## Embeddingi

Przeanalizujmy co się dzieje w RNN, gdy podajemy słowa w reprezentacji one hot.

## $$ h_t = f( W^h * h_{t-1} + W^x * x_t + b)$$

Zatem jeśli x to "one-hot" z jedynką na pozycji $i$ to:

## $$ W^x * x_t = W^x[:,i],  $$

Czyli wkład informacji słowa sprowadza się do wzięcia odpowieniej kolumny macierzy wag.

Czyli i-ta kolumna macierzy wag jest w pewnym sensie reprezentacją słowa i.

Zatem pójdźmy krok dalej: stwórzmy sobie dodatkową warstwę w sieci, zawierającą reprezentacje słów, które będą przekazywane do wyliczenia stanu ukrytego.


Wówczas sieć z warstwą "embeddingów" ma postać:

<br>

<br>

$x_t$ - id słowa wejściowego w momencie $t$.

$EMB$ - macierz embeddingów

<br>

$$emb_t = EMB[x_t]$$
$$ h_t = f( W^h * h_{t-1} + W^x * emb_t + b)$$

<br>

Ta warstwa nazywa się EMBEDDING'ami (embedding layer).


<img src="Grafika/embeddings.jpg" width="700">
Źródło: https://www.slideshare.net/Geeks_Lab/aibigdata-lab-2016-62764857



### Zauważmy, że embeddingi są parametrami sieci, ale jednocześnie reprezentacją słów. Oznacza to, że trenując sieć, uczymy embeddingi, czyli uczymy się reprezentacji słów.


### Case study: IMBD

In [8]:
import numpy as np
from keras.preprocessing import sequence

from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, SimpleRNN, LSTM, Bidirectional

from keras.callbacks import EarlyStopping

from keras.datasets import imdb

### 5000 words

In [3]:
max_features = 5000

In [4]:
(X_train,y_train),(X_test,y_test) = imdb.load_data(num_words=5000)

In [5]:
X_train.shape

(25000,)

In [6]:
len(X_train[1])

189

In [7]:
print(X_train[1])

[1, 194, 1153, 194, 2, 78, 228, 5, 6, 1463, 4369, 2, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 2, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 2, 5, 163, 11, 3215, 2, 4, 1153, 9, 194, 775, 7, 2, 2, 349, 2637, 148, 605, 2, 2, 15, 123, 125, 68, 2, 2, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 2, 5, 2, 656, 245, 2350, 5, 4, 2, 131, 152, 491, 18, 2, 32, 2, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]


Zwróćmy uwagę w powyższym, że ciągi zaczynają się zawsze od "1" - jest to oznaczenie początku zdania. Czyli "początek zdania" będzie mial swój embedding. Dzięki temu sieć lepiej nauczy się uwzględniać, podczas "analizy" pierwszego słow fakt, że to słowo jest pierwsze.

Standaryzacja długości sekwencji (znalezienie najdłuższej, wypełnienie zerami pozostałych w taki sposób, aby wszystkie były jednakowej długości)

In [10]:
max_len = 400

In [11]:
X_train = sequence.pad_sequences(X_train,maxlen=max_len)
X_test = sequence.pad_sequences(X_test,maxlen=max_len)

In [12]:
X_train.shape

(25000, 400)

In [13]:
n_train = 5000
n_test = 2000

In [14]:
X_train = X_train[:n_train]
y_train = y_train[:n_train]
X_test = X_test[:n_test]
y_test = y_test[:n_test]

### Zwykła sieć rekurencyjna ( z embeddingami)

In [15]:
embeding_dims = 32

In [17]:
model = Sequential()
model.add(Embedding(max_features,embeding_dims,input_length=max_len))
model.add(SimpleRNN(100,return_sequences=False))# many-to-one
model.add(Dense(1,activation='sigmoid'))

model.compile(loss='binary_crossentropy',metrics=['accuracy'],optimizer='adam')

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 400, 32)           160000    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 100)               13300     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
Total params: 173,401
Trainable params: 173,401
Non-trainable params: 0
_________________________________________________________________


In [18]:
early_stoping = EarlyStopping(patience=5)
model.fit(X_train,y_train,batch_size=32,epochs=100,callbacks=[early_stoping],validation_split=0.25,verbose=2)

Train on 3750 samples, validate on 1250 samples
Epoch 1/100
 - 26s - loss: 0.6929 - acc: 0.5320 - val_loss: 0.6845 - val_acc: 0.5568
Epoch 2/100
 - 25s - loss: 0.6251 - acc: 0.6997 - val_loss: 0.6542 - val_acc: 0.6056
Epoch 3/100
 - 25s - loss: 0.5279 - acc: 0.7717 - val_loss: 0.6418 - val_acc: 0.6024
Epoch 4/100
 - 26s - loss: 0.5350 - acc: 0.7819 - val_loss: 0.6215 - val_acc: 0.6400
Epoch 5/100
 - 26s - loss: 0.3823 - acc: 0.8344 - val_loss: 0.5384 - val_acc: 0.7504
Epoch 6/100
 - 26s - loss: 0.2556 - acc: 0.9011 - val_loss: 0.5771 - val_acc: 0.7248
Epoch 7/100
 - 25s - loss: 0.1807 - acc: 0.9344 - val_loss: 0.6936 - val_acc: 0.7416
Epoch 8/100
 - 25s - loss: 0.0940 - acc: 0.9680 - val_loss: 0.7869 - val_acc: 0.7256
Epoch 9/100
 - 25s - loss: 0.0469 - acc: 0.9872 - val_loss: 0.7189 - val_acc: 0.7704
Epoch 10/100
 - 26s - loss: 0.0463 - acc: 0.9904 - val_loss: 0.8962 - val_acc: 0.7552


<keras.callbacks.History at 0x7f8f59681c88>

In [19]:
print(model.evaluate(X_train,y_train,verbose=2))
print(model.evaluate(X_test,y_test,verbose=2))

[0.23753432908952235, 0.9356]
[0.9043297662734985, 0.7475]


### Simple RNN + dense pomiędzy zwracanym stanem ukrytym a outputem

In [28]:
model2 = Sequential()
model2.add(Embedding(max_features,embeding_dims,input_length=max_len))
model2.add(SimpleRNN(100,return_sequences=False))# many-to-one
model2.add(Dense(50,activation='tanh'))
model2.add(Dense(1,activation='sigmoid'))

model2.compile(loss='binary_crossentropy',metrics=['accuracy'],optimizer='adam')

model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 400, 32)           160000    
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, 100)               13300     
_________________________________________________________________
dense_5 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 51        
Total params: 178,401
Trainable params: 178,401
Non-trainable params: 0
_________________________________________________________________


In [29]:
early_stoping = EarlyStopping(patience=5)
model2.fit(X_train,y_train,batch_size=32,epochs=100,callbacks=[early_stoping],validation_split=0.25,verbose=2)

Train on 3750 samples, validate on 1250 samples
Epoch 1/100
 - 13s - loss: 0.7024 - acc: 0.5072 - val_loss: 0.6954 - val_acc: 0.5072
Epoch 2/100
 - 12s - loss: 0.7006 - acc: 0.5165 - val_loss: 0.6887 - val_acc: 0.5480
Epoch 3/100
 - 12s - loss: 0.6892 - acc: 0.5349 - val_loss: 0.6845 - val_acc: 0.5544
Epoch 4/100
 - 12s - loss: 0.6668 - acc: 0.5920 - val_loss: 0.6676 - val_acc: 0.5616
Epoch 5/100
 - 12s - loss: 0.6063 - acc: 0.6795 - val_loss: 0.6780 - val_acc: 0.5896
Epoch 6/100
 - 16s - loss: 0.5345 - acc: 0.7189 - val_loss: 0.7133 - val_acc: 0.5824
Epoch 7/100
 - 16s - loss: 0.4854 - acc: 0.7456 - val_loss: 0.7807 - val_acc: 0.5768
Epoch 8/100
 - 14s - loss: 0.4503 - acc: 0.7755 - val_loss: 0.7526 - val_acc: 0.5936
Epoch 9/100
 - 15s - loss: 0.4077 - acc: 0.7915 - val_loss: 0.8054 - val_acc: 0.5976


<keras.callbacks.History at 0x7fd9c0334b38>

In [30]:
model2.evaluate(X_test,y_test,verbose=2)

[0.32980653083324435, 0.846]

## Dwuwarstwowa sieć rekurencyjna

In [44]:
model3 = Sequential()
model3.add(Embedding(max_features,embeding_dims,input_length=max_len))
model3.add(SimpleRNN(100,return_sequences=True))
model3.add(SimpleRNN(50,return_sequences=False))# many-to-one
model3.add(Dense(1,activation='sigmoid'))

model3.compile(loss='binary_crossentropy',metrics=['accuracy'],optimizer='adam')
early_stoping = EarlyStopping(patience=5)
model3.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 400, 32)           160000    
_________________________________________________________________
simple_rnn_8 (SimpleRNN)     (None, 400, 100)          13300     
_________________________________________________________________
simple_rnn_9 (SimpleRNN)     (None, 50)                7550      
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 51        
Total params: 180,901
Trainable params: 180,901
Non-trainable params: 0
_________________________________________________________________


In [45]:
model3.fit(X_train,y_train,batch_size=32,epochs=100,callbacks=[early_stoping],validation_split=0.25,verbose=2)

Train on 3750 samples, validate on 1250 samples
Epoch 1/100
 - 26s - loss: 0.7007 - acc: 0.5037 - val_loss: 0.6911 - val_acc: 0.5144
Epoch 2/100
 - 31s - loss: 0.6153 - acc: 0.6589 - val_loss: 0.7629 - val_acc: 0.5304
Epoch 3/100
 - 29s - loss: 0.2996 - acc: 0.8824 - val_loss: 0.8891 - val_acc: 0.5312
Epoch 4/100
 - 27s - loss: 0.0761 - acc: 0.9765 - val_loss: 1.5083 - val_acc: 0.5152
Epoch 5/100
 - 31s - loss: 0.0236 - acc: 0.9949 - val_loss: 1.6281 - val_acc: 0.5304
Epoch 6/100
 - 27s - loss: 0.0056 - acc: 0.9992 - val_loss: 1.9399 - val_acc: 0.5232


<keras.callbacks.History at 0x7fd9c0e502b0>

In [None]:
# model uczy się prawie całego tekstu na pamięć

In [46]:
model3.evaluate(X_test,y_test,verbose=2)

[1.9941312866210938, 0.5165]

## Dwukierunkowa sieć rekurencyjna

In [47]:
model4 = Sequential()
model4.add(Embedding(max_features,embeding_dims,input_length=max_len))
model4.add(Bidirectional(SimpleRNN(100,return_sequences=True)))
model4.add(SimpleRNN(50,return_sequences=False))# many-to-one
model4.add(Dense(1,activation='sigmoid'))

model4.compile(loss='binary_crossentropy',metrics=['accuracy'],optimizer='adam')
early_stoping = EarlyStopping(patience=5)
model4.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_9 (Embedding)      (None, 400, 32)           160000    
_________________________________________________________________
bidirectional_1 (Bidirection (None, 400, 200)          26600     
_________________________________________________________________
simple_rnn_11 (SimpleRNN)    (None, 50)                12550     
_________________________________________________________________
dense_9 (Dense)              (None, 1)                 51        
Total params: 199,201
Trainable params: 199,201
Non-trainable params: 0
_________________________________________________________________


In [48]:
model4.fit(X_train,y_train,batch_size=32,epochs=100,callbacks=[early_stoping],validation_split=0.25,verbose=2)

Train on 3750 samples, validate on 1250 samples
Epoch 1/100
 - 54s - loss: 0.7013 - acc: 0.5163 - val_loss: 0.6807 - val_acc: 0.5552
Epoch 2/100
 - 52s - loss: 0.6301 - acc: 0.6480 - val_loss: 0.6615 - val_acc: 0.6312
Epoch 3/100
 - 52s - loss: 0.4729 - acc: 0.7787 - val_loss: 0.7163 - val_acc: 0.6448
Epoch 4/100
 - 53s - loss: 0.3517 - acc: 0.8525 - val_loss: 0.7211 - val_acc: 0.6632
Epoch 5/100
 - 53s - loss: 0.2696 - acc: 0.8957 - val_loss: 0.7957 - val_acc: 0.6848
Epoch 6/100
 - 53s - loss: 0.2045 - acc: 0.9245 - val_loss: 0.8354 - val_acc: 0.6864
Epoch 7/100
 - 52s - loss: 0.1648 - acc: 0.9400 - val_loss: 0.9497 - val_acc: 0.6704


<keras.callbacks.History at 0x7fd9c0e50550>

In [49]:
model4.evaluate(X_test,y_test,verbose=2)

[1.004870327949524, 0.651]

# LSTM

Na GPU (na przykład w google collab): zamiast `LSTM` -> `CuDNNLSTM`

In [52]:
model5 = Sequential()
model5.add(Embedding(max_features,embeding_dims,input_length=max_len))
model5.add(LSTM(100,return_sequences=False))# many-to-one
model5.add(Dense(1,activation='sigmoid'))

model5.compile(loss='binary_crossentropy',metrics=['accuracy'],optimizer='adam')
early_stoping = EarlyStopping(patience=1)
model5.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_11 (Embedding)     (None, 400, 32)           160000    
_________________________________________________________________
lstm_2 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 101       
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________


In [53]:
model5.fit(X_train,y_train,batch_size=32,epochs=100,callbacks=[early_stoping],validation_split=0.25,verbose=2)

Train on 3750 samples, validate on 1250 samples
Epoch 1/100
 - 57s - loss: 0.6797 - acc: 0.5995 - val_loss: 0.6361 - val_acc: 0.7200
Epoch 2/100
 - 61s - loss: 0.4379 - acc: 0.8123 - val_loss: 0.4132 - val_acc: 0.8336
Epoch 3/100
 - 57s - loss: 0.3032 - acc: 0.8880 - val_loss: 0.4614 - val_acc: 0.7856


<keras.callbacks.History at 0x7fd9b2126208>

In [54]:
model5.evaluate(X_test,y_test,verbose=2)

[0.4555453841686249, 0.791]