### Embeddings

This notebook code is taken from Francois Challot's book *Deep Learning with Python*, published by Manning, and available [on Amazon](https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438/ref=sr_1_fkmr0_1?keywords=deep+learning+python+challot&qid=1573571371&sr=8-1-fkmr0). You can see the orignal notebook [on the book's GitHub](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/3.5-classifying-movie-reviews.ipynb)

This notebook uses the IMDB movie data set that is built-in with Kerasto demonstrate adding an Embedding layer in a model.


In [6]:
import keras
from keras.datasets import imdb
max_features = 10000

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=max_features)

The data has been loaded as in the previous Keras idmb notebook. The previous notebook used vectorized one-hot representations of the presence/absence of words in an example. This notebook will show a simple approach to representing the first 20 words of each review in an embedding layer. 

In [7]:
from keras import preprocessing
maxlen = 20

train_data = preprocessing.sequence.pad_sequences(train_data, maxlen=maxlen)
test_data = preprocessing.sequence.pad_sequences(test_data, maxlen=maxlen)

In [9]:
# set up the Embedding layer in a Sequential model
from keras.models import Sequential
from keras.layers import Flatten, Dense, Embedding

model = Sequential()
model.add(Embedding(max_features, 8, input_length=maxlen))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.summary()

history = model.fit(train_data, train_labels, epochs=10, batch_size=32, validation_split=0.2)



Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 20, 8)             80000     
_________________________________________________________________
flatten_3 (Flatten)          (None, 160)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 161       
Total params: 80,161
Trainable params: 80,161
Non-trainable params: 0
_________________________________________________________________
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


The validation accuracy reached 75% in this approach. This approach is still a bag of words approach in that each word is considered in isolation. More powerful approaches can capture relationships between words. This can be done with a 1D convolutional layer or recurrent layers. Both of these approaches are demonstrated in other notebooks.

### Embeddings

The one-hot vectors are sparse binary high-dimensional vectors. A lower-dimensional approach is to learn word embeddings from the data. Word embeddings learn vector representations based on word co-occurrence. Words that tend to occur together probably are related in some way, so their vectors should be similar. 


Word embeddings can be learned at the same time as training occurs. This was attempted in the example above, with limited results because our data was small and the model not complex enough. 

Another way to use embeddings is to use pretrained embeddings from other sources. Ideally, the best approach would be to either train embeddings or use pretrained embeddings in the domain of the problem. Fortunately, the pretrained embeddings tend to be general enough for many appplications. Popular pretrained embeddings include Word2vec and GloVe. 