<img src="./pics/DL.png" width=110 align="left" style="margin-right: 10px">

# Introduction to Deep Learning

## 08. Embeddings

---

## [Word](https://keras.io/layers/embeddings/) [Embeddings](https://radimrehurek.com/gensim/models/word2vec.html)

<div style="display: inline-block;">
<img src="https://adriancolyer.files.wordpress.com/2016/04/word2vec-king-queen-vectors.png?w=566&zoom=2" width=400 align='left'>
<img src="https://adriancolyer.files.wordpress.com/2016/04/word2vec-king-queen-composition.png?w=566&zoom=2" width=400 align='left'>
</div>

<div style='align: clear'/>
<br>Images from <a href="https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/">the morning paper</a>

> _Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with one dimension per word to a continuous vector space with a much lower dimension._  
_Methods to generate this mapping include neural networks, dimensionality reduction on the word co-occurrence matrix, probabilistic models, explainable knowledge base method, and explicit representation in terms of the context in which words appear._ - [source](https://en.wikipedia.org/wiki/Word_embedding)

The intuition to the model is that words with similar contexts have similar meaning.

#### Training

<div style="display: inline-block;">
<img src="http://mccormickml.com/assets/word2vec/training_data.png" width=300 align='left'>
<img src="http://mccormickml.com/assets/word2vec/skip_gram_net_arch.png" width=300 align='left'>
<img src="http://mccormickml.com/assets/word2vec/word2vec_weight_matrix_lookup_table.png" width=300 align='left'>
</div>

<div style='align: clear'/>
<br>Images from <a href="http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/">Word2Vec Tutorial - The Skip-Gram Model.</a>, by <a href="http://mccormickml.com/">Chris McCormick</a>

There are two approach to learn word-embeddings: 
- the continous bag-of-words (CBOW): the model predicts the selected word from the context words in the surrounding window (word order invariant)
- the skip-gram architecture:  the model predicts the context words from the selected word (context words are weighted by their distance to the selected word)


#### Further reading:

- https://www.tensorflow.org/tutorials/representation/word2vec#vector-representations-of-words
- https://www.quora.com/How-does-word2vec-work-Can-someone-walk-through-a-specific-example
- http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
- https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa
- https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526
- https://hackernoon.com/word-embeddings-in-nlp-and-its-applications-fab15eaf7430
- https://blog.cambridgespark.com/tutorial-build-your-own-embedding-and-use-it-in-a-neural-network-e9cde4a81296
- https://skymind.ai/wiki/word2vec
- https://github.com/anvaka/word2vec-graph
- https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/
- https://heartbeat.fritz.ai/using-a-keras-embedding-layer-to-handle-text-data-2c88dc019600
- [Google Word2Vec](https://code.google.com/archive/p/word2vec/)


---

### In Practice

#### Learning simple word-embeddings

In [None]:
import numpy as np

from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding

In [None]:
docs = ['Well done!',
        'Good work',
        'Great effort',
        'nice work',
        'Excellent!',
        'Weak',
        'Poor effort!',
        'not good',
        'poor work',
        'Could have done better.']

labels = np.array([1, 1, 1, 1, 1,
                   0, 0, 0, 0, 0])

In [None]:
# integer encode the documents
vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)

In [None]:
# pad documents to a max length of 4 words
max_length = 4
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
print(padded_docs)

In [None]:
# define the model
model = Sequential([
    Embedding(vocab_size, 8, input_length=max_length),
    Flatten(),
    Dense(1, activation='sigmoid')
])

In [None]:
# compile the model
model.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

In [None]:
# summarize the model
print(model.summary())

In [None]:
# fit the model
model.fit(padded_docs, labels, epochs=50, verbose=0)

In [None]:
# evaluate the model
score = model.evaluate(padded_docs, labels, verbose=0)
print('loss: {}, accuracy: {}'.format(*score))

#### Exercise: Classify the 20newsgroups dataset while building an embedding

---

### Further tutorials:
- https://www.pyimagesearch.com/2018/09/10/keras-tutorial-how-to-get-started-with-keras-deep-learning-and-python/
- https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
- https://www.datacamp.com/community/tutorials/deep-learning-python
- https://elitedatascience.com/keras-tutorial-deep-learning-in-python
- https://www.guru99.com/keras-tutorial.html
- https://github.com/adventuresinML/adventures-in-ml-code