# Embeddings

We will use the embeddings through the whole lab. They are simply represented by a weight matrix:

In [1]:
import numpy as np

embedding_size = 4
vocab_size = 10

embedding = np.arange(embedding_size * vocab_size, dtype='float')
embedding = embedding.reshape(vocab_size, embedding_size)
print(embedding)

[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]
 [ 16.  17.  18.  19.]
 [ 20.  21.  22.  23.]
 [ 24.  25.  26.  27.]
 [ 28.  29.  30.  31.]
 [ 32.  33.  34.  35.]
 [ 36.  37.  38.  39.]]


To access the embedding for a given symbol $i$, you may:
 - compute a one-hot encoding of $i$, then compute a dot product with the embedding matrix
 - simply index (slice) the embedding matrix by $i$, using numpy indexing

In [2]:
i = 3
onehot = np.zeros(10)
onehot[i] = 1.
onehot

array([ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.])

In [3]:
embedding_vector = np.dot(onehot, embedding)
print(embedding_vector)

[ 12.  13.  14.  15.]


In [4]:
print(embedding[i])

[ 12.  13.  14.  15.]


### The Embedding layer in Keras

In Keras, embeddings have an extra parameter, `input_length` which is typically used when having a sequence of symbols as input (think sequence of words). In our case, the length will always be 1.

```py
Embedding(output_dim=embedding_size, input_dim=vocab_size,
          input_length=sequence_length, name='my_embedding')
```

In [5]:
from keras.layers import Embedding

embedding_layer = Embedding(
    output_dim=embedding_size, input_dim=vocab_size,
    input_length=1, name='my_embedding')

Using TensorFlow backend.


Let's use it as part of a Keras model:

In [6]:
from keras.layers import Input
from keras.models import Model

x = Input(shape=[1], name='input')
embedding = embedding_layer(x)
model = Model(input=x, output=embedding)
model.output_shape

(None, 1, 4)

The embedding weights are randomly initialized:

In [19]:
model.get_weights()

[array([[ 0.00010885,  0.04235527,  0.00520309,  0.0379991 ],
        [-0.0275134 ,  0.04514914, -0.03284432, -0.02928448],
        [ 0.0040774 , -0.01528351,  0.00097798,  0.00578408],
        [-0.01080111,  0.03560133, -0.03600425,  0.00415856],
        [-0.03910469,  0.04213334,  0.01893706, -0.0020257 ],
        [ 0.03863158, -0.00525854,  0.03180087,  0.02923371],
        [ 0.03271924, -0.02948186,  0.01590746, -0.02079001],
        [-0.01088799, -0.04269426, -0.00206385,  0.00180041],
        [ 0.0463028 ,  0.01722094,  0.02712139, -0.03782616],
        [-0.00826054,  0.04071173,  0.00534217,  0.03200009]], dtype=float32)]

In [21]:
model.predict([[0],
               [3]])

array([[[ 0.00010885,  0.04235527,  0.00520309,  0.0379991 ]],

       [[-0.01080111,  0.03560133, -0.03600425,  0.00415856]]], dtype=float32)

The output of an embedding layer is then a 3-d tensor of shape `(batch_size, sequence_length, embedding_size)`
To remove the sequence dimension, useless in our case, we use the `Flatten()` layer

In [22]:
from keras.layers import Flatten

x = Input(shape=[1], name='input')

# Add a flatten layer to remove useless "sequence" dimension
y = Flatten()(embedding_layer(x))

model2 = Model(input=x, output=y)
model2.output_shape

(None, 4)

In [23]:
model2.predict([[0],
                [3]])

array([[ 0.00010885,  0.04235527,  0.00520309,  0.0379991 ],
       [-0.01080111,  0.03560133, -0.03600425,  0.00415856]], dtype=float32)

Note that we re-used the same `embedding_layer` instance in both `model` and `model2`: therefore the two model share exactly the same weights in memory.