# Embeddings

We will use the embeddings through the whole lab. They are simply represented by a weight matrix:

In [1]:
import numpy as np

embedding_size = 4
vocab_size = 10

embedding = np.arange(embedding_size * vocab_size, dtype='float')
embedding = embedding.reshape(vocab_size, embedding_size)
print(embedding)

[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]
 [ 16.  17.  18.  19.]
 [ 20.  21.  22.  23.]
 [ 24.  25.  26.  27.]
 [ 28.  29.  30.  31.]
 [ 32.  33.  34.  35.]
 [ 36.  37.  38.  39.]]


To access the embedding for a given symbol $i$, you may:
 - compute a one-hot encoding of $i$, then compute a dot product with the embedding matrix
 - simply index (slice) the embedding matrix by $i$, using numpy indexing

In [2]:
i = 3
onehot = np.zeros(10)
onehot[i] = 1.
onehot

array([ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.])

In [3]:
embedding_vector = np.dot(onehot, embedding)
print(embedding_vector)

[ 12.  13.  14.  15.]


In [4]:
print(embedding[i])

[ 12.  13.  14.  15.]


### The Embedding layer in Keras

In Keras, embeddings have an extra parameter, `input_length` which is typically used when having a sequence of symbols as input (think sequence of words). In our case, the length will always be 1.

```py
Embedding(output_dim=embedding_size, input_dim=vocab_size,
          input_length=sequence_length, name='my_embedding')
```

In [5]:
from keras.layers import Embedding

embedding_layer = Embedding(
    output_dim=embedding_size, input_dim=vocab_size,
    input_length=1, name='my_embedding')

Using TensorFlow backend.


Let's use it as part of a Keras model:

In [6]:
from keras.layers import Input
from keras.models import Model

x = Input(shape=[1], name='input')
embedding = embedding_layer(x)
model = Model(input=x, output=embedding)
model.output_shape

(None, 1, 4)

The embedding weights are randomly initialized:

In [7]:
model.get_weights()

[array([[-0.01936201,  0.01061306, -0.00502486,  0.02421001],
        [-0.0071108 , -0.00054221, -0.00888319, -0.01043938],
        [ 0.02423747, -0.03029217,  0.04300025,  0.04906172],
        [ 0.02151183, -0.03897278, -0.02845473,  0.03577714],
        [ 0.02908627, -0.00221761,  0.04954894, -0.0006169 ],
        [-0.04881933, -0.00207155,  0.01976109, -0.03441812],
        [-0.04364808, -0.03180511,  0.00863915,  0.0453595 ],
        [ 0.02288247, -0.00607735, -0.04348791, -0.03095592],
        [-0.03011045,  0.02573893, -0.01666873, -0.0387949 ],
        [-0.04212544,  0.00182465, -0.00680971,  0.04644271]], dtype=float32)]

In [8]:
model.predict([[0],
               [3]])

array([[[-0.01936201,  0.01061306, -0.00502486,  0.02421001]],

       [[ 0.02151183, -0.03897278, -0.02845473,  0.03577714]]], dtype=float32)

The output of an embedding layer is then a 3-d tensor of shape `(batch_size, sequence_length, embedding_size)`
To remove the sequence dimension, useless in our case, we use the `Flatten()` layer

In [9]:
from keras.layers import Flatten

x = Input(shape=[1], name='input')

# Add a flatten layer to remove useless "sequence" dimension
y = Flatten()(embedding_layer(x))

model2 = Model(input=x, output=y)
model2.output_shape

(None, 4)

In [10]:
model2.predict([[0],
                [3]])

array([[-0.01936201,  0.01061306, -0.00502486,  0.02421001],
       [ 0.02151183, -0.03897278, -0.02845473,  0.03577714]], dtype=float32)

Note that we re-used the same `embedding_layer` instance in both `model` and `model2`: therefore the two model share exactly the same weights in memory.