# Embeddings

We will use the embeddings through the whole lab. They are simply represented by a weight matrix:

In [5]:
import numpy as np

embedding_size = 4
vocab_size = 10

embedding = np.arange(embedding_size * vocab_size, dtype='float')
embedding = embedding.reshape(vocab_size, embedding_size)
print(embedding)

[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]
 [ 16.  17.  18.  19.]
 [ 20.  21.  22.  23.]
 [ 24.  25.  26.  27.]
 [ 28.  29.  30.  31.]
 [ 32.  33.  34.  35.]
 [ 36.  37.  38.  39.]]


To access the embedding for a given symbol $i$, you may:
 - compute a one-hot encoding of $i$, then compute a dot product with the embedding matrix
 - simply index (slice) the embedding matrix by $i$, using numpy indexing

In [6]:
i = 3
onehot = np.zeros(10)
onehot[i] = 1.
item_embedding = np.dot(onehot, embedding)
print(item_embedding)
print(embedding[i])

[ 12.  13.  14.  15.]
[ 12.  13.  14.  15.]


In Keras, embeddings have an extra parameter, `input_length` which is typically used when having a sequence of symbols as input (think sequence of words). In our case, the length will always be 1.

```py
Embedding(output_dim=embedding_size, input_dim=vocab_size,
          input_length=sequence_length, name='my_embedding')
```

In [7]:
from keras.layers import Input, Embedding, Flatten
from keras.models import Model

Using TensorFlow backend.


In [8]:
x = Input(shape=[1], name='input')
emb = Embedding(output_dim=embedding_size, input_dim=vocab_size,
          input_length=1, name='my_embedding')(x)
model = Model(input=x,output=emb)
model.output_shape

(None, 1, 4)

The output of an embedding layer is then a 3-d tensor of shape `(batch_size, sequence_length, embedding_size)`
To remove the sequence dimension, useless in our case, we use the `Flatten()` layer

In [11]:
x = Input(shape=[1], name='input')
emb = Embedding(output_dim=embedding_size, input_dim=vocab_size,
          input_length=1, name='my_embedding')(x)

# Add a flatten layer to remove useless "sequence" dimension
y = Flatten()(emb)

model = Model(input=x,output=y)
model.output_shape

(None, 4)