# Chapter 11: Dense Vector Representations
Storing dense vectors in PyTorch: The `Embedding` and `EmbeddingBag` classes 

Programs from the book: [_Python for Natural Language Processing_](https://link.springer.com/book/9783031575488)

__Author__: Pierre Nugues

## Modules

In [1]:
import random
import torch
import torch.nn as nn

In [2]:
random.seed(4321)
torch.manual_seed(4321)

<torch._C.Generator at 0x12beb7210>

## Embeddings


We use the class `Embedding(num_embeddings, embedding_dim, ...)` and we create 5000 dense vectors (embeddings) in a vector space of dimension 64

In [3]:
embedding = nn.Embedding(5000, 64)
embedding

Embedding(5000, 64)

In [4]:
embedding.weight[:5, :5]

tensor([[-0.4716, -0.3436, -1.1742,  0.1221,  1.3231],
        [-1.1889,  0.8678,  2.0916, -1.2002, -0.5946],
        [ 2.1430, -0.3934,  0.0314, -0.6845, -3.5251],
        [ 1.9011, -0.0659, -0.9426,  0.4688, -0.5444],
        [-0.6471, -1.9928,  1.3672, -1.6397, -0.1779]],
       grad_fn=<SliceBackward0>)

An embedding layer acts as a lookup table

In [5]:
word_idx = torch.LongTensor([3, 2, 1])
embedding(word_idx)[:, :5]

tensor([[ 1.9011, -0.0659, -0.9426,  0.4688, -0.5444],
        [ 2.1430, -0.3934,  0.0314, -0.6845, -3.5251],
        [-1.1889,  0.8678,  2.0916, -1.2002, -0.5946]],
       grad_fn=<SliceBackward0>)

When the input has a variable length, we have to align vectors up to a maximal length. We need then a padding symbol for the sequences less than this maximal length. We tell Torch by assigning the padding symbol an index usually 0

In [6]:
embedding = nn.Embedding(5000, 64, padding_idx=0)

In [7]:
embedding

Embedding(5000, 64, padding_idx=0)

In [8]:
embedding.weight[:5, :5]

tensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-0.7611,  0.9961,  0.0208, -1.5157, -0.3545],
        [ 0.0871, -0.0454,  0.9313, -0.8555,  0.3771],
        [ 0.6364,  1.1133,  0.8546,  0.7088, -1.0786],
        [ 0.0289,  0.8395,  0.8943,  0.0564,  0.5867]],
       grad_fn=<SliceBackward0>)

In [9]:
word_idx = torch.LongTensor([0, 2, 0, 5])
embedding(word_idx)[:, :5]

tensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0871, -0.0454,  0.9313, -0.8555,  0.3771],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.2631,  0.3616, -0.0205, -0.2178,  0.3249]],
       grad_fn=<SliceBackward0>)

## Embedding Bags
Embedding bags deal with embedding sequences of variable length when the embeddings are summed. In CLD3, we have a weighted sum of a variable number of embeddings. See https://github.com/google/cld3

In [10]:
embedding_bag = nn.EmbeddingBag(8, 5, mode='sum')

In [11]:
embedding_bag.weight

Parameter containing:
tensor([[ 0.2474, -0.7485, -0.4725,  0.9527,  1.4553],
        [ 1.4757,  0.3881,  0.2289, -0.3961,  0.0492],
        [-0.2349,  0.9298, -1.0225,  0.0796,  0.5364],
        [-0.6228, -0.9396, -0.3125, -1.1847,  0.7243],
        [-0.3061,  0.8004, -0.6700,  1.3515, -0.9943],
        [ 0.2663,  0.3603,  1.4713, -1.1935,  1.5441],
        [-0.9372, -0.3915, -0.0108, -0.2004,  0.3531],
        [ 1.0411, -0.2576, -0.5129, -1.3512,  0.7840]], requires_grad=True)

an `EmbeddingBag` object needs the bags of indices it will sum as its first parameter 

In [12]:
embedding_bag(torch.tensor([[1, 2], [3, 4]]))

tensor([[ 1.2408,  1.3179, -0.7936, -0.3164,  0.5856],
        [-0.9289, -0.1393, -0.9825,  0.1669, -0.2700]],
       grad_fn=<EmbeddingBagBackward0>)

In [13]:
embedding_bag(torch.tensor([[1]])) + embedding_bag(torch.tensor([[2]]))

tensor([[ 1.2408,  1.3179, -0.7936, -0.3164,  0.5856]], grad_fn=<AddBackward0>)

In [14]:
embedding_bag.weight[1] + embedding_bag.weight[2]

tensor([ 1.2408,  1.3179, -0.7936, -0.3164,  0.5856], grad_fn=<AddBackward0>)

Or we may have a 1-D input and the the bag indices as second parameter: `offsets`

In [15]:
embedding_bag(torch.tensor([1, 2, 3, 4]), offsets=torch.tensor([0, 2]))

tensor([[ 1.2408,  1.3179, -0.7936, -0.3164,  0.5856],
        [-0.9289, -0.1393, -0.9825,  0.1669, -0.2700]],
       grad_fn=<EmbeddingBagBackward0>)

We can also compute a weighted sum using the `per_sample_weights` parameter. The shape must be the same as the input

In [16]:
embedding_bag(torch.tensor([[1, 2], [3, 4]]), per_sample_weights=torch.tensor(
    [[0.5, 0.5], [0.2, 0.8]]))

tensor([[ 0.6204,  0.6590, -0.3968, -0.1582,  0.2928],
        [-0.3694,  0.4524, -0.5985,  0.8443, -0.6506]],
       grad_fn=<EmbeddingBagBackward0>)

In [17]:
0.5 * embedding_bag.weight[1] + 0.5 * embedding_bag.weight[2]

tensor([ 0.6204,  0.6590, -0.3968, -0.1582,  0.2928], grad_fn=<AddBackward0>)

In [18]:
0.2 * embedding_bag.weight[3] + 0.8 * embedding_bag.weight[4]

tensor([-0.3694,  0.4524, -0.5985,  0.8443, -0.6506], grad_fn=<AddBackward0>)

With an offset

In [19]:
embedding_bag(torch.tensor([1, 2, 3, 4]),
              offsets=torch.tensor([0, 2]),
              per_sample_weights=torch.tensor([0.5, 0.5, 0.2, 0.8]))

tensor([[ 0.6204,  0.6590, -0.3968, -0.1582,  0.2928],
        [-0.3694,  0.4524, -0.5985,  0.8443, -0.6506]],
       grad_fn=<EmbeddingBagBackward0>)