# nn.Embedding

In [1]:
import torch
import torch.nn as nn

In [2]:
torch.__version__

'1.8.2+cu111'

> A simple lookup table that stores embeddings of a fixed dictionary and size.

## Example 1

Let's make $8 \times 4$ size lookup table. This is one of the most common usage of `nn.Embedding`.

In [3]:
embedding_layer = nn.Embedding(num_embeddings=8, embedding_dim=4)

In [4]:
embedding_layer.weight

Parameter containing:
tensor([[-3.3307e-01,  3.6703e-01,  1.0469e+00,  4.7747e-01],
        [ 1.2580e+00, -8.4635e-01, -4.5423e-01,  7.3989e-01],
        [-9.6321e-01, -5.6505e-01,  1.4820e+00, -1.4327e+00],
        [ 1.3749e+00, -1.1486e+00,  1.7615e+00, -1.1193e+00],
        [ 5.0136e-01,  3.7170e-02, -1.4641e+00,  5.2615e-01],
        [-1.3963e+00,  7.1644e-02,  1.0268e+00,  2.0797e+00],
        [ 1.2618e+00,  9.1458e-06, -3.5095e-02, -7.5799e-01],
        [-9.7169e-02, -1.2832e-01,  8.4959e-01, -1.2375e+00]],
       requires_grad=True)

Weights of the embedding are embeddings themselves. Indexing will fetch the corresponding weights.

In [5]:
idx = torch.IntTensor([1, 3, 4])

In [6]:
embedding_layer(idx)

tensor([[ 1.2580, -0.8463, -0.4542,  0.7399],
        [ 1.3749, -1.1486,  1.7615, -1.1193],
        [ 0.5014,  0.0372, -1.4641,  0.5262]], grad_fn=<EmbeddingBackward>)

## Example 2

Let's use other options. `padding_idx` will set the index to have zero weights, which will not be updated at training stage.

In [7]:
embedding_layer = nn.Embedding(num_embeddings=8, embedding_dim=4, padding_idx=2, max_norm=1, norm_type=2)

In [8]:
embedding_layer.weight

Parameter containing:
tensor([[-0.2921,  0.3198, -1.4463, -0.0211],
        [ 1.6262, -1.3835, -0.6565, -0.7844],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [-0.2434,  0.1026,  0.3376,  2.4716],
        [-1.1623, -0.2063,  0.7496,  0.3677],
        [-1.6133, -1.6416,  0.2389, -0.4451],
        [-1.8385, -0.1169,  0.0202,  0.8692],
        [ 1.0441, -0.5111, -0.8074, -0.1289]], requires_grad=True)

Another interesting options are `max_norm`. `norm_type` parameter will decide the type of norm. Default is Frobenius norm.

In [9]:
embedding_layer.weight.norm(dim=-1)

tensor([1.5099, 2.3675, 0.0000, 2.5085, 1.4459, 2.3564, 2.0371, 1.4212],
       grad_fn=<CopyBackwards>)

Here, norm of each embeddings are not normalized to 1. This is unexpected, but followings will show how it works.

In [10]:
embedding_layer(idx)

tensor([[ 0.6869, -0.5844, -0.2773, -0.3313],
        [-0.0970,  0.0409,  0.1346,  0.9853],
        [-0.8039, -0.1427,  0.5184,  0.2543]], grad_fn=<EmbeddingBackward>)

`embedding_layer` is queried from `idx`. After this forward pass, we can see that matrix weights are changed.

In [11]:
embedding_layer.weight

Parameter containing:
tensor([[-0.2921,  0.3198, -1.4463, -0.0211],
        [ 0.6869, -0.5844, -0.2773, -0.3313],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [-0.0970,  0.0409,  0.1346,  0.9853],
        [-0.8039, -0.1427,  0.5184,  0.2543],
        [-1.6133, -1.6416,  0.2389, -0.4451],
        [-1.8385, -0.1169,  0.0202,  0.8692],
        [ 1.0441, -0.5111, -0.8074, -0.1289]], requires_grad=True)

In [12]:
embedding_layer.weight.norm(dim=-1)

tensor([1.5099, 1.0000, 0.0000, 1.0000, 1.0000, 2.3564, 2.0371, 1.4212],
       grad_fn=<CopyBackwards>)

And the queries embeddings are normalized to the given `max_norm`. We can guess that the normalization works on-line. Following blocks support this.

In [13]:
idx = torch.IntTensor([0, 2])
embedding_layer(idx)
embedding_layer.weight.norm(dim=-1)

tensor([1.0000, 1.0000, 0.0000, 1.0000, 1.0000, 2.3564, 2.0371, 1.4212],
       grad_fn=<CopyBackwards>)

In [14]:
idx = torch.IntTensor([7])
embedding_layer(idx)
embedding_layer.weight.norm(dim=-1)

tensor([1.0000, 1.0000, 0.0000, 1.0000, 1.0000, 2.3564, 2.0371, 1.0000],
       grad_fn=<CopyBackwards>)