[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/itmorn/AI.handbook/blob/main/DL/torch/nn/Sparse/Embedding.ipynb)

# Embedding
A simple lookup table that stores embeddings of a fixed dictionary and size.    
一个简单的查找表，存储固定字典和大小的嵌入。

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.  
这个模块通常用于存储词嵌入并使用索引检索它们。模块的输入是索引列表，输出是相应的词嵌入。

**定义**：   
torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None, device=None, dtype=None)

**参数**：  
- num_embeddings (int) – size of the dictionary of embeddings  嵌入字典的大小

- embedding_dim (int) – the size of each embedding vector  每个嵌入向量的大小

- padding_idx (int, optional) – If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector at padding_idx will default to all zeros, but can be updated to another value to be used as the padding vector.  如果指定了，padding_idx上的条目不参与梯度;因此，在padding_idx处的嵌入向量在训练过程中不会更新，即它仍然是一个固定的“pad”。对于新构造的Embedding, padding_idx处的嵌入向量将默认为全零，但可以更新为另一个值用作填充向量。

- max_norm (float, optional) – If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm.  如果给定，每个模数大于max_norm的嵌入向量将被重规范化为模数max_norm。

- norm_type (float, optional) – The p of the p-norm to compute for the max_norm option. Default 2.  用于max_norm选项计算的p-norm的p。默认2。

- scale_grad_by_freq (bool, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default False.  如果给出，这将按迷你批处理中单词频率的倒数缩放梯度。

- sparse (bool, optional) – If True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.  如果为True，梯度w.r.t.权重矩阵将是一个稀疏张量。有关稀疏梯度的详细信息，请参阅注释。

# 图解建表查表
    
<p align="center">
<img src="./imgs/Embedding.svg"
    width="1000" /></p>


In [51]:
# 调包计算
import torch
import torch.nn as nn
torch.manual_seed(666)
input = torch.LongTensor([[1, 2, 4, 5],
                          [4, 3, 2, 9]])
print("input:\n", input, "\n")

embedding = nn.Embedding(num_embeddings=10, embedding_dim=3)
print("embedding.weight:\n", embedding.weight, "\n")

output = embedding(input)
print("output:\n", output, "\n")

loss = output.sum()
optimizer = torch.optim.Adam(embedding.parameters(), lr=0.1) 
loss.backward()
optimizer.step() # 反向传播会更新embedding.weight

print("embedding.weight:\n", embedding.weight, "\n")


input:
 tensor([[1, 2, 4, 5],
        [4, 3, 2, 9]]) 

embedding.weight:
 Parameter containing:
tensor([[-0.7747,  0.7926, -0.0062],
        [-0.4377,  0.4657, -0.1880],
        [-0.8975,  0.4169, -0.3840],
        [ 0.0394,  0.4869, -0.1476],
        [-0.4459, -0.0336, -0.6461],
        [ 0.3470,  0.8133, -0.8232],
        [ 0.7238,  1.3477,  0.9699],
        [-1.0729,  0.4506,  0.0600],
        [-0.2728,  0.0554,  1.9797],
        [ 0.2763,  0.3080, -0.2687]], requires_grad=True) 

output:
 tensor([[[-0.4377,  0.4657, -0.1880],
         [-0.8975,  0.4169, -0.3840],
         [-0.4459, -0.0336, -0.6461],
         [ 0.3470,  0.8133, -0.8232]],

        [[-0.4459, -0.0336, -0.6461],
         [ 0.0394,  0.4869, -0.1476],
         [-0.8975,  0.4169, -0.3840],
         [ 0.2763,  0.3080, -0.2687]]], grad_fn=<EmbeddingBackward0>) 

embedding.weight:
 Parameter containing:
tensor([[-0.7747,  0.7926, -0.0062],
        [-0.5377,  0.3657, -0.2880],
        [-0.9975,  0.3169, -0.4840],
        [-

In [50]:
# 调包计算
import torch
import torch.nn as nn
torch.manual_seed(666)
input = torch.LongTensor([[0,2,0,5]])
print("input:\n", input, "\n")

embedding = nn.Embedding(num_embeddings=10, embedding_dim=3, padding_idx=0)

print("embedding.weight:\n", embedding.weight, "\n")

output = embedding(input)
print("output:\n", output, "\n")

loss = output.sum()
optimizer = torch.optim.Adam(embedding.parameters(), lr=0.1) 
loss.backward()
optimizer.step()  # 反向传播不会更新padding_idx对应的向量

print("embedding.weight:\n", embedding.weight, "\n")

input:
 tensor([[0, 2, 0, 5]]) 

embedding.weight:
 Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000],
        [-0.4377,  0.4657, -0.1880],
        [-0.8975,  0.4169, -0.3840],
        [ 0.0394,  0.4869, -0.1476],
        [-0.4459, -0.0336, -0.6461],
        [ 0.3470,  0.8133, -0.8232],
        [ 0.7238,  1.3477,  0.9699],
        [-1.0729,  0.4506,  0.0600],
        [-0.2728,  0.0554,  1.9797],
        [ 0.2763,  0.3080, -0.2687]], requires_grad=True) 

output:
 tensor([[[ 0.0000,  0.0000,  0.0000],
         [-0.8975,  0.4169, -0.3840],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.3470,  0.8133, -0.8232]]], grad_fn=<EmbeddingBackward0>) 

embedding.weight:
 Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000],
        [-0.4377,  0.4657, -0.1880],
        [-0.9975,  0.3169, -0.4840],
        [ 0.0394,  0.4869, -0.1476],
        [-0.4459, -0.0336, -0.6461],
        [ 0.2470,  0.7133, -0.9232],
        [ 0.7238,  1.3477,  0.9699],
        [-1.0729,  0.4506,  0.0600],
 