# Positional Encoding

To use the sequence order information, we can inject absolute or relative positional information by adding positional encoding to the input representations.

Suppose input  $\mathbf{X}\in\mathbb{R}^{n\times{d}}$, the positional encoding outputs  $\mathbf{X} + \mathbf{P}$  where:

$$p_{i, 2j} = sin\left(\frac{i}{10000^{2j/d}}\right)$$

$$p_{i, 2j + 1} = cos\left(\frac{i}{10000^{2j/d}}\right)$$

Rows represents by $sin, cos$ positions. 

Columns represents by frequencies.

In [1]:
import torch
from torch import nn

In [2]:
#@save
class PositionalEncoding(nn.Module):
    def __init__(self, num_hiddens, dropout, max_len=1000):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(dropout)
        # Create a long enough `P`
        self.P = torch.zeros((1, max_len, num_hiddens))
        X = torch.arange(max_len, dtype=torch.float32).reshape(-1, 1) / \
            torch.pow(10000, torch.arange(0, num_hiddens, 2, dtype=torch.float32) / num_hiddens)
        self.P[:, :, 0::2] = torch.sin(X)
        self.P[:, :, 1::2] = torch.cos(X)

    def forward(self, X):
        X = X + self.P[:, :X.shape[1], :].to(X.device)
        return self.dropout(X)

Positional encoding just like binary representation:

![jupyter](../images/10/position.svg)