### Positional Encoding
Positional encoding is a vector added to the input embeddings (word vectors) at each position. 
This vector is designed in such a way that it allows the model to easily figure out the relative and absolute positions of tokens.

### Positional Encoding Formula

For a token at position \( \text{pos} \) and embedding dimension \( i \), the positional encoding is computed as:

#### For even dimensions (\( 2i \)):
$$
PE_{\text{(pos, 2i)}} = \sin\left(\frac{\text{pos}}{10000^{\frac{2i}{d}}}\right)
$$

#### For odd dimensions (\( 2i+1 \)):
$$
PE_{\text{(pos, 2i+1)}} = \cos\left(\frac{\text{pos}}{10000^{\frac{2i}{d}}}\right)
$$

---

### Where:
- \( \text{pos} \): The position index of the token in the sequence.
- \( i \): The embedding dimension index.
- \( d \): The total embedding dimension size (e.g., 512).
- \( 10000 \): A scaling factor to control the frequency of the sine and cosine functions.

---

### Key Points:
1. Sine is applied to **even** dimensions, and cosine is applied to **odd** dimensions.
2. The denominator \( 10000^{\frac{2i}{d}} \) controls the frequency of the oscillations:
   - **Low-frequency** components for smaller \( i \).
   - **High-frequency** components for larger \( i \).


### Simplest Implementation

In [2]:
import numpy as np
import torch

def positional_encoding(seq_len, d_model):
    """
    Computes positional encodings using loops.
    
    Args:
        seq_len (int): Length of the sequence (number of tokens).
        d_model (int): Embedding dimension.
    
    Returns:
        torch.Tensor: Positional encoding matrix of shape (seq_len, d_model).
    """
    # Initialize positional encoding matrix
    pos_enc = np.zeros((seq_len, d_model))
    
    # Loop over each position in the sequence
    for pos in range(seq_len):
        # Loop over each dimension in the embedding
        for i in range(d_model):
            # Compute the denominator term: 10000^(2*(i//2)/d_model)
            denominator = 10000 ** ((2 * (i // 2)) / d_model)
            
            # Apply sine to even dimensions, cosine to odd dimensions
            if i % 2 == 0:  # Even dimensions: 2i
                pos_enc[pos, i] = np.sin(pos / denominator)
            else:  # Odd dimensions: 2i+1
                pos_enc[pos, i] = np.cos(pos / denominator)
    
    # Convert to PyTorch tensor
    return torch.tensor(pos_enc, dtype=torch.float32)

# Example usage
seq_len = 10  # Sequence length (number of tokens)
d_model = 8   # Embedding dimension

pos_encoding = positional_encoding(seq_len, d_model)
print("Positional Encoding Matrix:")
print(pos_encoding)


Positional Encoding Matrix:
tensor([[ 0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00,  0.0000e+00,  1.0000e+00],
        [ 8.4147e-01,  5.4030e-01,  9.9833e-02,  9.9500e-01,  9.9998e-03,
          9.9995e-01,  1.0000e-03,  1.0000e+00],
        [ 9.0930e-01, -4.1615e-01,  1.9867e-01,  9.8007e-01,  1.9999e-02,
          9.9980e-01,  2.0000e-03,  1.0000e+00],
        [ 1.4112e-01, -9.8999e-01,  2.9552e-01,  9.5534e-01,  2.9996e-02,
          9.9955e-01,  3.0000e-03,  1.0000e+00],
        [-7.5680e-01, -6.5364e-01,  3.8942e-01,  9.2106e-01,  3.9989e-02,
          9.9920e-01,  4.0000e-03,  9.9999e-01],
        [-9.5892e-01,  2.8366e-01,  4.7943e-01,  8.7758e-01,  4.9979e-02,
          9.9875e-01,  5.0000e-03,  9.9999e-01],
        [-2.7942e-01,  9.6017e-01,  5.6464e-01,  8.2534e-01,  5.9964e-02,
          9.9820e-01,  6.0000e-03,  9.9998e-01],
        [ 6.5699e-01,  7.5390e-01,  6.4422e-01,  7.6484e-01,  6.9943e-02,
          9.9755e-01,  6.9999e-03,  9

### Efficient Implementation: 

In [4]:
def positional_encoding(seq_len, d_model):
    """
    Computes the positional encodings for a sequence of length seq_len and embedding dimension d_model.
    
    Args:
        seq_len (int): Length of the input sequence.
        d_model (int): Dimension of the embedding space.
    
    Returns:
        torch.Tensor: Positional encoding matrix of shape (seq_len, d_model).
    """
    # Initialize a positional encoding matrix (seq_len, d_model)
    pos_enc = np.zeros((seq_len, d_model))
    
    # Get position indices (0 to seq_len-1)
    positions = np.arange(0, seq_len)[:, np.newaxis]  # Shape: (seq_len, 1)
    
    # Get dimension indices (0 to d_model-1)
    dimensions = np.arange(0, d_model)[np.newaxis, :]  # Shape: (1, d_model)
    
    # Compute the positional encoding formula
    denominator = 10000 ** (2 * (dimensions // 2) / d_model)
    pos_enc[:, 0::2] = np.sin(positions / denominator[:, 0::2])  # Apply sine to even indices
    pos_enc[:, 1::2] = np.cos(positions / denominator[:, 1::2])  # Apply cosine to odd indices
    
    # Convert to PyTorch tensor for compatibility with deep learning frameworks
    return torch.tensor(pos_enc, dtype=torch.float32)

# Example usage
seq_len = 10  # Sequence length (number of tokens)
d_model = 8   # Embedding dimension

pos_encoding = positional_encoding(seq_len, d_model)
print("Positional Encoding Matrix:")
print(pos_encoding)

Positional Encoding Matrix:
tensor([[ 0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00,  0.0000e+00,  1.0000e+00],
        [ 8.4147e-01,  5.4030e-01,  9.9833e-02,  9.9500e-01,  9.9998e-03,
          9.9995e-01,  1.0000e-03,  1.0000e+00],
        [ 9.0930e-01, -4.1615e-01,  1.9867e-01,  9.8007e-01,  1.9999e-02,
          9.9980e-01,  2.0000e-03,  1.0000e+00],
        [ 1.4112e-01, -9.8999e-01,  2.9552e-01,  9.5534e-01,  2.9996e-02,
          9.9955e-01,  3.0000e-03,  1.0000e+00],
        [-7.5680e-01, -6.5364e-01,  3.8942e-01,  9.2106e-01,  3.9989e-02,
          9.9920e-01,  4.0000e-03,  9.9999e-01],
        [-9.5892e-01,  2.8366e-01,  4.7943e-01,  8.7758e-01,  4.9979e-02,
          9.9875e-01,  5.0000e-03,  9.9999e-01],
        [-2.7942e-01,  9.6017e-01,  5.6464e-01,  8.2534e-01,  5.9964e-02,
          9.9820e-01,  6.0000e-03,  9.9998e-01],
        [ 6.5699e-01,  7.5390e-01,  6.4422e-01,  7.6484e-01,  6.9943e-02,
          9.9755e-01,  6.9999e-03,  9

### PyTorch Implementation

In [5]:
import math
class PositionalEncoding(torch.nn.Module):
    def __init__(self, embed_dim, max_len=256):
        super().__init__()
        # create a matrix of [seq_len, hidden_dim] representing positional encoding for each token in sequence
        pe = torch.zeros(max_len, embed_dim)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) # (max_len, 1)
        div_term = torch.exp(torch.arange(0, embed_dim, 2).float() * (-math.log(10000.0) / embed_dim))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)
        self.register_buffer('pe', pe, persistent=False)

    def forward(self, x):
        x = x + self.pe[:, :x.size(1)]
        return x

In [8]:
position = torch.arange(0, 9, dtype=torch.float)
position

tensor([0., 1., 2., 3., 4., 5., 6., 7., 8.])