<a href="https://colab.research.google.com/github/sharma-himanshukumar/LLM_Learning/blob/main/Various_positional_encoding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Documentation and When to Use Each Method

#### 1. Sinusoidal Positional Encoding

**Description**:
Sinusoidal positional encodings use sine and cosine functions to encode positions. Each dimension of the encoding corresponds to a different frequency.

**Features**:
- Fixed and deterministic.
- Provides unique encoding for each position.
- Ensures smooth transitions between positions, which can help in capturing the order of tokens.

**When to Use**:
- When you want a simple, fixed positional encoding.
- Suitable for applications where the sequence length can vary and you want to avoid learnable parameters for positional encodings.

**Code**:
```python
def get_sinusoidal_positional_encoding(seq_len, d_model):
    PE = np.zeros((seq_len, d_model))
    for pos in range(seq_len):
        for i in range(0, d_model, 2):
            PE[pos, i] = np.sin(pos / (10000 ** (i / d_model)))
            PE[pos, i + 1] = np.cos(pos / (10000 ** ((i + 1) / d_model)))
    return PE
```

#### 2. Learnable Positional Encoding

**Description**:
Learnable positional encodings assign a unique embedding vector to each position, which is learned during training.

**Features**:
- Adaptable to the specific task and data.
- Can capture more nuanced positional information.

**When to Use**:
- When you have enough data and computational resources to train the positional encodings.
- Useful for tasks where the model benefits from learning the positional encodings from the data.

**Code**:
```python
class LearnablePositionalEncoding(nn.Module):
    def __init__(self, seq_len, d_model):
        super().__init__()
        self.positional_embeddings = nn.Embedding(seq_len, d_model)
    
    def forward(self, x):
        positions = torch.arange(x.size(1), device=x.device).unsqueeze(0)
        return self.positional_embeddings(positions)
```

#### 3. Relative Positional Encoding

**Description**:
Relative positional encodings encode the relative distance between tokens instead of their absolute positions.

**Features**:
- Focuses on the relative position of tokens, which can be more important in some tasks.
- Allows for more flexible handling of varying sequence lengths.

**When to Use**:
- When relative positions are more critical than absolute positions (e.g., in some language modeling tasks).
- In models that need to handle sequences of varying lengths dynamically.

**Code**:
```python
def get_relative_positional_encoding(seq_len, d_model):
    range_vec = torch.arange(seq_len)
    distance_mat = range_vec[None, :] - range_vec[:, None]
    relative_positions = F.embedding(distance_mat, torch.randn(2 * seq_len - 1, d_model))
    return relative_positions
```

#### 4. Rotary Positional Embeddings (RoPE)

**Description**:
Rotary positional embeddings apply a rotation to the query and key vectors based on their positions during the attention mechanism.

**Features**:
- Integrates positional information directly into the attention mechanism.
- Ensures that the positional encoding is inherently tied to the model's computation.

**When to Use**:
- When you want to tightly couple positional information with the attention mechanism.
- Suitable for models that benefit from rotational invariance in positional encodings.

**Code**:
```python
def apply_rope(x, seq_len, d_model):
    pos = torch.arange(seq_len, dtype=torch.float32, device=x.device).unsqueeze(1)
    freqs = torch.arange(0, d_model, 2, dtype=torch.float32, device=x.device) / d_model
    angles = pos * (10000 ** -freqs)
    angles = torch.cat([torch.sin(angles), torch.cos(angles)], dim=1)
    return x * angles
```

### Example Usage and Output

This script demonstrates how to generate and print positional encodings for a sample input sequence of length 10 with a model dimension of 16. The output shows the different encodings generated by each method, providing a visual comparison of their structures.


In [7]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

# Sinusoidal Positional Encoding
def get_sinusoidal_positional_encoding(seq_len, d_model):
    PE = np.zeros((seq_len, d_model))
    for pos in range(seq_len):
        for i in range(0, d_model, 2):
            PE[pos, i] = np.sin(pos / (10000 ** (i / d_model)))
            PE[pos, i + 1] = np.cos(pos / (10000 ** ((i + 1) / d_model)))
    return PE

# Learnable Positional Encoding
class LearnablePositionalEncoding(nn.Module):
    def __init__(self, seq_len, d_model):
        super().__init__()
        self.positional_embeddings = nn.Embedding(seq_len, d_model)

    def forward(self, x):
        positions = torch.arange(x.size(1), device=x.device).unsqueeze(0)
        return self.positional_embeddings(positions)

# Relative Positional Encoding (Simplified)
def get_relative_positional_encoding(seq_len, d_model):
    range_vec = torch.arange(seq_len)
    distance_mat = range_vec[None, :] - range_vec[:, None]
    distance_mat = distance_mat +seq_len -1
    relative_positions = F.embedding(distance_mat, torch.randn(2 * seq_len - 1, d_model))
    return relative_positions

# Rotary Positional Embeddings (RoPE)
def apply_rope(x, seq_len, d_model):
    pos = torch.arange(seq_len, dtype=torch.float32, device=x.device).unsqueeze(1)
    freqs = torch.arange(0, d_model, 2, dtype=torch.float32, device=x.device) / d_model
    angles = pos * (10000 ** -freqs)
    angles = torch.cat([torch.sin(angles), torch.cos(angles)], dim=1)
    return x * angles

# Parameters
seq_len = 10
d_model = 16

# Sample input
x = torch.randn(1, seq_len, d_model)

# Sinusoidal Positional Encoding
sinusoidal_encoding = get_sinusoidal_positional_encoding(seq_len, d_model)
print("Sinusoidal Positional Encoding:\n", sinusoidal_encoding)

# Learnable Positional Encoding
learnable_pos_enc = LearnablePositionalEncoding(seq_len, d_model)
learnable_encoding = learnable_pos_enc(x)
print("Learnable Positional Encoding:\n", learnable_encoding)

# Relative Positional Encoding
relative_encoding = get_relative_positional_encoding(seq_len, d_model)
print("Relative Positional Encoding:\n", relative_encoding)

# Rotary Positional Embeddings (RoPE)
rope_encoding = apply_rope(x, seq_len, d_model)
print("Rotary Positional Encoding:\n", rope_encoding)


Sinusoidal Positional Encoding:
 [[ 0.00000000e+00  1.00000000e+00  0.00000000e+00  1.00000000e+00
   0.00000000e+00  1.00000000e+00  0.00000000e+00  1.00000000e+00
   0.00000000e+00  1.00000000e+00  0.00000000e+00  1.00000000e+00
   0.00000000e+00  1.00000000e+00  0.00000000e+00  1.00000000e+00]
 [ 8.41470985e-01  8.46009110e-01  3.10983593e-01  9.84230234e-01
   9.98334166e-02  9.98419278e-01  3.16175064e-02  9.99841890e-01
   9.99983333e-03  9.99984189e-01  3.16227239e-03  9.99998419e-01
   9.99999833e-04  9.99999842e-01  3.16227761e-04  9.99999984e-01]
 [ 9.09297427e-01  4.31462829e-01  5.91127117e-01  9.37418309e-01
   1.98669331e-01  9.93682109e-01  6.32033979e-02  9.99367611e-01
   1.99986667e-02  9.99936755e-01  6.32451316e-03  9.99993675e-01
   1.99999867e-03  9.99999368e-01  6.32455490e-04  9.99999937e-01]
 [ 1.41120008e-01 -1.15966142e-01  8.12648897e-01  8.61040649e-01
   2.95520207e-01  9.85803469e-01  9.47260913e-02  9.98577313e-01
   2.99955002e-02  9.99857701e-01  9.486