# Lecture 5 Transformer - Positional Encoding

#### Positional Encoding

Positional encoding is a technique used to add positional information to input embeddings in Transformer models, allowing the model to understand the order of tokens in a sequence (since Transformers are order-agnostic by default). Below are three Python coding examples that progressively illustrate how positional encoding works:
- Basic Positional Encoding (Sine & Cosine) from Scratch
- Positional Encoding in a Simple Transformer Model
- Learnable Positional Encoding Using PyTorch (optional)

#### 3. Learnable Positional Encoding Using PyTorch
In this example, we implement learnable positional encoding, where the positional information is learned during training rather than being predefined using sine and cosine functions.

In [2]:
import torch
import torch.nn as nn

class LearnablePositionalEncoding(nn.Module):
    def __init__(self, seq_length, d_model):
        super(LearnablePositionalEncoding, self).__init__()
        # Learnable positional encoding (initialized randomly)
        self.positional_encoding = nn.Parameter(torch.randn(1, seq_length, d_model))

    def forward(self, x):
        """
        Adds positional encoding to the input tensor.

        Args:
            x: Input tensor of shape (batch_size, seq_length, d_model)

        Returns:
            A tensor with positional encoding added to the input
        """
        return x + self.positional_encoding

# Example usage
seq_length = 10 # max 10 words in each sentence
d_model = 16 # each word is vectorized into 1 x 16 array
batch_size = 2 # two sentences

# Sample input tensor (random values)
x = torch.randn(batch_size, seq_length, d_model)

# Create the learnable positional encoding layer
pos_encoding_layer = LearnablePositionalEncoding(seq_length, d_model)

# Forward pass
output = pos_encoding_layer(x)

print("Learnable Positional Encoding:\n", pos_encoding_layer.positional_encoding)
print("Input + Positional Encoding:\n", output)

Learnable Positional Encoding:
 Parameter containing:
tensor([[[-0.1276, -0.6889,  0.8759, -0.0136,  0.7803,  1.0841,  0.1368,
          -1.1224,  0.5634,  1.3568, -0.4604,  0.0918,  0.1806,  1.8718,
          -0.9727,  0.5435],
         [ 1.4102,  0.8206, -0.8326, -0.3513,  2.6679,  1.2296,  0.5776,
           0.9855, -0.7315,  0.7948,  0.0057,  0.4612,  0.1047, -0.3921,
          -2.6505, -0.1576],
         [ 1.5602,  0.7220,  0.4299, -0.2796, -0.8408,  2.1729,  1.6441,
           0.4115,  0.8372, -0.7143, -1.4775,  1.6577,  0.6052,  1.5958,
           0.0325,  0.1703],
         [-0.2209, -0.2384,  0.6595,  0.3653, -1.7404,  0.9811,  0.3686,
           1.2329,  1.9262, -0.0741,  0.7718, -0.0156, -1.1351, -0.4721,
          -0.5552,  0.7829],
         [-0.5426, -0.9944, -1.2903, -0.5876,  1.7312,  0.4334, -0.5261,
           0.9414, -0.0810, -0.9085,  0.1460, -0.0696, -1.2196,  0.5070,
           0.9800, -1.0328],
         [ 0.2745, -0.6019, -1.6439,  1.7115, -0.9995, -0.3480,  0.9282

**Explanation:**
- Learnable Positional Encoding: Instead of calculating the positional encoding using sine and cosine functions, the encoding is now a trainable tensor, initialized randomly, and learned during model training.
- Positional Encoding Parameter: self.positional_encoding is an nn.Parameter that holds the learnable positional encodings for the sequence. The shape is (1, seq_length, d_model) so that the same positional encodings are applied across different batches.
- Forward Pass: The input tensor x (with shape (batch_size, seq_length, d_model)) is summed with the positional encoding, adding positional information to the input.