# 17 - Positional Encoding in Transformers

Transformers do not have any built-in notion of sequence order. Positional encoding injects information about the position of each token, enabling the model to capture order and structure in sequences.

In this notebook, you'll scaffold the core logic of positional encoding, and see how it is used in LLMs and transformers.

## 🔢 Why Positional Encoding?

Self-attention treats all tokens as a set, so we must add position information to the input embeddings.

**LLM/Transformer Context:**
- Every transformer-based LLM (e.g., GPT, BERT) uses positional encoding to model word order.

### Task:
- Scaffold a function to add positional encodings to input embeddings.
- Add a docstring explaining why this is necessary.

In [None]:
def add_positional_encoding(embeddings, pos_encodings):
    """
    Add positional encodings to input embeddings.
    Args:
        embeddings (np.ndarray): Input embeddings (seq_len x d_model)
        pos_encodings (np.ndarray): Positional encodings (seq_len x d_model)
    Returns:
        np.ndarray: Position-aware embeddings (seq_len x d_model)
    """
    # TODO: Add positional encodings to embeddings
    pass

## 🧮 Sinusoidal Positional Encoding (Vaswani et al.)

The original transformer uses fixed sinusoidal functions to encode positions, allowing the model to extrapolate to longer sequences.

**LLM/Transformer Context:**
- Sinusoidal encodings are used in many transformer models and are a standard baseline.

### Task:
- Scaffold a function to generate sinusoidal positional encodings for a given sequence length and embedding dimension.
- Add a docstring explaining the formula and its properties.

In [None]:
def sinusoidal_positional_encoding(seq_len, d_model):
    """
    Generate sinusoidal positional encodings (as in the original transformer).
    Args:
        seq_len (int): Length of the sequence.
        d_model (int): Embedding dimension.
    Returns:
        np.ndarray: Positional encodings (seq_len x d_model)
    """
    # TODO: Implement sinusoidal positional encoding
    pass

## 🔗 Learnable Positional Embeddings

Many modern LLMs use learnable positional embeddings instead of fixed encodings, allowing the model to adapt position representations during training.

**LLM/Transformer Context:**
- Learnable positional embeddings are used in GPT, BERT, and most large-scale LLMs.

### Task:
- Scaffold a function to initialize learnable positional embeddings.
- Add a docstring explaining their use.

In [None]:
def init_learnable_positional_embeddings(seq_len, d_model):
    """
    Initialize learnable positional embeddings.
    Args:
        seq_len (int): Length of the sequence.
        d_model (int): Embedding dimension.
    Returns:
        np.ndarray: Learnable positional embeddings (seq_len x d_model)
    """
    # TODO: Initialize learnable positional embeddings (e.g., random or zeros)
    pass

## 🧠 Final Summary: Positional Encoding in LLMs

- Positional encoding is essential for transformers to model sequence order.
- Both fixed (sinusoidal) and learnable encodings are used in LLMs.
- Understanding positional encoding is key to building and interpreting transformer models.

In the next notebook, you'll see how positional encoding is integrated into the full transformer block!