# 21 - Positional Encoding in the Transformer

Transformers require positional encoding to model the order of tokens, since self-attention alone is permutation-invariant. In this notebook, you'll scaffold the integration of positional encoding into the transformer block, as used in LLMs.

## 🔢 Adding Positional Encoding to Input Embeddings

Before any transformer block, positional encodings are added to the input token embeddings.

**LLM/Transformer Context:**
- This step is essential for all transformer-based LLMs (e.g., GPT, BERT, T5).

### Task:
- Scaffold a function to add positional encodings to input embeddings.
- Add a docstring explaining its role.

In [None]:
def add_positional_encoding_to_embeddings(embeddings, pos_encodings):
    """
    Add positional encodings to input embeddings before the transformer block.
    Args:
        embeddings (np.ndarray): Input embeddings (seq_len x d_model)
        pos_encodings (np.ndarray): Positional encodings (seq_len x d_model)
    Returns:
        np.ndarray: Position-aware embeddings (seq_len x d_model)
    """
    # TODO: Add positional encodings to embeddings
    pass

## 🧮 Sinusoidal vs. Learnable Positional Encodings

Transformers can use either fixed (sinusoidal) or learnable positional encodings. Each has trade-offs in generalization and flexibility.

**LLM/Transformer Context:**
- GPT and BERT use learnable encodings; the original transformer used sinusoidal.

### Task:
- Scaffold code to select and initialize either sinusoidal or learnable positional encodings for a given sequence length and embedding dimension.
- Add comments explaining the choice.

In [None]:
# TODO: Choose and initialize positional encodings (sinusoidal or learnable)
pass

## 🔗 Integrating Positional Encoding in the Transformer Block

After adding positional encodings, the embeddings are passed through the transformer block (multi-head attention, feedforward, etc.).

**LLM/Transformer Context:**
- This integration is the first step in every transformer layer in LLMs.

### Task:
- Scaffold a function to perform the full input step: embedding lookup, add positional encoding, and pass to the transformer block.
- Add a docstring explaining the workflow.

In [None]:
def transformer_input_step(token_indices, embedding_matrix, pos_encodings, transformer_block_fn):
    """
    Full input step for a transformer: embedding lookup, add positional encoding, pass to transformer block.
    Args:
        token_indices (np.ndarray): Sequence of token indices (seq_len,)
        embedding_matrix (np.ndarray): Token embedding matrix (vocab_size x d_model)
        pos_encodings (np.ndarray): Positional encodings (seq_len x d_model)
        transformer_block_fn (callable): Function for the transformer block
    Returns:
        np.ndarray: Output of the transformer block (seq_len x d_model)
    """
    # TODO: Implement the full input step for the transformer
    pass

## 🧠 Final Summary: Positional Encoding in LLMs

- Adding positional encoding to input embeddings is essential for transformers to model sequence order.
- Both fixed and learnable encodings are used in LLMs, with different trade-offs.
- Mastering this integration is key to building and understanding transformer-based LLMs.

In the next notebook, you'll build the full transformer block from scratch!