# 09 - RNN from Scratch

Recurrent Neural Networks (RNNs) are foundational for sequence modeling. While transformers have largely replaced RNNs in LLMs, understanding RNNs builds intuition for how models process sequences and maintain memory over time.

In this notebook, you'll scaffold the core logic of an RNN, step by step, and see how these ideas connect to modern LLMs.

## 🔢 RNN Cell: The Core Computation

An RNN cell processes one time step of a sequence, updating its hidden state:

$$ h_t = \tanh(W_{xh} x_t + W_{hh} h_{t-1} + b_h) $$

**LLM/Transformer Context:**
- RNNs were the first models to handle sequential data, a key challenge in language modeling. Transformers generalize this idea with self-attention.

### Task:
- Scaffold a function for a single RNN cell step.
- Add a docstring explaining its role in sequence modeling.

In [None]:
def rnn_cell_step(x_t, h_prev, W_xh, W_hh, b_h):
    """
    Compute one step of an RNN cell.
    Args:
        x_t (np.ndarray): Input at time t (input_dim,)
        h_prev (np.ndarray): Previous hidden state (hidden_dim,)
        W_xh (np.ndarray): Input-to-hidden weights (hidden_dim x input_dim)
        W_hh (np.ndarray): Hidden-to-hidden weights (hidden_dim x hidden_dim)
        b_h (np.ndarray): Hidden bias (hidden_dim,)
    Returns:
        np.ndarray: Current hidden state (hidden_dim,)
    """
    # TODO: Implement the RNN cell computation
    pass

## 🔗 Unrolling the RNN Over a Sequence

To process a sequence, the RNN cell is applied at each time step, passing the hidden state forward.

**LLM/Transformer Context:**
- This is analogous to how transformers process sequences, but with attention instead of recurrence.

### Task:
- Scaffold a function to run an RNN over an entire input sequence.
- Add a docstring explaining the sequence processing.

In [None]:
def rnn_forward(X_seq, h0, W_xh, W_hh, b_h):
    """
    Run an RNN over an input sequence.
    Args:
        X_seq (np.ndarray): Input sequence (seq_len x input_dim)
        h0 (np.ndarray): Initial hidden state (hidden_dim,)
        W_xh, W_hh, b_h: RNN parameters
    Returns:
        list: List of hidden states for each time step
    """
    # TODO: Implement RNN unrolling over the sequence
    pass

## 🧮 Output Layer: From Hidden State to Prediction

After processing the sequence, the RNN's hidden state(s) are mapped to output predictions (e.g., next token probabilities).

**LLM/Transformer Context:**
- In language modeling, the output layer predicts the next token at each position, just like in transformers.

### Task:
- Scaffold a function to compute output logits from hidden states.
- Add a docstring explaining its use in sequence prediction.

In [None]:
def rnn_output_layer(h_states, W_hy, b_y):
    """
    Compute output logits from RNN hidden states.
    Args:
        h_states (list or np.ndarray): Hidden states (seq_len x hidden_dim)
        W_hy (np.ndarray): Hidden-to-output weights (output_dim x hidden_dim)
        b_y (np.ndarray): Output bias (output_dim,)
    Returns:
        np.ndarray: Output logits (seq_len x output_dim)
    """
    # TODO: Map hidden states to output logits
    pass

## 🔁 Training an RNN: Loss and Backpropagation

To train an RNN, compute the loss (e.g., cross-entropy) at each time step and backpropagate through time (BPTT).

**LLM/Transformer Context:**
- BPTT is the precursor to the attention-based backpropagation in transformers.

### Task:
- Scaffold a function to compute the total loss over a sequence (e.g., sum of cross-entropy losses).
- Add a docstring explaining its role in sequence modeling.

In [None]:
def rnn_sequence_loss(logits_seq, targets_seq, loss_fn):
    """
    Compute the total loss over a sequence.
    Args:
        logits_seq (np.ndarray): Output logits (seq_len x output_dim)
        targets_seq (np.ndarray): True target indices (seq_len,)
        loss_fn (callable): Loss function (e.g., cross-entropy)
    Returns:
        float: Total loss over the sequence
    """
    # TODO: Compute total sequence loss
    pass

## 🧠 Final Summary: RNNs and LLMs

- RNNs were the first models to handle sequential data, a key challenge in language modeling.
- Transformers build on these ideas, replacing recurrence with self-attention for better parallelism and long-range memory.
- Understanding RNNs gives you a strong foundation for grasping the inner workings of LLMs and transformers.

In the next notebook, you'll dive deeper into backpropagation through time (BPTT) and see how gradients flow through sequences!