# 11 - RNN Character-Level Language Model

A character-level language model predicts the next character in a sequence, one character at a time. This is a foundational idea for LLMs, which predict the next token (word or subword) in a sequence.

In this notebook, you'll scaffold the components needed to build, train, and sample from a simple RNN-based character-level language model.

## 📚 Preparing the Dataset

To train a character-level model, you need to encode text as sequences of integer indices (one per character).

**LLM/Transformer Context:**
- LLMs use tokenization to convert text into sequences of integers. Here, we use characters as tokens.

### Task:
- Scaffold code to build a character vocabulary and encode text as integer sequences.
- Add comments explaining each step.

In [None]:
# TODO: Build char-to-index and index-to-char vocabularies
# TODO: Encode a sample text as a sequence of integer indices
pass

## 🔢 Input/Target Sequence Preparation

For language modeling, the input is a sequence of characters, and the target is the same sequence shifted by one character (the next character at each position).

**LLM/Transformer Context:**
- This is the same as next-token prediction in LLMs.

### Task:
- Scaffold code to create input and target sequences for training.

In [None]:
# TODO: Given an encoded text, create input and target sequences for training
pass

## 🧮 RNN Language Model Architecture

The model consists of an embedding layer, an RNN, and an output layer mapping to vocabulary logits.

**LLM/Transformer Context:**
- LLMs use token embeddings, deep transformer blocks, and output logits for next-token prediction. Here, we use a single RNN layer for simplicity.

### Task:
- Scaffold functions to initialize model parameters: embedding matrix, RNN weights, and output weights.

In [None]:
def init_char_rnn_model(vocab_size, embed_dim, hidden_dim):
    """
    Initialize parameters for a character-level RNN language model.
    Args:
        vocab_size (int): Number of unique characters.
        embed_dim (int): Embedding dimension.
        hidden_dim (int): RNN hidden state dimension.
    Returns:
        dict: Model parameters (embedding, W_xh, W_hh, b_h, W_hy, b_y)
    """
    # TODO: Initialize all model parameters
    pass

## 🔗 Forward Pass: Embedding, RNN, Output

The forward pass consists of embedding lookup, RNN unrolling, and output logits computation for each time step.

**LLM/Transformer Context:**
- This mirrors the forward pass in LLMs, just at the character level and with a single RNN layer.

### Task:
- Scaffold a function for the forward pass through the model, returning logits for each time step.

In [None]:
def char_rnn_forward(input_seq, h0, params):
    """
    Forward pass for a character-level RNN language model.
    Args:
        input_seq (np.ndarray): Sequence of input character indices (seq_len,)
        h0 (np.ndarray): Initial hidden state (hidden_dim,)
        params (dict): Model parameters
    Returns:
        logits_seq (np.ndarray): Logits for each time step (seq_len x vocab_size)
        h_states (list): Hidden states for each time step
    """
    # TODO: Implement the forward pass (embedding -> RNN -> output)
    pass

## 🧮 Loss Computation: Cross-Entropy

Compute the cross-entropy loss between the predicted logits and the true next character at each time step.

**LLM/Transformer Context:**
- This is the same loss used in LLMs for next-token prediction.

### Task:
- Scaffold a function to compute the average cross-entropy loss over a sequence.

In [None]:
def sequence_cross_entropy_loss(logits_seq, target_seq):
    """
    Compute average cross-entropy loss over a sequence.
    Args:
        logits_seq (np.ndarray): Logits for each time step (seq_len x vocab_size)
        target_seq (np.ndarray): True next character indices (seq_len,)
    Returns:
        float: Average loss
    """
    # TODO: Implement sequence cross-entropy loss
    pass

## 🔁 Sampling: Generating Text

After training, you can sample new text by feeding the model's output back as input.

**LLM/Transformer Context:**
- This is how LLMs generate text, one token at a time.

### Task:
- Scaffold a function to sample a sequence of characters from the trained model, given a start character.

In [None]:
def sample_char_rnn(start_char_idx, params, char_to_idx, idx_to_char, sample_length):
    """
    Sample a sequence of characters from the trained RNN model.
    Args:
        start_char_idx (int): Index of the start character.
        params (dict): Trained model parameters.
        char_to_idx (dict): Character to index mapping.
        idx_to_char (dict): Index to character mapping.
        sample_length (int): Number of characters to sample.
    Returns:
        str: Generated text sequence.
    """
    # TODO: Implement text sampling from the RNN
    pass

## 🧠 Final Summary: Character-Level Language Modeling and LLMs

- Character-level language models are a simple but powerful way to understand sequence modeling and next-token prediction.
- LLMs use the same principles, but with subword/word tokens, deep transformer blocks, and much larger vocabularies.
- Mastering this workflow gives you a strong foundation for building and understanding LLMs.

In the next notebook, you'll explore more advanced recurrent architectures like LSTM and GRU!