# 18 - Teacher Forcing in Sequence Models

Teacher forcing is a training technique for sequence-to-sequence models where the true previous output is fed as input to the decoder, rather than the model's own prediction. This helps models learn faster and more stably, especially in early training.

In this notebook, you'll scaffold the logic for teacher forcing and see how it is used in LLMs and transformers.

## 🔁 What is Teacher Forcing?

During training, the decoder receives the ground-truth previous token as input at each step, rather than its own previous prediction.

**LLM/Transformer Context:**
- Teacher forcing is used in training encoder-decoder models (including transformers) for tasks like translation and summarization.

### Task:
- Scaffold a function to perform a decoder forward pass with teacher forcing.
- Add a docstring explaining its role.

In [None]:
def decoder_forward_teacher_forcing(context_vector, target_seq, decoder_params, cell_type='rnn'):
    """
    Run the decoder with teacher forcing: at each step, feed the true previous token as input.
    Args:
        context_vector (np.ndarray): Initial hidden state for the decoder (hidden_dim,)
        target_seq (np.ndarray): Ground-truth output sequence (seq_len x input_dim)
        decoder_params (dict): Decoder parameters (weights, biases, etc.)
        cell_type (str): 'rnn', 'lstm', or 'gru'
    Returns:
        np.ndarray: Output logits for each step (seq_len x output_dim)
    """
    # TODO: Implement decoder forward pass with teacher forcing
    pass

## 🧮 Scheduled Sampling (Optional Extension)

Scheduled sampling is a technique where, during training, the model sometimes uses its own prediction as the next input instead of the ground-truth token. This helps bridge the gap between training and inference.

**LLM/Transformer Context:**
- Scheduled sampling can help models become more robust to their own mistakes during generation.

### Task:
- Scaffold a function for scheduled sampling, where you probabilistically choose between the true token and the model's prediction at each step.
- Add a docstring explaining its use.

In [None]:
def decoder_forward_scheduled_sampling(context_vector, target_seq, decoder_params, sampling_prob, cell_type='rnn'):
    """
    Run the decoder with scheduled sampling: probabilistically use ground-truth or model prediction as next input.
    Args:
        context_vector (np.ndarray): Initial hidden state for the decoder (hidden_dim,)
        target_seq (np.ndarray): Ground-truth output sequence (seq_len x input_dim)
        decoder_params (dict): Decoder parameters (weights, biases, etc.)
        sampling_prob (float): Probability of using model prediction as next input.
        cell_type (str): 'rnn', 'lstm', or 'gru'
    Returns:
        np.ndarray: Output logits for each step (seq_len x output_dim)
    """
    # TODO: Implement decoder forward pass with scheduled sampling
    pass

## 🧠 Final Summary: Teacher Forcing in LLMs

- Teacher forcing accelerates training and stabilizes sequence models by providing the correct context at each step.
- Scheduled sampling helps bridge the gap between training (teacher forcing) and inference (autoregressive generation).
- These techniques are widely used in training encoder-decoder LLMs and transformers for sequence generation tasks.

In the next notebook, you'll explore self-attention and the transformer architecture!