# 15 - Encoder-Decoder Architecture (No Attention)

Encoder-decoder models are the foundation of many sequence-to-sequence tasks, such as translation and summarization. Before attention, these models used a fixed-size context vector to transfer information from the encoder to the decoder.

In this notebook, you'll scaffold the components of a basic encoder-decoder model, and see how this structure leads to the transformer architecture in LLMs.

## 🔢 Encoder: Sequence to Context Vector

The encoder processes the input sequence and produces a fixed-size context vector (usually the final hidden state).

**LLM/Transformer Context:**
- The encoder is analogous to the encoder stack in transformer-based models like BERT and T5.

### Task:
- Scaffold a function to run an RNN/LSTM/GRU encoder over an input sequence and return the final hidden state (context vector).
- Add a docstring explaining its role.

In [None]:
def encoder_forward(X_seq, h0, encoder_params, cell_type='rnn'):
    """
    Run the encoder over an input sequence to produce a context vector.
    Args:
        X_seq (np.ndarray): Input sequence (seq_len x input_dim)
        h0 (np.ndarray): Initial hidden state (hidden_dim,)
        encoder_params (dict): Encoder parameters (weights, biases, etc.)
        cell_type (str): 'rnn', 'lstm', or 'gru'
    Returns:
        np.ndarray: Context vector (final hidden state)
    """
    # TODO: Implement encoder forward pass for the chosen cell type
    pass

## 🔗 Decoder: Context Vector to Output Sequence

The decoder generates the output sequence, using the context vector as its initial hidden state.

**LLM/Transformer Context:**
- The decoder is analogous to the decoder stack in transformer-based models like GPT and T5.

### Task:
- Scaffold a function to run an RNN/LSTM/GRU decoder, generating an output sequence from the context vector.
- Add a docstring explaining its role.

In [None]:
def decoder_forward(context_vector, Y_seq, decoder_params, cell_type='rnn'):
    """
    Run the decoder to generate an output sequence from the context vector.
    Args:
        context_vector (np.ndarray): Initial hidden state for the decoder (hidden_dim,)
        Y_seq (np.ndarray): Output sequence input (seq_len x input_dim)
        decoder_params (dict): Decoder parameters (weights, biases, etc.)
        cell_type (str): 'rnn', 'lstm', or 'gru'
    Returns:
        np.ndarray: Output sequence logits (seq_len x output_dim)
    """
    # TODO: Implement decoder forward pass for the chosen cell type
    pass

## 🧮 Sequence-to-Sequence Training (No Attention)

Train the encoder-decoder model by minimizing the loss between the predicted and true output sequences.

**LLM/Transformer Context:**
- This is the basis for training translation, summarization, and other seq2seq models before attention and transformers.

### Task:
- Scaffold a function to compute the total loss over the output sequence.
- Add a docstring explaining its role.

In [None]:
def seq2seq_loss(logits_seq, target_seq, loss_fn):
    """
    Compute the total loss for the output sequence.
    Args:
        logits_seq (np.ndarray): Output logits (seq_len x output_dim)
        target_seq (np.ndarray): True output indices (seq_len,)
        loss_fn (callable): Loss function (e.g., cross-entropy)
    Returns:
        float: Total loss over the sequence
    """
    # TODO: Compute total sequence loss
    pass

## 🧠 Final Summary: Encoder-Decoder and LLMs

- Encoder-decoder models are the foundation of many sequence-to-sequence tasks.
- Before attention, these models relied on a fixed-size context vector, which limited their ability to handle long sequences.
- Transformers solve this with self-attention, allowing the model to access all input positions at every decoding step.

In the next notebook, you'll add attention to the encoder-decoder architecture, paving the way to transformers!