```{contents}
```
## Encoder–Decoder Model (Seq2Seq Architecture)

An **Encoder–Decoder model** is a neural network architecture designed to transform one sequence into another sequence.
It is the foundation of machine translation, summarization, question answering, and many multimodal systems.

---

### High-Level Idea

**Input sequence → Encoder → Context representation → Decoder → Output sequence**

Example:
English sentence → Encoder → semantic representation → Decoder → French sentence

---

### Components

#### Encoder

The **encoder** reads the entire input sequence and converts it into a **set of internal representations**.

**Responsibilities:**

* Understand input meaning
* Capture relationships between tokens
* Produce hidden states representing the input

**Operations:**

* Token embeddings
* Positional encoding
* Multi-head self-attention
* Feedforward layers

Output: A sequence of contextual vectors

---

#### Decoder

The **decoder** generates the output sequence **one token at a time**, conditioned on:

1. Its own previous outputs
2. The encoder’s representations

**Operations:**

* Masked self-attention (prevents seeing future tokens)
* Cross-attention (attends to encoder outputs)
* Feedforward layers
* Softmax over vocabulary

---

### Attention Mechanism

#### Self-Attention (Encoder & Decoder)

Learns relationships **within the same sequence**.

#### Cross-Attention (Decoder only)

Connects output generation to input meaning by attending to encoder states.

---

### Transformer Encoder–Decoder Example

```
Input Tokens
   ↓
[ Encoder Blocks ]
   ↓
Encoded Representations
   ↓        ↑
   → Cross-Attention ←
   ↓
[ Decoder Blocks ]
   ↓
Output Tokens
```

---

### Why Encoder–Decoder Works Well

| Benefit                       | Explanation                                  |
| ----------------------------- | -------------------------------------------- |
| Flexible input/output lengths | Works for translation, summarization         |
| Strong context understanding  | Encoder builds full semantic map             |
| Precise generation control    | Decoder conditions on both context & history |

---

### Use Cases

* Machine Translation
* Document Summarization
* Speech-to-Text
* Image Captioning
* Question Answering
* Multimodal AI systems

---

### Comparison with Decoder-Only Models

| Feature                           | Encoder–Decoder | Decoder-Only |
| --------------------------------- | --------------- | ------------ |
| Handles input & output separately | Yes             | No           |
| Cross-attention                   | Yes             | No           |
| Better for transformation tasks   | Yes             | Moderate     |
| Prompt simplicity                 | Medium          | High         |

---

### Examples of Encoder–Decoder Models

* T5
* BART
* MarianMT
* Pegasus
* Whisper (speech → text)