

# Example Sentence (English → French)

**Input (English):**

> “The dog that barked all night kept the neighbors awake and made them very angry.”
> (15 words)

**Target (French):**

> “Le chien qui a aboyé toute la nuit a empêché les voisins de dormir et les a rendus très fâchés.”



# **1. LSTM (Long Short-Term Memory)**

### **Processing**

* Reads **one word at a time, sequentially**.
* Keeps a hidden state that updates with each word.
* Example:

  * Step 1: “The” → hidden state h1.
  * Step 2: “dog” → h2 depends on h1.
  * Step 10: “neighbors” → h10 depends on h9.
  * Step 15: “angry” → h15 depends on everything before it (compressed into a single state).

### **Impact**

* When translating “made them very angry,” the model may have already **lost context** of *“dog barked all night”*.
* LSTM struggles to maintain links between *cause* (“barked all night”) and *effect* (“neighbors angry”) because those words are far apart.

### **Result**

* Likely translation might miss nuance:

  * “Le chien a aboyé la nuit. Les voisins étaient très fâchés.”
    (Drops the causal connection).


# **2. GRU (Gated Recurrent Unit)**

### **Processing**

* Same **step-by-step** style as LSTM but with **simpler gates** (reset/update).
* Slightly faster, fewer parameters.
* Still: “angry” (last word) depends indirectly on “dog” (first word) via 14 updates.

### **Impact**

* GRU retains context a little better for medium-length sentences.
* But with **15 words**, still suffers from **vanishing dependency strength**.
* It knows *neighbors were angry* but not strongly *why*.

### **Result**

* Translation slightly better but still weak:

  * “Le chien a aboyé toute la nuit. Les voisins étaient très fâchés.”
    (Captures both sentences, but **loses cause-effect link** “kept awake → angry”).



#  **3. Transformer**

### **Processing**

* Processes **all words in parallel** with self-attention.
* At each layer, every word **looks at every other word directly**.
* Example:

  * “neighbors” attends strongly to “kept awake.”
  * “angry” attends strongly to “barked all night.”
  * “dog” attends to “barked.”

### **Impact**

* Long-range dependencies preserved: *barking all night* is directly tied to *angry neighbors*.
* Multiple attention heads capture **different relationships**:

  * Head 1: subject–verb (*dog ↔ barked*).
  * Head 2: cause–effect (*barked ↔ angry*).
  * Head 3: time (*all night ↔ kept awake*).

### **Result**

* Translation is accurate and keeps causal meaning:

  * “Le chien qui a aboyé toute la nuit a empêché les voisins de dormir et les a rendus très fâchés.”
    (Preserves both **semantics** and **structure**).

---

#  **Detailed Comparison Table**

| Feature                 | LSTM                                      | GRU                            | Transformer                        |
| ----------------------- | ----------------------------------------- | ------------------------------ | ---------------------------------- |
| **Processing**          | Sequential (1 word at a time)             | Sequential (simpler gates)     | Parallel (all words at once)       |
| **Memory**              | Good short/medium, weak long              | Similar, slightly better       | Excellent (long dependencies easy) |
| **Speed**               | Slow                                      | Faster than LSTM               | Very fast (parallelizable)         |
| **Parameters**          | Many                                      | Fewer than LSTM                | Many, but scale efficiently        |
| **Translation Example** | Loses cause-effect                        | Partial cause-effect           | Preserves full cause-effect        |
| **Output (French)**     | Neighbors angry, but missing “kept awake” | Slightly better but incomplete | Full accurate sentence preserved   |



**Summary:**

* **LSTM**: Strong for short text, struggles with long dependencies.
* **GRU**: Faster & simpler than LSTM, but still sequential.
* **Transformer**: Handles **long, complex sentences** best because attention allows direct links between *any* two words.

