## 1. Introduction & Motivation

### What We've Learned So Far

In our RNN journey, we have covered:
1. **Vanilla RNN** - Basic recurrent neural network
2. **LSTM** - Long Short-Term Memory (handles long-term dependencies)
3. **GRU** - Gated Recurrent Unit (simpler alternative to LSTM)
4. **Deep RNNs** - Stacking multiple RNN layers

Now, we'll learn one final important concept: **Bidirectional RNNs**

### The Limitation of Unidirectional RNNs

In a standard (unidirectional) RNN:

```
Time Step 1        Time Step 2        Time Step 3
    ‚Üì                  ‚Üì                  ‚Üì
  [x‚ÇÅ]              [x‚ÇÇ]              [x‚ÇÉ]
    ‚Üì                  ‚Üì                  ‚Üì
  ‚îå‚îÄ‚îÄ‚îÄ‚îê    h‚ÇÅ      ‚îå‚îÄ‚îÄ‚îÄ‚îê    h‚ÇÇ      ‚îå‚îÄ‚îÄ‚îÄ‚îê    h‚ÇÉ
  ‚îÇRNN‚îÇ ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚Üí  ‚îÇRNN‚îÇ ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚Üí  ‚îÇRNN‚îÇ ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚Üí ≈∑
  ‚îî‚îÄ‚îÄ‚îÄ‚îò            ‚îî‚îÄ‚îÄ‚îÄ‚îò            ‚îî‚îÄ‚îÄ‚îÄ‚îò
```

**Key Observation:**
- Information flows **only from left to right**
- The final output ≈∑ depends on x‚ÇÅ, x‚ÇÇ, x‚ÇÉ (all **past** inputs)
- But what if **future inputs** should affect **past outputs**?

### When Future Inputs Affect Past Outputs

There are many NLP scenarios where understanding the **full context** (both past AND future) is crucial:

| Scenario | Why Future Context Matters |
|----------|---------------------------|
| Named Entity Recognition | "Amazon" could be a company OR a river - need to see following words |
| Machine Translation | Word order differs between languages |
| Part-of-Speech Tagging | Same word can be noun/verb depending on context |
| Sentiment Analysis | Negation words can flip meaning of earlier words |

---

## 2. The Problem: Named Entity Recognition (NER) Example

### What is NER?

**Named Entity Recognition (NER)** is an NLP task where you identify and classify named entities in text:

- **Person (PER)**: "Nitish", "Elon Musk"
- **Organization (ORG)**: "Google", "Amazon"
- **Location (LOC)**: "Delhi", "Amazon River"

### The Amazon Problem

Consider these two sentences:

**Sentence 1:** "I love **Amazon**. It's a great **website**."

**Sentence 2:** "I love **Amazon**. It's a beautiful **river**."

### The Challenge

```
Processing left-to-right:

"I" ‚Üí "love" ‚Üí "Amazon" ‚Üí ???
                   ‚Üì
          Is it ORG or LOC?
          We don't know yet!
```

When processing "Amazon":
- If we only see "I love Amazon" ‚Üí **Ambiguous!** (Could be company or river)
- If we also see "...website" ‚Üí **Organization (ORG)**
- If we also see "...river" ‚Üí **Location (LOC)**

### The Key Insight

**The future input ("website" or "river") affects the classification of a past word ("Amazon")!**

A unidirectional RNN cannot handle this because it only has access to past context, not future context.

**Solution: Bidirectional RNN** - Process the sequence from **both directions**!

---

## 3. Bidirectional RNN Architecture

### The Core Idea

Bidirectional RNN uses **two separate RNNs**:
1. **Forward RNN** (‚Üí): Processes sequence left-to-right
2. **Backward RNN** (‚Üê): Processes sequence right-to-left

Then, at each time step, we **concatenate** the outputs from both RNNs.

### Architecture Diagram

```
                  x‚ÇÅ              x‚ÇÇ              x‚ÇÉ              x‚ÇÑ
               (Amazon)         (the)          (best)        (website)
                  ‚Üì               ‚Üì               ‚Üì               ‚Üì
              ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
Forward ‚Üí     ‚îÇ  RNN  ‚îÇ ‚îÄ‚îÄ‚îÄ‚Üí  ‚îÇ  RNN  ‚îÇ ‚îÄ‚îÄ‚îÄ‚Üí  ‚îÇ  RNN  ‚îÇ ‚îÄ‚îÄ‚îÄ‚Üí  ‚îÇ  RNN  ‚îÇ
  (Blue)      ‚îÇ   ‚Üí   ‚îÇ       ‚îÇ   ‚Üí   ‚îÇ       ‚îÇ   ‚Üí   ‚îÇ       ‚îÇ   ‚Üí   ‚îÇ
              ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                  ‚Üì               ‚Üì               ‚Üì               ‚Üì
                 h‚ÇÅ‚Üí             h‚ÇÇ‚Üí             h‚ÇÉ‚Üí             h‚ÇÑ‚Üí
                  ‚Üì               ‚Üì               ‚Üì               ‚Üì
              [CONCAT]        [CONCAT]        [CONCAT]        [CONCAT]
                  ‚Üë               ‚Üë               ‚Üë               ‚Üë
                 h‚ÇÅ‚Üê             h‚ÇÇ‚Üê             h‚ÇÉ‚Üê             h‚ÇÑ‚Üê
                  ‚Üë               ‚Üë               ‚Üë               ‚Üë
              ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
Backward ‚Üê    ‚îÇ  RNN  ‚îÇ ‚Üê‚îÄ‚îÄ‚îÄ  ‚îÇ  RNN  ‚îÇ ‚Üê‚îÄ‚îÄ‚îÄ  ‚îÇ  RNN  ‚îÇ ‚Üê‚îÄ‚îÄ‚îÄ  ‚îÇ  RNN  ‚îÇ
  (Green)     ‚îÇ   ‚Üê   ‚îÇ       ‚îÇ   ‚Üê   ‚îÇ       ‚îÇ   ‚Üê   ‚îÇ       ‚îÇ   ‚Üê   ‚îÇ
              ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                  ‚Üë               ‚Üë               ‚Üë               ‚Üë
               (Amazon)         (the)          (best)        (website)
                  ‚Üì               ‚Üì               ‚Üì               ‚Üì
                 ≈∑‚ÇÅ              ≈∑‚ÇÇ              ≈∑‚ÇÉ              ≈∑‚ÇÑ
```

### How It Solves the Amazon Problem

At time step 1 (processing "Amazon"):

- **Forward RNN (h‚ÇÅ‚Üí)**: Has seen only "Amazon"
- **Backward RNN (h‚ÇÅ‚Üê)**: Has seen "website", "best", "the", "Amazon"

When we concatenate [h‚ÇÅ‚Üí, h‚ÇÅ‚Üê], the output ≈∑‚ÇÅ has information from **both directions**!

Now the model knows "website" comes later ‚Üí "Amazon" is an **Organization**!

---

## 4. Mathematical Formulation

### Forward RNN Equation

The forward hidden state at time $t$:

$$\overrightarrow{h_t} = \tanh(W_{\rightarrow} \cdot \overrightarrow{h_{t-1}} + U \cdot x_t + b_{\rightarrow})$$

Where:
- $\overrightarrow{h_t}$ = Forward hidden state at time $t$
- $\overrightarrow{h_{t-1}}$ = Previous forward hidden state
- $W_{\rightarrow}$ = Forward recurrent weights
- $U$ = Input weights
- $b_{\rightarrow}$ = Forward bias

### Backward RNN Equation

The backward hidden state at time $t$:

$$\overleftarrow{h_t} = \tanh(W_{\leftarrow} \cdot \overleftarrow{h_{t+1}} + U \cdot x_t + b_{\leftarrow})$$

**Key Difference:** Notice $h_{t+1}$ instead of $h_{t-1}$!
- Time step 4 affects time step 3
- Time step 3 affects time step 2
- And so on...

### Output Equation

The final output at time $t$ is computed by concatenating both hidden states:

$$\hat{y}_t = \sigma(W_y \cdot [\overrightarrow{h_t}; \overleftarrow{h_t}] + b_y)$$

Where:
- $[\overrightarrow{h_t}; \overleftarrow{h_t}]$ = Concatenation of forward and backward hidden states
- $W_y$ = Output weights
- $b_y$ = Output bias
- $\sigma$ = Activation function (sigmoid for binary, softmax for multi-class)

### Summary of Equations

| Component | Equation |
|-----------|----------|
| Forward Hidden | $\overrightarrow{h_t} = \tanh(W_{\rightarrow} \cdot \overrightarrow{h_{t-1}} + U \cdot x_t + b_{\rightarrow})$ |
| Backward Hidden | $\overleftarrow{h_t} = \tanh(W_{\leftarrow} \cdot \overleftarrow{h_{t+1}} + U \cdot x_t + b_{\leftarrow})$ |
| Output | $\hat{y}_t = \sigma(W_y \cdot [\overrightarrow{h_t}; \overleftarrow{h_t}] + b_y)$ |

---

## 5. Implementation in Keras

Keras provides a `Bidirectional` wrapper that makes it extremely easy to create bidirectional RNNs!

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, LSTM, GRU, Dense, Bidirectional
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Load IMDB dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

# Pad sequences
x_train = pad_sequences(x_train, maxlen=100)
x_test = pad_sequences(x_test, maxlen=100)

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

### 5.1 Unidirectional SimpleRNN (Baseline)

In [None]:
# Unidirectional SimpleRNN
unidirectional_model = Sequential([
    Embedding(10000, 32, input_length=100),
    SimpleRNN(5),  # 5 units
    Dense(1, activation='sigmoid')
])

unidirectional_model.build(input_shape=(None, 100))
print("Unidirectional SimpleRNN:")
unidirectional_model.summary()

### 5.2 Bidirectional SimpleRNN

Converting to bidirectional is simple - just wrap the RNN layer with `Bidirectional()`!

In [None]:
# Bidirectional SimpleRNN
bidirectional_rnn = Sequential([
    Embedding(10000, 32, input_length=100),
    Bidirectional(SimpleRNN(5)),  # Wrapped with Bidirectional!
    Dense(1, activation='sigmoid')
])

bidirectional_rnn.build(input_shape=(None, 100))
print("Bidirectional SimpleRNN:")
bidirectional_rnn.summary()

### Understanding the Parameter Count

Notice:
- **Unidirectional SimpleRNN**: 190 parameters
- **Bidirectional SimpleRNN**: 380 parameters (exactly **2x**!)

This is because bidirectional uses **two separate RNNs** (forward + backward), each with its own weights!

---

### 5.3 Bidirectional LSTM (BiLSTM)

The same bidirectional concept works with LSTM and GRU. **BiLSTM** is very commonly used in practice!

In [None]:
# Bidirectional LSTM (BiLSTM)
bilstm_model = Sequential([
    Embedding(10000, 32, input_length=100),
    Bidirectional(LSTM(5)),  # BiLSTM!
    Dense(1, activation='sigmoid')
])

bilstm_model.build(input_shape=(None, 100))
print("Bidirectional LSTM (BiLSTM):")
bilstm_model.summary()

### 5.4 Bidirectional GRU (BiGRU)

In [None]:
# Bidirectional GRU (BiGRU)
bigru_model = Sequential([
    Embedding(10000, 32, input_length=100),
    Bidirectional(GRU(5)),  # BiGRU!
    Dense(1, activation='sigmoid')
])

bigru_model.build(input_shape=(None, 100))
print("Bidirectional GRU (BiGRU):")
bigru_model.summary()

### Parameter Comparison

Let's compare the parameter counts:

In [None]:
# Parameter comparison
models = {
    'Unidirectional RNN': unidirectional_model,
    'Bidirectional RNN': bidirectional_rnn,
    'BiLSTM': bilstm_model,
    'BiGRU': bigru_model
}

print("Parameter Count Comparison:")
print("=" * 45)
for name, model in models.items():
    # Get only RNN layer params (excluding embedding and dense)
    rnn_params = model.layers[1].count_params()
    total_params = model.count_params()
    print(f"{name:25s}: RNN Layer = {rnn_params:5d}, Total = {total_params:,}")

---

## 6. Practical Example: Sentiment Analysis with BiLSTM

In [None]:
# Build a BiLSTM model for sentiment analysis
model = Sequential([
    Embedding(10000, 32, input_length=100),
    Bidirectional(LSTM(32, dropout=0.2, recurrent_dropout=0.2)),
    Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model.summary()

In [None]:
# Train the model
history = model.fit(
    x_train, y_train,
    epochs=5,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

In [None]:
# Visualize training progress
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss
axes[0].plot(history.history['loss'], label='Training Loss', marker='o')
axes[0].plot(history.history['val_loss'], label='Validation Loss', marker='s')
axes[0].set_title('BiLSTM: Training vs Validation Loss')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[0].grid(True)

# Accuracy
axes[1].plot(history.history['accuracy'], label='Training Accuracy', marker='o')
axes[1].plot(history.history['val_accuracy'], label='Validation Accuracy', marker='s')
axes[1].set_title('BiLSTM: Training vs Validation Accuracy')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

In [None]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"\nTest Accuracy: {test_accuracy:.4f}")
print(f"Test Loss: {test_loss:.4f}")

---

## 7. Stacked Bidirectional RNNs

You can also stack multiple bidirectional layers (Deep Bidirectional RNN)!

In [None]:
# Stacked Bidirectional LSTM
stacked_bilstm = Sequential([
    Embedding(10000, 32, input_length=100),
    Bidirectional(LSTM(32, return_sequences=True)),  # First BiLSTM layer
    Bidirectional(LSTM(16)),                          # Second BiLSTM layer
    Dense(1, activation='sigmoid')
])

stacked_bilstm.build(input_shape=(None, 100))
print("Stacked Bidirectional LSTM:")
stacked_bilstm.summary()

### Understanding Output Shapes

- **First BiLSTM (32 units)**: Output shape = `(None, 100, 64)`
  - 32 forward + 32 backward = **64 features per time step**
  - `return_sequences=True` keeps all 100 time steps

- **Second BiLSTM (16 units)**: Output shape = `(None, 32)`
  - 16 forward + 16 backward = **32 features**
  - Only returns final state (for classification)

---

## 8. Applications of Bidirectional RNNs

### Best Use Cases

| Application | Why Bidirectional Helps |
|-------------|------------------------|
| **Named Entity Recognition (NER)** | Context from both sides determines entity type |
| **Part-of-Speech (POS) Tagging** | Same word can be different POS based on context |
| **Machine Translation** | Word order differs between languages |
| **Sentiment Analysis** | Negations and modifiers can appear before or after |
| **Time Series Forecasting** | Historical patterns in both directions help |

### Real-World Examples

1. **NER**: "Apple announced..." vs "I ate an apple..." ‚Üí Company vs Fruit
2. **POS Tagging**: "I can fish" (verb) vs "I caught a fish" (noun)
3. **Sentiment**: "not bad" ‚Üí positive despite "not" and "bad"

---

## 9. Advantages and Disadvantages

### ‚úÖ Advantages

| Advantage | Description |
|-----------|-------------|
| **Full Context** | Access to both past and future information at every time step |
| **Better Accuracy** | Often outperforms unidirectional RNNs on many NLP tasks |
| **Flexible** | Works with any RNN cell (SimpleRNN, LSTM, GRU) |
| **Easy to Implement** | Keras `Bidirectional()` wrapper makes it simple |

### ‚ùå Disadvantages

| Disadvantage | Description |
|--------------|-------------|
| **Double Parameters** | 2x weights and biases ‚Üí Increased training time |
| **Overfitting Risk** | More parameters = higher chance of overfitting |
| **Latency Issues** | Cannot be used in **real-time** applications |
| **Requires Full Sequence** | Need entire input before processing |

### When NOT to Use Bidirectional RNNs

**Real-time Speech Recognition Example:**

```
User speaking: "Hi my name is..."
                    ‚Üì
           BiRNN: "I need to wait for
                   the complete sentence!"
                    ‚Üì
           High latency! ‚ùå
```

For real-time applications where data arrives incrementally, unidirectional RNNs are preferred.

---

## 10. Visualization: Information Flow Comparison

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Unidirectional RNN
ax1 = axes[0]
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 6)
ax1.set_title('Unidirectional RNN', fontsize=14, fontweight='bold')
ax1.axis('off')

# Draw boxes and arrows for unidirectional
for i, (x, word) in enumerate([(1, 'x‚ÇÅ'), (3.5, 'x‚ÇÇ'), (6, 'x‚ÇÉ'), (8.5, 'x‚ÇÑ')]):
    # Input
    ax1.text(x+0.5, 1, word, ha='center', fontsize=11)
    ax1.annotate('', xy=(x+0.5, 2), xytext=(x+0.5, 1.3),
                arrowprops=dict(arrowstyle='->', color='black'))
    # RNN box
    rect = mpatches.FancyBboxPatch((x, 2), 1, 1, boxstyle='round,pad=0.05',
                                    facecolor='steelblue', edgecolor='black')
    ax1.add_patch(rect)
    ax1.text(x+0.5, 2.5, 'RNN', ha='center', va='center', color='white', fontweight='bold')
    # Output
    ax1.annotate('', xy=(x+0.5, 4.5), xytext=(x+0.5, 3.2),
                arrowprops=dict(arrowstyle='->', color='black'))
    ax1.text(x+0.5, 4.7, f'≈∑{i+1}', ha='center', fontsize=11)
    # Forward arrow
    if i < 3:
        ax1.annotate('', xy=(x+2, 2.5), xytext=(x+1.2, 2.5),
                    arrowprops=dict(arrowstyle='->', color='steelblue', lw=2))

ax1.text(5, 5.5, 'Information flows only LEFT ‚Üí RIGHT', ha='center', fontsize=11, style='italic')

# Bidirectional RNN
ax2 = axes[1]
ax2.set_xlim(0, 10)
ax2.set_ylim(0, 6)
ax2.set_title('Bidirectional RNN', fontsize=14, fontweight='bold')
ax2.axis('off')

for i, (x, word) in enumerate([(1, 'x‚ÇÅ'), (3.5, 'x‚ÇÇ'), (6, 'x‚ÇÉ'), (8.5, 'x‚ÇÑ')]):
    # Input
    ax2.text(x+0.5, 0.3, word, ha='center', fontsize=11)
    
    # Forward RNN (blue)
    rect_f = mpatches.FancyBboxPatch((x, 1.2), 1, 0.8, boxstyle='round,pad=0.05',
                                      facecolor='steelblue', edgecolor='black')
    ax2.add_patch(rect_f)
    ax2.text(x+0.5, 1.6, '‚Üí', ha='center', va='center', color='white', fontsize=14)
    
    # Backward RNN (green)
    rect_b = mpatches.FancyBboxPatch((x, 2.2), 1, 0.8, boxstyle='round,pad=0.05',
                                      facecolor='forestgreen', edgecolor='black')
    ax2.add_patch(rect_b)
    ax2.text(x+0.5, 2.6, '‚Üê', ha='center', va='center', color='white', fontsize=14)
    
    # Concat
    rect_c = mpatches.FancyBboxPatch((x+0.2, 3.3), 0.6, 0.5, boxstyle='round,pad=0.02',
                                      facecolor='orange', edgecolor='black')
    ax2.add_patch(rect_c)
    ax2.text(x+0.5, 3.55, 'C', ha='center', va='center', fontsize=10, fontweight='bold')
    
    # Output
    ax2.text(x+0.5, 4.3, f'≈∑{i+1}', ha='center', fontsize=11)
    
    # Forward arrows
    if i < 3:
        ax2.annotate('', xy=(x+2, 1.6), xytext=(x+1.2, 1.6),
                    arrowprops=dict(arrowstyle='->', color='steelblue', lw=1.5))
    # Backward arrows
    if i > 0:
        ax2.annotate('', xy=(x-0.8, 2.6), xytext=(x, 2.6),
                    arrowprops=dict(arrowstyle='->', color='forestgreen', lw=1.5))

ax2.text(5, 5.2, 'Information flows BOTH directions!', ha='center', fontsize=11, style='italic')

# Legend
legend_elements = [
    mpatches.Patch(facecolor='steelblue', label='Forward RNN'),
    mpatches.Patch(facecolor='forestgreen', label='Backward RNN'),
    mpatches.Patch(facecolor='orange', label='Concatenate')
]
ax2.legend(handles=legend_elements, loc='lower right', fontsize=9)

plt.tight_layout()
plt.show()

---

## 11. Summary

### Key Takeaways

1. **Bidirectional RNNs** use **two RNNs**: one forward (‚Üí) and one backward (‚Üê)

2. **Why use them?** When future context affects current predictions:
   - Named Entity Recognition
   - POS Tagging
   - Machine Translation

3. **Implementation in Keras** is simple:
   ```python
   Bidirectional(LSTM(units))  # Just wrap with Bidirectional!
   ```

4. **Parameter count doubles** (2x weights and biases)

5. **Common variants**:
   - **BiLSTM** - Most popular in practice
   - **BiGRU** - Faster alternative

6. **Limitations**:
   - Cannot be used for real-time applications
   - Requires full sequence before processing
   - Higher computational cost

### Mathematical Summary

| Direction | Equation |
|-----------|----------|
| Forward | $\overrightarrow{h_t} = f(\overrightarrow{h_{t-1}}, x_t)$ |
| Backward | $\overleftarrow{h_t} = f(\overleftarrow{h_{t+1}}, x_t)$ |
| Output | $\hat{y}_t = g([\overrightarrow{h_t}; \overleftarrow{h_t}])$ |

### What's Next?

Now that you've mastered RNNs (Vanilla, LSTM, GRU, Deep, Bidirectional), you're ready for:
- **Encoder-Decoder Architecture** (Sequence-to-Sequence models)
- **Attention Mechanism** (The foundation of Transformers!)
- **Transformers** (State-of-the-art for NLP)

---

**Congratulations!** üéâ You now understand Bidirectional RNNs and can apply them to various NLP tasks!