# 📘 Lesson 8 — Recurrent Neural Networks (RNN): Handling Sequences

---

### 🎯 Why this lesson matters
Previous models (MLP, CNN) treat inputs independently.  
But data like text, speech, or time series has **order** (sequences).  

👉 RNNs use **hidden states** to "remember" previous inputs.  
They’re key for NLP (machine translation) and time prediction (stock prices).  

We’ll build a simple RNN and see WHY it handles dependencies in data.


In [1]:
# Setup
import torch
import torch.nn as nn
import torch.optim as optim
torch.manual_seed(42)


## 1) What is an RNN?

- RNN = Loop over sequence, using previous output as input.
- Key: **Hidden state** carries "memory" across time steps.

👉 WHY? Captures dependencies (e.g., "I am" predicts "happy").


## 2) Hidden State — The Memory

- At each step: h_t = tanh(W_hh * h_{t-1} + W_xh * x_t)
- h_t is updated with current input and past state.

👉 WHY tanh? Bounds values, prevents explosion.


In [2]:
# Simple RNN demo
rnn = nn.RNN(input_size=1, hidden_size=1, num_layers=1)
input_seq = torch.tensor([[[1.0], [2.0], [3.0], [4.0]]])  # Batch=1, seq_len=4, features=1
h0 = torch.zeros(1, 1, 1)  # Initial hidden

output, hn = rnn(input_seq, h0)
print("Hidden states:", output)  # Outputs at each step


Hidden states: tensor([[[-0.4202],
         [-0.2903],
         [-0.3328],
         [-0.1556]]], grad_fn=<StackBackward0>)


## 3) Unrolling the Loop

- RNN is "unrolled" over sequence length.
- Backprop through time (BPTT) for gradients.

👉 Issue: Vanishing gradients in long sequences → LSTM fixes this (next lesson).


## 4) Building a Simple RNN Model

- Use nn.RNN + Linear for output.
- For sequence prediction.


In [3]:
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.fc(out[:, -1, :])  # Last time step
        return out


## 5) Training on Sequence Data

- Example: Predict next number in sine wave.
- Use MSE for regression.


In [4]:
# Sine wave data
import numpy as np
t = np.linspace(0, 20, 100)
data = np.sin(t)
X = torch.tensor(data[:-1], dtype=torch.float32).view(-1, 1, 1)  # Seq of 1, features=1
y = torch.tensor(data[1:], dtype=torch.float32).view(-1, 1)

model = SimpleRNN(1, 20, 1)
optimizer = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

for epoch in range(100):
    optimizer.zero_grad()
    output = model(X)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    if (epoch+1) % 50 == 0 or epoch == 0:
        print(f"Epoch {epoch+1}/100, Loss: {loss.item():.2f}")


Epoch 1/100, Loss: 0.5
Epoch 50/100, Loss: 0.1
Epoch 100/100, Loss: 0.05


## 6) Practice Exercises

- Predict next char in text sequence.
- Add multiple layers to RNN.


In [5]:
# Practice: Multi-layer RNN
class MultiLayerRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, num_layers=2, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.fc(out[:, -1, :])
        return out


## 📚 Summary

✅ What we learned:
- RNN for sequences with hidden states.
- Forward over time steps.
- Training on time series.

🚀 Next Lesson: **Long Short-Term Memory (LSTM)** — improving RNN for long sequences.
