# Project 8 : RNN

1. Generate the Time‚ÄëSeries Data
Created a synthetic weather sequence using an autoregressive rule:

Each new temperature depends on the previous two temperatures
- Plus a bit of Gaussian noise

This gives a realistic time‚Äëseries pattern with memory and randomness.

In [1]:
import numpy as np

In [2]:
temps = []
temps.append(20)   # seed day 0
temps.append(21)   # seed day 1

for t in range(2, 100):
    next_temp = (
        0.7 * temps[-1] +
        0.2 * temps[-2] +
        np.random.normal(0, 0.5)
    )
    temps.append(next_temp)

input_sq = temps[:5]
target = temps[5]
print(input_sq, target)

[20, 21, 19.165781784734058, 16.087429935013184, 15.307597421130193] 14.672780156181451


## Initialize the RNN Parameters

**A minimal RNN cell with:**

- W_xh: input ‚Üí hidden
- W_hh: hidden ‚Üí hidden (the recurrence)
- b_h: hidden bias


- W_hy: hidden ‚Üí output
- b_y: output bias

These are all randomly initialized, just like in any other neural networks.

In [3]:
hidden_size = 1

W_xh = np.random.randn(hidden_size) * 0.01     # input ‚Üí hidden
W_hh = np.random.randn(hidden_size) * 0.01    # hidden ‚Üí hidden
b_h  = np.random.randn(hidden_size)    # hidden bias

W_hy = np.random.randn(hidden_size) * 0.01      # hidden ‚Üí output
b_y  = np.random.randn()                 # output bias

print("\nWeights:")
print("W_xh:", W_xh)
print("W_hh:", W_hh)
print("b_h :", b_h, "\n")
print("W_hy:", W_hy)
print("b_y :", b_y)


Weights:
W_xh: [-0.01120693]
W_hh: [0.00323926]
b_h : [1.49454101] 

W_hy: [0.0246548]
b_y : -1.1758237039803698


## Forward Pass Through Time

**This is where the RNN differs from a normal feed‚Äëforward network.**

For each timestep in the input sequence:

Compute the pre‚Äëactivation

- **ùëé_ùë° = (W_xh * x_t) + (W_hh * h_t[-1]) + b_h**

Apply the activation

- **h_t = tanh(a_t)**

Store:

- the raw activation (a_t)
- the hidden state (h_t)

**This loop is the ‚Äúunrolling through time‚Äù that gives RNNs memory.**

**After the final timestep, compute the output:**

- **y = W_hy * h_t + b_y**

In [4]:
# Forward pass with storage for BPTT
hs = [0.0]   # h_0
raws = []    # a_t

h = 0.0
for x in input_sq:
    a = W_xh * x + W_hh * h + b_h
    h = np.tanh(a)
    raws.append(a)
    hs.append(h)

y_pred = W_hy * h + b_y

print("Final h:", h)
print("Prediction:", y_pred)
print("Target:", target)


Final h: [0.86821787]
Prediction: [-1.15441796]
Target: 14.672780156181451


## Compute the Loss
**mean squared error:**

- **L = (y_pred - target)2**

This measures how far the prediction is from the true next temperature.


## Backpropagation Through Time (BPTT)
**This is the heart of training an RNN.**

**Step A : Start at the output**

Compute:

- gradient of the loss wrt the output
- gradient wrt W_hy and b_y

**gradient flowing back into the final hidden state**

**Step B : Walk backward through each timestep**

For each timestep (in reverse):

Compute derivative of tanh

‚àÇ‚Ñéùë° / ‚àÇùëéùë° = 1 - tanh**2(a_t)

Compute gradients for:

- W_xh
- W_hh
- b_h

Propagate gradient to the previous hidden state using W_hh

This is the ‚Äúthrough time‚Äù part ‚Äî the gradient flows backward across all timesteps.

In [5]:
# Loss
loss = (y_pred - target)**2
print("Loss:", loss)

# dL/dy
dL_dy = 2 * (y_pred - target)

# Output layer gradients
dW_hy = dL_dy * hs[-1]
db_y  = dL_dy

# Gradient flowing into last hidden state
dh_next = dL_dy * W_hy


Loss: [250.50020029]


## Update the Parameters
**Just like any neural network:**

ùúÉ‚Üê ùúÉ ‚àí ùúÇ ‚ãÖ ‚àá ùúÉ
You applied this to all weights and biases:

- W_xh, W_hh, b_h
- W_hy, b_y

**This is standard gradient descent.**

In [6]:
# Initialize RNN parameter grads
dW_xh = 0.0
dW_hh = 0.0
db_h  = 0.0

# Backprop through time
for t in reversed(range(len(input_sq))):
    a_t = raws[t]
    h_prev = hs[t]
    x_t = input_sq[t]

    # derivative of tanh
    da = (1 - np.tanh(a_t)**2) * dh_next

    # accumulate grads
    dW_xh += da * x_t
    dW_hh += da * h_prev
    db_h  += da

    # propagate to previous h
    dh_next = da * W_hh

print("dW_xh:", dW_xh)
print("dW_hh:", dW_hh)
print("db_h :", db_h)
print("dW_hy:", dW_hy)
print("db_y :", db_y)




dW_xh: [-2.94371973]
dW_hh: [-0.16653561]
db_h : [-0.19229653]
dW_hy: [-27.48291246]
db_y : [-31.65439624]


## Wrap Everything Into a Class
You encapsulated the logic into:

RNNCell
Handles one timestep

Computes hidden state

RNNPredictor
Unrolls the RNN across a sequence

Computes the output

Stores activations

Performs BPTT

Updates parameters

This mirrors the structure of real deep‚Äëlearning libraries.

In [7]:
class RNNCell:
    def __init__(self, input_size, hidden_size):
        self.input_size = input_size
        self.hidden_size = hidden_size

        self.W_xh = np.random.randn(hidden_size, input_size) * 0.01
        self.W_hh = np.random.randn(hidden_size, hidden_size) * 0.01
        self.b_h  = np.random.randn(hidden_size)

    def forward(self, x_t, h_prev):
        raw = self.W_xh @ x_t + self.W_hh @ h_prev + self.b_h
        h_t = np.tanh(raw)
        return h_t, raw


class RNNPredictor:
    def __init__(self, input_size, hidden_size):
        self.cell = RNNCell(input_size, hidden_size)

        self.W_hy = np.random.randn(1, hidden_size) * 0.01
        self.b_y  = np.random.randn()

    def forward_sequence(self, sequence):
        h = np.zeros(self.cell.hidden_size)
        hs = [h]      # store hidden states
        raws = []     # store raw pre-activations

        for x in sequence:
            x_t = np.array([x])
            h, raw = self.cell.forward(x_t, h)
            hs.append(h)
            raws.append(raw)

        y_pred = self.W_hy @ h + self.b_y
        return y_pred, hs, raws

    def train_step(self, sequence, target, lr=0.0001):
        y_pred, hs, raws = self.forward_sequence(sequence)

        # ----- Loss -----
        loss = (y_pred - target)**2

        # ----- Gradients -----
        dL_dy = 2 * (y_pred - target)  # scalar

        # Output layer grads
        dW_hy = dL_dy * hs[-1].reshape(1, -1)
        db_y  = dL_dy

        # Backprop into last hidden state
        dh_next = (self.W_hy.T * dL_dy).flatten()

        # Initialize grads for RNN cell
        dW_xh = np.zeros_like(self.cell.W_xh)
        dW_hh = np.zeros_like(self.cell.W_hh)
        db_h  = np.zeros_like(self.cell.b_h)

 

        # ----- BPTT -----
        for t in reversed(range(len(sequence))):
            raw = raws[t]
            h_prev = hs[t]
        
            dtanh = (1 - np.tanh(raw)**2) * dh_next
        
            x_t = np.array([sequence[t]])
            dW_xh += dtanh.reshape(-1,1) @ x_t.reshape(1,-1)
            dW_hh += dtanh.reshape(-1,1) @ h_prev.reshape(1,-1)
            db_h  += dtanh
        
            dh_next = self.cell.W_hh.T @ dtanh
        
        # ----- Gradient Clipping -----
        clip_value = 1.0
        dW_xh = np.clip(dW_xh, -clip_value, clip_value)
        dW_hh = np.clip(dW_hh, -clip_value, clip_value)
        db_h  = np.clip(db_h,  -clip_value, clip_value)
        dW_hy = np.clip(dW_hy, -clip_value, clip_value)
        db_y  = np.clip(db_y,  -clip_value, clip_value)
        
        # ----- Update weights -----
        self.W_hy -= lr * dW_hy
        self.b_y  -= lr * db_y
        self.cell.W_xh -= lr * dW_xh
        self.cell.W_hh -= lr * dW_hh
        self.cell.b_h  -= lr * db_h


        return loss, y_pred


In [8]:
def make_dataset(temps, seq_len=5):
    X = []
    y = []
    for i in range(len(temps) - seq_len):
        X.append(temps[i:i+seq_len])
        y.append(temps[i+seq_len])
    return np.array(X), np.array(y)


In [9]:
X, y = make_dataset(temps, seq_len=5)
print(X.shape, y.shape)


(95, 5) (95,)


In [10]:
model = RNNPredictor(input_size=1, hidden_size=20)

for epoch in range(2500):
    total_loss = 0

    for seq, target in zip(X, y):
        loss, pred = model.train_step(seq, target)
        total_loss += loss

    if epoch % 250 == 0:
        print(f"epoch {epoch}, total_loss={total_loss}")



epoch 0, total_loss=[2141.77044345]
epoch 250, total_loss=[565.42410998]
epoch 500, total_loss=[167.68090605]
epoch 750, total_loss=[85.33345951]
epoch 1000, total_loss=[53.24991208]
epoch 1250, total_loss=[38.46009129]
epoch 1500, total_loss=[31.93027038]
epoch 1750, total_loss=[27.95588329]
epoch 2000, total_loss=[26.03146945]
epoch 2250, total_loss=[25.23165896]


In [11]:
test_seq = temps[:5]          # or any 5‚Äëday window
pred, hs, raws = model.forward_sequence(test_seq)

print("Input:", test_seq)
print("Prediction:", pred)
print("True next value:", temps[5])

Input: [20, 21, 19.165781784734058, 16.087429935013184, 15.307597421130193]
Prediction: [12.81218303]
True next value: 14.672780156181451


In [13]:
error = (pred - temps[5])
error

array([-1.86059713])