# 🔄 Chapter 15: Sequence Processing with RNNs & CNNs

Learn how to work with sequences—like time series or text—using Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs).

## I. 🧠 Recurrent Neurons and Layers

Recurrent Neural Networks process sequential data by maintaining a hidden state that evolves over time. This allows them to capture temporal dependencies.

In [1]:
import tensorflow as tf
from tensorflow.keras import layers

# Define a simple RNN cell with 5 units
cell = layers.SimpleRNNCell(units=5)

# Wrap the cell into an RNN layer that returns sequences and state
rnn_layer = layers.RNN(cell, return_sequences=True, return_state=True)

# Generate synthetic data: batch size=2, time steps=10, features=3
X = tf.random.uniform((2, 10, 3))

# Run data through RNN
all_outputs, final_state = rnn_layer(X)

# Output shapes
print("All outputs shape:", all_outputs.shape)  # (2, 10, 5)
print("Final state shape:", final_state.shape)  # (2, 5)

2025-06-19 08:10:27.190047: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-19 08:10:27.499377: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-19 08:10:27.728225: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750309827.949736    1260 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750309828.028685    1260 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1750309828.541350    1260 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

All outputs shape: (2, 10, 5)
Final state shape: (2, 5)


Memory-enhanced cells like **LSTM** and **GRU** help with longer sequences and vanishing gradient problems.

In [2]:
# LSTM layer with 5 units
lstm_layer = layers.LSTM(5, return_sequences=True, return_state=True)

# Process the same input data
all_out, final_hidden, final_cell = lstm_layer(X)

# Shapes of final states
print("Final hidden state shape:", final_hidden.shape)  # (2, 5)
print("Final cell state shape:", final_cell.shape)    # (2, 5)

Final hidden state shape: (2, 5)
Final cell state shape: (2, 5)


## II. 📝 Training RNNs

Training RNNs follows the same pattern as other neural networks: compile and fit models with `model.fit()`.

## III. 📈 Forecasting a Time Series

Let's explore a simple time series prediction problem.

### A. Baseline Metric

First, establish a naive baseline: predict yesterday's value as today's prediction.

In [7]:
import numpy as np
import matplotlib.pyplot as plt

# Generate a sine wave time series
series = np.sin(np.arange(1000) / 100)

# Naive prediction: yesterday's value
naive_preds = series[1:]

# Compute MAE between actual and naive predictions
baseline_mae = np.mean(np.abs(series[:-1] - naive_preds))
print("Baseline MAE:", baseline_mae)

Baseline MAE: 0.006542123263128362


### B. Simple RNN Model

Create a model to learn from the time series data using a simple RNN.

In [8]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

# Define window size and features
n_past = 20
n_features = 1

# Function to create dataset
def create_dataset(series, window_size):
    X, y = [], []
    for i in range(len(series) - window_size):
        X.append(series[i:(i + window_size)])
        y.append(series[i + window_size])
    return np.expand_dims(np.array(X), axis=-1), np.array(y)

# Prepare data
X_data, y_data = create_dataset(series, n_past)

# Build the model
model = Sequential([
    layers.SimpleRNN(20, return_sequences=False, input_shape=(n_past, n_features)),
    layers.Dense(1)
])

# Compile and train
model.compile(loss="mse", optimizer="adam")
model.fit(X_data, y_data, epochs=10, validation_split=0.2)

Epoch 1/10


  super().__init__(**kwargs)


[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 23ms/step - loss: 0.0142 - val_loss: 0.0020
Epoch 2/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 5.7160e-04 - val_loss: 2.9905e-04
Epoch 3/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - loss: 2.7936e-04 - val_loss: 3.1991e-04
Epoch 4/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 1.4383e-04 - val_loss: 1.9127e-04
Epoch 5/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 1.0754e-04 - val_loss: 1.2957e-04
Epoch 6/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 8.4583e-05 - val_loss: 1.2556e-04
Epoch 7/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - loss: 6.8464e-05 - val_loss: 9.5178e-05
Epoch 8/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 5.5331e-05 - val_loss: 3.7441e-05
Epo

<keras.src.callbacks.history.History at 0x7e0d0cfe3130>

### C. Deep RNNs

Stack multiple RNN layers for more capacity.

In [9]:
# Build a deeper RNN model
deep_rnn = Sequential([
    layers.SimpleRNN(20, return_sequences=True, input_shape=(n_past, n_features)),
    layers.SimpleRNN(20),
    layers.Dense(1)
])

# Compile and train
deep_rnn.compile(loss="mse", optimizer="adam")
deep_rnn.fit(X_data, y_data, epochs=10, validation_split=0.2)

Epoch 1/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 98ms/step - loss: 0.1256 - val_loss: 0.0113
Epoch 2/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step - loss: 0.0038 - val_loss: 9.3827e-04
Epoch 3/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step - loss: 0.0011 - val_loss: 6.7834e-04
Epoch 4/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - loss: 5.7681e-04 - val_loss: 4.8777e-04
Epoch 5/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 3.7745e-04 - val_loss: 5.3150e-04
Epoch 6/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - loss: 2.7355e-04 - val_loss: 5.1251e-04
Epoch 7/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 2.3090e-04 - val_loss: 3.2359e-04
Epoch 8/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 1.6483e-04 - val_loss: 2.1019e-04


<keras.src.callbacks.history.History at 0x7e0d0c38c7c0>

### D. Multi-Step Forecast

Use the trained model to predict multiple future steps iteratively.

In [10]:
# Take the last window from the series
last_window = series[-n_past:]

# Initialize list to hold predictions
preds = []

# Prepare input for prediction
input_eval = np.expand_dims(last_window, axis=(0,2))  # shape: (1, n_past, 1)

# Predict next 10 steps
for _ in range(10):
    pred = model.predict(input_eval)[0,0]
    preds.append(pred)
    # Append the predicted value to the input window
    input_eval = np.append(input_eval[:,1:,:], [[[pred]]], axis=1)

print("Future predictions:", preds)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 188ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 54ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step
Future predictions: [np.float32(-0.53750026), np.float32(-0.5411351), np.float32(-0.54376066), np.float32(-0.5447591), np.float32(-0.54547787), np.float32(-0.544994), np.float32(-0.54423875), np.float32(-0.5422231), np.float32(-0.54106146), np.float32(-0.54056686)]


## IV. 🕰️ Handling Long Sequences

Sequences can be long, leading to vanishing or exploding gradients. Use strategies like gated cells, gradient clipping, and residual connections.

### A. Vanishing/Exploding Gradients

Use gated units such as LSTM/GRU, and techniques like gradient clipping.

In [11]:
# Example of gradient clipping with optimizer
optimizer = tf.keras.optimizers.Adam(clipvalue=1.0)
model.compile(loss="mse", optimizer=optimizer)

### B. Short-Term Memory with CNNs

Convolutional models can capture local context and long-range dependencies via dilated convolutions or causal convolutions.

## ✅ Practical Code Example: Causal 1D Convolution

Construct a sequence model combining Conv1D with LSTM.

In [12]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D

# Build a model with causal convolution followed by LSTM
model_causal = Sequential([
    Conv1D(filters=32, kernel_size=3, padding="causal", activation="relu",
           input_shape=(n_past, n_features)),
    layers.LSTM(20),
    layers.Dense(1)
])

# Compile and train
model_causal.compile(loss="mse", optimizer="adam")
model_causal.fit(X_data, y_data, epochs=10, validation_split=0.2)

Epoch 1/10


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 24ms/step - loss: 0.4476 - val_loss: 0.0173
Epoch 2/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - loss: 0.0181 - val_loss: 0.0049
Epoch 3/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - loss: 0.0046 - val_loss: 0.0072
Epoch 4/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - loss: 0.0030 - val_loss: 0.0054
Epoch 5/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - loss: 0.0025 - val_loss: 0.0048
Epoch 6/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - loss: 0.0023 - val_loss: 0.0049
Epoch 7/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step - loss: 0.0021 - val_loss: 0.0050
Epoch 8/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - loss: 0.0020 - val_loss: 0.0042
Epoch 9/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m

<keras.src.callbacks.history.History at 0x7e0d0cff8610>

## 🧠 Chapter Summary

- **RNNs** with hidden states handle sequential data effectively.
- **Gated units** like LSTM/GRU address training challenges.
- **Deep/stacked RNNs** enhance model capacity.
- **Multi-step forecasting** involves iterative predictions.
- **Convolutional sequence models** capture local and long-range dependencies.

## 🧪 Exercises to Try

1. Train an **LSTM** or **GRU** model and compare with SimpleRNN.
2. Use **CNN → LSTM** architecture for improved forecasts.
3. Experiment with **gradient clipping**, **residual connections**, or **layer normalization**.
4. Build a **Bidirectional RNN** to process sequential data forward and backward.
5. Forecast longer sequences (20–50 steps) and compare strategies.