# Recurrent Neural Networks (RNNs) with PyTorch

This notebook provides a comprehensive guide to understanding and implementing Recurrent Neural Networks (RNNs) using PyTorch. We'll start with the basic concepts and gradually build up to more complex implementations.

## What we'll cover:

1. Understanding RNN architecture and concepts
2. Building a simple RNN from scratch
3. Using PyTorch's built-in RNN modules
4. Implementing more advanced RNN variants like LSTM and GRU
5. Training RNN models and making predictions

## 1. Import Required Libraries

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## 2. Understanding RNN Architecture

### What is a Recurrent Neural Network?

A Recurrent Neural Network (RNN) is a type of neural network designed to work with sequential data. Unlike feedforward neural networks, RNNs have connections that feed back into the network, creating a form of memory that allows information to persist.

### Key Components of RNNs:

1. **Input (x)**: The current input in the sequence
2. **Hidden state (h)**: The "memory" that captures information from previous inputs
3. **Output (y)**: The prediction for the current step

### How Information Flows in an RNN:

For each time step t:
- The RNN takes the current input x_t and the previous hidden state h_{t-1}
- Combines them to update the current hidden state h_t
- Produces an output y_t based on h_t

### Mathematical Representation:

\begin{align}
h_t &= \tanh(W_{xh} x_t + W_{hh} h_{t-1} + b_h) \\
y_t &= W_{hy} h_t + b_y
\end{align}

Where:
- W_{xh}, W_{hh}, and W_{hy} are weight matrices
- b_h and b_y are bias vectors
- tanh is an activation function

### Why Use RNNs?

RNNs are particularly useful for:
- Time series prediction
- Natural language processing
- Speech recognition
- And other tasks involving sequential or temporal data

Let's visualize the RNN architecture:

In [None]:
# Simple visualization of an unfolded RNN
def plot_rnn_structure():
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Hidden states
    for t in range(4):
        # Draw the hidden state circle
        circle = plt.Circle((t, 1), 0.2, fill=True, color='skyblue', alpha=0.8)
        ax.add_patch(circle)
        ax.text(t, 1, f'h{t}', ha='center', va='center')
        
        # Draw input
        if t > 0:
            ax.arrow(t-1, 1, 0.6, 0, head_width=0.05, head_length=0.1, fc='k', ec='k')
        
        # Draw the input box
        if t > 0:
            rect = plt.Rectangle((t-0.3, 0.3), 0.6, 0.4, fill=True, color='lightgreen', alpha=0.8)
            ax.add_patch(rect)
            ax.text(t, 0.5, f'x{t}', ha='center', va='center')
            
            # Connect input to hidden state
            ax.arrow(t, 0.7, 0, 0.1, head_width=0.05, head_length=0.1, fc='k', ec='k')
            
        # Draw the output box
        if t > 0:
            rect = plt.Rectangle((t-0.3, 1.6), 0.6, 0.4, fill=True, color='salmon', alpha=0.8)
            ax.add_patch(rect)
            ax.text(t, 1.8, f'y{t}', ha='center', va='center')
            
            # Connect hidden state to output
            ax.arrow(t, 1.2, 0, 0.4, head_width=0.05, head_length=0.1, fc='k', ec='k')
    
    # Add labels and title
    ax.set_xlim(-0.5, 3.5)
    ax.set_ylim(0, 2.3)
    ax.set_title('Recurrent Neural Network Architecture (Unfolded)')
    ax.axis('off')
    
    plt.text(1.5, 0.1, 'Inputs', ha='center')
    plt.text(1.5, 1, 'Hidden States', ha='center')
    plt.text(1.5, 2.1, 'Outputs', ha='center')
    
    plt.tight_layout()
    plt.show()

plot_rnn_structure()

## 3. Preparing Data

For this tutorial, we'll create a simple synthetic dataset to demonstrate RNNs. We'll generate a sine wave and train our RNN to predict the next values in the sequence.

In [None]:
# Generate a sine wave dataset
def generate_sine_wave(sample_size=1000, sequence_length=50):
    # Generate a sine wave
    time_steps = np.linspace(0, 100, sample_size)
    data = np.sin(time_steps * 0.1)  # Sine wave with some frequency
    
    # Add some noise to make it more interesting
    data += 0.1 * np.random.randn(sample_size)
    
    # Create sequences for training
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i+sequence_length])
        y.append(data[i+sequence_length])
    
    # Convert to tensor and reshape for RNN input (batch_size, seq_len, features)
    X = torch.FloatTensor(np.array(X)).view(-1, sequence_length, 1)
    y = torch.FloatTensor(np.array(y)).view(-1, 1)
    
    # Split data into training and testing sets
    train_size = int(len(X) * 0.8)
    X_train, X_test = X[:train_size], X[train_size:]
    y_train, y_test = y[:train_size], y[train_size:]
    
    return X_train, y_train, X_test, y_test, data

# Generate the data
seq_length = 20
X_train, y_train, X_test, y_test, sine_data = generate_sine_wave(sequence_length=seq_length)

print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Test labels shape: {y_test.shape}")

# Plot a sample of our sine wave data
plt.figure(figsize=(12, 4))
plt.plot(sine_data[:100])
plt.title("Sample of Sine Wave Data")
plt.xlabel("Time Steps")
plt.ylabel("Amplitude")
plt.grid(True)
plt.show()

# Visualize a single training example
plt.figure(figsize=(12, 4))
plt.plot(X_train[0].view(-1).detach().numpy(), label='Input Sequence')
plt.scatter(seq_length, y_train[0].item(), color='red', label='Target Value')
plt.title("Example of Input Sequence and Target")
plt.xlabel("Time Steps")
plt.ylabel("Amplitude")
plt.legend()
plt.grid(True)
plt.show()

## 4. Building a Simple RNN from Scratch

To understand how RNNs work internally, let's implement a simple RNN cell from scratch using PyTorch.

In [None]:
class SimpleRNNCell(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNNCell, self).__init__()
        
        # Weights for the input layer
        self.W_xh = nn.Parameter(torch.randn(input_size, hidden_size))
        
        # Weights for the hidden layer
        self.W_hh = nn.Parameter(torch.randn(hidden_size, hidden_size))
        
        # Bias for hidden layer
        self.b_h = nn.Parameter(torch.zeros(hidden_size))
        
        # Weights and bias for output layer
        self.W_hy = nn.Parameter(torch.randn(hidden_size, output_size))
        self.b_y = nn.Parameter(torch.zeros(output_size))
        
        # Activation function
        self.tanh = nn.Tanh()
        
    def forward(self, x, h_prev):
        # Calculate hidden state
        h = self.tanh(x @ self.W_xh + h_prev @ self.W_hh + self.b_h)
        
        # Calculate output
        y = h @ self.W_hy + self.b_y
        
        return y, h

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.cell = SimpleRNNCell(input_size, hidden_size, output_size)
    
    def forward(self, x_sequence):
        # x_sequence shape: (batch_size, sequence_length, input_size)
        batch_size = x_sequence.size(0)
        sequence_length = x_sequence.size(1)
        
        # Initialize hidden state with zeros
        h = torch.zeros(batch_size, self.hidden_size).to(x_sequence.device)
        
        # Process the sequence
        outputs = []
        for t in range(sequence_length):
            x_t = x_sequence[:, t, :]
            output, h = self.cell(x_t, h)
            outputs.append(output)
        
        # Stack outputs along sequence dimension
        outputs = torch.stack(outputs, dim=1)
        
        # Return only the last output
        return outputs[:, -1, :]

# Initialize our custom RNN
input_size = 1  # Dimension of input features
hidden_size = 32  # Number of hidden units
output_size = 1  # Dimension of output

# Create model instance
custom_rnn = SimpleRNN(input_size, hidden_size, output_size).to(device)
print(custom_rnn)

Now, let's define a function to train our custom RNN model.

In [None]:
def train_model(model, X_train, y_train, epochs=100, learning_rate=0.01):
    # Move data to device
    X_train = X_train.to(device)
    y_train = y_train.to(device)
    
    # Loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    # Training loop
    losses = []
    for epoch in range(epochs):
        # Forward pass
        outputs = model(X_train)
        loss = criterion(outputs, y_train)
        
        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        losses.append(loss.item())
        
        # Print progress
        if (epoch+1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
    
    return losses

# Train our custom RNN model
custom_rnn_losses = train_model(custom_rnn, X_train, y_train, epochs=100)

# Plot losses
plt.figure(figsize=(10, 5))
plt.plot(custom_rnn_losses)
plt.title('Training Loss (Custom RNN)')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.grid(True)
plt.show()

## 5. Using PyTorch's RNN Modules

Now that we understand the basic mechanics of an RNN, let's leverage PyTorch's built-in RNN modules which are optimized for performance and ease of use.

In [None]:
class PyTorchRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1, dropout=0):
        super(PyTorchRNN, self).__init__()
        
        # RNN layer
        self.rnn = nn.RNN(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,  # input shape (batch_size, seq_len, features)
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Fully connected output layer
        self.fc = nn.Linear(hidden_size, output_size)
        
        # Store parameters
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
    def forward(self, x):
        # Initialize hidden state with zeros
        batch_size = x.size(0)
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        
        # Forward pass through RNN
        # out shape = (batch_size, seq_length, hidden_size)
        out, _ = self.rnn(x, h0)
        
        # We only need the last time step's output
        out = self.fc(out[:, -1, :])
        
        return out

# Initialize the PyTorch RNN model
rnn_model = PyTorchRNN(
    input_size=1,
    hidden_size=32,
    output_size=1,
    num_layers=1
).to(device)

print(rnn_model)

# Train the PyTorch RNN model
rnn_losses = train_model(rnn_model, X_train, y_train, epochs=100)

# Plot losses
plt.figure(figsize=(10, 5))
plt.plot(rnn_losses)
plt.title('Training Loss (PyTorch RNN)')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.grid(True)
plt.show()

## 6. Training the RNN Model

We've already trained our models with basic parameters. Now, let's create a more comprehensive training function with evaluation and implement early stopping to avoid overfitting.

In [None]:
def train_with_validation(model, X_train, y_train, X_val=None, y_val=None, 
                         epochs=100, learning_rate=0.01, patience=10):
    # If validation data not provided, use 20% of training data
    if X_val is None or y_val is None:
        val_size = int(len(X_train) * 0.2)
        indices = torch.randperm(len(X_train))
        train_indices = indices[val_size:]
        val_indices = indices[:val_size]
        
        X_val, y_val = X_train[val_indices], y_train[val_indices]
        X_train, y_train = X_train[train_indices], y_train[train_indices]
    
    # Move data to device
    X_train = X_train.to(device)
    y_train = y_train.to(device)
    X_val = X_val.to(device)
    y_val = y_val.to(device)
    
    # Loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    # For early stopping
    best_val_loss = float('inf')
    no_improve_epochs = 0
    best_model = None
    
    # For tracking metrics
    train_losses = []
    val_losses = []
    
    for epoch in range(epochs):
        # Training phase
        model.train()
        optimizer.zero_grad()
        train_outputs = model(X_train)
        train_loss = criterion(train_outputs, y_train)
        train_loss.backward()
        optimizer.step()
        
        # Validation phase
        model.eval()
        with torch.no_grad():
            val_outputs = model(X_val)
            val_loss = criterion(val_outputs, y_val)
        
        # Track losses
        train_losses.append(train_loss.item())
        val_losses.append(val_loss.item())
        
        # Print progress
        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], '
                  f'Train Loss: {train_loss.item():.4f}, '
                  f'Val Loss: {val_loss.item():.4f}')
        
        # Check for early stopping
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            no_improve_epochs = 0
            best_model = model.state_dict().copy()
        else:
            no_improve_epochs += 1
            
        if no_improve_epochs >= patience:
            print(f"Early stopping at epoch {epoch+1}")
            model.load_state_dict(best_model)
            break
    
    return model, train_losses, val_losses

# Initialize a fresh model
rnn_model_with_val = PyTorchRNN(
    input_size=1,
    hidden_size=32,
    output_size=1,
    num_layers=1
).to(device)

# Train with validation and early stopping
trained_model, train_losses, val_losses = train_with_validation(
    rnn_model_with_val, X_train, y_train, 
    X_test, y_test,  # Using test set as validation for demonstration
    epochs=200, 
    learning_rate=0.01,
    patience=20
)

# Plot training and validation losses
plt.figure(figsize=(10, 5))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

## 7. Making Predictions with the RNN

Now that we've trained our RNN model, let's use it to make predictions and visualize the results.

In [None]:
def predict_and_plot(model, X_test, y_test, original_data, sequence_length):
    # Set model to evaluation mode
    model.eval()
    
    # Move test data to device
    X_test = X_test.to(device)
    
    # Make predictions
    with torch.no_grad():
        y_pred = model(X_test).cpu().numpy().flatten()
    
    # Get actual test values
    y_actual = y_test.cpu().numpy().flatten()
    
    # Calculate error metrics
    mse = np.mean((y_actual - y_pred) ** 2)
    rmse = np.sqrt(mse)
    
    print(f"Test MSE: {mse:.4f}")
    print(f"Test RMSE: {rmse:.4f}")
    
    # Plot results
    plt.figure(figsize=(12, 6))
    
    # Plot a segment of the original data
    time_steps = np.arange(200)
    plt.plot(time_steps, original_data[:200], 'b-', label='Original Data')
    
    # Plot predictions
    offset = 800  # Start predictions after this index
    max_points = min(150, len(y_pred))  # Limit the number of points for clarity
    
    pred_time_steps = np.arange(offset, offset + max_points)
    plt.plot(pred_time_steps, y_pred[:max_points], 'r-', label='Predictions')
    plt.plot(pred_time_steps, y_actual[:max_points], 'g-', label='Actual Values')
    
    plt.title('RNN Predictions vs Actual Values')
    plt.xlabel('Time Steps')
    plt.ylabel('Amplitude')
    plt.legend()
    plt.grid(True)
    plt.show()
    
    # Also plot a scatter plot of actual vs predicted
    plt.figure(figsize=(8, 8))
    plt.scatter(y_actual, y_pred, alpha=0.5)
    plt.plot([-1, 1], [-1, 1], 'r--')  # Diagonal line for reference
    plt.title('Actual vs Predicted Values')
    plt.xlabel('Actual Values')
    plt.ylabel('Predicted Values')
    plt.grid(True)
    plt.axis('equal')
    plt.show()
    
    return y_pred, y_actual

# Make predictions using our trained model
predictions, actuals = predict_and_plot(trained_model, X_test, y_test, sine_data, seq_length)

## 8. Implementing LSTM and GRU Models

Standard RNNs often struggle with long-term dependencies due to vanishing/exploding gradient problems. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks were designed to overcome these limitations.

### LSTM Architecture
LSTM uses three gates to regulate information flow:
1. Forget gate: Decides what information to discard from cell state
2. Input gate: Updates the cell state with new information
3. Output gate: Controls what part of the cell state to output

### GRU Architecture
GRU is a simplified version of LSTM with:
1. Reset gate: Determines how to combine new input with previous memory
2. Update gate: Controls how much of previous memory to keep

In [None]:
# LSTM Model
class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM layer
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True  # input shape (batch_size, seq_len, features)
        )
        
        # Fully connected output layer
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        # Initialize hidden state and cell state with zeros
        batch_size = x.size(0)
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        
        # Forward propagate LSTM
        # out shape = (batch_size, seq_length, hidden_size)
        out, _ = self.lstm(x, (h0, c0))
        
        # Use only the last time step output
        out = self.fc(out[:, -1, :])
        
        return out

# GRU Model
class GRUModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(GRUModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # GRU layer
        self.gru = nn.GRU(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True  # input shape (batch_size, seq_len, features)
        )
        
        # Fully connected output layer
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        batch_size = x.size(0)
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        
        # Forward propagate GRU
        # out shape = (batch_size, seq_length, hidden_size)
        out, _ = self.gru(x, h0)
        
        # Use only the last time step output
        out = self.fc(out[:, -1, :])
        
        return out

# Initialize LSTM and GRU models
lstm_model = LSTMModel(input_size=1, hidden_size=32, output_size=1).to(device)
gru_model = GRUModel(input_size=1, hidden_size=32, output_size=1).to(device)

# Print model architectures
print("LSTM Model:")
print(lstm_model)
print("\nGRU Model:")
print(gru_model)

Now let's train both the LSTM and GRU models and compare their performance.

In [None]:
# Train LSTM model
print("Training LSTM Model:")
lstm_model, lstm_train_losses, lstm_val_losses = train_with_validation(
    lstm_model, X_train, y_train, 
    X_test, y_test,
    epochs=150, 
    learning_rate=0.01,
    patience=15
)

# Train GRU model
print("\nTraining GRU Model:")
gru_model, gru_train_losses, gru_val_losses = train_with_validation(
    gru_model, X_train, y_train, 
    X_test, y_test,
    epochs=150, 
    learning_rate=0.01,
    patience=15
)

# Plot training losses for all models
plt.figure(figsize=(12, 6))
plt.plot(train_losses, label='Simple RNN')
plt.plot(lstm_train_losses, label='LSTM')
plt.plot(gru_train_losses, label='GRU')
plt.title('Training Loss Comparison')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

# Plot validation losses for all models
plt.figure(figsize=(12, 6))
plt.plot(val_losses, label='Simple RNN')
plt.plot(lstm_val_losses, label='LSTM')
plt.plot(gru_val_losses, label='GRU')
plt.title('Validation Loss Comparison')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

# Make predictions using LSTM and GRU models
print("\nLSTM Model predictions:")
lstm_pred, lstm_actual = predict_and_plot(lstm_model, X_test, y_test, sine_data, seq_length)

print("\nGRU Model predictions:")
gru_pred, gru_actual = predict_and_plot(gru_model, X_test, y_test, sine_data, seq_length)

## 9. Visualizing RNN Results

Let's create some visualizations to better understand how our RNN models work and process sequential data.

In [None]:
# Compare predictions from all models on the same plot
def compare_model_predictions(models, names, X_test, y_test):
    plt.figure(figsize=(14, 7))
    
    # Get actual values
    y_actual = y_test.cpu().numpy().flatten()
    
    # Plot actual values
    plt.plot(y_actual, 'k-', label='Actual Values', linewidth=2)
    
    # Plot predictions for each model
    colors = ['r', 'g', 'b']
    
    for i, (model, name) in enumerate(zip(models, names)):
        model.eval()
        with torch.no_grad():
            y_pred = model(X_test.to(device)).cpu().numpy().flatten()
        
        plt.plot(y_pred, colors[i]+'--', label=f'{name} Predictions', alpha=0.8)
        
        # Calculate metrics
        mse = np.mean((y_actual - y_pred) ** 2)
        print(f"{name} Test MSE: {mse:.6f}")
    
    plt.title('Model Predictions Comparison')
    plt.xlabel('Test Sample')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

# Compare our three models
models = [trained_model, lstm_model, gru_model]
names = ["Simple RNN", "LSTM", "GRU"]
compare_model_predictions(models, names, X_test, y_test)

In [None]:
# Visualize the hidden state evolution for a simple RNN
class RNNWithHidden(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNNWithHidden, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        self.hidden_size = hidden_size
    
    def forward(self, x, return_hidden=False):
        batch_size = x.size(0)
        h0 = torch.zeros(1, batch_size, self.hidden_size).to(x.device)
        
        # We need to save all hidden states
        all_hidden = []
        
        # Process each time step manually to get all hidden states
        if return_hidden:
            h = h0
            hidden_states = []
            
            for t in range(x.size(1)):
                # Get input for this time step
                x_t = x[:, t:t+1, :]
                
                # Run through RNN
                out, h = self.rnn(x_t, h)
                
                # Save hidden state
                hidden_states.append(h.clone())
            
            # Stack all hidden states
            hidden_states = torch.cat(hidden_states, dim=0).transpose(0, 1)
            
            # Get final output
            out = self.fc(out[:, -1, :])
            
            return out, hidden_states
        else:
            # Standard forward pass
            out, _ = self.rnn(x, h0)
            out = self.fc(out[:, -1, :])
            return out

# Create and train a simple model for visualization
vis_rnn = RNNWithHidden(input_size=1, hidden_size=4, output_size=1).to(device)

# Train the model
vis_rnn_losses = train_model(vis_rnn, X_train, y_train, epochs=50)

# Get a sample sequence and hidden states
sample_idx = 5
sample_sequence = X_test[sample_idx:sample_idx+1]
_, hidden_states = vis_rnn(sample_sequence.to(device), return_hidden=True)
hidden_states = hidden_states.cpu().detach().numpy()[0]  # Get the first batch

# Visualize the hidden state evolution
plt.figure(figsize=(12, 8))

# Plot each hidden unit
for i in range(hidden_states.shape[1]):
    plt.subplot(hidden_states.shape[1], 1, i+1)
    plt.plot(hidden_states[:, i])
    plt.title(f'Hidden Unit {i+1}')
    plt.grid(True)

plt.tight_layout()
plt.suptitle('Evolution of Hidden States Over Time', y=1.05, fontsize=16)
plt.show()

## 10. Conclusion and Next Steps

In this notebook, we've covered:

1. **Basic RNN Concepts**: Understanding the architecture and how information flows through RNNs
2. **Implementation**: Building RNNs from scratch and using PyTorch's built-in modules
3. **Advanced Architectures**: Implementing LSTM and GRU models to handle long-term dependencies 
4. **Training and Evaluation**: Proper training procedures with validation and early stopping
5. **Visualization**: Understanding how the hidden states evolve and model performance analysis

### Next Steps for Further Learning:

1. **Application to Real-World Data**: Apply RNNs to real-world sequential data like stock prices, weather data, or text data
2. **Bidirectional RNNs**: Learn about bidirectional RNNs that process sequences in both directions
3. **Sequence-to-Sequence Models**: Explore models for tasks like translation where both input and output are sequences
4. **Attention Mechanisms**: Implement attention to improve performance on long sequences
5. **Combining CNNs and RNNs**: For tasks like image captioning or video analysis

### Advanced RNN Applications:

- **Text Generation**: Generate human-like text by training on large text corpora
- **Time Series Forecasting**: Predict future values in financial or scientific time series data
- **Anomaly Detection**: Detect unusual patterns in sequential data
- **Music Generation**: Create new music by learning patterns from existing compositions