# Lab 4.4: Introduction to Recurrent Neural Networks

## Duration: 45 minutes

## Learning Objectives
By the end of this lab, you will be able to:
- Understand why RNNs are perfect for sequence data
- Build your first RNN using TensorFlow/Keras
- Compare RNN performance with regular Dense networks on sequence tasks
- Use SimpleRNN, LSTM, and GRU layers
- Apply RNNs to text classification and time series problems

## Prerequisites
- **Labs 4.1, 4.2, & 4.3 completed** (TensorFlow basics, deep networks, CNNs)
- Understanding of neural networks
- Basic knowledge of sequences and time series

## Key Concepts
- **Sequences**: Data where order matters (text, time series, audio)
- **Memory**: How RNNs remember information from previous steps
- **LSTM**: Long Short-Term Memory for long sequences
- **GRU**: Gated Recurrent Unit (simpler than LSTM)
- **Text Processing**: Converting words to numbers

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
import warnings
warnings.filterwarnings('ignore')

# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)

print("Lab 4.4: Introduction to Recurrent Neural Networks")
print("=" * 60)
print(f"TensorFlow version: {tf.__version__}")

print("\n🎯 What are sequences?")
print("  • Text: 'I love deep learning' → words have order!")
print("  • Time series: Stock prices over time")
print("  • Audio: Sound waves over time")
print("  • Video: Frames in temporal order")

print("\n❓ Why can't we use Dense or CNN layers?")
print("  • Dense: Treats all inputs independently")
print("  • CNN: Good for spatial patterns, not temporal")
print("  • We need memory to understand context!")

# Example of sequence importance
examples = [
    "The movie was not bad",  # Positive
    "The movie was bad",      # Negative
    "Not the best movie",     # Negative  
    "The best movie",         # Positive
]

print("\n📝 Sequence Examples (word order matters!):")
for i, example in enumerate(examples, 1):
    print(f"  {i}. '{example}'")
    
print("\n🧠 RNNs can understand context and sequence dependencies!")

## Step 1: The Problem - Dense Networks Can't Handle Sequences

Let's first see why regular networks fail on sequential data:

In [None]:
## Step 4: Comparing RNN Types - LSTM vs GRU vs SimpleRNN

Now let's build LSTM and GRU models and compare their performance with our SimpleRNN:

In [None]:
print("Comparing Different RNN Architectures:")
print("=" * 60)

# Build LSTM model
lstm_model = keras.Sequential([
    layers.Embedding(input_dim=10000, output_dim=32, input_length=max_length),
    layers.LSTM(32, dropout=0.2),
    layers.Dense(1, activation='sigmoid')
], name='LSTM_Model')

# Build GRU model
gru_model = keras.Sequential([
    layers.Embedding(input_dim=10000, output_dim=32, input_length=max_length),
    layers.GRU(32, dropout=0.2),
    layers.Dense(1, activation='sigmoid')
], name='GRU_Model')

# Compile models
for model in [lstm_model, gru_model]:
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy']
    )

print("🔍 Model Architecture Comparison:")
print("-" * 40)

models = {
    'SimpleRNN': simple_rnn_model,
    'LSTM': lstm_model,
    'GRU': gru_model
}

for name, model in models.items():
    if name != 'SimpleRNN':  # SimpleRNN already built
        model.build(input_shape=(None, max_length))
    
    params = model.count_params()
    rnn_layer = next(layer for layer in model.layers if 'rnn' in layer.__class__.__name__.lower() or 
                     'lstm' in layer.__class__.__name__.lower() or 
                     'gru' in layer.__class__.__name__.lower())
    
    print(f"\n{name}:")
    print(f"  • Total parameters: {params:,}")
    print(f"  • RNN layer type: {rnn_layer.__class__.__name__}")
    print(f"  • Hidden units: {rnn_layer.units}")

print("\n🧠 Understanding the Differences:")
print("-" * 40)
print("SimpleRNN:")
print("  ✅ Simple and fast")
print("  ❌ Vanishing gradient problem")
print("  ❌ Poor long-term memory")

print("\nLSTM (Long Short-Term Memory):")
print("  ✅ Solves vanishing gradient")
print("  ✅ Excellent long-term memory")
print("  ❌ More complex, slower")

print("\nGRU (Gated Recurrent Unit):")
print("  ✅ Good long-term memory")
print("  ✅ Simpler than LSTM")
print("  ✅ Faster than LSTM")

# Train LSTM model
print("\n🚀 Training LSTM model...")
lstm_history = lstm_model.fit(
    X_train_padded[:5000], y_train_imdb[:5000],
    validation_split=0.2,
    epochs=3,
    batch_size=128,
    verbose=0
)

# Train GRU model
print("🚀 Training GRU model...")
gru_history = gru_model.fit(
    X_train_padded[:5000], y_train_imdb[:5000],
    validation_split=0.2,
    epochs=3,
    batch_size=128,
    verbose=0
)

# Evaluate all models
print("\n📊 Performance Comparison:")
print("-" * 30)

results = {}
for name, model in models.items():
    loss, accuracy = model.evaluate(X_test_padded[:1000], y_test_imdb[:1000], verbose=0)
    results[name] = {'accuracy': accuracy, 'loss': loss}
    print(f"{name:>10}: {accuracy:.4f} accuracy")

# Find best model
best_model = max(results.items(), key=lambda x: x[1]['accuracy'])
print(f"\n🏆 Best performing model: {best_model[0]} ({best_model[1]['accuracy']:.4f})")

# Visualize training curves
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.plot(simple_rnn_history.history['val_accuracy'], label='SimpleRNN', linewidth=2)
plt.plot(lstm_history.history['val_accuracy'], label='LSTM', linewidth=2)
plt.plot(gru_history.history['val_accuracy'], label='GRU', linewidth=2)
plt.title('Validation Accuracy Comparison')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 3, 2)
model_names = list(results.keys())
accuracies = [results[name]['accuracy'] for name in model_names]
colors = ['orange', 'blue', 'green']
bars = plt.bar(model_names, accuracies, color=colors, alpha=0.7)
plt.title('Final Test Accuracy')
plt.ylabel('Accuracy')
for bar, acc in zip(bars, accuracies):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005,
             f'{acc:.3f}', ha='center', fontweight='bold')
plt.grid(axis='y', alpha=0.3)

plt.subplot(1, 3, 3)
param_counts = [results_model.count_params() for results_model in models.values()]
plt.bar(model_names, param_counts, color=colors, alpha=0.7)
plt.title('Parameter Count')
plt.ylabel('Parameters')
for i, (bar, count) in enumerate(zip(plt.gca().patches, param_counts)):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 500,
             f'{count:,}', ha='center', fontweight='bold', rotation=45 if count > 50000 else 0)

plt.tight_layout()
plt.show()

print("\n💡 Key Insights:")
print("  • LSTM/GRU typically outperform SimpleRNN")
print("  • GRU often matches LSTM performance with fewer parameters")
print("  • All RNNs handle sequences better than Dense networks")
print("  • Choice depends on your specific task and computational budget")

## Step 3: Building Your First SimpleRNN

Now let's create a Simple RNN to understand the basics:

In [None]:
print("Preparing Sequential Data for RNNs:")
print("=" * 50)

# Pad sequences to make them the same length
max_length = 200  # Reasonable length that captures most reviews

print(f"Padding sequences to length {max_length}...")
X_train_padded = pad_sequences(X_train_imdb, maxlen=max_length, padding='post', truncating='post')
X_test_padded = pad_sequences(X_test_imdb, maxlen=max_length, padding='post', truncating='post')

print(f"Training data shape: {X_train_padded.shape}")
print(f"Test data shape: {X_test_padded.shape}")

# Show what padding looks like
print("\nExample of padded sequence:")
sample_review = X_train_padded[0]
print(f"Original length: {len(X_train_imdb[0])}")
print(f"Padded length: {len(sample_review)}")
print(f"First 20 words: {sample_review[:20]}")
print(f"Last 20 words: {sample_review[-20:]}")
print("(Note: 0s are padding tokens)")

# Visualize the impact of padding
plt.figure(figsize=(12, 4))

# Before padding
plt.subplot(1, 2, 1)
original_lengths = [len(x) for x in X_train_imdb[:1000]]
plt.hist(original_lengths, bins=30, alpha=0.7, color='blue')
plt.title('Original Sequence Lengths')
plt.xlabel('Length')
plt.ylabel('Frequency')
plt.axvline(max_length, color='red', linestyle='--', label=f'Max Length: {max_length}')
plt.legend()

# After padding
plt.subplot(1, 2, 2)
padded_lengths = [np.count_nonzero(x) for x in X_train_padded[:1000]]  # Count non-zero (non-padding)
plt.hist(padded_lengths, bins=30, alpha=0.7, color='green')
plt.title('Actual Content Lengths (After Padding)')
plt.xlabel('Content Length')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

# Calculate padding statistics
actual_content_lengths = [np.count_nonzero(x) for x in X_train_padded]
truncated_samples = sum(1 for length in [len(x) for x in X_train_imdb] if length > max_length)
padding_ratio = 1 - (np.mean(actual_content_lengths) / max_length)

print(f"\n📊 Padding Impact:")
print(f"  Samples truncated: {truncated_samples} ({truncated_samples/len(X_train_imdb)*100:.1f}%)")
print(f"  Average padding ratio: {padding_ratio:.2f} ({padding_ratio*100:.1f}% padding)")
print(f"  Efficiency: {(1-padding_ratio)*100:.1f}% of data is actual content")

print("\n✅ Data Preparation Complete:")
print("  • Sequences padded to uniform length")
print("  • Truncated long sequences")
print("  • Ready for RNN processing")
print("  • Each word is represented by an integer ID")

print("Preparing Sequential Data for RNNs:")
print("=" * 50)

# Load IMDB movie review dataset for sentiment analysis
print("Loading IMDB movie reviews dataset...")
(X_train_imdb, y_train_imdb), (X_test_imdb, y_test_imdb) = imdb.load_data(num_words=10000)

print(f"Training samples: {len(X_train_imdb)}")
print(f"Test samples: {len(X_test_imdb)}")
print(f"Classes: {len(set(y_train_imdb))} (0=negative, 1=positive)")

# Look at some sample reviews
word_index = imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}

def decode_review(encoded_review):
    return ' '.join([reverse_word_index.get(i - 3, '?') for i in encoded_review])

print("\nSample Reviews:")
print("-" * 30)
for i in range(3):
    review_text = decode_review(X_train_imdb[i])
    sentiment = "POSITIVE" if y_train_imdb[i] == 1 else "NEGATIVE"
    print(f"\nReview {i+1} ({sentiment}):")
    print(f"Length: {len(X_train_imdb[i])} words")
    print(f"Text: {review_text[:200]}...")

# Show the sequence length problem
sequence_lengths = [len(x) for x in X_train_imdb]
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.hist(sequence_lengths, bins=50, alpha=0.7)
plt.title('Distribution of Review Lengths')
plt.xlabel('Number of Words')
plt.ylabel('Frequency')
plt.axvline(np.mean(sequence_lengths), color='red', linestyle='--', 
           label=f'Mean: {np.mean(sequence_lengths):.0f}')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(range(len(sequence_lengths[:1000])), sorted(sequence_lengths[:1000]))
plt.title('Sequence Lengths (First 1000 samples)')
plt.xlabel('Sample Index')
plt.ylabel('Number of Words')

plt.tight_layout()
plt.show()

print(f"\n📊 Sequence Statistics:")
print(f"  Shortest review: {min(sequence_lengths)} words")
print(f"  Longest review: {max(sequence_lengths)} words")
print(f"  Average length: {np.mean(sequence_lengths):.1f} words")
print(f"  Standard deviation: {np.std(sequence_lengths):.1f} words")

print("\n❗ Problems for Dense Networks:")
print("  • Variable length inputs (can't handle different sizes)")
print("  • Would need to pad/truncate all sequences")
print("  • No understanding of word order")
print("  • Can't capture long-range dependencies")
print("  • 'movie great' vs 'great movie' treated identically")

# Pad sequences to make them the same length
max_length = 200  # Reasonable length that captures most reviews

print(f"\n🔧 Padding sequences to length {max_length}...")
X_train_padded = pad_sequences(X_train_imdb, maxlen=max_length, padding='post', truncating='post')
X_test_padded = pad_sequences(X_test_imdb, maxlen=max_length, padding='post', truncating='post')

print(f"Training data shape: {X_train_padded.shape}")
print(f"Test data shape: {X_test_padded.shape}")

# Show what padding looks like
print("\nExample of padded sequence:")
sample_review = X_train_padded[0]
print(f"Original length: {len(X_train_imdb[0])}")
print(f"Padded length: {len(sample_review)}")
print(f"First 20 words: {sample_review[:20]}")
print(f"Last 20 words: {sample_review[-20:]}")
print("(Note: 0s are padding tokens)")

print("\n✅ Data Preparation Complete:")
print("  • Sequences padded to uniform length")
print("  • Ready for RNN processing")
print("  • Each word is represented by an integer ID")

## Step 2: Preparing Sequential Data

Before we can use RNNs, we need to prepare our sequential data: