# Lesson 9a: Convolutional Neural Networks & Transfer Learning

**Learning Objectives:**
- Understand the fundamental components of Convolutional Neural Networks (CNNs)
- Learn how convolution, pooling, and activation layers work together
- Build CNNs from scratch for image classification tasks
- Master transfer learning techniques using pre-trained models
- Apply data augmentation strategies to improve model generalization
- Implement fine-tuning strategies for domain adaptation

**Prerequisites:**
- Understanding of basic neural networks (Lesson 3a, 3b)
- Familiarity with gradient descent and backpropagation
- Basic knowledge of Python and NumPy

**Why CNNs Matter in 2025:**
Convolutional Neural Networks revolutionized computer vision and remain the foundation for modern visual AI systems. They are essential for image classification, object detection, medical imaging, autonomous vehicles, and countless production applications. Understanding CNNs is critical for any machine learning practitioner working with visual or spatial data.

---

## 1. Installation & Setup

This notebook requires TensorFlow/Keras for deep learning implementations. We'll automatically install all required dependencies.

In [None]:
# Auto-install required packages
import sys
import subprocess

def install_package(package_name):
    """Install a package using pip if not already installed."""
    try:
        __import__(package_name.split('[')[0])
        print(f"✓ {package_name} already installed")
    except ImportError:
        print(f"Installing {package_name}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package_name])
        print(f"✓ {package_name} installed successfully")

# Required packages
packages = [
    'tensorflow>=2.13.0',
    'numpy>=1.24.0',
    'matplotlib>=3.7.0',
    'scikit-learn>=1.3.0',
    'pillow>=10.0.0',
    'seaborn>=0.12.0'
]

for package in packages:
    install_package(package)

print("\n✓ All dependencies installed successfully!")

In [None]:
# Standard imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Tuple, List
import warnings

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16, ResNet50, MobileNetV2
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# Scikit-learn utilities
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# Configuration
warnings.filterwarnings('ignore')
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")

---

## 2. CNN Fundamentals: Understanding the Building Blocks

### 2.1 What Makes CNNs Special?

Unlike fully connected neural networks, CNNs leverage three key architectural innovations:

1. **Local Connectivity**: Each neuron connects only to a small region of the input, capturing local patterns
2. **Parameter Sharing**: The same filter is applied across the entire image, drastically reducing parameters
3. **Translation Invariance**: Features detected anywhere in the image are recognized equally well

### 2.2 Core Components

**Convolutional Layers:**
- Apply learnable filters (kernels) to detect features like edges, textures, and patterns
- Each filter slides across the input to produce a feature map
- Mathematical operation: element-wise multiplication and sum

**Pooling Layers:**
- Reduce spatial dimensions while retaining important information
- Max pooling: takes maximum value in each window (most common)
- Average pooling: takes average value in each window
- Benefits: reduces computation, provides translation invariance, prevents overfitting

**Activation Functions:**
- ReLU (Rectified Linear Unit): most common, f(x) = max(0, x)
- Introduces non-linearity, enabling complex pattern recognition

Let's visualize these operations:

In [None]:
def visualize_convolution_operation():
    """Demonstrate how convolution works with a simple example."""
    
    # Create a simple 6x6 input image with a vertical edge
    input_image = np.array([
        [0, 0, 0, 1, 1, 1],
        [0, 0, 0, 1, 1, 1],
        [0, 0, 0, 1, 1, 1],
        [0, 0, 0, 1, 1, 1],
        [0, 0, 0, 1, 1, 1],
        [0, 0, 0, 1, 1, 1]
    ])
    
    # Vertical edge detection filter (Sobel operator)
    vertical_filter = np.array([
        [-1, 0, 1],
        [-1, 0, 1],
        [-1, 0, 1]
    ])
    
    # Horizontal edge detection filter
    horizontal_filter = np.array([
        [-1, -1, -1],
        [0, 0, 0],
        [1, 1, 1]
    ])
    
    # Manual convolution operation
    def convolve2d(image, kernel):
        """Simple 2D convolution without padding."""
        output_size = image.shape[0] - kernel.shape[0] + 1
        output = np.zeros((output_size, output_size))
        
        for i in range(output_size):
            for j in range(output_size):
                region = image[i:i+kernel.shape[0], j:j+kernel.shape[1]]
                output[i, j] = np.sum(region * kernel)
        
        return output
    
    vertical_output = convolve2d(input_image, vertical_filter)
    horizontal_output = convolve2d(input_image, horizontal_filter)
    
    # Visualization
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    # Row 1: Vertical edge detection
    axes[0, 0].imshow(input_image, cmap='gray')
    axes[0, 0].set_title('Input Image\n(Vertical Edge)', fontsize=12, fontweight='bold')
    axes[0, 0].axis('off')
    
    axes[0, 1].imshow(vertical_filter, cmap='RdBu', vmin=-1, vmax=1)
    axes[0, 1].set_title('Vertical Edge Filter\n(Sobel)', fontsize=12, fontweight='bold')
    for i in range(3):
        for j in range(3):
            axes[0, 1].text(j, i, f'{vertical_filter[i, j]:.0f}', 
                          ha='center', va='center', fontsize=10)
    axes[0, 1].axis('off')
    
    axes[0, 2].imshow(vertical_output, cmap='hot')
    axes[0, 2].set_title('Feature Map\n(Strong Response!)', fontsize=12, fontweight='bold')
    axes[0, 2].axis('off')
    
    # Row 2: Horizontal edge detection
    axes[1, 0].imshow(input_image, cmap='gray')
    axes[1, 0].set_title('Same Input Image', fontsize=12, fontweight='bold')
    axes[1, 0].axis('off')
    
    axes[1, 1].imshow(horizontal_filter, cmap='RdBu', vmin=-1, vmax=1)
    axes[1, 1].set_title('Horizontal Edge Filter', fontsize=12, fontweight='bold')
    for i in range(3):
        for j in range(3):
            axes[1, 1].text(j, i, f'{horizontal_filter[i, j]:.0f}', 
                          ha='center', va='center', fontsize=10)
    axes[1, 1].axis('off')
    
    axes[1, 2].imshow(horizontal_output, cmap='hot')
    axes[1, 2].set_title('Feature Map\n(Weak Response)', fontsize=12, fontweight='bold')
    axes[1, 2].axis('off')
    
    plt.tight_layout()
    plt.savefig('cnn_convolution_demo.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("Key Insight:")
    print("The vertical edge filter strongly responds to vertical edges in the image.")
    print("The horizontal edge filter shows weak response to the same vertical edge.")
    print("This demonstrates how different filters detect different features!")

visualize_convolution_operation()

In [None]:
def demonstrate_pooling():
    """Visualize max pooling and average pooling operations."""
    
    # Create a sample feature map
    feature_map = np.array([
        [1, 3, 2, 4],
        [5, 6, 1, 3],
        [2, 4, 8, 2],
        [1, 3, 5, 7]
    ])
    
    # Max pooling 2x2
    max_pooled = np.array([
        [np.max(feature_map[0:2, 0:2]), np.max(feature_map[0:2, 2:4])],
        [np.max(feature_map[2:4, 0:2]), np.max(feature_map[2:4, 2:4])]
    ])
    
    # Average pooling 2x2
    avg_pooled = np.array([
        [np.mean(feature_map[0:2, 0:2]), np.mean(feature_map[0:2, 2:4])],
        [np.mean(feature_map[2:4, 0:2]), np.mean(feature_map[2:4, 2:4])]
    ])
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    
    # Original feature map
    im1 = axes[0].imshow(feature_map, cmap='viridis', vmin=0, vmax=8)
    axes[0].set_title('Original Feature Map\n(4×4)', fontsize=12, fontweight='bold')
    for i in range(4):
        for j in range(4):
            axes[0].text(j, i, f'{feature_map[i, j]:.0f}', 
                        ha='center', va='center', fontsize=14, color='white')
    axes[0].set_xticks([])
    axes[0].set_yticks([])
    
    # Max pooled
    im2 = axes[1].imshow(max_pooled, cmap='viridis', vmin=0, vmax=8)
    axes[1].set_title('Max Pooling (2×2)\nTakes Maximum', fontsize=12, fontweight='bold')
    for i in range(2):
        for j in range(2):
            axes[1].text(j, i, f'{max_pooled[i, j]:.0f}', 
                        ha='center', va='center', fontsize=14, color='white')
    axes[1].set_xticks([])
    axes[1].set_yticks([])
    
    # Average pooled
    im3 = axes[2].imshow(avg_pooled, cmap='viridis', vmin=0, vmax=8)
    axes[2].set_title('Average Pooling (2×2)\nTakes Average', fontsize=12, fontweight='bold')
    for i in range(2):
        for j in range(2):
            axes[2].text(j, i, f'{avg_pooled[i, j]:.1f}', 
                        ha='center', va='center', fontsize=14, color='white')
    axes[2].set_xticks([])
    axes[2].set_yticks([])
    
    plt.tight_layout()
    plt.savefig('cnn_pooling_demo.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("\nPooling Effects:")
    print(f"Original size: {feature_map.shape} → Pooled size: {max_pooled.shape}")
    print(f"Parameters reduced: {feature_map.size} → {max_pooled.size} (75% reduction)")
    print("\nMax pooling preserves strongest activations (most common choice).")
    print("Average pooling preserves overall information (useful for certain architectures).")

demonstrate_pooling()

---

## 3. Building Your First CNN from Scratch

We'll build a CNN for the classic MNIST handwritten digit classification task. This demonstrates all core CNN concepts in a working implementation.

In [None]:
# Load MNIST dataset
print("Loading MNIST dataset...")
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()

# Normalize pixel values to [0, 1] range
X_train_full = X_train_full.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Reshape to add channel dimension (28, 28) → (28, 28, 1)
X_train_full = X_train_full.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

# Create validation split
X_train, X_val = X_train_full[:-10000], X_train_full[-10000:]
y_train, y_val = y_train_full[:-10000], y_train_full[-10000:]

print(f"\nDataset shapes:")
print(f"Training: {X_train.shape}, Labels: {y_train.shape}")
print(f"Validation: {X_val.shape}, Labels: {y_val.shape}")
print(f"Test: {X_test.shape}, Labels: {y_test.shape}")

# Visualize sample images
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(X_train[i].squeeze(), cmap='gray')
    ax.set_title(f'Label: {y_train[i]}', fontsize=11)
    ax.axis('off')
plt.suptitle('Sample MNIST Digits', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
def build_simple_cnn(input_shape=(28, 28, 1), num_classes=10):
    """
    Build a simple CNN for MNIST classification.
    
    Architecture:
    - Conv2D(32 filters, 3x3) + ReLU + MaxPooling(2x2)
    - Conv2D(64 filters, 3x3) + ReLU + MaxPooling(2x2)
    - Flatten
    - Dense(128) + ReLU + Dropout(0.5)
    - Dense(num_classes) + Softmax
    """
    model = models.Sequential([
        # First convolutional block
        layers.Conv2D(32, kernel_size=(3, 3), activation='relu', 
                     input_shape=input_shape, name='conv1'),
        layers.MaxPooling2D(pool_size=(2, 2), name='pool1'),
        
        # Second convolutional block
        layers.Conv2D(64, kernel_size=(3, 3), activation='relu', name='conv2'),
        layers.MaxPooling2D(pool_size=(2, 2), name='pool2'),
        
        # Flatten and fully connected layers
        layers.Flatten(name='flatten'),
        layers.Dense(128, activation='relu', name='fc1'),
        layers.Dropout(0.5, name='dropout'),
        layers.Dense(num_classes, activation='softmax', name='output')
    ])
    
    return model

# Build the model
model = build_simple_cnn()

# Compile with optimizer, loss, and metrics
model.compile(
    optimizer=optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Display model architecture
print("\n" + "="*70)
print("CNN ARCHITECTURE SUMMARY")
print("="*70)
model.summary()
print("="*70)

In [None]:
# Define callbacks for training
callbacks = [
    EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True,
        verbose=1
    ),
    ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=3,
        min_lr=1e-7,
        verbose=1
    )
]

# Train the model
print("\nTraining CNN on MNIST...\n")
history = model.fit(
    X_train, y_train,
    batch_size=128,
    epochs=15,
    validation_data=(X_val, y_val),
    callbacks=callbacks,
    verbose=1
)

print("\n✓ Training completed!")

In [None]:
# Visualize training history
def plot_training_history(history):
    """Plot training and validation metrics."""
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Accuracy plot
    ax1.plot(history.history['accuracy'], label='Training Accuracy', linewidth=2)
    ax1.plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
    ax1.set_xlabel('Epoch', fontsize=12)
    ax1.set_ylabel('Accuracy', fontsize=12)
    ax1.set_title('Model Accuracy', fontsize=14, fontweight='bold')
    ax1.legend(fontsize=11)
    ax1.grid(True, alpha=0.3)
    
    # Loss plot
    ax2.plot(history.history['loss'], label='Training Loss', linewidth=2)
    ax2.plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
    ax2.set_xlabel('Epoch', fontsize=12)
    ax2.set_ylabel('Loss', fontsize=12)
    ax2.set_title('Model Loss', fontsize=14, fontweight='bold')
    ax2.legend(fontsize=11)
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('cnn_training_history.png', dpi=150, bbox_inches='tight')
    plt.show()

plot_training_history(history)

# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"\n{'='*60}")
print(f"FINAL TEST RESULTS")
print(f"{'='*60}")
print(f"Test Accuracy: {test_accuracy*100:.2f}%")
print(f"Test Loss: {test_loss:.4f}")
print(f"{'='*60}")

In [None]:
# Generate predictions and confusion matrix
y_pred = model.predict(X_test, verbose=0)
y_pred_classes = np.argmax(y_pred, axis=1)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred_classes)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar_kws={'label': 'Count'})
plt.xlabel('Predicted Label', fontsize=12)
plt.ylabel('True Label', fontsize=12)
plt.title('Confusion Matrix - MNIST Classification', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('cnn_confusion_matrix.png', dpi=150, bbox_inches='tight')
plt.show()

# Classification report
print("\nDetailed Classification Report:")
print("="*60)
print(classification_report(y_test, y_pred_classes, 
                          target_names=[str(i) for i in range(10)]))

---

## 4. Transfer Learning: Leveraging Pre-trained Models

### Why Transfer Learning?

Training deep CNNs from scratch requires:
- Massive datasets (millions of images)
- Extensive computational resources (GPUs/TPUs)
- Days or weeks of training time

**Transfer learning solves this by:**
1. Using models pre-trained on large datasets (ImageNet: 14M images, 1000 classes)
2. Fine-tuning these models for your specific task
3. Achieving excellent results with small datasets and limited compute

### Common Pre-trained Architectures (2025):

- **VGG16/VGG19**: Simple, deep architectures (16-19 layers)
- **ResNet50/ResNet101**: Residual connections enable very deep networks (50-152 layers)
- **MobileNetV2/V3**: Lightweight models for mobile/edge deployment
- **EfficientNet**: State-of-the-art accuracy with compound scaling
- **Vision Transformers (ViT)**: Transformer-based architectures (covered in Lesson 9c)

We'll demonstrate transfer learning using a small custom dataset.

In [None]:
# Create a synthetic image classification dataset
# In practice, you would load your own images

def create_synthetic_image_dataset(num_samples=1000, img_size=224):
    """
    Create synthetic image dataset for demonstration.
    In real applications, use your own image data.
    
    Returns:
        X: Images of shape (num_samples, img_size, img_size, 3)
        y: Labels (0 or 1 for binary classification)
    """
    np.random.seed(42)
    
    X = []
    y = []
    
    for i in range(num_samples):
        # Class 0: Images with more blue channel
        if i < num_samples // 2:
            img = np.random.rand(img_size, img_size, 3)
            img[:, :, 2] += 0.3  # Boost blue channel
            img = np.clip(img, 0, 1)
            y.append(0)
        # Class 1: Images with more red channel
        else:
            img = np.random.rand(img_size, img_size, 3)
            img[:, :, 0] += 0.3  # Boost red channel
            img = np.clip(img, 0, 1)
            y.append(1)
        
        X.append(img)
    
    return np.array(X, dtype='float32'), np.array(y)

# Create dataset
print("Creating synthetic image dataset...")
X_images, y_images = create_synthetic_image_dataset(num_samples=1000, img_size=224)

# Split into train/validation/test
X_temp, X_test_tl, y_temp, y_test_tl = train_test_split(
    X_images, y_images, test_size=0.2, random_state=42, stratify=y_images
)
X_train_tl, X_val_tl, y_train_tl, y_val_tl = train_test_split(
    X_temp, y_temp, test_size=0.2, random_state=42, stratify=y_temp
)

print(f"\nDataset splits:")
print(f"Train: {X_train_tl.shape}, Labels: {y_train_tl.shape}")
print(f"Validation: {X_val_tl.shape}, Labels: {y_val_tl.shape}")
print(f"Test: {X_test_tl.shape}, Labels: {y_test_tl.shape}")

# Visualize samples
fig, axes = plt.subplots(2, 4, figsize=(12, 6))
for i in range(4):
    axes[0, i].imshow(X_train_tl[i])
    axes[0, i].set_title(f'Class {y_train_tl[i]} (Blue-ish)', fontsize=10)
    axes[0, i].axis('off')
    
    axes[1, i].imshow(X_train_tl[500 + i])
    axes[1, i].set_title(f'Class {y_train_tl[500 + i]} (Red-ish)', fontsize=10)
    axes[1, i].axis('off')

plt.suptitle('Sample Images from Synthetic Dataset', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
def build_transfer_learning_model(base_model_name='ResNet50', trainable_layers=0):
    """
    Build a transfer learning model using a pre-trained base.
    
    Args:
        base_model_name: 'VGG16', 'ResNet50', or 'MobileNetV2'
        trainable_layers: Number of top layers to make trainable (0 = freeze all)
    
    Returns:
        Compiled Keras model
    """
    # Load pre-trained base model
    if base_model_name == 'VGG16':
        base_model = VGG16(weights='imagenet', include_top=False, 
                          input_shape=(224, 224, 3))
    elif base_model_name == 'ResNet50':
        base_model = ResNet50(weights='imagenet', include_top=False, 
                             input_shape=(224, 224, 3))
    elif base_model_name == 'MobileNetV2':
        base_model = MobileNetV2(weights='imagenet', include_top=False, 
                                input_shape=(224, 224, 3))
    else:
        raise ValueError(f"Unknown base model: {base_model_name}")
    
    # Freeze base model layers
    base_model.trainable = False
    
    # Optionally unfreeze top layers for fine-tuning
    if trainable_layers > 0:
        base_model.trainable = True
        for layer in base_model.layers[:-trainable_layers]:
            layer.trainable = False
    
    # Build complete model
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(1, activation='sigmoid')  # Binary classification
    ])
    
    # Compile
    model.compile(
        optimizer=optimizers.Adam(learning_rate=0.0001),
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# Build transfer learning model using ResNet50
print("Building transfer learning model with ResNet50...\n")
tl_model = build_transfer_learning_model(base_model_name='ResNet50', trainable_layers=0)

print("\n" + "="*70)
print("TRANSFER LEARNING MODEL SUMMARY")
print("="*70)
tl_model.summary()
print("="*70)

# Count trainable vs non-trainable parameters
trainable_params = sum([np.prod(v.shape) for v in tl_model.trainable_weights])
non_trainable_params = sum([np.prod(v.shape) for v in tl_model.non_trainable_weights])

print(f"\nTrainable parameters: {trainable_params:,}")
print(f"Non-trainable parameters: {non_trainable_params:,}")
print(f"Total parameters: {trainable_params + non_trainable_params:,}")

In [None]:
# Data augmentation for better generalization
datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    zoom_range=0.2,
    fill_mode='nearest'
)

# Train the transfer learning model
print("\nTraining transfer learning model...\n")

history_tl = tl_model.fit(
    datagen.flow(X_train_tl, y_train_tl, batch_size=32),
    epochs=10,
    validation_data=(X_val_tl, y_val_tl),
    callbacks=[
        EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
        ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)
    ],
    verbose=1
)

print("\n✓ Transfer learning training completed!")

In [None]:
# Evaluate transfer learning model
test_loss_tl, test_acc_tl = tl_model.evaluate(X_test_tl, y_test_tl, verbose=0)

print(f"\n{'='*60}")
print(f"TRANSFER LEARNING TEST RESULTS")
print(f"{'='*60}")
print(f"Test Accuracy: {test_acc_tl*100:.2f}%")
print(f"Test Loss: {test_loss_tl:.4f}")
print(f"{'='*60}")

# Plot training history
plot_training_history(history_tl)

---

## 5. Fine-Tuning Strategy

### Two-Stage Fine-Tuning Approach:

**Stage 1: Feature Extraction**
- Freeze all pre-trained layers
- Train only the new top layers
- Fast training, prevents catastrophic forgetting

**Stage 2: Fine-Tuning**
- Unfreeze top layers of base model
- Train with very low learning rate
- Adapts features to your specific domain

Let's demonstrate this strategy:

In [None]:
# Stage 2: Fine-tuning - Unfreeze top layers
print("Stage 2: Fine-tuning top layers...\n")

# Rebuild model with some trainable layers
tl_model_finetuned = build_transfer_learning_model(
    base_model_name='ResNet50', 
    trainable_layers=10  # Unfreeze last 10 layers
)

# Use much lower learning rate for fine-tuning
tl_model_finetuned.compile(
    optimizer=optimizers.Adam(learning_rate=1e-5),  # 100x smaller learning rate
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("Fine-tuning with unfrozen top layers...\n")

history_ft = tl_model_finetuned.fit(
    datagen.flow(X_train_tl, y_train_tl, batch_size=32),
    epochs=10,
    validation_data=(X_val_tl, y_val_tl),
    callbacks=[
        EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
        ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=1e-7)
    ],
    verbose=1
)

# Evaluate fine-tuned model
test_loss_ft, test_acc_ft = tl_model_finetuned.evaluate(X_test_tl, y_test_tl, verbose=0)

print(f"\n{'='*60}")
print(f"FINE-TUNED MODEL TEST RESULTS")
print(f"{'='*60}")
print(f"Test Accuracy: {test_acc_ft*100:.2f}%")
print(f"Test Loss: {test_loss_ft:.4f}")
print(f"{'='*60}")
print(f"\nImprovement from fine-tuning: {(test_acc_ft - test_acc_tl)*100:.2f}%")

---

## 6. Data Augmentation Techniques

Data augmentation artificially expands your training dataset by applying transformations. This is critical for preventing overfitting when training with limited data.

### Common Augmentation Techniques:

1. **Geometric Transformations**: Rotation, flipping, shifting, zooming
2. **Color Transformations**: Brightness, contrast, saturation adjustments
3. **Advanced Techniques**: Cutout, mixup, CutMix (advanced)

Let's visualize different augmentations:


In [None]:
# Demonstrate various data augmentation techniques
sample_image = X_train_tl[0:1]  # Take first image

# Define different augmentation strategies
augmentation_configs = [
    ('Original', ImageDataGenerator()),
    ('Rotation (30°)', ImageDataGenerator(rotation_range=30)),
    ('Horizontal Flip', ImageDataGenerator(horizontal_flip=True)),
    ('Width Shift (20%)', ImageDataGenerator(width_shift_range=0.2)),
    ('Height Shift (20%)', ImageDataGenerator(height_shift_range=0.2)),
    ('Zoom (20%)', ImageDataGenerator(zoom_range=0.2)),
    ('Brightness (±30%)', ImageDataGenerator(brightness_range=[0.7, 1.3])),
    ('Combined', ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        horizontal_flip=True,
        zoom_range=0.2,
        brightness_range=[0.8, 1.2]
    ))
]

fig, axes = plt.subplots(2, 4, figsize=(15, 8))
axes = axes.flatten()

for idx, (name, datagen) in enumerate(augmentation_configs):
    datagen.fit(sample_image)
    aug_iter = datagen.flow(sample_image, batch_size=1)
    aug_image = next(aug_iter)[0]
    
    axes[idx].imshow(aug_image)
    axes[idx].set_title(name, fontsize=11, fontweight='bold')
    axes[idx].axis('off')

plt.suptitle('Data Augmentation Techniques', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('data_augmentation_examples.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nData Augmentation Best Practices:")
print("1. Apply augmentation ONLY to training data, never to validation/test")
print("2. Choose augmentations that match your domain (e.g., no vertical flip for text)")
print("3. Combine multiple techniques for stronger regularization")
print("4. Monitor validation performance to avoid excessive augmentation")

---

## 7. Modern CNN Architectures Overview

### Evolution of CNN Architectures (2012-2025):

**AlexNet (2012)**
- 8 layers, 60M parameters
- First to use ReLU and dropout
- Won ImageNet by huge margin

**VGGNet (2014)**
- Very deep (16-19 layers)
- Simple architecture: 3×3 convs only
- 138M parameters (VGG16)

**ResNet (2015)**
- Introduced skip connections (residual learning)
- Enabled training of very deep networks (50-152 layers)
- Solved vanishing gradient problem

**Inception/GoogLeNet (2015)**
- Multi-scale processing with inception modules
- Efficient with fewer parameters

**MobileNet (2017-2019)**
- Designed for mobile and edge devices
- Depthwise separable convolutions
- 10-100× fewer parameters than ResNet

**EfficientNet (2019)**
- Compound scaling (depth, width, resolution)
- State-of-the-art accuracy-to-efficiency ratio

**Vision Transformers (2020-2025)**
- Apply transformer architecture to images
- Can outperform CNNs with sufficient data
- Covered in detail in Lesson 9c

Let's compare some key architectures:

In [None]:
# Compare different pre-trained architectures
architectures = [
    ('VGG16', VGG16),
    ('ResNet50', ResNet50),
    ('MobileNetV2', MobileNetV2)
]

comparison_data = []

print("\n" + "="*80)
print("CNN ARCHITECTURE COMPARISON")
print("="*80)
print(f"{'Architecture':<20} {'Parameters':<15} {'Layers':<10} {'Top-1 Acc*':<15}")
print("="*80)

for name, arch_class in architectures:
    model = arch_class(weights='imagenet', include_top=True, input_shape=(224, 224, 3))
    params = model.count_params()
    num_layers = len(model.layers)
    
    # Approximate ImageNet top-1 accuracy (from literature)
    accuracies = {'VGG16': 71.3, 'ResNet50': 76.1, 'MobileNetV2': 71.8}
    accuracy = accuracies.get(name, 'N/A')
    
    print(f"{name:<20} {params:>13,}  {num_layers:<10} {accuracy}%")
    
    comparison_data.append({
        'name': name,
        'params': params / 1e6,  # Convert to millions
        'accuracy': accuracy
    })

print("="*80)
print("* ImageNet Top-1 Accuracy (approximate)\n")

# Visualize comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

names = [d['name'] for d in comparison_data]
params = [d['params'] for d in comparison_data]
accs = [d['accuracy'] for d in comparison_data]

# Parameters comparison
ax1.bar(names, params, color=['#3498db', '#e74c3c', '#2ecc71'])
ax1.set_ylabel('Parameters (Millions)', fontsize=12)
ax1.set_title('Model Size Comparison', fontsize=13, fontweight='bold')
ax1.grid(axis='y', alpha=0.3)

# Accuracy comparison
ax2.bar(names, accs, color=['#3498db', '#e74c3c', '#2ecc71'])
ax2.set_ylabel('Top-1 Accuracy (%)', fontsize=12)
ax2.set_title('ImageNet Accuracy Comparison', fontsize=13, fontweight='bold')
ax2.set_ylim([65, 80])
ax2.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('architecture_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nKey Insights:")
print("• ResNet50: Best accuracy but moderate size")
print("• MobileNetV2: Smallest size, good for mobile deployment")
print("• VGG16: Largest size, simple architecture, decent accuracy")

---

## 8. Production Best Practices

### Key Considerations for Deploying CNNs:

**1. Model Selection:**
- Choose architecture based on deployment constraints (mobile, cloud, edge)
- Balance accuracy vs. inference speed vs. model size
- Consider MobileNet/EfficientNet for resource-constrained environments

**2. Optimization Techniques:**
- Quantization: Convert FP32 to INT8 (4× smaller, faster inference)
- Pruning: Remove redundant weights
- Knowledge distillation: Train smaller model to mimic larger one

**3. Training Best Practices:**
- Always use data augmentation to prevent overfitting
- Implement early stopping and learning rate scheduling
- Monitor validation metrics closely
- Use proper train/val/test splits (no data leakage!)

**4. Transfer Learning Guidelines:**
- Start with pre-trained weights when possible
- Freeze base layers initially, fine-tune later
- Use very low learning rates for fine-tuning (1e-5 to 1e-6)
- Match input preprocessing to pre-training (ImageNet normalization)

**5. Monitoring and Debugging:**
- Track training curves (loss, accuracy)
- Visualize predictions and errors
- Use confusion matrices for classification tasks
- Check class balance and distribution shifts

In [None]:
# Production-ready training pipeline example

class CNNProductionPipeline:
    """Production-ready CNN training pipeline with best practices."""
    
    def __init__(self, base_model='ResNet50', num_classes=10):
        self.base_model_name = base_model
        self.num_classes = num_classes
        self.model = None
        self.history = None
    
    def build_model(self, input_shape=(224, 224, 3), trainable_base=False):
        """Build transfer learning model with custom top layers."""
        
        # Load pre-trained base
        base_models = {
            'VGG16': VGG16,
            'ResNet50': ResNet50,
            'MobileNetV2': MobileNetV2
        }
        
        base = base_models[self.base_model_name](
            weights='imagenet',
            include_top=False,
            input_shape=input_shape
        )
        
        base.trainable = trainable_base
        
        # Build complete model
        self.model = models.Sequential([
            base,
            layers.GlobalAveragePooling2D(),
            layers.BatchNormalization(),
            layers.Dense(512, activation='relu'),
            layers.Dropout(0.5),
            layers.Dense(256, activation='relu'),
            layers.Dropout(0.3),
            layers.Dense(self.num_classes, activation='softmax')
        ])
        
        return self.model
    
    def compile_model(self, learning_rate=0.001):
        """Compile model with optimizer and metrics."""
        
        self.model.compile(
            optimizer=optimizers.Adam(learning_rate=learning_rate),
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy', 
                    keras.metrics.TopKCategoricalAccuracy(k=5, name='top5_acc')]
        )
    
    def get_callbacks(self, checkpoint_path='best_model.h5'):
        """Create production callbacks for training."""
        
        return [
            # Save best model
            keras.callbacks.ModelCheckpoint(
                checkpoint_path,
                monitor='val_accuracy',
                save_best_only=True,
                verbose=1
            ),
            
            # Early stopping
            EarlyStopping(
                monitor='val_loss',
                patience=10,
                restore_best_weights=True,
                verbose=1
            ),
            
            # Learning rate reduction
            ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=5,
                min_lr=1e-7,
                verbose=1
            ),
            
            # TensorBoard logging
            keras.callbacks.TensorBoard(
                log_dir='./logs',
                histogram_freq=1
            )
        ]
    
    def create_data_augmentation(self):
        """Create data augmentation pipeline."""
        
        return ImageDataGenerator(
            rotation_range=20,
            width_shift_range=0.2,
            height_shift_range=0.2,
            horizontal_flip=True,
            zoom_range=0.2,
            shear_range=0.15,
            brightness_range=[0.8, 1.2],
            fill_mode='nearest'
        )
    
    def train(self, X_train, y_train, X_val, y_val, 
             epochs=50, batch_size=32, use_augmentation=True):
        """Train the model with all best practices."""
        
        if use_augmentation:
            datagen = self.create_data_augmentation()
            train_data = datagen.flow(X_train, y_train, batch_size=batch_size)
        else:
            train_data = (X_train, y_train)
        
        self.history = self.model.fit(
            train_data,
            epochs=epochs,
            validation_data=(X_val, y_val),
            callbacks=self.get_callbacks(),
            verbose=1
        )
        
        return self.history
    
    def evaluate(self, X_test, y_test):
        """Evaluate model on test set."""
        
        results = self.model.evaluate(X_test, y_test, verbose=0)
        metrics = dict(zip(self.model.metrics_names, results))
        
        print("\n" + "="*60)
        print("TEST SET EVALUATION")
        print("="*60)
        for metric, value in metrics.items():
            if 'acc' in metric:
                print(f"{metric}: {value*100:.2f}%")
            else:
                print(f"{metric}: {value:.4f}")
        print("="*60)
        
        return metrics

# Demonstrate usage
print("\nProduction Pipeline Example:")
print("="*60)
pipeline = CNNProductionPipeline(base_model='MobileNetV2', num_classes=10)
model_prod = pipeline.build_model()
pipeline.compile_model(learning_rate=0.001)

print("\n✓ Production pipeline initialized")
print("\nPipeline includes:")
print("  • Transfer learning with pre-trained weights")
print("  • Data augmentation")
print("  • Model checkpointing")
print("  • Early stopping")
print("  • Learning rate scheduling")
print("  • TensorBoard logging")
print("  • Multiple evaluation metrics (accuracy, top-5 accuracy)")

---

## 9. Summary & Key Takeaways

### What We Learned:

**CNN Fundamentals:**
- Convolutional layers detect local patterns using learnable filters
- Pooling layers reduce spatial dimensions and provide translation invariance
- CNNs leverage local connectivity and parameter sharing for efficiency

**Building CNNs:**
- Start simple, add complexity as needed
- Stack conv-pool blocks followed by dense layers
- Use ReLU activation and dropout for regularization

**Transfer Learning:**
- Pre-trained models save massive computational resources
- Two-stage approach: feature extraction → fine-tuning
- Critical for small datasets and limited compute

**Production Best Practices:**
- Always use data augmentation for better generalization
- Implement proper callbacks (early stopping, LR scheduling)
- Monitor multiple metrics and validate on held-out data
- Choose architecture based on deployment constraints

### When to Use CNNs (2025):

**✅ Excellent for:**
- Image classification and object detection
- Medical imaging analysis
- Video analysis and action recognition
- Any task with spatial or grid-like data

**⚠️ Consider alternatives:**
- Vision Transformers for very large datasets (>14M images)
- Graph Neural Networks for non-grid structured data
- RNNs/Transformers for sequential data (time series, text)

### Next Steps:

- **Lesson 9b**: RNNs & Sequences - Learn about sequential data processing
- **Lesson 9c**: Transformers & Attention - Master the architecture dominating AI in 2025
- **Advanced topics**: Object detection (YOLO, Faster R-CNN), semantic segmentation (U-Net), GANs

---

## 10. Exercises & Further Exploration

### Exercise 1: Build a Custom CNN
Modify the simple CNN architecture to improve MNIST accuracy beyond 99%. Try:
- Adding more convolutional layers
- Using batch normalization
- Experimenting with different filter sizes

### Exercise 2: Transfer Learning on Your Data
Apply transfer learning to a real-world dataset:
- Download a small image dataset (e.g., Cats vs Dogs, Flowers)
- Use different pre-trained models (VGG16, ResNet50, EfficientNet)
- Compare results and computational requirements

### Exercise 3: Data Augmentation Impact
Train the same model with and without data augmentation:
- Measure accuracy difference
- Visualize training curves
- Analyze overfitting behavior

### Further Reading:

- **Original Papers**:
  - AlexNet: "ImageNet Classification with Deep CNNs" (Krizhevsky et al., 2012)
  - ResNet: "Deep Residual Learning" (He et al., 2015)
  - EfficientNet: "Rethinking Model Scaling" (Tan & Le, 2019)

- **Resources**:
  - Stanford CS231n: Convolutional Neural Networks
  - fast.ai Practical Deep Learning course
  - TensorFlow/Keras documentation and tutorials

---

**Congratulations!** You now understand the fundamentals of CNNs and transfer learning. These skills are essential for computer vision tasks and form the foundation for many modern AI applications.

Continue to **Lesson 9b: RNNs & Sequences** to learn about processing sequential data! 🚀