# Convolutional Neural Networks (CNNs) in PyTorch

This notebook provides a comprehensive guide to understanding and implementing Convolutional Neural Networks (CNNs) using PyTorch. CNNs are specialized neural networks designed for processing data with grid-like topology, particularly effective for image analysis.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
import matplotlib.pyplot as plt
import numpy as np
import time
import copy
from PIL import Image
from torchvision.utils import make_grid
import os

# Set random seed for reproducibility
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## 1. Introduction to CNNs

CNNs are designed to automatically and adaptively learn spatial hierarchies of features from input images. They use three key concepts:

- **Local Receptive Fields**: Each neuron is connected to only a small region of the input
- **Shared Weights**: The same filter is used across different spatial locations 
- **Pooling**: Summarizes features in neighborhoods for translation invariance

Let's explore the core components that make CNNs so effective for computer vision tasks.

## 2. CNN Components Explained

In [None]:
# Create a sample input to demonstrate CNN components
sample_input = torch.randn(1, 1, 28, 28)  # Batch size, channels, height, width
print(f"Sample input shape: {sample_input.shape}")

# Demonstrate convolutional layer
conv_layer = nn.Conv2d(1, 16, kernel_size=3, stride=1, padding=1)
conv_output = conv_layer(sample_input)
print(f"Conv2d: Input {sample_input.shape} -> Output {conv_output.shape}")
print(f"Conv2d parameters: {sum(p.numel() for p in conv_layer.parameters())}")

# Demonstrate batch normalization
bn_layer = nn.BatchNorm2d(16)
bn_output = bn_layer(conv_output)
print(f"BatchNorm2d: Input {conv_output.shape} -> Output {bn_output.shape}")

# Demonstrate activation function
relu_output = F.relu(bn_output)
print(f"ReLU: Input {bn_output.shape} -> Output {relu_output.shape}")

# Demonstrate max pooling
pool_layer = nn.MaxPool2d(kernel_size=2, stride=2)
pool_output = pool_layer(relu_output)
print(f"MaxPool2d: Input {relu_output.shape} -> Output {pool_output.shape}")

# Demonstrate adaptive average pooling
avgpool_layer = nn.AdaptiveAvgPool2d((1, 1))
avgpool_output = avgpool_layer(pool_output)
print(f"AdaptiveAvgPool2d: Input {pool_output.shape} -> Output {avgpool_output.shape}")

# Demonstrate flattening for fully connected layers
flattened = torch.flatten(avgpool_output, 1)
print(f"Flatten: Input {avgpool_output.shape} -> Output {flattened.shape}")

## 3. Building a Complete CNN Architecture

In [None]:
class CNNComponents(nn.Module):
    """A comprehensive CNN demonstrating various architectural components."""
    def __init__(self, in_channels=1, num_classes=10):
        super(CNNComponents, self).__init__()
        
        # Convolutional Layer 1
        self.conv1 = nn.Conv2d(in_channels, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)  # Batch Normalization
        
        # Convolutional Layer 2
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)  # Batch Normalization
        
        # Max Pooling
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Global Average Pooling
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.25)
        
        # Fully Connected Layers
        self.fc1 = nn.Linear(64, 128)
        self.fc2 = nn.Linear(128, num_classes)
    
    def forward(self, x):
        # First convolutional block
        x = self.conv1(x)              # Convolution
        x = self.bn1(x)                # Batch Normalization
        x = F.relu(x)                  # ReLU Activation
        x = self.pool(x)               # Max Pooling
        
        # Second convolutional block
        x = self.conv2(x)              # Convolution
        x = self.bn2(x)                # Batch Normalization
        x = F.relu(x)                  # ReLU Activation
        x = self.pool(x)               # Max Pooling
        
        # Global average pooling
        x = self.avgpool(x)            # Average Pooling
        x = torch.flatten(x, 1)        # Flatten
        
        # Fully connected layers with dropout
        x = self.dropout(x)            # Dropout
        x = self.fc1(x)                # Fully Connected
        x = F.relu(x)                  # ReLU Activation
        x = self.dropout(x)            # Dropout
        x = self.fc2(x)                # Output Layer
        
        return x

# Create and inspect the model
model = CNNComponents(in_channels=1, num_classes=10).to(device)
print("CNN Architecture:")
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")

# Test with a sample input
sample_input = torch.randn(2, 1, 28, 28).to(device)
output = model(sample_input)
print(f"\nInput shape: {sample_input.shape}")
print(f"Output shape: {output.shape}")

## 4. Data Preparation for CNNs

CNNs require proper data preprocessing and augmentation for optimal performance:

In [None]:
# Define transforms for training and testing
train_transform = transforms.Compose([
    transforms.RandomRotation(10),                    # Random rotation ±10 degrees
    transforms.RandomHorizontalFlip(p=0.5),          # Random horizontal flip
    transforms.ToTensor(),                            # Convert to tensor
    transforms.Normalize((0.1307,), (0.3081,))       # MNIST normalization
])

test_transform = transforms.Compose([
    transforms.ToTensor(),                            # Convert to tensor
    transforms.Normalize((0.1307,), (0.3081,))       # MNIST normalization
])

# Load MNIST dataset
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=train_transform)
test_dataset = datasets.MNIST('./data', train=False, transform=test_transform)

# Create data loaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Training batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")

# Visualize some samples
def visualize_samples(dataset, num_samples=8):
    plt.figure(figsize=(12, 3))
    for i in range(num_samples):
        image, label = dataset[i]
        plt.subplot(1, num_samples, i + 1)
        plt.imshow(image.squeeze(), cmap='gray')
        plt.title(f'Label: {label}')
        plt.axis('off')
    plt.tight_layout()
    plt.show()

visualize_samples(train_dataset)

## 5. Training the CNN

In [None]:
# Initialize model, loss function, and optimizer
model = CNNComponents(in_channels=1, num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function
def train_model(model, train_loader, criterion, optimizer, num_epochs=2):
    model.train()
    train_losses = []
    train_accuracies = []
    
    for epoch in range(num_epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        
        for i, (images, labels) in enumerate(train_loader):
            images, labels = images.to(device), labels.to(device)
            
            # Zero gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            # Backward pass and optimization
            loss.backward()
            optimizer.step()
            
            # Statistics
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            # Print progress
            if (i + 1) % 200 == 0:
                print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], '
                      f'Loss: {loss.item():.4f}, Accuracy: {100 * correct / total:.2f}%')
        
        # Calculate epoch statistics
        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = 100 * correct / total
        train_losses.append(epoch_loss)
        train_accuracies.append(epoch_accuracy)
        
        print(f'Epoch [{epoch+1}/{num_epochs}] completed: '
              f'Average Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%')
    
    return train_losses, train_accuracies

# Train the model
print("Starting CNN training...")
train_losses, train_accuracies = train_model(model, train_loader, criterion, optimizer, num_epochs=2)
print("Training completed!")

## 6. Model Evaluation and Testing

In [None]:
# Evaluation function
def evaluate_model(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    class_correct = list(0. for i in range(10))
    class_total = list(0. for i in range(10))
    
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            # Per-class accuracy
            c = (predicted == labels).squeeze()
            for i in range(labels.size(0)):
                label = labels[i]
                class_correct[label] += c[i].item()
                class_total[label] += 1
    
    overall_accuracy = 100 * correct / total
    print(f'Overall Test Accuracy: {overall_accuracy:.2f}%')
    
    # Per-class accuracy
    print('\nPer-class Accuracy:')
    for i in range(10):
        if class_total[i] > 0:
            class_acc = 100 * class_correct[i] / class_total[i]
            print(f'Class {i}: {class_acc:.2f}%')
    
    return overall_accuracy

# Evaluate the trained model
test_accuracy = evaluate_model(model, test_loader)

# Plot training progress
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_losses)
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(train_accuracies)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.grid(True)

plt.tight_layout()
plt.show()

## 7. Filter Visualization

Let's visualize what the CNN has learned by examining the filters in the first convolutional layer:

In [None]:
def visualize_filters(model, layer_name="conv1"):
    """Visualize the filters of a specific layer in a CNN model."""
    for name, module in model.named_modules():
        if name == layer_name:
            weights = module.weight.data.cpu()
            
            # Normalize the weights for better visualization
            weights = (weights - weights.min()) / (weights.max() - weights.min())
            
            # Plot the filters
            plt.figure(figsize=(12, 6))
            num_filters = min(32, weights.shape[0])  # Show up to 32 filters
            num_cols = 8
            num_rows = (num_filters + num_cols - 1) // num_cols
            
            for i in range(num_filters):
                plt.subplot(num_rows, num_cols, i + 1)
                
                # For single-channel input (grayscale)
                if weights.shape[1] == 1:
                    plt.imshow(weights[i, 0], cmap='viridis')
                else:
                    # For multi-channel input (RGB)
                    plt.imshow(weights[i].permute(1, 2, 0))
                    
                plt.axis('off')
                plt.title(f'Filter {i+1}', fontsize=8)
            
            plt.suptitle(f'Filters in {layer_name}', fontsize=14)
            plt.tight_layout()
            plt.show()
            return
    
    print(f"Layer '{layer_name}' not found in the model.")

# Visualize the learned filters
visualize_filters(model, "conv1")

# Function to visualize feature maps
def visualize_feature_maps(model, input_image, layer_name="conv1"):
    """Visualize feature maps produced by a specific layer."""
    activation = {}
    
    def get_activation(name):
        def hook(model, input, output):
            activation[name] = output.detach()
        return hook
    
    # Register hook
    for name, module in model.named_modules():
        if name == layer_name:
            module.register_forward_hook(get_activation(name))
    
    # Forward pass
    model.eval()
    with torch.no_grad():
        _ = model(input_image.unsqueeze(0).to(device))
    
    # Get feature maps
    if layer_name in activation:
        feature_maps = activation[layer_name].cpu().squeeze(0)
        
        # Plot feature maps
        plt.figure(figsize=(15, 8))
        num_maps = min(16, feature_maps.shape[0])  # Show up to 16 feature maps
        num_cols = 8
        num_rows = (num_maps + num_cols - 1) // num_cols
        
        for i in range(num_maps):
            plt.subplot(num_rows, num_cols, i + 1)
            plt.imshow(feature_maps[i], cmap='viridis')
            plt.axis('off')
            plt.title(f'Map {i+1}', fontsize=8)
        
        plt.suptitle(f'Feature Maps from {layer_name}', fontsize=14)
        plt.tight_layout()
        plt.show()
    else:
        print(f"No activation captured for layer '{layer_name}'")

# Visualize feature maps for a sample image
sample_image, _ = test_dataset[0]
print(f"Visualizing feature maps for a sample image:")
plt.figure(figsize=(3, 3))
plt.imshow(sample_image.squeeze(), cmap='gray')
plt.title('Input Image')
plt.axis('off')
plt.show()

visualize_feature_maps(model, sample_image, "conv1")

## 8. Transfer Learning Example

Transfer learning allows us to leverage pre-trained models for our specific tasks:

In [None]:
# Example of transfer learning with a pre-trained ResNet
def create_transfer_learning_model(num_classes=10, pretrained=True):
    """Create a transfer learning model based on ResNet18."""
    # Load pre-trained ResNet18
    model = models.resnet18(pretrained=pretrained)
    
    # Modify the first convolutional layer to accept single-channel input (for MNIST)
    model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
    
    # Replace the final fully connected layer
    num_features = model.fc.in_features
    model.fc = nn.Linear(num_features, num_classes)
    
    return model

# Create transfer learning model
transfer_model = create_transfer_learning_model(num_classes=10, pretrained=True).to(device)
print("Transfer Learning Model (ResNet18-based):")
print(f"Total parameters: {sum(p.numel() for p in transfer_model.parameters())}")

# Feature extraction vs fine-tuning
def set_parameter_requires_grad(model, feature_extracting):
    \"\"\"Set requires_grad for model parameters based on feature extraction mode.\"\"\"
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False
        # Only train the final classifier
        for param in model.fc.parameters():
            param.requires_grad = True

# Example: Feature extraction mode
print("\\nSetting up for feature extraction (freezing pre-trained weights)...")
set_parameter_requires_grad(transfer_model, feature_extracting=True)

# Count trainable parameters
trainable_params = sum(p.numel() for p in transfer_model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in transfer_model.parameters())
print(f"Trainable parameters: {trainable_params}")
print(f"Total parameters: {total_params}")
print(f"Percentage of trainable parameters: {100 * trainable_params / total_params:.2f}%")

# Test the transfer learning model
sample_input = torch.randn(2, 1, 28, 28).to(device)
with torch.no_grad():
    output = transfer_model(sample_input)
print(f"\\nTransfer model output shape: {output.shape}")

## Conclusion

This notebook covered the fundamentals of Convolutional Neural Networks in PyTorch:

1. **CNN Architecture**: Understanding convolution, pooling, and fully connected layers
2. **Building CNNs**: Creating complete architectures with batch normalization and dropout
3. **Data Preprocessing**: Applying transforms and augmentation for better performance
4. **Training**: Implementing training loops with proper monitoring
5. **Evaluation**: Testing model performance and analyzing results
6. **Visualization**: Understanding what CNNs learn through filter and feature map visualization
7. **Transfer Learning**: Leveraging pre-trained models for improved performance

**Key Takeaways:**
- CNNs excel at capturing spatial hierarchies in images
- Data augmentation is crucial for generalization
- Batch normalization and dropout help with training stability and overfitting
- Transfer learning can significantly improve performance, especially with limited data
- Visualization helps understand what the network has learned

**Next Steps:**
- Experiment with different CNN architectures (ResNet, DenseNet, EfficientNet)
- Try different datasets (CIFAR-10, ImageNet)
- Explore advanced techniques like attention mechanisms
- Learn about object detection and segmentation architectures