<a href="https://colab.research.google.com/github/profliuhao/CSIT599/blob/main/CSIT599_module3_Pytorch_for_Advanced_Computer_Vision_Exercise_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## CNN Architecture Evolution: From LeNet to ResNet - PyTorch Version
### Exercise for Students - SEQUENTIAL LEARNING APPROACH

This exercise demonstrates the evolution of CNN architectures using PyTorch:
1. Traditional CNN (Basic vanilla CNN)
2. LeNet-5 (1998) - The pioneer
3. AlexNet (2012) - Deep learning breakthrough
4. VGG-16 (2014) - Deep and uniform
5. Inception/GoogLeNet (2014) - Multi-scale features
6. ResNet (2015) - Skip connections revolution

BONUS: Using pre-built models from torchvision and Hugging Face

Dataset: CIFAR-10 (32x32 color images, 10 classes)

Instructions:
1. Fill in the blanks marked with "# TODO: STUDENT FILL IN"
2. Run each architecture sequentially
3. Compare custom vs pre-built implementations
4. Explore Hugging Face vision models

Requirements:
pip install torch torchvision transformers pillow matplotlib

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
import time
from sklearn.metrics import classification_report

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Check device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

In [None]:

# ============================================================================
# STEP 1: DATA PREPARATION - CIFAR-10
# ============================================================================

def load_cifar10(batch_size=128):
    """
    Load and preprocess CIFAR-10 dataset.

    PyTorch uses transforms for data preprocessing and augmentation.

    Returns:
        tuple: (train_loader, test_loader, classes)
    """
    print("STEP 1: LOADING CIFAR-10 DATASET")
    print("="*50)

    # Define transforms
    # TODO: STUDENT FILL IN
    # transforms.ToTensor() converts images to tensors and normalizes to [0,1]
    # transforms.Normalize with mean=[0.5, 0.5, 0.5] and std=[0.5, 0.5, 0.5]
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((________, ________, ________),   # mean for RGB
                           (________, ________, ________))     # std for RGB
    ])

    # Load training data
    trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                            download=True, transform=transform)
    trainloader = DataLoader(trainset, batch_size=batch_size,
                            shuffle=True, num_workers=2)

    # Load test data
    testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                           download=True, transform=transform)
    testloader = DataLoader(testset, batch_size=batch_size,
                           shuffle=False, num_workers=2)

    classes = ('plane', 'car', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck')

    print(f"Training samples: {len(trainset)}")
    print(f"Test samples: {len(testset)}")
    print(f"Classes: {classes}")
    print(f"Batch size: {batch_size}")

    return trainloader, testloader, classes

def visualize_cifar10_samples(trainloader, classes):
    """Visualize sample images from CIFAR-10."""
    # Get a batch of images
    dataiter = iter(trainloader)
    images, labels = next(dataiter)

    # Convert images for display
    images = images / 2 + 0.5  # Unnormalize

    # Plot
    fig, axes = plt.subplots(4, 8, figsize=(12, 6))
    for i, ax in enumerate(axes.flat):
        if i < len(images):
            # Convert from (C, H, W) to (H, W, C)
            img = images[i].permute(1, 2, 0).numpy()
            ax.imshow(img)
            ax.set_title(classes[labels[i]], fontsize=8)
        ax.axis('off')

    plt.suptitle('CIFAR-10 Sample Images')
    plt.tight_layout()
    plt.show()


In [None]:
# ============================================================================
# STEP 2: TRADITIONAL CNN (BASELINE)
# ============================================================================

class TraditionalCNN(nn.Module):
    """
    Traditional/Vanilla CNN as baseline.

    PyTorch uses nn.Module as base class for all neural networks.
    Define layers in __init__ and forward pass in forward().
    """
    def __init__(self):
        super(TraditionalCNN, self).__init__()

        # TODO: STUDENT FILL IN
        # Define convolutional layers
        # nn.Conv2d(in_channels, out_channels, kernel_size)
        # First conv: 3 input channels (RGB), 32 output channels, 3x3 kernel
        self.conv1 = nn.Conv2d(________, ________, ________, padding=1)
        self.pool = nn.MaxPool2d(2, 2)

        # TODO: STUDENT FILL IN
        # Second conv: 32 input channels, 64 output channels, 3x3 kernel
        self.conv2 = nn.Conv2d(________, ________, ________, padding=1)

        # TODO: STUDENT FILL IN
        # Third conv: 64 input channels, 128 output channels, 3x3 kernel
        self.conv3 = nn.Conv2d(________, ________, ________, padding=1)

        # TODO: STUDENT FILL IN
        # Fully connected layers
        # nn.Linear(input_features, output_features)
        self.fc1 = nn.Linear(128 * 4 * 4, ________)  # 128 units
        self.fc2 = nn.Linear(________, ________)     # 128 to 10 classes

    def forward(self, x):
        """
        Forward pass defines the computation performed at every call.

        Args:
            x: Input tensor of shape (batch_size, 3, 32, 32)

        Returns:
            Output tensor of shape (batch_size, 10)
        """
        # TODO: STUDENT FILL IN
        # Apply conv1, ReLU, pooling
        # F.relu() is the ReLU activation function
        x = self.pool(F.relu(self.conv1(________)))

        # TODO: STUDENT FILL IN
        # Apply conv2, ReLU, pooling
        x = self.pool(F.relu(self.conv2(________)))

        # TODO: STUDENT FILL IN
        # Apply conv3, ReLU, pooling
        x = self.pool(F.relu(self.conv3(________)))

        # TODO: STUDENT FILL IN
        # Flatten: view(-1, ...) reshapes the tensor
        x = x.view(-1, ________ * ________ * ________)  # 128 * 4 * 4

        # TODO: STUDENT FILL IN
        # Apply fc1 with ReLU
        x = F.relu(self.fc1(________))

        # TODO: STUDENT FILL IN
        # Apply fc2 (no activation - done in loss function)
        x = self.fc2(________)

        return x

def create_traditional_cnn():
    """Create and initialize Traditional CNN."""
    print("\nSTEP 2: BUILDING TRADITIONAL CNN (BASELINE)")
    print("="*50)

    model = TraditionalCNN().to(device)

    # Print model architecture
    print("\nTraditional CNN Architecture:")
    print(model)

    # Count parameters
    total_params = sum(p.numel() for p in model.parameters())
    print(f"\nTotal parameters: {total_params:,}")

    return model

In [None]:

# ============================================================================
# STEP 3: LeNet-5 (1998) - THE PIONEER
# ============================================================================

class LeNet5(nn.Module):
    """
    LeNet-5 architecture (adapted for CIFAR-10).

    Original LeNet-5 (1998) by Yann LeCun.
    First successful CNN architecture.
    """
    def __init__(self):
        super(LeNet5, self).__init__()

        # Convolutional layers (using original tanh activation)
        self.conv1 = nn.Conv2d(3, 6, 5, padding=2)
        self.conv2 = nn.Conv2d(6, 16, 5)

        # Fully connected layers
        self.fc1 = nn.Linear(16 * 6 * 6, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Use tanh (original activation) instead of ReLU
        x = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
        x = F.max_pool2d(torch.tanh(self.conv2(x)), 2)
        x = x.view(-1, 16 * 6 * 6)
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x))
        x = self.fc3(x)
        return x

def create_lenet5():
    """Create LeNet-5 model."""
    print("\nSTEP 3: BUILDING LeNet-5 (1998)")
    print("="*50)
    print("Innovation: First successful CNN architecture!")

    model = LeNet5().to(device)
    print("\nLeNet-5 Architecture:")
    print(model)

    total_params = sum(p.numel() for p in model.parameters())
    print(f"\nTotal parameters: {total_params:,}")
    print("Note: Using tanh activation (original) instead of ReLU")

    return model

In [None]:
# ============================================================================
# STEP 4: AlexNet (2012) - DEEP LEARNING BREAKTHROUGH
# ============================================================================

class AlexNet(nn.Module):
    """
    AlexNet architecture (adapted for CIFAR-10).

    Original AlexNet (2012) - Deep learning revolution.
    Key innovations: ReLU + Dropout.
    """
    def __init__(self):
        super(AlexNet, self).__init__()

        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(96, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Dropout(0.5),  # KEY: Dropout introduced!
            nn.Linear(256 * 4 * 4, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, 10),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)  # Flatten
        x = self.classifier(x)
        return x

def create_alexnet():
    """Create AlexNet model."""
    print("\nSTEP 4: BUILDING AlexNet (2012)")
    print("="*50)
    print("Innovation: ReLU activation + Dropout + Deep network!")

    model = AlexNet().to(device)
    print("\nAlexNet Architecture:")
    print(model)

    total_params = sum(p.numel() for p in model.parameters())
    print(f"\nTotal parameters: {total_params:,}")
    print("Key innovations: ReLU + Dropout!")

    return model

In [None]:
# ============================================================================
# STEP 5: VGG-16 (2014) - DEEP AND UNIFORM
# ============================================================================

class VGG16(nn.Module):
    """
    VGG-16 architecture (simplified for CIFAR-10).

    Key innovation: Very deep network with uniform 3x3 filters.
    """
    def __init__(self):
        super(VGG16, self).__init__()

        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 2
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 3
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Linear(256 * 4 * 4, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(512, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

def create_vgg16():
    """Create VGG-16 model."""
    print("\nSTEP 5: BUILDING VGG-16 (2014)")
    print("="*50)
    print("Innovation: Very deep network + Uniform 3x3 filters!")

    model = VGG16().to(device)
    print("\nVGG-16 Architecture:")
    print(model)

    total_params = sum(p.numel() for p in model.parameters())
    print(f"\nTotal parameters: {total_params:,}")
    print("Key innovation: Very deep with uniform 3x3 convolutions!")

    return model

In [None]:
# ============================================================================
# STEP 6: Inception Module and GoogLeNet
# ============================================================================

class InceptionModule(nn.Module):
    """
    Inception module for multi-scale feature extraction.

    Key innovation: Parallel convolutions at different scales.
    """
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(InceptionModule, self).__init__()

        # 1x1 convolution branch
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_channels, ch1x1, kernel_size=1),
            nn.ReLU(inplace=True)
        )

        # 3x3 convolution branch
        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, ch3x3red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1),
            nn.ReLU(inplace=True)
        )

        # 5x5 convolution branch
        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, ch5x5red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch5x5red, ch5x5, kernel_size=5, padding=2),
            nn.ReLU(inplace=True)
        )

        # Max pooling branch
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_proj, kernel_size=1),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        # TODO: STUDENT FILL IN
        # Apply all branches and concatenate along channel dimension
        branch1 = self.branch1(________)
        branch2 = self.branch2(________)
        branch3 = self.branch3(________)
        branch4 = self.branch4(________)

        # Concatenate along channel dimension (dim=1)
        outputs = torch.cat([________, ________, ________, ________], dim=________)
        return outputs

class InceptionNet(nn.Module):
    """Inception/GoogLeNet architecture (simplified for CIFAR-10)."""
    def __init__(self):
        super(InceptionNet, self).__init__()

        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        # Inception modules
        self.inception3a = InceptionModule(64, 64, 96, 128, 16, 32, 32)
        self.inception3b = InceptionModule(256, 128, 128, 192, 32, 96, 64)
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)

        self.inception4a = InceptionModule(480, 192, 96, 208, 16, 48, 64)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(512, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool(x)
        x = self.inception4a(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.dropout(x)
        x = self.fc(x)
        return x

def create_inception():
    """Create Inception model."""
    print("\nSTEP 6: BUILDING Inception/GoogLeNet (2014)")
    print("="*50)
    print("Innovation: Multi-scale feature extraction with Inception modules!")

    model = InceptionNet().to(device)
    print("\nInception Architecture:")
    print(model)

    total_params = sum(p.numel() for p in model.parameters())
    print(f"\nTotal parameters: {total_params:,}")
    print("Key innovation: Parallel multi-scale convolutions!")

    return model

In [None]:
# ============================================================================
# STEP 7: ResNet - SKIP CONNECTIONS REVOLUTION
# ============================================================================

class ResidualBlock(nn.Module):
    """
    Residual block with skip connection.

    Key innovation: x + F(x) instead of just F(x).
    Enables training very deep networks.
    """
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()

        # TODO: STUDENT FILL IN
        # Main path: two conv layers
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)

        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # Shortcut connection
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                         stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        # TODO: STUDENT FILL IN
        # Main path
        out = F.relu(self.bn1(self.conv1(________)))
        out = self.bn2(self.conv2(________))

        # TODO: STUDENT FILL IN
        # Add skip connection: out + shortcut
        out += self.shortcut(________)
        out = F.relu(________)

        return out

class ResNet(nn.Module):
    """ResNet architecture (simplified for CIFAR-10)."""
    def __init__(self):
        super(ResNet, self).__init__()

        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True)
        )

        # Residual blocks
        self.layer1 = self._make_layer(64, 64, 2, stride=1)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(256, 10)

    def _make_layer(self, in_channels, out_channels, num_blocks, stride):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels, 1))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

def create_resnet():
    """Create ResNet model."""
    print("\nSTEP 7: BUILDING ResNet (2015)")
    print("="*50)
    print("Innovation: Skip connections enable very deep networks!")

    model = ResNet().to(device)
    print("\nResNet Architecture:")
    print(model)

    total_params = sum(p.numel() for p in model.parameters())
    print(f"\nTotal parameters: {total_params:,}")
    print("Key innovation: Skip connections (x + F(x))!")

    return model

In [None]:
# ============================================================================
# BONUS: USING PRE-BUILT MODELS FROM TORCHVISION
# ============================================================================

def use_pretrained_resnet():
    """
    Load pre-trained ResNet from torchvision.

    torchvision provides pre-trained models on ImageNet.
    We need to modify the final layer for CIFAR-10 (10 classes).
    """
    print("\nBONUS: USING PRE-TRAINED ResNet from torchvision")
    print("="*50)

    # Load pre-trained ResNet18
    model = torchvision.models.resnet18(pretrained=True)

    # Modify final layer for CIFAR-10 (10 classes instead of 1000)
    num_features = model.fc.in_features
    model.fc = nn.Linear(num_features, 10)

    model = model.to(device)

    print("Pre-trained ResNet18 loaded and modified for CIFAR-10")
    print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

    return model

def use_pretrained_vgg():
    """Load pre-trained VGG from torchvision."""
    print("\nBONUS: USING PRE-TRAINED VGG from torchvision")
    print("="*50)

    model = torchvision.models.vgg16(pretrained=True)

    # Modify classifier for CIFAR-10
    model.classifier[6] = nn.Linear(4096, 10)

    model = model.to(device)

    print("Pre-trained VGG16 loaded and modified for CIFAR-10")
    print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

    return model

In [None]:
# ============================================================================
# BONUS: USING HUGGING FACE TRANSFORMERS FOR VISION
# ============================================================================

def use_huggingface_vit():
    """
    Use Vision Transformer (ViT) from Hugging Face.

    Note: ViT is not a CNN - it's a transformer-based architecture.
    This shows the modern direction of computer vision.

    Requires: pip install transformers
    """
    print("\nBONUS: USING Vision Transformer from Hugging Face")
    print("="*50)

    try:
        from transformers import ViTForImageClassification, ViTConfig

        # Create ViT configuration for CIFAR-10
        config = ViTConfig(
            image_size=32,
            patch_size=4,
            num_channels=3,
            num_labels=10,
            hidden_size=384,
            num_hidden_layers=6,
            num_attention_heads=6
        )

        # Create model
        model = ViTForImageClassification(config)
        model = model.to(device)

        print("Vision Transformer (ViT) loaded from Hugging Face")
        print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")
        print("\nNote: ViT uses transformers (attention), not convolutions!")
        print("This represents the modern evolution beyond CNNs.")

        return model
    except ImportError:
        print("Hugging Face transformers not installed.")
        print("Install with: pip install transformers")
        return None

In [None]:
# ============================================================================
# TRAINING AND EVALUATION FUNCTIONS
# ============================================================================

def train_model(model, trainloader, testloader, epochs=10, lr=0.001):
    """
    Train a PyTorch model.

    Args:
        model: PyTorch model
        trainloader: Training data loader
        testloader: Test data loader
        epochs: Number of epochs
        lr: Learning rate

    Returns:
        tuple: (training_history, training_time)
    """
    print(f"\nTraining {model.__class__.__name__}...")

    # Define loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    # Training history
    history = {'train_loss': [], 'train_acc': [], 'test_loss': [], 'test_acc': []}

    start_time = time.time()

    for epoch in range(epochs):
        # Training phase
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0

        for i, (inputs, labels) in enumerate(trainloader):
            inputs, labels = inputs.to(device), labels.to(device)

            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward + backward + optimize
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # Statistics
            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

        train_loss = running_loss / len(trainloader)
        train_acc = correct / total

        # Evaluation phase
        model.eval()
        test_loss = 0.0
        correct = 0
        total = 0

        with torch.no_grad():
            for inputs, labels in testloader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, labels)

                test_loss += loss.item()
                _, predicted = outputs.max(1)
                total += labels.size(0)
                correct += predicted.eq(labels).sum().item()

        test_loss = test_loss / len(testloader)
        test_acc = correct / total

        # Store history
        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['test_loss'].append(test_loss)
        history['test_acc'].append(test_acc)

        print(f'Epoch [{epoch+1}/{epochs}] '
              f'Train Loss: {train_loss:.4f} Acc: {train_acc:.4f} | '
              f'Test Loss: {test_loss:.4f} Acc: {test_acc:.4f}')

    training_time = time.time() - start_time
    print(f"Training completed in {training_time:.2f} seconds")

    return history, training_time

def evaluate_model(model, testloader):
    """Evaluate model on test set."""
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in testloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    accuracy = correct / total
    print(f"Test Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")

    return accuracy

In [None]:
# ============================================================================
# COMPARISON AND VISUALIZATION
# ============================================================================

def compare_models(results):
    """Compare all trained models."""
    print("\n" + "="*60)
    print("MODEL COMPARISON")
    print("="*60)

    print(f"\n{'Model':<20} {'Accuracy':<12} {'Time (s)':<12} {'Params':<12}")
    print("-" * 60)

    for name, data in results.items():
        print(f"{name:<20} {data['accuracy']:<12.4f} "
              f"{data['time']:<12.2f} {data['params']:<12,}")

    # Rankings
    sorted_by_acc = sorted(results.items(), key=lambda x: x[1]['accuracy'], reverse=True)
    print("\n🏆 Accuracy Rankings:")
    for i, (name, data) in enumerate(sorted_by_acc, 1):
        print(f"  {i}. {name}: {data['accuracy']:.4f}")

def plot_comparison(results):
    """Plot comparison charts."""
    models = list(results.keys())
    accuracies = [results[m]['accuracy'] for m in models]
    params = [results[m]['params'] / 1e6 for m in models]
    times = [results[m]['time'] for m in models]

    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

    # Accuracy
    colors = ['red', 'orange', 'yellow', 'lightgreen', 'cyan', 'blue']
    ax1.bar(models, accuracies, color=colors)
    ax1.set_title('Test Accuracy Comparison')
    ax1.set_ylabel('Accuracy')
    ax1.set_ylim([0, 1])
    ax1.tick_params(axis='x', rotation=45)

    # Parameters
    ax2.bar(models, params, color=colors)
    ax2.set_title('Model Complexity (Parameters)')
    ax2.set_ylabel('Parameters (Millions)')
    ax2.tick_params(axis='x', rotation=45)

    # Training time
    ax3.bar(models, times, color=colors)
    ax3.set_title('Training Time (10 epochs)')
    ax3.set_ylabel('Time (seconds)')
    ax3.tick_params(axis='x', rotation=45)

    # Accuracy vs Parameters
    ax4.scatter(params, accuracies, c=range(len(models)), cmap='viridis', s=200)
    for i, model in enumerate(models):
        ax4.annotate(model, (params[i], accuracies[i]), fontsize=8)
    ax4.set_xlabel('Parameters (Millions)')
    ax4.set_ylabel('Accuracy')
    ax4.set_title('Accuracy vs Model Complexity')
    ax4.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('pytorch_cnn_comparison.png', dpi=150)
    plt.show()

In [None]:
# ============================================================================
# MAIN EXECUTION
# ============================================================================

if __name__ == "__main__":
    print("CNN ARCHITECTURE EVOLUTION - PyTorch Version")
    print("="*60)

    # STEP 1: Load and prepare data
    print("\n📥 Downloading and preparing CIFAR-10 dataset...")
    print("(This may take a few minutes on first run)")
    trainloader, testloader, classes = load_cifar10(batch_size=128)
    visualize_cifar10_samples(trainloader, classes)

    # Dictionary to store results
    results = {}

    # Define architectures
    architectures = [
        ("Traditional CNN", create_traditional_cnn),
        ("LeNet-5", create_lenet5),
        ("AlexNet", create_alexnet),
        ("VGG-16", create_vgg16),
        ("Inception", create_inception),
        ("ResNet", create_resnet)
    ]

    EPOCHS = 10

    # Train each architecture
    for name, create_func in architectures:
        print(f"\n{'='*60}")
        print(f"TRAINING: {name}")
        print(f"{'='*60}")

        model = create_func()
        history, training_time = train_model(model, trainloader, testloader,
                                            epochs=EPOCHS)
        accuracy = evaluate_model(model, testloader)

        results[name] = {
            'accuracy': accuracy,
            'time': training_time,
            'params': sum(p.numel() for p in model.parameters()),
            'history': history
        }

        print(f"✅ {name} completed!")

    # Compare all models
    compare_models(results)
    plot_comparison(results)

    print("\n🎉 EXERCISE COMPLETED!")

    # Bonus: Show how to use pre-built models
    print("\n" + "="*60)
    print("BONUS: PRE-BUILT MODELS")
    print("="*60)
    print("\nTo use pre-built models from torchvision:")
    print("  model = torchvision.models.resnet18(pretrained=True)")
    print("  # Modify final layer for your task")
    print("\nTo use Hugging Face transformers:")
    print("  from transformers import ViTForImageClassification")
    print("  model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')")




## 🎓 PYTORCH-SPECIFIC LEARNING POINTS:

1. PyTorch Model Structure:
   - Define layers in __init__()
   - Implement forward pass in forward()
   - Use nn.Module as base class

2. Training Loop:
   - model.train() for training mode
   - model.eval() for evaluation mode
   - optimizer.zero_grad() before backward()
   - loss.backward() for backpropagation
   - optimizer.step() to update weights

3. Data Loading:
   - Use torch.utils.data.DataLoader
   - transforms for preprocessing
   - Automatic batching and shuffling

4. GPU Acceleration:
   - .to(device) to move model/data to GPU
   - Automatic GPU usage if available

5. Pre-built Models:
   - torchvision.models for CNNs
   - transformers library for modern architectures
   - Transfer learning capabilities

🔧 REQUIREMENTS:
pip install torch torchvision transformers pillow matplotlib scikit-learn

🚀 NEXT STEPS:
1. Try transfer learning with pre-trained models
2. Experiment with data augmentation
3. Implement custom loss functions
4. Explore model ensembling
5. Try Vision Transformers (ViT)
