# Recognizing UVA landmarks with neural nets (100 pts)

## Robert Clay Harris: jbm2rt

The UVA Grounds is known for its Jeffersonian architecture and place in U.S. history as a model for college and university campuses throughout the country.

In this assignment, you will attempt the build image recognition systems to classify different buildlings/landmarks on Grounds. You will implement various CNN architectures covered in
Chapters 7-8, including VGG blocks, Network-in-Network (NiN) with GAP,
Inception modules, and ResNet blocks. You'll also explore transfer learning
with pretrained models.

Total Points: 100 + 5 bonus points
- Part 1: Implement VGG-style blocks (15 pts)
- Part 2: Implement NiN with Global Average Pooling (15 pts)
- Part 3: Implement Inception modules (15 pts)
- Part 4: Implement ResNet blocks (15 pts)
- Part 5: Transfer Learning with Pretrained Models (20 pts)
- Part 6: Efficient Architectures (20 pts)
- Bonus: Achieve >94% accuracy on the test set (5 pts)


Dataset: UVA Landmarks with 18 classes

To make it easier for you, some codes have been provided to help you process the data, you may modify it to fit your needs. You must submit the .ipynb and pdf files via UVA Canvas with the following format: yourcomputingID_assignment_2.*

Best of luck, and have fun!

# Import Dataset
The full dataset is huge (+37GB) with +13K images of 18 classes. So it will take a while to download, extract, and process. To save you time and effort, a subset of the data has been resized and compressed to only 379Mb and stored in a Firebase server. This dataset will be the one you will benchmark for your grade.

In [18]:
"""
Dataset: UVA Landmarks with 18 classes
Submission: yourcomputingID_assignment_2.ipynb and pdf

IMPORTANT DESIGN PRINCIPLES:
1. Use BatchNorm after every Conv layer (before activation)
2. Use ReLU activation (inplace=True saves memory)
3. Use bias=False in Conv when followed by BatchNorm
4. Initialize weights properly for better convergence
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets, transforms, models
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
import zipfile
import urllib.request
from tqdm import tqdm

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# ============================================================================
# Data Loading and Preprocessing
# ============================================================================

def download_dataset():
    """Download and extract the UVA landmarks dataset."""
    url = "https://firebasestorage.googleapis.com/v0/b/uva-landmark-images.appspot.com/o/dataset.zip?alt=media&token=e1403951-30d6-42b8-ba4e-394af1a2ddb7"

    if not os.path.exists('dataset'):
        print("Downloading dataset...")
        urllib.request.urlretrieve(url, 'dataset.zip')
        print("Extracting dataset...")
        with zipfile.ZipFile('dataset.zip', 'r') as zip_ref:
            zip_ref.extractall('.')
        os.remove('dataset.zip')
    else:
        print("Dataset already exists.")

# Download dataset
download_dataset()

# Dataset parameters
data_dir = "dataset/"
batch_size = 32
img_height = 150
img_width = 150
num_classes = 18

# Class names for UVA landmarks
class_names = ['AcademicalVillage', 'AldermanLibrary', 'AlumniHall', 'AquaticFitnessCenter',
               'BavaroHall', 'BrooksHall', 'ClarkHall', 'MadisonHall', 'MinorHall',
               'NewCabellHall', 'NewcombHall', 'OldCabellHall', 'OlssonHall', 'RiceHall',
               'Rotunda', 'ScottStadium', 'ThorntonHall', 'UniversityChapel']

# Data transforms
train_transform = transforms.Compose([
    transforms.Resize((img_height, img_width)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transform = transforms.Compose([
    transforms.Resize((img_height, img_width)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Create datasets
full_dataset = datasets.ImageFolder(data_dir)
train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(
    full_dataset, [train_size, val_size],
    generator=torch.Generator().manual_seed(42)
)

# Apply transforms
train_dataset.dataset.transform = train_transform
val_dataset.dataset.transform = val_transform

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=8)
val_loader   = DataLoader(val_dataset,   batch_size=256, shuffle=False, num_workers=8)

print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(val_dataset)}")

Using device: cpu
Dataset already exists.
Training samples: 11428
Validation samples: 2858


In [19]:
# ============================================================================
# Part 1: VGG-Style Blocks (15 points)
# ============================================================================

class VGGBlock(nn.Module):
    """
    VGG-style block:
      - num_convs times: Conv(3x3, bias=False) -> BN -> ReLU
      - then MaxPool2d(2, 2)
    """
    def __init__(self, in_channels, out_channels, num_convs=2):
        super(VGGBlock, self).__init__()
        layers = []
        c_in = in_channels
        for _ in range(num_convs):
            layers += [
                nn.Conv2d(c_in, out_channels, kernel_size=3, padding=1, bias=False),
                nn.BatchNorm2d(out_channels),
                nn.ReLU(inplace=True),
            ]
            c_in = out_channels
        layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        self.block = nn.Sequential(*layers)

    def forward(self, x):
        return self.block(x)


class VGGNet(nn.Module):
    """
    Simplified VGG:
      stem: 3 -> 64 (Conv-BN-ReLU)
      blocks: 64->128, 128->256 (each halving spatial size)
      GAP -> Linear(256 -> num_classes)
    """
    def __init__(self, num_classes=18):
        super(VGGNet, self).__init__()
        stem = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
        )
        block1 = VGGBlock(64, 128, num_convs=2)
        block2 = VGGBlock(128, 256, num_convs=2)

        # feature extractor
        self.features = nn.Sequential(stem, block1, block2)

        # global average pooling and classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.features(x)            # conv features
        x = self.avgpool(x)             # (B, 256, 1, 1)
        x = torch.flatten(x, 1)         # (B, 256)
        x = self.classifier(x)          # logits
        return x

In [20]:
# ============================================================================
# Part 2: Network in Network (NiN) with GAP (15 points)
# ============================================================================

class NiNBlock(nn.Module):
    """
    NiN block:
      spatial conv (k=kernel_size, stride, padding)
      -> 1x1 conv
      -> 1x1 conv
    Each conv: Conv (bias=False) -> BN -> ReLU
    """
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        super(NiNBlock, self).__init__()
        self.conv_block = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size,
                      stride=stride, padding=padding, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),

            nn.Conv2d(out_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),

            nn.Conv2d(out_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
        )

    def forward(self, x):
        return self.conv_block(x)


class NiN(nn.Module):
    """
    Network-in-Network with GAP:
      NiN(3->96, k=11, s=4) -> MaxPool(3,2)
      NiN(96->256, k=5, p=2) -> MaxPool(3,2)
      NiN(256->384, k=3, p=1) -> MaxPool(3,2)
      NiN(384->num_classes, k=3, p=1)
      GAP -> logits (no FC needed)
    """
    def __init__(self, num_classes=18):
        super(NiN, self).__init__()
        self.features = nn.Sequential(
            NiNBlock(3, 96, kernel_size=11, stride=4, padding=0),
            nn.MaxPool2d(kernel_size=3, stride=2),

            NiNBlock(96, 256, kernel_size=5, stride=1, padding=2),
            nn.MaxPool2d(kernel_size=3, stride=2),

            NiNBlock(256, 384, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=3, stride=2),

            NiNBlock(384, num_classes, kernel_size=3, stride=1, padding=1),

            nn.AdaptiveAvgPool2d((1, 1)),  # GAP to 1x1
        )

    def forward(self, x):
        x = self.features(x)     # (B, num_classes, 1, 1)
        x = torch.flatten(x, 1)  # (B, num_classes)
        return x

In [21]:
# ============================================================================
# Part 3: Inception Module (15 points)
# ============================================================================

class InceptionBlock(nn.Module):
    """
    Inception block with 4 parallel branches:
      1) 1x1
      2) 1x1 -> 3x3
      3) 1x1 -> 5x5
      4) 3x3 maxpool -> 1x1
    Each conv: Conv(bias=False) -> BN -> ReLU
    """
    def __init__(self, in_channels, ch1x1, ch3x3_reduce, ch3x3, ch5x5_reduce, ch5x5, pool_proj):
        super(InceptionBlock, self).__init__()

        # Branch 1: 1x1
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_channels, ch1x1, kernel_size=1, bias=False),
            nn.BatchNorm2d(ch1x1),
            nn.ReLU(inplace=True),
        )

        # Branch 2: 1x1 -> 3x3
        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, ch3x3_reduce, kernel_size=1, bias=False),
            nn.BatchNorm2d(ch3x3_reduce),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch3x3_reduce, ch3x3, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(ch3x3),
            nn.ReLU(inplace=True),
        )

        # Branch 3: 1x1 -> 5x5
        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, ch5x5_reduce, kernel_size=1, bias=False),
            nn.BatchNorm2d(ch5x5_reduce),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch5x5_reduce, ch5x5, kernel_size=5, padding=2, bias=False),
            nn.BatchNorm2d(ch5x5),
            nn.ReLU(inplace=True),
        )

        # Branch 4: 3x3 maxpool (s=1, p=1) -> 1x1
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_proj, kernel_size=1, bias=False),
            nn.BatchNorm2d(pool_proj),
            nn.ReLU(inplace=True),
        )

    def forward(self, x):
        b1 = self.branch1(x)
        b2 = self.branch2(x)
        b3 = self.branch3(x)
        b4 = self.branch4(x)
        return torch.cat([b1, b2, b3, b4], dim=1)


class SimpleGoogLeNet(nn.Module):
    """
    Simplified GoogLeNet with Inception modules.
    """
    def __init__(self, num_classes=18):
        super(SimpleGoogLeNet, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )

        # Inception3a: input=64, output=256 (64+128+32+32)
        self.inception3a = InceptionBlock(64, 64, 96, 128, 16, 32, 32)
        # Inception3b: input=256, output=480 (128+192+96+64)
        self.inception3b = InceptionBlock(256, 128, 128, 192, 32, 96, 64)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(480, num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

In [22]:
# ============================================================================
# Part 4: ResNet Blocks (15 points)
# ============================================================================

class BasicBlock(nn.Module):
    """
    Basic ResNet block:
      Main: 3x3(s=stride) -> BN -> ReLU -> 3x3 -> BN
      Skip: Identity or 1x1(s=stride) -> BN (when shape changes)
      Out:  Add -> ReLU
    """
    def __init__(self, in_channels, out_channels, stride=1):
        super(BasicBlock, self).__init__()
        # main path
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # skip path (projection if shape changes)
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                          stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out = out + self.shortcut(identity)
        out = self.relu(out)
        return out


class ResNet18(nn.Module):
    """
    ResNet-18:
      conv1 (7x7,s=2) -> maxpool
      layer1: 2x(64)
      layer2: 2x(128), downsample
      layer3: 2x(256), downsample
      layer4: 2x(512), downsample
      GAP -> FC
    """
    def __init__(self, num_classes=18):
        super(ResNet18, self).__init__()
        # stem
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
        )

        # residual stages
        self.layer1 = self._make_layer(64, 64,  num_blocks=2, stride=1)
        self.layer2 = self._make_layer(64, 128, num_blocks=2, stride=2)
        self.layer3 = self._make_layer(128, 256, num_blocks=2, stride=2)
        self.layer4 = self._make_layer(256, 512, num_blocks=2, stride=2)

        # head
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

    def _make_layer(self, in_channels, out_channels, num_blocks, stride):
        # first block may downsample / change channels
        blocks = [BasicBlock(in_channels, out_channels, stride=stride)]
        # remaining blocks keep same channels, stride=1
        for _ in range(1, num_blocks):
            blocks.append(BasicBlock(out_channels, out_channels, stride=1))
        return nn.Sequential(*blocks)

    def forward(self, x):
        x = self.conv1(x)
        x = self.layer1(x)   # 1/4 size
        x = self.layer2(x)   # 1/8
        x = self.layer3(x)   # 1/16
        x = self.layer4(x)   # 1/32
        x = self.avgpool(x)  # (B, 512, 1, 1)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

In [23]:
# ============================================================================
# Part 5: Transfer Learning (20 points)
# ============================================================================

def get_pretrained_model(model_name='resnet18', num_classes=18, feature_extract=True):
    """
    Return a pretrained model with its final classifier replaced for `num_classes`.
    If `feature_extract` is True, freeze all feature parameters.
    Supported: 'resnet18', 'vgg16', 'mobilenet_v2'
    """
    name = model_name.lower()

    if name == 'resnet18':
        model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
        if feature_extract:
            for p in model.parameters():
                p.requires_grad = False
        in_feats = model.fc.in_features
        model.fc = nn.Linear(in_feats, num_classes)
        return model

    elif name == 'vgg16':
        model = models.vgg16(weights=models.VGG16_Weights.DEFAULT)
        if feature_extract:
            for p in model.features.parameters():  # freeze conv backbone
                p.requires_grad = False
        # replace last linear layer
        in_feats = model.classifier[6].in_features
        model.classifier[6] = nn.Linear(in_feats, num_classes)
        return model

    elif name == 'mobilenet_v2':
        model = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.DEFAULT)
        if feature_extract:
            for p in model.parameters():
                p.requires_grad = False
        # replace last linear layer in classifier
        in_feats = model.classifier[1].in_features
        model.classifier[1] = nn.Linear(in_feats, num_classes)
        return model

    else:
        raise ValueError(f"Unsupported model_name '{model_name}'. "
                         "Choose from: 'resnet18', 'vgg16', 'mobilenet_v2'.")

In [24]:
# ============================================================================
# Training and Evaluation Functions (Provided - No modifications needed)
# ============================================================================

def train_epoch(model, dataloader, criterion, optimizer, device):
    """Train the model for one epoch."""
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, labels in tqdm(dataloader, desc="Training"):
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    epoch_loss = running_loss / len(dataloader)
    epoch_acc = 100. * correct / total
    return epoch_loss, epoch_acc


def evaluate(model, dataloader, criterion, device):
    """Evaluate the model."""
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in tqdm(dataloader, desc="Evaluating"):
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    epoch_loss = running_loss / len(dataloader)
    epoch_acc = 100. * correct / total
    return epoch_loss, epoch_acc


def train_model(model, train_loader, val_loader, num_epochs=10, lr=0.001):
    """
    Train and evaluate a model.

    Returns:
        Dictionary with training history
    """
    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

    history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}

    for epoch in range(num_epochs):
        print(f'\nEpoch {epoch+1}/{num_epochs}')
        print('-' * 30)

        train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
        val_loss, val_acc = evaluate(model, val_loader, criterion, device)

        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)

        print(f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%')
        print(f'Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%')

        scheduler.step()

    return history


def plot_training_history(history, title="Training History"):
    """Plot training and validation loss/accuracy."""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

    ax1.plot(history['train_loss'], label='Train Loss')
    ax1.plot(history['val_loss'], label='Val Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title(f'{title} - Loss')
    ax1.legend()
    ax1.grid(True)

    ax2.plot(history['train_acc'], label='Train Acc')
    ax2.plot(history['val_acc'], label='Val Acc')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy (%)')
    ax2.set_title(f'{title} - Accuracy')
    ax2.legend()
    ax2.grid(True)

    plt.tight_layout()
    plt.show()

In [25]:
# ============================================================================
# Main Execution - Test Your Implementations
# ============================================================================

if __name__ == "__main__":
    print("=" * 60)
    print("Testing your CNN implementations on UVA Landmarks Dataset")
    print("=" * 60)

    # Test each architecture with fewer epochs for quick validation
    test_epochs = 5  # Increase to 20-30 for better results

    # Dictionary to store results
    results = {}

    # Part 1: Test VGGNet
    print("\n" + "="*60)
    print("Part 1: Testing VGGNet")
    print("="*60)
    try:
        vgg_model = VGGNet(num_classes=num_classes)
        print(f"VGGNet Parameters: {sum(p.numel() for p in vgg_model.parameters()):,}")
        vgg_history = train_model(vgg_model, train_loader, val_loader, num_epochs=test_epochs)
        results['VGGNet'] = vgg_history['val_acc'][-1]
        plot_training_history(vgg_history, "VGGNet")
    except Exception as e:
        print(f"Error in VGGNet: {e}")
        results['VGGNet'] = 0

    # Part 2: Test NiN
    print("\n" + "="*60)
    print("Part 2: Testing Network in Network")
    print("="*60)
    try:
        nin_model = NiN(num_classes=num_classes)
        print(f"NiN Parameters: {sum(p.numel() for p in nin_model.parameters()):,}")
        nin_history = train_model(nin_model, train_loader, val_loader, num_epochs=test_epochs)
        results['NiN'] = nin_history['val_acc'][-1]
        plot_training_history(nin_history, "NiN")
    except Exception as e:
        print(f"Error in NiN: {e}")
        results['NiN'] = 0

    # Part 3: Test Inception
    print("\n" + "="*60)
    print("Part 3: Testing Inception Module")
    print("="*60)
    try:
        inception_model = SimpleGoogLeNet(num_classes=num_classes)
        print(f"GoogLeNet Parameters: {sum(p.numel() for p in inception_model.parameters()):,}")
        inception_history = train_model(inception_model, train_loader, val_loader, num_epochs=test_epochs)
        results['Inception'] = inception_history['val_acc'][-1]
        plot_training_history(inception_history, "Inception")
    except Exception as e:
        print(f"Error in Inception: {e}")
        results['Inception'] = 0

    # Part 4: Test ResNet
    print("\n" + "="*60)
    print("Part 4: Testing ResNet")
    print("="*60)
    try:
        resnet_model = ResNet18(num_classes=num_classes)
        print(f"ResNet18 Parameters: {sum(p.numel() for p in resnet_model.parameters()):,}")
        resnet_history = train_model(resnet_model, train_loader, val_loader, num_epochs=test_epochs)
        results['ResNet18'] = resnet_history['val_acc'][-1]
        plot_training_history(resnet_history, "ResNet18")
    except Exception as e:
        print(f"Error in ResNet: {e}")
        results['ResNet18'] = 0

    # Part 5: Test Transfer Learning
    print("\n" + "="*60)
    print("Part 5: Testing Transfer Learning")
    print("="*60)

    # Test feature extraction
    try:
        print("\nTesting Feature Extraction (frozen backbone)...")
        pretrained_frozen = get_pretrained_model('resnet18', num_classes=num_classes, feature_extract=True)
        frozen_history = train_model(pretrained_frozen, train_loader, val_loader, num_epochs=test_epochs)
        results['Transfer_Frozen'] = frozen_history['val_acc'][-1]
        plot_training_history(frozen_history, "Transfer Learning (Frozen)")
    except Exception as e:
        print(f"Error in Transfer Learning (Frozen): {e}")
        results['Transfer_Frozen'] = 0

    # Test fine-tuning
    try:
        print("\nTesting Fine-tuning (trainable backbone)...")
        pretrained_finetune = get_pretrained_model('resnet18', num_classes=num_classes, feature_extract=False)
        finetune_history = train_model(pretrained_finetune, train_loader, val_loader,
                                      num_epochs=test_epochs, lr=0.0001)
        results['Transfer_Finetune'] = finetune_history['val_acc'][-1]
        plot_training_history(finetune_history, "Transfer Learning (Fine-tune)")
    except Exception as e:
        print(f"Error in Transfer Learning (Fine-tune): {e}")
        results['Transfer_Finetune'] = 0

    # Print summary of results
    print("\n" + "="*60)
    print("RESULTS SUMMARY")
    print("="*60)
    for model_name, accuracy in results.items():
        print(f"{model_name:20s}: {accuracy:.2f}%")

    best_model = max(results, key=results.get)
    best_accuracy = results[best_model]
    print(f"\nBest Model: {best_model} with {best_accuracy:.2f}% validation accuracy")

    if best_accuracy > 94:
        print("\n🎉 BONUS ACHIEVED! Accuracy > 94%")
    else:
        print(f"\nKeep improving! Current best: {best_accuracy:.2f}% (Target: 94% for bonus)")

    print("\n" + "="*60)
    print("Assignment Complete!")
    print("Remember to submit: yourcomputingID_assignment_2.py")
    print("="*60)

Testing your CNN implementations on UVA Landmarks Dataset

Part 1: Testing VGGNet
VGGNet Parameters: 1,113,938

Epoch 1/5
------------------------------


Training:  58%|█████▊    | 52/90 [04:43<03:27,  5.46s/it]


KeyboardInterrupt: 

In [None]:
"""
Assignment Extension: Memory-Efficient Architectures for Edge Deployment
========================================================================
Learning Objectives:
1. Implement depthwise separable convolutions (MobileNet)
2. Build inverted residual blocks (MobileNetV2)
3. Understand FLOPs vs parameters vs memory trade-offs
4. Design models for memory-constrained devices

Total Points: 20
- Depthwise Separable Conv implementation (5 pts)
- Inverted Residual Block implementation (5 pts)
- MobileNetV2 architecture (5 pts)
- Model efficiency analysis (5 pts)

IMPORTANT CONCEPTS:
- FLOPs (Floating Point Operations): Measure of computational cost
  Standard Conv: FLOPs = 2 × H × W × K² × C_in × C_out
  Depthwise Conv: FLOPs = 2 × H × W × K² × C_in
  Pointwise Conv: FLOPs = 2 × H × W × C_in × C_out

- Parameters: Number of trainable weights
  Standard Conv: params = K² × C_in × C_out + C_out (bias)
  Depthwise: params = K² × C_in + C_in
  Pointwise: params = C_in × C_out + C_out

- Memory: Storage needed for model weights (typically 4 bytes per float32 param)
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets, transforms, models
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
import zipfile
import urllib.request
from tqdm import tqdm
import time

# Set seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Download dataset function
def download_dataset():
    """Download and extract the UVA landmarks dataset."""
    url = "https://firebasestorage.googleapis.com/v0/b/uva-landmark-images.appspot.com/o/dataset.zip?alt=media&token=e1403951-30d6-42b8-ba4e-394af1a2ddb7"

    if not os.path.exists('dataset'):
        print("Downloading dataset...")
        urllib.request.urlretrieve(url, 'dataset.zip')
        print("Extracting dataset...")
        with zipfile.ZipFile('dataset.zip', 'r') as zip_ref:
            zip_ref.extractall('.')
        os.remove('dataset.zip')
    else:
        print("Dataset already exists.")

download_dataset()

# Dataset parameters
data_dir = "dataset/"
batch_size = 32
img_size = 224  # MobileNet/EfficientNet use 224x224
num_classes = 18

# Data transforms
transform_train = transforms.Compose([
    transforms.RandomResizedCrop(img_size, scale=(0.8, 1.0)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

transform_val = transforms.Compose([
    transforms.Resize((img_size, img_size)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Create datasets
full_dataset = datasets.ImageFolder(data_dir)
train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(
    full_dataset, [train_size, val_size],
    generator=torch.Generator().manual_seed(42)
)

train_dataset.dataset.transform = transform_train
val_dataset.dataset.transform = transform_val

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(val_dataset)}")

# ============================================================================
# Part 1: Depthwise Separable Convolution (5 points)
# ============================================================================

class DepthwiseSeparableConv(nn.Module):
    """
    TODO (5 pts): Implement depthwise separable convolution as used in MobileNet.

    This replaces a standard convolution with two operations:
    1. Depthwise convolution: Apply a single filter per input channel
    2. Pointwise convolution: 1x1 conv to combine channels

    Design Requirements:
    - Depthwise: Conv2d with groups=in_channels (each filter operates on one channel)
    - Pointwise: Conv2d with kernel_size=1 (combines information across channels)
    - Use BatchNorm2d after each convolution
    - Use ReLU6 activation (clamps output between 0 and 6, better for quantization)
    - No bias in conv layers when using BatchNorm (bias=False)

    Efficiency Analysis:
    - Standard 3x3 conv: 9 × C_in × C_out parameters
    - Depthwise + Pointwise: 9 × C_in + C_in × C_out parameters
    - Reduction factor: approximately 8-9x for typical channel counts

    Example: C_in=128, C_out=128
    - Standard: 9 × 128 × 128 = 147,456 params
    - DW+PW: 9 × 128 + 128 × 128 = 17,536 params (8.4x reduction!)
    """
    def __init__(self, in_channels, out_channels, stride=1):
        super(DepthwiseSeparableConv, self).__init__()

        # TODO: Implement depthwise convolution
        # Hint: Use groups=in_channels to make each filter operate on single channel
        # Structure: Conv2d -> BatchNorm2d -> ReLU6
        # Kernel size should be 3x3 with padding=1 to maintain spatial dimensions

        # TODO: Implement pointwise convolution
        # Hint: This is a 1x1 convolution that combines the depthwise outputs
        # Structure: Conv2d(kernel_size=1) -> BatchNorm2d -> ReLU6

        pass

    def forward(self, x):
        # TODO: Apply depthwise then pointwise convolution
        # Return the final output
        pass


# ============================================================================
# Part 2: Inverted Residual Block (5 points)
# ============================================================================

class InvertedResidual(nn.Module):
    """
    TODO (5 pts): Implement the inverted residual block from MobileNetV2.

    Key Innovation: "Inverted" means we expand channels first, then compress
    Traditional residual: wide -> narrow -> wide
    Inverted residual: narrow -> wide -> narrow

    Architecture:
    1. Expansion layer: 1x1 conv to expand channels by expand_ratio
    2. Depthwise layer: 3x3 depthwise conv (may have stride for downsampling)
    3. Projection layer: 1x1 conv to project back to output channels
    4. Skip connection: Only when stride=1 AND in_channels=out_channels

    Design Requirements:
    - Expansion: Only if expand_ratio != 1 (skip if no expansion needed)
    - All layers use BatchNorm, but NO activation after final projection
    - Use ReLU6 for all activations except the last layer
    - hidden_dim = in_channels * expand_ratio

    Memory Optimization:
    - Linear bottleneck (no activation after projection) preserves information
    - Skip connection only when dimensions match (saves memory)

    FLOP Analysis for one block (H×W×C_in input):
    - Expansion: H×W×C_in×(expand_ratio×C_in) FLOPs
    - Depthwise: H×W×9×(expand_ratio×C_in) FLOPs
    - Projection: H×W×(expand_ratio×C_in)×C_out FLOPs
    """
    def __init__(self, in_channels, out_channels, stride=1, expand_ratio=6):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        hidden_dim = in_channels * expand_ratio

        # TODO: Determine when to use residual connection
        # Residual only when: stride=1 AND in_channels=out_channels
        # Store as self.use_residual

        # TODO: Build the layers list
        # If expand_ratio != 1:
        #   Add expansion layer: Conv2d(kernel=1) -> BN -> ReLU6
        # Always add:
        #   Depthwise: Conv2d(kernel=3, stride=stride, groups=hidden_dim) -> BN -> ReLU6
        #   Projection: Conv2d(kernel=1) -> BN (NO activation here!)

        # TODO: Combine layers into self.conv = nn.Sequential(*layers)

        pass

    def forward(self, x):
        # TODO: Implement forward pass
        # If self.use_residual is True: return x + self.conv(x)
        # Otherwise: return self.conv(x)
        pass


# ============================================================================
# Part 3: MobileNetV2 Architecture (5 points)
# ============================================================================

class MobileNetV2(nn.Module):
    """
    TODO (5 pts): Build a simplified MobileNetV2 architecture.

    Architecture Overview:
    1. Initial conv: 3 -> 32 channels, stride=2 (downsample to 112x112)
    2. Series of InvertedResidual blocks with specific configurations
    3. Final conv: expand to 1280 channels (wide feature layer)
    4. Global average pooling
    5. Classifier: Linear(1280, num_classes)

    Block Configuration:
    Each block has (in_channels, out_channels, stride, expand_ratio)
    - t=1: no expansion, t=6: 6x expansion
    - s=2: downsample spatial dimensions by 2

    Design Principles:
    - Width multiplier: Can scale all channels by a factor (0.5, 0.75, 1.0)
    - Resolution multiplier: Can use smaller input sizes (192, 160, 128)

    Memory Considerations:
    - Peak memory usage occurs at expansion layers
    - Skip connections don't add parameters but need memory for gradients
    - Total params ≈ 2.3M for width_mult=1.0 (vs ResNet50's 25M)

    Suggested Block Sequence (simplified):
    Stage 1: 32 -> 16, t=1, s=1
    Stage 2: 16 -> 24, t=6, s=2 (downsample)
             24 -> 24, t=6, s=1
    Stage 3: 24 -> 32, t=6, s=2 (downsample)
             32 -> 32, t=6, s=1 (repeat 2x)
    Stage 4: 32 -> 64, t=6, s=2 (downsample)
             64 -> 64, t=6, s=1 (repeat 3x)
    Stage 5: 64 -> 96, t=6, s=1 (repeat 3x)
    Stage 6: 96 -> 160, t=6, s=2 (downsample)
             160 -> 160, t=6, s=1 (repeat 2x)
    Stage 7: 160 -> 320, t=6, s=1
    """
    def __init__(self, num_classes=18, width_mult=1.0, dropout_prob=0.2):
        super(MobileNetV2, self).__init__()

        # TODO: Build the initial convolution layer
        # Conv2d(3, 32, kernel=3, stride=2, padding=1) -> BN -> ReLU6
        # This reduces 224x224 to 112x112

        # TODO: Build the InvertedResidual blocks
        # Create a nn.Sequential with all the blocks following the configuration above
        # Remember to apply width_mult to scale channel counts if needed

        # TODO: Build the final convolution layer
        # Conv2d(320, 1280, kernel=1) -> BN -> ReLU6
        # This creates a wide feature layer for better classification

        # TODO: Add global average pooling
        # Use nn.AdaptiveAvgPool2d((1, 1)) to handle any input size

        # TODO: Add dropout for regularization (optional but recommended)
        # nn.Dropout(p=dropout_prob) before the classifier

        # TODO: Add the final classifier
        # nn.Linear(1280, num_classes)

        # TODO: Initialize weights properly
        # Call self._initialize_weights()

        pass

    def forward(self, x):
        # TODO: Implement the forward pass
        # Pass through: features -> avgpool -> flatten -> dropout -> classifier
        pass

    def _initialize_weights(self):
        """Initialize model weights for better training."""
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)


# ============================================================================
# Part 4: Model Efficiency Analysis (5 points)
# ============================================================================

def count_parameters(model):
    """Count total and trainable parameters."""
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return total_params, trainable_params

def get_model_size_mb(model):
    """Calculate model size in MB (assuming float32 weights)."""
    param_size = 0
    for param in model.parameters():
        param_size += param.nelement() * param.element_size()

    buffer_size = 0
    for buffer in model.buffers():
        buffer_size += buffer.nelement() * buffer.element_size()

    size_all_mb = (param_size + buffer_size) / (1024 ** 2)
    return size_all_mb

def estimate_flops(model, input_shape=(1, 3, 224, 224)):
    """
    TODO (Bonus): Estimate FLOPs for your model.

    Simplified FLOP counting:
    - Conv2d: 2 × H_out × W_out × K² × C_in × C_out / groups
    - Linear: 2 × in_features × out_features
    - BatchNorm: 4 × num_features (can often ignore)

    This is complex to implement fully, so a simplified version is fine.
    You can also use existing libraries like thop or ptflops if available.
    """
    # Optional implementation
    total_flops = 0
    # Hint: You'd need to track input/output shapes through the network
    return total_flops

def measure_inference_time(model, input_shape=(1, 3, 224, 224), num_runs=100):
    """Measure average inference time in milliseconds."""
    model.eval()
    device = next(model.parameters()).device
    dummy_input = torch.randn(input_shape).to(device)

    # Warm up (important for accurate timing)
    for _ in range(10):
        with torch.no_grad():
            _ = model(dummy_input)

    # Synchronize if using CUDA
    if torch.cuda.is_available():
        torch.cuda.synchronize()

    # Time the inference
    start_time = time.time()
    with torch.no_grad():
        for _ in range(num_runs):
            _ = model(dummy_input)

    if torch.cuda.is_available():
        torch.cuda.synchronize()

    end_time = time.time()
    avg_time = (end_time - start_time) / num_runs * 1000  # Convert to ms
    return avg_time

# ============================================================================
# Training Functions (Provided - No TODO)
# ============================================================================

def train_epoch(model, dataloader, criterion, optimizer, device):
    """Train for one epoch."""
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    progress_bar = tqdm(dataloader, desc="Training")
    for inputs, labels in progress_bar:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

        # Update progress bar
        progress_bar.set_postfix({
            'loss': running_loss / (progress_bar.n + 1),
            'acc': 100. * correct / total
        })

    return running_loss / len(dataloader), 100. * correct / total

def evaluate(model, dataloader, criterion, device):
    """Evaluate model on validation/test set."""
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in tqdm(dataloader, desc="Evaluating"):
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    return running_loss / len(dataloader), 100. * correct / total

# ============================================================================
# Visualization Functions (Provided - No TODO)
# ============================================================================

def plot_model_comparison(models_dict):
    """Compare efficiency metrics of different models."""
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))

    model_names = list(models_dict.keys())
    params_list = []
    size_list = []
    time_list = []

    for name, model in models_dict.items():
        total_params, _ = count_parameters(model)
        params_list.append(total_params / 1e6)  # Convert to millions
        size_list.append(get_model_size_mb(model))
        time_list.append(measure_inference_time(model.to(device)))

    # Plot 1: Parameters
    axes[0, 0].bar(model_names, params_list, color='blue', alpha=0.7)
    axes[0, 0].set_ylabel('Parameters (Millions)')
    axes[0, 0].set_title('Model Parameters Comparison')
    axes[0, 0].tick_params(axis='x', rotation=45)
    axes[0, 0].grid(True, axis='y')

    # Plot 2: Model Size
    axes[0, 1].bar(model_names, size_list, color='green', alpha=0.7)
    axes[0, 1].set_ylabel('Size (MB)')
    axes[0, 1].set_title('Model Size on Disk')
    axes[0, 1].tick_params(axis='x', rotation=45)
    axes[0, 1].grid(True, axis='y')

    # Plot 3: Inference Time
    axes[1, 0].bar(model_names, time_list, color='red', alpha=0.7)
    axes[1, 0].set_ylabel('Time (ms)')
    axes[1, 0].set_title('Inference Time (Lower is Better)')
    axes[1, 0].tick_params(axis='x', rotation=45)
    axes[1, 0].grid(True, axis='y')

    # Plot 4: Efficiency Score
    axes[1, 1].scatter(params_list, time_list, s=100, alpha=0.7)
    for i, name in enumerate(model_names):
        axes[1, 1].annotate(name, (params_list[i], time_list[i]),
                           fontsize=8, ha='right')
    axes[1, 1].set_xlabel('Parameters (Millions)')
    axes[1, 1].set_ylabel('Inference Time (ms)')
    axes[1, 1].set_title('Efficiency Trade-off (Lower-left is Better)')
    axes[1, 1].grid(True)

    plt.tight_layout()
    plt.show()

def plot_training_curves(history):
    """Plot training and validation curves."""
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))

    # Loss curves
    axes[0].plot(history['train_loss'], label='Train Loss')
    axes[0].plot(history['val_loss'], label='Val Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].set_title('Training Progress - Loss')
    axes[0].legend()
    axes[0].grid(True)

    # Accuracy curves
    axes[1].plot(history['train_acc'], label='Train Acc')
    axes[1].plot(history['val_acc'], label='Val Acc')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy (%)')
    axes[1].set_title('Training Progress - Accuracy')
    axes[1].legend()
    axes[1].grid(True)

    plt.tight_layout()
    plt.show()

# ============================================================================
# Main Execution - Test Your Implementation
# ============================================================================

if __name__ == "__main__":
    print("="*80)
    print("Testing Your Efficient Architecture Implementation")
    print("="*80)

    # Test your implementations
    try:
        # Test DepthwiseSeparableConv
        print("\n1. Testing DepthwiseSeparableConv...")
        dw_conv = DepthwiseSeparableConv(32, 64)
        test_input = torch.randn(1, 32, 56, 56)
        output = dw_conv(test_input)
        print(f"   Input shape: {test_input.shape}")
        print(f"   Output shape: {output.shape}")
        print(f"   ✓ DepthwiseSeparableConv working!")
    except Exception as e:
        print(f"   ✗ Error in DepthwiseSeparableConv: {e}")

    try:
        # Test InvertedResidual
        print("\n2. Testing InvertedResidual...")
        inv_res = InvertedResidual(32, 32, stride=1, expand_ratio=6)
        test_input = torch.randn(1, 32, 56, 56)
        output = inv_res(test_input)
        print(f"   Input shape: {test_input.shape}")
        print(f"   Output shape: {output.shape}")
        print(f"   ✓ InvertedResidual working!")
    except Exception as e:
        print(f"   ✗ Error in InvertedResidual: {e}")

    try:
        # Test MobileNetV2
        print("\n3. Testing MobileNetV2...")
        mobilenet = MobileNetV2(num_classes=num_classes)
        test_input = torch.randn(1, 3, 224, 224)
        output = mobilenet(test_input)
        print(f"   Input shape: {test_input.shape}")
        print(f"   Output shape: {output.shape}")

        # Analyze model
        total_params, trainable_params = count_parameters(mobilenet)
        model_size = get_model_size_mb(mobilenet)
        print(f"   Total parameters: {total_params:,}")
        print(f"   Trainable parameters: {trainable_params:,}")
        print(f"   Model size: {model_size:.2f} MB")
        print(f"   ✓ MobileNetV2 working!")

    except Exception as e:
        print(f"   ✗ Error in MobileNetV2: {e}")

    # Compare with other models
    print("\n" + "="*80)
    print("Model Comparison")
    print("="*80)

    try:
        # Create models for comparison
        models_to_compare = {
            'Your MobileNetV2': MobileNetV2(num_classes=num_classes),
            'ResNet18': models.resnet18(num_classes=num_classes),
            'Pretrained MobileNetV2': models.mobilenet_v2(weights=None, num_classes=num_classes)
        }

        # Compare models
        for name, model in models_to_compare.items():
            total_params, _ = count_parameters(model)
            size_mb = get_model_size_mb(model)
            print(f"{name:20s}: {total_params/1e6:.2f}M params, {size_mb:.2f} MB")

        # Visualize comparison
        plot_model_comparison(models_to_compare)

    except Exception as e:
        print(f"Error in model comparison: {e}")

    # Train your model (optional - takes time)
    print("\n" + "="*80)
    print("Training Your MobileNetV2")
    print("="*80)

    train_model = input("Do you want to train your MobileNetV2? (y/n): ")

    if train_model.lower() == 'y':
        try:
            model = MobileNetV2(num_classes=num_classes, dropout_prob=0.2)
            model = model.to(device)

            # Setup training
            criterion = nn.CrossEntropyLoss()
            optimizer = optim.Adam(model.parameters(), lr=0.001)
            scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)

            # Training loop
            num_epochs = 5
            history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}

            for epoch in range(num_epochs):
                print(f"\nEpoch {epoch+1}/{num_epochs}")
                print("-" * 30)

                train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
                val_loss, val_acc = evaluate(model, val_loader, criterion, device)

                history['train_loss'].append(train_loss)
                history['train_acc'].append(train_acc)
                history['val_loss'].append(val_loss)
                history['val_acc'].append(val_acc)

                print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%")
                print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

                scheduler.step()

            # Plot training curves
            plot_training_curves(history)

            print(f"\nFinal Validation Accuracy: {history['val_acc'][-1]:.2f}%")

            # Success criteria
            if history['val_acc'][-1] > 80:
                print("✓ Great job! Your model achieves good accuracy while being efficient!")
            elif history['val_acc'][-1] > 70:
                print("✓ Good start! Try fine-tuning hyperparameters or training longer.")
            else:
                print("Keep working! Check your implementation and try different settings.")

        except Exception as e:
            print(f"Error during training: {e}")

    print("\n" + "="*80)
    print("Assignment Complete!")
    print("="*80)
    print("\nKey Takeaways:")
    print("1. Depthwise separable convolutions reduce parameters by ~8-9x")
    print("2. Inverted residuals with linear bottlenecks preserve information")
    print("3. MobileNetV2 achieves ResNet-level accuracy with 10x fewer parameters")
    print("4. Efficient models are crucial for edge deployment (phones, IoT, etc.)")
    print("="*80)