# Lab 1: CNN Robustness Report - Student Notebook

## Objective
Build a comprehensive CNN robustness report by:
1. Training a baseline CNN model
2. Detecting overfitting from training curves
3. Applying 2 regularization techniques and comparing
4. Running adversarial attacks (FGSM)
5. Generating a final robustness report

## Deliverables
Your final report must include:
- ‚úÖ Clean accuracy (baseline model)
- ‚úÖ Noisy accuracy (Gaussian noise perturbation)
- ‚úÖ Adversarial accuracy (FGSM attack)
- ‚úÖ Model size (MB)
- ‚úÖ Inference time (ms per image)
- ‚úÖ Observations (overfitting analysis, regularization effectiveness, robustness assessment)

## Instructions
- **üî¥ RED CODE**: Code you need to complete (fill in the blanks)
- **üü¢ GREEN CODE**: Complete code already provided
- **üìù COMMENTS**: Explain what each section does
- **üí° HINTS**: Refer to Part 1, 2, 3 notebooks from class

## Section 1: Setup & Data Loading

In [None]:
# üü¢ GREEN CODE - Complete (Run this as-is)
import os
import time
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
from torchvision.models import resnet18
from tqdm import tqdm

np.random.seed(42)
torch.manual_seed(42)

print("‚úÖ Libraries imported")

In [None]:
# üü¢ GREEN CODE - Device and paths
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {device}")

DATASET_PATH = r"C:\Users\Lucifer\python_workspace\BITS\AI_Quality_Engineering\dataset"
TRAIN_PATH = os.path.join(DATASET_PATH, "train")
VAL_PATH = os.path.join(DATASET_PATH, "val")
TEST_PATH = os.path.join(DATASET_PATH, "test")

In [None]:
# üü¢ GREEN CODE - Data transformations and loading
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

train_dataset = ImageFolder(TRAIN_PATH, transform=transform)
val_dataset = ImageFolder(VAL_PATH, transform=transform)
test_dataset = ImageFolder(TEST_PATH, transform=transform)

class_names = train_dataset.classes
num_classes = len(class_names)

print(f"Classes: {class_names}")
print(f"Train: {len(train_dataset)} | Val: {len(val_dataset)} | Test: {len(test_dataset)}")

In [None]:
# üî¥ RED CODE - TODO: Create data loaders
# Create train_loader, val_loader, test_loader with batch_size=64
# Use shuffle=True for train, shuffle=False for val/test

BATCH_SIZE = 64
train_loader = DataLoader(  # TODO: Complete this line
    
)
val_loader = DataLoader(  # TODO: Complete this line
    
)
test_loader = DataLoader(  # TODO: Complete this line
    
)

print(f"‚úÖ Data loaders created with batch size {BATCH_SIZE}")

## Section 2: Define Models with Regularization

In [None]:
# üü¢ GREEN CODE - Baseline model (no regularization)
class BaselineModel(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.resnet = resnet18(pretrained=False)
        self.resnet.fc = nn.Linear(self.resnet.fc.in_features, num_classes)
    
    def forward(self, x):
        return self.resnet(x)

print("‚úÖ BaselineModel defined")

In [None]:
# üî¥ RED CODE - TODO: Create L2 Regularized Model
# Add L2 regularization using weight_decay in optimizer
# This model should be identical to BaselineModel
# The difference is in how we create the optimizer (will add weight_decay)

class L2RegularizedModel(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        # TODO: Create ResNet-18 model
        # Hint: Same as BaselineModel
        pass
    
    def forward(self, x):
        # TODO: Implement forward pass
        pass

print("‚úÖ L2RegularizedModel defined")

In [None]:
# üî¥ RED CODE - TODO: Create Dropout Model
# Add Dropout layer before final classification

class DropoutModel(nn.Module):
    def __init__(self, num_classes, dropout_rate=0.5):
        super().__init__()
        self.resnet = resnet18(pretrained=False)
        # TODO: Replace final layer with sequential containing Dropout + Linear
        # Use nn.Sequential(nn.Dropout(dropout_rate), nn.Linear(...))
        self.resnet.fc = ___  # TODO: Complete this
    
    def forward(self, x):
        # TODO: Implement forward pass
        pass

print("‚úÖ DropoutModel defined")

## Section 3: Training & Evaluation Functions

In [None]:
# üü¢ GREEN CODE - Training function (complete)
def train_epoch(model, loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in tqdm(loader, desc="Training", leave=False):
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    avg_loss = running_loss / len(loader)
    accuracy = 100 * correct / total
    return avg_loss, accuracy

print("‚úÖ train_epoch() defined")

In [None]:
# üî¥ RED CODE - TODO: Complete evaluation function
def evaluate(model, loader, criterion, device):
    model.eval()
    correct = 0
    total = 0
    running_loss = 0.0
    
    with torch.no_grad():
        for images, labels in tqdm(loader, desc="Evaluating", leave=False):
            images, labels = images.to(device), labels.to(device)
            # TODO: Forward pass
            outputs = ___
            # TODO: Calculate loss
            loss = ___
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    avg_loss = running_loss / len(loader)
    accuracy = 100 * correct / total
    return avg_loss, accuracy

print("‚úÖ evaluate() defined")

## Section 4: Train All Three Models

In [None]:
# üî¥ RED CODE - TODO: Train Baseline Model
# Create model, optimizer, criterion and train for 30 epochs

print("\n" + "="*60)
print("Training BASELINE Model (No Regularization)")
print("="*60)

baseline_model = BaselineModel(num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer_baseline = optim.Adam(baseline_model.parameters(), lr=0.001)  # TODO: Add weight_decay=0 (or no weight decay)

baseline_history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
NUM_EPOCHS = 30

# TODO: Complete the training loop
for epoch in range(NUM_EPOCHS):
    train_loss, train_acc = train_epoch(baseline_model, train_loader, criterion, optimizer_baseline, device)
    val_loss, val_acc = evaluate(baseline_model, val_loader, criterion, device)
    
    baseline_history['train_loss'].append(train_loss)
    baseline_history['train_acc'].append(train_acc)
    baseline_history['val_loss'].append(val_loss)
    baseline_history['val_acc'].append(val_acc)
    
    # TODO: Print epoch results (Epoch [X/30] Train: XXX% Val: XXX%)
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{NUM_EPOCHS}] Train Acc: {train_acc:.2f}% | Val Acc: {val_acc:.2f}%")

print("‚úÖ Baseline model trained")

In [None]:
# üî¥ RED CODE - TODO: Train L2 Regularized Model
print("\n" + "="*60)
print("Training L2 REGULARIZED Model (Weight Decay = 0.001)")
print("="*60)

l2_model = L2RegularizedModel(num_classes).to(device)
# TODO: Create optimizer with weight_decay=0.001
optimizer_l2 = optim.Adam(l2_model.parameters(), lr=0.001, weight_decay=___)  # TODO: What weight_decay value?

l2_history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}

# TODO: Complete the training loop (similar to baseline)
for epoch in range(NUM_EPOCHS):
    # TODO: Train and evaluate
    # TODO: Update history
    # TODO: Print every 10 epochs
    pass

print("‚úÖ L2 Regularized model trained")

In [None]:
# üî¥ RED CODE - TODO: Train Dropout Model
print("\n" + "="*60)
print("Training DROPOUT Model (Dropout Rate = 0.5)")
print("="*60)

dropout_model = DropoutModel(num_classes, dropout_rate=0.5).to(device)
optimizer_dropout = optim.Adam(dropout_model.parameters(), lr=0.001)

dropout_history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}

# TODO: Complete the training loop
for epoch in range(NUM_EPOCHS):
    # TODO: Implement (same pattern as above)
    pass

print("‚úÖ Dropout model trained")

## Section 5: Detect Overfitting

In [None]:
# üî¥ RED CODE - TODO: Plot training curves and analyze overfitting
# Create a figure with 3 subplots (one for each model)
# Each subplot should show train and val accuracy curves

fig, axes = plt.subplots(1, 3, figsize=(16, 5))
fig.suptitle('Overfitting Analysis: Baseline vs L2 vs Dropout', fontsize=16, fontweight='bold')

models_data = [
    ('Baseline', baseline_history, axes[0]),
    ('L2 Regularized', l2_history, axes[1]),
    ('Dropout', dropout_history, axes[2])
]

# TODO: Complete the plotting loop
for model_name, history, ax in models_data:
    # TODO: Plot train and val accuracy
    # TODO: Set labels, title, legend
    # TODO: Add grid
    pass

plt.tight_layout()
plt.savefig('overfitting_comparison.png', dpi=150)
plt.show()

print("‚úÖ Overfit curves plotted")

In [None]:
# üî¥ RED CODE - TODO: Calculate overfitting metrics
# For each model, calculate the gap between final train and val accuracy

overfitting_analysis = {}

# Baseline
baseline_gap = baseline_history['train_acc'][-1] - baseline_history['val_acc'][-1]
overfitting_analysis['Baseline'] = baseline_gap

# L2 Regularized
# TODO: Calculate L2 gap
l2_gap = ___
overfitting_analysis['L2 Regularized'] = l2_gap

# Dropout
# TODO: Calculate Dropout gap
dropout_gap = ___
overfitting_analysis['Dropout'] = dropout_gap

print("\nüìä OVERFITTING ANALYSIS (Train-Val Gap):")
for model_name, gap in overfitting_analysis.items():
    print(f"{model_name:20s}: {gap:6.2f}% gap", end="")
    if gap > 10:
        print(" ‚ö†Ô∏è  HIGH OVERFITTING")
    elif gap > 5:
        print(" ‚ö†Ô∏è  MODERATE OVERFITTING")
    else:
        print(" ‚úÖ GOOD GENERALIZATION")

## Section 6: Model Complexity Analysis

In [None]:
# üü¢ GREEN CODE - Helper functions
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

def get_model_size_mb(model):
    torch.save(model.state_dict(), "temp_model.pth")
    size_mb = os.path.getsize("temp_model.pth") / (1024 * 1024)
    os.remove("temp_model.pth")
    return size_mb

print("‚úÖ Helper functions defined")

In [None]:
# üî¥ RED CODE - TODO: Calculate model sizes and parameters
model_complexity = {}

# TODO: For each model, calculate parameters and size
for model_name, model in [('Baseline', baseline_model), ('L2 Regularized', l2_model), ('Dropout', dropout_model)]:
    params = count_parameters(model)
    size = get_model_size_mb(model)
    model_complexity[model_name] = {'params': params, 'size': size}
    print(f"{model_name}: {params:,} parameters | {size:.2f} MB")

## Section 7: Robustness Testing - Adversarial Attacks

In [None]:
# üî¥ RED CODE - TODO: Implement Gaussian Noise Perturbation
def add_gaussian_noise(images, noise_std=0.1):
    """
    Add Gaussian noise to images.
    
    Args:
        images: Tensor of shape (B, C, H, W)
        noise_std: Standard deviation of Gaussian noise
    
    Returns:
        Noisy images (clipped to valid range [-1, 1])
    """
    # TODO: Create random noise with torch.randn_like
    # TODO: Add noise to images
    # TODO: Clip to [-1, 1]
    noise = ___
    noisy_images = ___
    return torch.clamp(noisy_images, -1, 1)

print("‚úÖ Gaussian noise function defined")

In [None]:
# üî¥ RED CODE - TODO: Implement FGSM Attack
def fgsm_attack(model, images, labels, device, epsilon=0.05):
    """
    Fast Gradient Sign Method (FGSM) Attack.
    
    Perturb images in the direction of the gradient to fool the model.
    
    Args:
        model: Neural network
        images: Input images
        labels: True labels
        device: Device to run on
        epsilon: Attack strength
    
    Returns:
        Adversarial images
    """
    # TODO: Enable gradient tracking for images
    images.requires_grad = ___
    
    # Forward pass
    outputs = model(images)
    loss = nn.CrossEntropyLoss()(outputs, labels)
    
    # TODO: Compute gradients
    model.zero_grad()
    loss.backward()
    
    # TODO: Get gradient sign and create adversarial examples
    data_grad = images.grad.data
    sign_data_grad = ___
    
    # TODO: Create perturbed images
    perturbed_images = ___
    
    # Clip to valid range
    return torch.clamp(perturbed_images, -1, 1).detach()

print("‚úÖ FGSM attack function defined")

In [None]:
# üî¥ RED CODE - TODO: Evaluate robustness on noisy images
def evaluate_on_noisy(model, loader, device, noise_std=0.1):
    """Evaluate model on Gaussian noise-perturbed images"""
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in tqdm(loader, desc=f"Testing noise (œÉ={noise_std})", leave=False):
            images, labels = images.to(device), labels.to(device)
            # TODO: Add Gaussian noise
            noisy_images = ___
            # TODO: Get predictions
            outputs = ___
            _, predicted = torch.max(outputs, 1)
            # TODO: Update accuracy
            total += ___
            correct += ___
    
    accuracy = 100 * correct / total
    return accuracy

print("‚úÖ Noisy evaluation function defined")

In [None]:
# üî¥ RED CODE - TODO: Evaluate robustness against FGSM attacks
def evaluate_on_adversarial(model, loader, device, epsilon=0.05):
    """Evaluate model on FGSM adversarial examples"""
    model.eval()
    correct = 0
    total = 0
    
    for images, labels in tqdm(loader, desc=f"Testing FGSM (Œµ={epsilon})", leave=False):
        images, labels = images.to(device), labels.to(device)
        # TODO: Generate adversarial examples
        adv_images = ___
        # TODO: Evaluate on adversarial images
        # TODO: Update accuracy
        pass
    
    accuracy = 100 * correct / total
    return accuracy

print("‚úÖ Adversarial evaluation function defined")

In [None]:
# üî¥ RED CODE - TODO: Test all models on clean, noisy, and adversarial data
print("\n" + "="*70)
print("ROBUSTNESS EVALUATION")
print("="*70)

robustness_results = {}

for model_name, model in [('Baseline', baseline_model), ('L2 Regularized', l2_model), ('Dropout', dropout_model)]:
    print(f"\n{model_name}:")
    
    # Clean accuracy
    clean_loss, clean_acc = evaluate(model, test_loader, criterion, device)
    # TODO: Get noisy accuracy
    noisy_acc = ___
    # TODO: Get adversarial accuracy
    adv_acc = ___
    
    robustness_results[model_name] = {
        'clean': clean_acc,
        'noisy': noisy_acc,
        'adversarial': adv_acc
    }
    
    print(f"  Clean Accuracy:       {clean_acc:.2f}%")
    print(f"  Noisy Accuracy (œÉ=0.1): {noisy_acc:.2f}%")
    print(f"  Adversarial Accuracy (Œµ=0.05): {adv_acc:.2f}%")

## Section 8: Generate Final Report

In [None]:
# üî¥ RED CODE - TODO: Create comprehensive robustness report
print("\n" + "="*80)
print("üéØ CNN ROBUSTNESS REPORT")
print("="*80)

print("\nüìä ACCURACY METRICS:")
print("-" * 80)
print(f"{'Model':<25} {'Clean':<15} {'Noisy':<15} {'Adversarial':<15}")
print("-" * 80)

# TODO: Print results for each model
for model_name in ['Baseline', 'L2 Regularized', 'Dropout']:
    # TODO: Get results and print in table format
    pass

print("\n‚è±Ô∏è  INFERENCE TIME & MODEL SIZE:")
print("-" * 80)
# TODO: Print inference time and model size for each model

In [None]:
# üî¥ RED CODE - TODO: Inference time analysis
print("\nüìè MODEL COMPLEXITY:")
print("-" * 80)
print(f"{'Model':<25} {'Parameters':<20} {'Size (MB)':<15}")
print("-" * 80)

# TODO: Print parameters and size for each model
for model_name in ['Baseline', 'L2 Regularized', 'Dropout']:
    params = model_complexity[model_name]['params']
    size = model_complexity[model_name]['size']
    # TODO: Print in table format
    pass

In [None]:
# üî¥ RED CODE - TODO: Write your observations
print("\nüí° OBSERVATIONS & ANALYSIS:")
print("="*80)

observations = """
Please answer the following questions:

1. OVERFITTING ANALYSIS:
   - Which model shows the least overfitting? Why?
   - TODO: Write your observation here
   
2. REGULARIZATION EFFECTIVENESS:
   - How effective is L2 regularization compared to Dropout?
   - TODO: Write your observation here
   
3. ROBUSTNESS ASSESSMENT:
   - Which model is most robust to adversarial attacks?
   - What's your hypothesis for why?
   - TODO: Write your observation here
   
4. ACCURACY-COMPLEXITY TRADEOFF:
   - Is there a correlation between model size and accuracy?
   - TODO: Write your observation here
   
5. RECOMMENDATIONS:
   - Which model would you deploy in production and why?
   - TODO: Write your recommendation here
"""

print(observations)

# TODO: Write your answers below
YOUR_OBSERVATIONS = """
"""

print(YOUR_OBSERVATIONS)

In [None]:
# üî¥ RED CODE - TODO: Create final summary visualization
# Create a bar chart comparing clean, noisy, and adversarial accuracy for all 3 models

fig, ax = plt.subplots(figsize=(10, 6))

# TODO: Prepare data for plotting
# Hint: Use model names as x-axis, and plot 3 bars per model

# TODO: Create grouped bar chart
# Note: You'll need to offset the bars for each accuracy type

ax.set_xlabel('Model', fontsize=12, fontweight='bold')
ax.set_ylabel('Accuracy (%)', fontsize=12, fontweight='bold')
ax.set_title('Robustness Comparison: Clean vs Noisy vs Adversarial', fontsize=14, fontweight='bold')
ax.legend(['Clean', 'Noisy', 'Adversarial'], fontsize=11)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('robustness_summary.png', dpi=150)
plt.show()

print("‚úÖ Summary visualization created")

## Submission Checklist

Before submitting, ensure your report includes:

- [ ] **Clean Accuracy**: Baseline model test accuracy
- [ ] **Noisy Accuracy**: Accuracy with Gaussian noise (œÉ=0.1)
- [ ] **Adversarial Accuracy**: Accuracy with FGSM attack (Œµ=0.05)
- [ ] **Model Size**: Size in MB for all 3 models
- [ ] **Inference Time**: Per-image inference time
- [ ] **Overfitting Analysis**: Explanation of train-val gap for each model
- [ ] **Regularization Comparison**: Which technique worked better?
- [ ] **Robustness Assessment**: Which model is most robust and why?
- [ ] **Visualizations**: Training curves and robustness comparison chart
- [ ] **Observations**: Detailed answers to all 5 questions above

## Questions & Tips

- If stuck, refer to Part 1, 2, and 3 notebooks from class
- Look for `# TODO:` comments to find missing code
- Use the solution notebook only as a last resort!
- Good luck! üöÄ