<a href="https://colab.research.google.com/github/lucianoselimaj/MLDL_Labs/blob/main/Pytorch_AlexNet_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### PyTorch AlexNet Exercises

Welcome to the PyTorch AlexNet exercise template notebook.

There are several questions in this notebook and it's your goal to answer them by writing Python and PyTorch code.






What we'll cover:
1. Define full AlexNet architecture
2. Implement training & validation loops
3. Insert result tracking and reporting
4. (Optional) Add model saving and decay experiments


In [2]:
#core lib for ML
import torch
import torch.nn as nn
import torch.optim as optim #optimizers
# provides standard datasets, transformers for data preprocessing
from torchvision import datasets, transforms, models
# data batch loading
from torch.utils.data import DataLoader
import numpy as np
import time

import os
import zipfile
import urllib.request
import shutil
from collections import defaultdict


# Define dataset directory
dataset_dir = './tiny-imagenet-200'
zip_path = './tiny-imagenet-200.zip'
url = 'http://cs231n.stanford.edu/tiny-imagenet-200.zip'



#Download the dataset if it doesn't exist
if not os.path.exists(dataset_dir):
    print("Downloading Tiny ImageNet dataset...")
    urllib.request.urlretrieve(url, zip_path)

    print("Extracting dataset...")
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall()

    print("Dataset ready!")
else:
    print("Tiny ImageNet already downloaded.")


# --------------------------------------------------
# Step 2: Reorganize Validation Folder (only first time)
# --------------------------------------------------

val_dir = os.path.join(dataset_dir, 'val')
val_images_dir = os.path.join(val_dir, 'images')
val_annotations_file = os.path.join(val_dir, 'val_annotations.txt')

# Only reorganize if not already done
if not os.path.exists(os.path.join(val_dir, 'n01443537')):
    print("🔧 Reorganizing validation folder...")
    val_annotations = defaultdict(list)

    with open(val_annotations_file, 'r') as f:
        for line in f.readlines():
            parts = line.strip().split('\t')
            filename, class_name = parts[0], parts[1]
            val_annotations[class_name].append(filename)

    for class_name, filenames in val_annotations.items():
        class_dir = os.path.join(val_dir, class_name, 'images')
        os.makedirs(class_dir, exist_ok=True)
        for fname in filenames:
            src = os.path.join(val_images_dir, fname)
            dst = os.path.join(class_dir, fname)
            shutil.move(src, dst)

    shutil.rmtree(val_images_dir)
    print("✅ Validation folder reorganized.")



# Define a custom AlexNet architecture using PyTorch
class AlexNet(nn.Module):
    def __init__(self, num_classes=200):  # Tiny ImageNet has 200 classes
        super(AlexNet, self).__init__()  # Initialize the parent nn.Module class

        # Define the convolutional feature extractor part of AlexNet
        self.features = nn.Sequential(
            # First convolutional layer: input has 3 channels (RGB), output 64 feature maps
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),  # Apply ReLU activation function

            # Max pooling layer to reduce spatial dimensions (size and complexity)
            nn.MaxPool2d(kernel_size=3, stride=2),

            # Second convolutional layer: 64 input channels, 192 output filters
            nn.Conv2d(in_channels=64, out_channels=192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),  # Apply ReLU activation
            nn.MaxPool2d(kernel_size=3, stride=2),  # Again, reduce dimensions

            # Third convolutional layer: increases depth to 384 feature maps
            nn.Conv2d(in_channels=192, out_channels=384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            # Fourth convolutional layer: reduces to 256 feature maps
            nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            # Fifth convolutional layer: maintains 256 filters
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            # Final pooling to reduce to a small spatial feature map (typically 6x6)
            nn.MaxPool2d(kernel_size=3, stride=2)
        )

        # Define the fully connected layers (classifier part)
        self.classifier = nn.Sequential(
            nn.Dropout(),  # Dropout: randomly deactivate neurons (for regularization)

            # First fully connected layer: input = flattened feature map (256×6×6), output = 4096 neurons
            nn.Linear(in_features=256 * 6 * 6, out_features=4096),
            nn.ReLU(inplace=True),  # ReLU activation

            nn.Dropout(),  # Another dropout layer for regularization

            # Second fully connected layer: 4096 → 4096 neurons
            nn.Linear(in_features=4096, out_features=4096),
            nn.ReLU(inplace=True),

            # Final fully connected layer: maps to 200 output classes (TinyImageNet)
            nn.Linear(in_features=4096, out_features=num_classes)
        )

    # Define the forward pass: how data moves through the network
    def forward(self, x):
        x = self.features(x)              # Pass input through convolutional layers
        x = torch.flatten(x, 1)           # Flatten output (preserve batch dim)
        x = self.classifier(x)            # Pass through fully connected layers
        return x                          # Output: class scores (logits)


# List of learning rates to test in your experiments
learning_rates = [0.1, 0.001, 0.0001]
# List of batch sizes to test
batch_sizes = [16, 32, 64]
# Number of epochs to train the model for each combination
num_epochs = 10


# Define the image transformations (resizing, normalization)
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize all images to 224x224 (required by AlexNet)
    transforms.ToTensor(),  # Convert PIL images to PyTorch tensors
    transforms.Normalize(mean=[0.485, 0.456, 0.406],  # Normalize using ImageNet means
                         std=[0.229, 0.224, 0.225])   # And standard deviations
])


train_dir = os.path.join(dataset_dir, 'train')
val_dir = os.path.join(dataset_dir, 'val')
# Load the training dataset (organize images in class folders)
train_dataset = datasets.ImageFolder(train_dir, transform=transform)
# Load the validation dataset
val_dataset = datasets.ImageFolder(val_dir, transform=transform)


# Loop over all combinations of learning rate and batch size
for lr in learning_rates:
    for batch_size in batch_sizes:
        # Print current configuration being trained
        print(f"\n🔁 Training AlexNet | LR={lr}, Batch Size={batch_size}")

        # Create DataLoader for training set
        # batch_size controls how many images are processed per step
        # shuffle=True to randomly mix training data every epoch
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

        # Create DataLoader for validation set
        # shuffle=False since we just need performance stats
        val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

        # Initialize a new instance of AlexNet
        # .cuda() moves the model to the GPU for faster training
        net = AlexNet(num_classes=200).cuda()

        # Define the loss function: CrossEntropyLoss is used for multi-class classification
        criterion = nn.CrossEntropyLoss()

        # Define the optimizer: SGD with momentum and weight decay
        # lr: learning rate (from loop), momentum improves convergence speed
        # weight_decay adds L2 regularization to prevent overfitting
        optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=5e-4)

        # Main training loop: run for num_epochs iterations
        for epoch in range(num_epochs):
            net.train()  # Set the model to training mode
            running_loss = 0.0  # Accumulate total loss per epoch
            correct = 0         # Count correct predictions
            total = 0           # Count total predictions

            # Iterate over all training batches
            for inputs, targets in train_loader:
                # Move input images and labels to GPU
                inputs, targets = inputs.cuda(), targets.cuda()

                optimizer.zero_grad()      # Clear previous gradients
                outputs = net(inputs)      # Forward pass through the model
                loss = criterion(outputs, targets)  # Compute loss
                loss.backward()            # Backpropagation (compute gradients)
                optimizer.step()           # Update model weights

                running_loss += loss.item()  # Add current batch loss to total

                # Get predictions by choosing class with highest logit
                _, predicted = outputs.max(1)

                # Update total number of examples and correct predictions
                total += targets.size(0)
                correct += predicted.eq(targets).sum().item()

            # Compute training accuracy after all batches
            train_acc = 100. * correct / total

            # Print training loss and accuracy for this epoch
            print(f"Epoch [{epoch+1}/{num_epochs}] → "
                  f"Train Loss: {running_loss/len(train_loader):.4f} | "
                  f"Train Acc: {train_acc:.2f}%")

            # ----------------- Validation Phase -----------------
            net.eval()  # Set model to evaluation mode (disables dropout, batchnorm update)
            val_loss = 0.0
            val_correct = 0
            val_total = 0

            with torch.no_grad():  # No gradient tracking during validation
                for inputs, targets in val_loader:
                    inputs, targets = inputs.cuda(), targets.cuda()
                    outputs = net(inputs)              # Forward pass
                    loss = criterion(outputs, targets) # Compute loss
                    val_loss += loss.item()            # Accumulate loss

                    # Get predicted classes
                    _, predicted = outputs.max(1)

                    # Update total and correct predictions
                    val_total += targets.size(0)
                    val_correct += predicted.eq(targets).sum().item()

            # Calculate validation accuracy
            val_acc = 100. * val_correct / val_total

            # Print validation loss and accuracy
            print(f"              Val Loss: {val_loss/len(val_loader):.4f} | "
                  f"Val Acc: {val_acc:.2f}%")

        # After all epochs, save the trained model to a file
        model_name = f"alexnet_lr{lr}_bs{batch_size}.pth"
        torch.save(net.state_dict(), model_name)
        print(f"✅ Saved: {model_name}")



Tiny ImageNet already downloaded.
🔧 Reorganizing validation folder...
✅ Validation folder reorganized.

🔁 Training AlexNet | LR=0.1, Batch Size=16
Epoch [1/10] → Train Loss: 5.3136 | Train Acc: 0.43%
              Val Loss: 5.3112 | Val Acc: 0.50%
Epoch [2/10] → Train Loss: 5.3133 | Train Acc: 0.46%
              Val Loss: 5.3117 | Val Acc: 0.50%
Epoch [3/10] → Train Loss: 5.3137 | Train Acc: 0.50%
              Val Loss: 5.3126 | Val Acc: 0.50%
Epoch [4/10] → Train Loss: 5.3134 | Train Acc: 0.49%
              Val Loss: 5.3109 | Val Acc: 0.50%
Epoch [5/10] → Train Loss: 5.3139 | Train Acc: 0.48%
              Val Loss: 5.3076 | Val Acc: 0.50%
Epoch [6/10] → Train Loss: 5.3130 | Train Acc: 0.50%
              Val Loss: 5.3129 | Val Acc: 0.50%
Epoch [7/10] → Train Loss: 5.3134 | Train Acc: 0.46%
              Val Loss: 5.3124 | Val Acc: 0.50%
Epoch [8/10] → Train Loss: 5.3128 | Train Acc: 0.48%
              Val Loss: 5.3159 | Val Acc: 0.50%
Epoch [9/10] → Train Loss: 5.3132 | Train Acc

KeyboardInterrupt: 