# DATASCI 503, Group Work 12: Fashion MNIST with Regularization

**Instructions:** During lab section, and afterward as necessary, you will collaborate in two-person teams to complete the problems that are interspersed below. The GSI will help individual teams encountering difficulty, make announcements addressing common issues, and help ensure progress for all teams. **During lab, feel free to flag down your GSI to ask questions at any point!** Upon completion, one member of the team should submit their team's work through Canvas **as html**.

In this assignment, we will apply deep learning to the Fashion MNIST dataset.
The Fashion MNIST dataset is a popular benchmark dataset for machine learning and computer vision, often used as a drop-in replacement for the original MNIST dataset of handwritten digits. Fashion MNIST is composed of 70,000 grayscale images in total, each having a resolution of 28x28 pixels. The dataset is divided into 60,000 training images and 10,000 testing images. Each image depicts an item of clothing or accessory, categorized into one of ten classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, and Ankle boot.

This dataset was designed to provide more challenging classification tasks compared to the original MNIST digits, largely because the images are more complex and may contain more subtle differences. Each of these items has a corresponding label, making Fashion MNIST suitable for tasks such as image classification and computer vision model testing. It serves as a standardized dataset for evaluating and comparing the performance of machine learning algorithms.

For more details on Fashion MNIST, see the [official documentation](https://github.com/zalandoresearch/fashion-mnist).

PyTorch, via torchvision, makes it easy to load the Fashion MNIST dataset. Run the cell below to download the data and create training and validation dataloaders.

In [None]:
import numpy as np
import seaborn as sns
import torch
import torchvision
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix
from torch import nn, optim
from torch.utils.data import DataLoader, random_split
from torchvision import transforms

# Set random seed for reproducibility
torch.manual_seed(42)

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor()])

# Download the full training dataset
full_train_dataset = torchvision.datasets.FashionMNIST(
    root="./data", train=True, download=True, transform=transform
)

# Define the sizes
train_size = int(0.7 * len(full_train_dataset))
val_size = len(full_train_dataset) - train_size

# Split the dataset into training and validation sets
train_dataset, val_dataset = random_split(full_train_dataset, [train_size, val_size])

# Download the test dataset
test_dataset = torchvision.datasets.FashionMNIST(
    root="./data", train=False, download=True, transform=transform
)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1024, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=1024, shuffle=False)

# Loss function to use throughout
criterion = nn.CrossEntropyLoss()

We provide you with the following function for training your model.

In [None]:
def train_model(model, train_loader, val_loader, optimizer, num_epochs=50):
    """Train a model and return training and validation losses.

    Args:
        model: PyTorch model to train
        train_loader: DataLoader for training data
        val_loader: DataLoader for validation data
        optimizer: PyTorch optimizer
        num_epochs: Number of training epochs

    Returns:
        Tuple of (training_losses, val_losses) lists
    """
    val_losses = []
    training_losses = []

    for epoch in range(num_epochs):
        model.train()
        train_loss = 0
        for batch_x, batch_y in train_loader:
            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        train_loss /= len(train_loader)
        training_losses.append(train_loss)

        model.eval()
        val_loss = 0
        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                outputs = model(batch_x)
                loss = criterion(outputs, batch_y)
                val_loss += loss.item()

        val_loss /= len(val_loader)

        val_losses.append(val_loss)
        if (epoch + 1) % 10 == 0:
            print(
                f"Epoch {epoch + 1}/{num_epochs}, "
                f"Training Loss: {train_loss:.4f}, "
                f"Validation Loss: {val_loss:.4f}"
            )

    return training_losses, val_losses

### Problem 1: Demonstrating Underfitting

Create a model that underfits the data for at least 50 epochs. Using the `train_model` function above, demonstrate that the training loss drops steadily while the validation loss remains similar (within 0.05) to the training loss throughout training. This behavior indicates that the model lacks sufficient capacity to fully capture the patterns in the data.

**Hint:** Use a very small hidden layer (e.g., 2 neurons) to severely limit model capacity.

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Create a model with very limited capacity (only 2 hidden neurons)
# This will cause underfitting since the model can't learn complex patterns
model_underfit = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 2), nn.ReLU(), nn.Linear(2, 10))

optimizer = optim.Adam(model_underfit.parameters(), lr=0.001)

underfit_training_losses, underfit_validation_losses = train_model(
    model_underfit, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot shows training and validation losses tracking closely together
plt.figure(figsize=(8, 5))
plt.plot(underfit_training_losses, label="Training Loss")
plt.plot(underfit_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Underfitting: Training and Validation Losses")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Check that training and validation losses are within 0.05 of each other for most epochs
loss_differences = np.abs(np.array(underfit_training_losses) - np.array(underfit_validation_losses))
proportion_close = np.mean(loss_differences < 0.05)
assert (
    proportion_close > 0.9
), f"Expected 90%+ of epochs to have losses within 0.05, got {proportion_close:.1%}"
assert (
    len(underfit_training_losses) == 50
), f"Expected 50 epochs, got {len(underfit_training_losses)}"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify the model has limited capacity (small hidden layer)
model_params = sum(p.numel() for p in model_underfit.parameters())
assert model_params < 2000, "Model has too many parameters for underfitting demo"
# Verify final losses are still relatively high (underfitting)
assert underfit_training_losses[-1] > 0.5, "Training loss too low for underfitting demo"
assert underfit_validation_losses[-1] > 0.5, "Validation loss too low for underfitting demo"
# END HIDDEN TESTS

### Problem 2: Demonstrating Overfitting

Now demonstrate overfitting with a larger model. During training, the training loss should drop steadily while the validation loss should drop initially and then begin increasing. This divergence between training and validation loss is the hallmark of overfitting.

**Hint:** Use a larger model with more hidden neurons (e.g., 128 neurons per layer).

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Create a larger model that will overfit the training data
model_overfit = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 128),
    nn.ReLU(),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)
optimizer = optim.Adam(model_overfit.parameters(), lr=0.001)

overfit_training_losses, overfit_validation_losses = train_model(
    model_overfit, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot shows training loss decreasing while validation loss increases (overfitting)
plt.figure(figsize=(8, 5))
plt.plot(overfit_training_losses, label="Training Loss")
plt.plot(overfit_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Overfitting: Training and Validation Losses Diverge")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Check for overfitting: training loss should be much lower than validation loss at end
final_gap = overfit_validation_losses[-1] - overfit_training_losses[-1]
assert final_gap > 0.1, f"Expected significant gap between losses, got {final_gap:.3f}"
# Check that validation loss increases in latter half of training
min_val_idx = np.argmin(overfit_validation_losses)
assert min_val_idx < 40, "Validation loss should reach minimum before epoch 40 for overfitting"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify that validation loss at end is higher than at minimum
assert overfit_validation_losses[-1] > min(
    overfit_validation_losses
), "Validation loss should increase after minimum"
# Verify training loss is low (model is memorizing)
assert overfit_training_losses[-1] < 0.2, "Training loss should be low when overfitting"
# END HIDDEN TESTS

### Problem 3: Early Stopping

Early stopping is a regularization technique that stops training when the validation loss begins to increase. It compares the new validation loss to the previous best validation loss and increments a counter every time the loss increases above the best previous value by a certain threshold (`min_delta`). If this counter reaches a given limit (called `patience`), the training stops.

Modify the training function to implement early stopping. Your function should:
1. Track the best validation loss seen so far
2. Increment a counter when the validation loss does not improve by at least `min_delta`
3. Stop training when the counter reaches `patience`

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
def train_model_with_early_stopping(
    model, train_loader, val_loader, optimizer, num_epochs=100, patience=5, min_delta=0.01
):
    """Train a model with early stopping.

    Args:
        model: PyTorch model to train
        train_loader: DataLoader for training data
        val_loader: DataLoader for validation data
        optimizer: PyTorch optimizer
        num_epochs: Maximum number of training epochs
        patience: Number of epochs to wait for improvement
        min_delta: Minimum improvement required to reset patience counter

    Returns:
        Tuple of (training_losses, validation_losses) lists
    """
    best_val_loss = float("inf")
    patience_counter = 0

    validation_losses = []
    training_losses = []

    for epoch in range(num_epochs):
        # Training phase
        model.train()
        train_loss = 0
        for inputs, targets in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        train_loss /= len(train_loader)
        training_losses.append(train_loss)

        # Validation phase
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for inputs, targets in val_loader:
                outputs = model(inputs)
                loss = criterion(outputs, targets)
                val_loss += loss.item()

        val_loss /= len(val_loader)
        validation_losses.append(val_loss)

        # Print training and validation loss
        print(
            f"Epoch {epoch + 1}/{num_epochs}, "
            f"Training Loss: {train_loss:.4f}, "
            f"Validation Loss: {val_loss:.4f}"
        )

        # Check for early stopping
        if val_loss < best_val_loss - min_delta:
            best_val_loss = val_loss
            patience_counter = 0  # Reset the counter if there's improvement
        else:
            patience_counter += 1

        if patience_counter >= patience:
            print(f"Early stopping triggered at epoch {epoch + 1}")
            break

    return training_losses, validation_losses


# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Train a model with early stopping
model_early = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 128),
    nn.ReLU(),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)
optimizer = optim.Adam(model_early.parameters(), lr=0.001)

early_training_losses, early_validation_losses = train_model_with_early_stopping(
    model_early, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot the training curve showing where early stopping occurred
plt.figure(figsize=(8, 5))
plt.plot(early_training_losses, label="Training Loss")
plt.plot(early_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title(f"Early Stopping at Epoch {len(early_training_losses)}")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
import inspect

# Check that early stopping triggered before 50 epochs
assert (
    len(early_training_losses) < 50
), f"Expected early stopping before 50 epochs, ran {len(early_training_losses)}"
assert len(early_training_losses) == len(
    early_validation_losses
), "Training and validation loss lists should have same length"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify the function exists and has correct signature
sig = inspect.signature(train_model_with_early_stopping)
assert "patience" in sig.parameters, "Function should have patience parameter"
assert "min_delta" in sig.parameters, "Function should have min_delta parameter"
# Verify early stopping stopped at a reasonable epoch (not too early)
assert len(early_training_losses) > 5, "Early stopping occurred too soon"
# END HIDDEN TESTS

### Problem 4: Reduced Hidden Dimension

One way to reduce overfitting is to reduce the model capacity by using fewer hidden neurons. Try reducing the size of your model from Problem 2 to avoid overfitting.

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Create a smaller model with fewer hidden neurons (32 instead of 128)
model_smaller = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 32),
    nn.ReLU(),
    nn.Linear(32, 32),
    nn.ReLU(),
    nn.Linear(32, 10),
)
optimizer = optim.Adam(model_smaller.parameters(), lr=0.001)

smaller_training_losses, smaller_validation_losses = train_model(
    model_smaller, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot shows less divergence between training and validation losses
plt.figure(figsize=(8, 5))
plt.plot(smaller_training_losses, label="Training Loss")
plt.plot(smaller_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Reduced Hidden Dimension: Less Overfitting")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Smaller model should overfit less (smaller gap between train and val loss)
smaller_gap = smaller_validation_losses[-1] - smaller_training_losses[-1]
overfit_gap = overfit_validation_losses[-1] - overfit_training_losses[-1]
assert (
    smaller_gap < overfit_gap
), f"Expected smaller gap with reduced model, got {smaller_gap:.3f} vs {overfit_gap:.3f}"
assert len(smaller_training_losses) == 50, f"Expected 50 epochs, got {len(smaller_training_losses)}"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify the model has fewer parameters than the overfitting model
smaller_params = sum(p.numel() for p in model_smaller.parameters())
overfit_params = sum(p.numel() for p in model_overfit.parameters())
assert smaller_params < overfit_params, "Smaller model should have fewer parameters"
# END HIDDEN TESTS

### Problem 5: L2 Regularization

L2 regularization (also called weight decay) penalizes large weights by adding the squared magnitude of weights to the loss function. In PyTorch, this is implemented through the `weight_decay` parameter in the optimizer.

Use L2 regularization when training the model from Problem 2 to reduce overfitting.

**Hint:** Set `weight_decay=0.01` in the Adam optimizer.

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Create the same large model but train with L2 regularization (weight decay)
model_l2 = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 128),
    nn.ReLU(),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)
optimizer = optim.Adam(model_l2.parameters(), lr=0.001, weight_decay=0.01)

l2_training_losses, l2_validation_losses = train_model(
    model_l2, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot shows training and validation losses staying closer together
plt.figure(figsize=(8, 5))
plt.plot(l2_training_losses, label="Training Loss")
plt.plot(l2_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("L2 Regularization: Training and Validation Losses")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# L2 regularization should reduce overfitting (smaller gap between train and val)
l2_gap = l2_validation_losses[-1] - l2_training_losses[-1]
assert (
    l2_gap < overfit_gap
), f"Expected L2 to reduce overfitting gap, got {l2_gap:.3f} vs {overfit_gap:.3f}"
assert len(l2_training_losses) == 50, f"Expected 50 epochs, got {len(l2_training_losses)}"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify training loss is higher than unregularized (regularization adds penalty)
assert (
    l2_training_losses[-1] > overfit_training_losses[-1]
), "L2 training loss should be higher due to regularization"
# Verify validation loss improved or stayed stable
assert l2_validation_losses[-1] < 0.6, "L2 validation loss should be reasonable"
# END HIDDEN TESTS

### Problem 6: Dropout Regularization

Dropout is a regularization technique that randomly sets a fraction of neurons to zero during training, which prevents co-adaptation of neurons and reduces overfitting.

Add dropout to the model from Problem 2 to reduce overfitting.

**Hint:** Use `nn.Dropout(p)` where `p` is the probability of setting a neuron to zero (e.g., `p=0.5` or higher).

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Add dropout after the first hidden layer to regularize
model_dropout = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 128),
    nn.ReLU(),
    nn.Dropout(0.7),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)
optimizer = optim.Adam(model_dropout.parameters(), lr=0.001)

dropout_training_losses, dropout_validation_losses = train_model(
    model_dropout, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot shows dropout preventing overfitting
plt.figure(figsize=(8, 5))
plt.plot(dropout_training_losses, label="Training Loss")
plt.plot(dropout_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Dropout Regularization: Training and Validation Losses")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Dropout should reduce overfitting (smaller gap between train and val)
dropout_gap = dropout_validation_losses[-1] - dropout_training_losses[-1]
assert (
    dropout_gap < overfit_gap
), f"Expected dropout to reduce overfitting gap, got {dropout_gap:.3f} vs {overfit_gap:.3f}"
assert len(dropout_training_losses) == 50, f"Expected 50 epochs, got {len(dropout_training_losses)}"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify training loss is higher than unregularized (dropout makes training harder)
assert (
    dropout_training_losses[-1] > overfit_training_losses[-1]
), "Dropout training loss should be higher during training"
# Verify model contains dropout layers
has_dropout = any(isinstance(m, nn.Dropout) for m in model_dropout.modules())
assert has_dropout, "Model should contain Dropout layer"
# END HIDDEN TESTS

### Problem 7: Combining L2 and Dropout

Use both L2 regularization (weight decay) and dropout together to reduce overfitting.

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Combine both L2 regularization and dropout
model_l2dropout = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 128),
    nn.ReLU(),
    nn.Dropout(0.7),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)

optimizer = optim.Adam(model_l2dropout.parameters(), lr=0.001, weight_decay=0.01)

l2dropout_training_losses, l2dropout_validation_losses = train_model(
    model_l2dropout, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot shows combined regularization effect
plt.figure(figsize=(8, 5))
plt.plot(l2dropout_training_losses, label="Training Loss")
plt.plot(l2dropout_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("L2 + Dropout: Training and Validation Losses")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Combined regularization should reduce overfitting
l2dropout_gap = l2dropout_validation_losses[-1] - l2dropout_training_losses[-1]
assert (
    l2dropout_gap < overfit_gap
), f"Expected combined regularization to reduce gap, got {l2dropout_gap:.3f}"
assert (
    len(l2dropout_training_losses) == 50
), f"Expected 50 epochs, got {len(l2dropout_training_losses)}"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify model contains dropout
has_dropout = any(isinstance(m, nn.Dropout) for m in model_l2dropout.modules())
assert has_dropout, "Model should contain Dropout layer"
# Training loss should be relatively high due to strong regularization
assert l2dropout_training_losses[-1] > 0.4, "Strong regularization should keep training loss higher"
# END HIDDEN TESTS

### Problem 8: Combining Dropout and Early Stopping

Use both dropout and early stopping together to reduce overfitting.

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Combine dropout with early stopping
model_dropoutearly = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 128),
    nn.ReLU(),
    nn.Dropout(0.7),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)

optimizer = optim.Adam(model_dropoutearly.parameters(), lr=0.001)

dropoutearly_training_losses, dropoutearly_validation_losses = train_model_with_early_stopping(
    model_dropoutearly, train_loader, val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot shows dropout + early stopping
plt.figure(figsize=(8, 5))
plt.plot(dropoutearly_training_losses, label="Training Loss")
plt.plot(dropoutearly_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title(f"Dropout + Early Stopping at Epoch {len(dropoutearly_training_losses)}")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Should stop before 50 epochs due to early stopping
assert (
    len(dropoutearly_training_losses) <= 50
), f"Should run at most 50 epochs, ran {len(dropoutearly_training_losses)}"
assert len(dropoutearly_training_losses) == len(
    dropoutearly_validation_losses
), "Training and validation loss lists should have same length"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify model contains dropout
has_dropout = any(isinstance(m, nn.Dropout) for m in model_dropoutearly.modules())
assert has_dropout, "Model should contain Dropout layer"
# Verify some training occurred
assert len(dropoutearly_training_losses) > 3, "Should train for more than 3 epochs"
# END HIDDEN TESTS

### Problem 9: Implicit Regularization

Some regularization occurs implicitly through training hyperparameters. In particular, larger learning rates and smaller batch sizes can act as implicit regularizers by adding noise to the gradient updates.

Modify the learning rate and/or batch size to reduce overfitting. You will need to create a new DataLoader with a different batch size.

**Note:** You should use the same model architecture as in Problem 2 (the overfitting model).

After training, create a plot showing training and validation losses over epochs.

In [None]:
# BEGIN SOLUTION
# Create dataloaders with smaller batch size for implicit regularization
implicit_train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
implicit_val_loader = DataLoader(val_dataset, batch_size=1024, shuffle=False)

# Same architecture as overfitting model
model_implreg = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, 128),
    nn.ReLU(),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)

# Use slightly higher learning rate
optimizer = optim.Adam(model_implreg.parameters(), lr=0.005)

implreg_training_losses, implreg_validation_losses = train_model(
    model_implreg, implicit_train_loader, implicit_val_loader, optimizer, num_epochs=50
)
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Plot implicit regularization results
plt.figure(figsize=(8, 5))
plt.plot(implreg_training_losses, label="Training Loss")
plt.plot(implreg_validation_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Implicit Regularization: Smaller Batch Size + Higher LR")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
assert len(implreg_training_losses) == 50, f"Expected 50 epochs, got {len(implreg_training_losses)}"
assert len(implreg_training_losses) == len(
    implreg_validation_losses
), "Training and validation loss lists should have same length"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify training occurred
assert implreg_training_losses[-1] < implreg_training_losses[0], "Training loss should decrease"
# Verify reasonable validation loss
assert implreg_validation_losses[-1] < 1.0, "Validation loss should be reasonable"
# END HIDDEN TESTS

### Problem 10: Model Comparison and Analysis

Run the code below to print the validation and test loss for each model.

Then create a single figure with 3 subplots showing the confusion matrices for the three models with the best test loss. Use seaborn's `heatmap` function to visualize the confusion matrices.

Finally, write a comment explaining which model you would choose for deployment and why.

In [None]:
# List of all models and their names
models = [
    model_underfit,
    model_overfit,
    model_early,
    model_smaller,
    model_l2,
    model_dropout,
    model_l2dropout,
    model_dropoutearly,
    model_implreg,
]
model_names = [
    "Underfit",
    "Overfit",
    "Early Stopping",
    "Smaller Model",
    "L2 Regularization",
    "Dropout",
    "L2 and Dropout",
    "Dropout and Early Stopping",
    "Implicit Regularization",
]


def get_loss(model, loader):
    """Compute average loss for a model on a data loader."""
    model.eval()
    loss = 0
    with torch.no_grad():
        for batch_x, batch_y in loader:
            outputs = model(batch_x)
            batch_loss = criterion(outputs, batch_y)
            loss += batch_loss.item()
    return loss / len(loader)


# Print validation and test losses for all models
test_losses = []
for model, model_name in zip(models, model_names):
    val = get_loss(model, val_loader)
    test = get_loss(model, test_loader)
    test_losses.append(test)
    print(f"{model_name:>25}: validation={val:.4f} test={test:.4f}")

In [None]:
# BEGIN SOLUTION
# Get indices of three models with lowest test loss
best_indices = np.argsort(test_losses)[:3]
best_models = [models[idx] for idx in best_indices]
best_model_names = [model_names[idx] for idx in best_indices]

# Create confusion matrix subplots for the three best models
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for idx, (model, model_name) in enumerate(zip(best_models, best_model_names)):
    model.eval()
    predictions = []
    true_labels = []
    with torch.no_grad():
        for batch_x, batch_y in test_loader:
            outputs = model(batch_x)
            _, predicted = torch.max(outputs, 1)
            predictions.extend(predicted.tolist())
            true_labels.extend(batch_y.tolist())

    conf_matrix = confusion_matrix(true_labels, predictions)
    sns.heatmap(conf_matrix, annot=True, fmt="d", ax=axes[idx])
    axes[idx].set_title(f"Confusion Matrix: {model_name}")
    axes[idx].set_xlabel("Predicted Labels")
    axes[idx].set_ylabel("True Labels")

plt.tight_layout()
plt.show()
# END SOLUTION

Which model would you choose for deployment and why?

> BEGIN SOLUTION

I would choose the early stopping model for deployment. It achieves the lowest test loss among all models, indicating the best generalization to unseen data. Early stopping effectively prevents overfitting by halting training at the optimal point before the model starts memorizing the training data. Additionally, it does not require architectural changes (like dropout) or hyperparameter tuning for regularization strength (like L2 weight decay), making it simpler to implement and maintain. The confusion matrix shows good performance across most classes, with the expected difficulty distinguishing similar clothing items like shirts and T-shirts.
> END SOLUTION


In [None]:
# Test assertions
# Verify that best_indices contains 3 models
assert len(best_indices) == 3, f"Expected 3 best models, got {len(best_indices)}"
# Verify that test_losses list has the right length
assert len(test_losses) == 9, f"Expected 9 test losses, got {len(test_losses)}"
print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify all test losses are positive
assert all(loss > 0 for loss in test_losses), "All test losses should be positive"
# Verify best models are sorted correctly
best_test_losses = [test_losses[idx] for idx in best_indices]
assert best_test_losses == sorted(best_test_losses), "Best models should be sorted by test loss"
# END HIDDEN TESTS