# Lab 6: Optimization Methods in PyTorch

The goal of this lab is to improve the performance of a deep learning model by implementing various regularization and normalization techniques in PyTorch.

**What You'll Do:**
- Apply **L1/L2 weight decay**.
- Implement **Dropout**.
- Implement **Normalization (BatchNorm, LayerNorm, etc.)**.
- Use **Early Stopping**.
- Experiment with **Data Augmentation (CutMix, Mixup)**.

You'll be given challenges where you must use the **PyTorch documentation** to complete missing parts!

## Part 1: Imports

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np

torch.manual_seed(24)  # For reproducibility

## Part 2: Load the Dataset
We'll use CIFAR-10, a small image classification dataset.

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
val_dataset = datasets.CIFAR10(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

## Part 3. Define a Simple CNN
We'll start with a basic CNN model and optimize it throughout the challenge.

In [None]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 16 * 16, 10)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

model = SimpleCNN()

## Part 5: Apply L2 Regularization (Weight Decay)
Modify the optimizer to use **L2 regularization**. Check out the PyTorch documentation for the Adam optimizer [here](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html).

In [None]:
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=.0001)

## Part 6: Add Dropout to the Model
Modify the **SimpleCNN** class you created above to include **Dropout layers**. Check out the documentation on dropout layers [here](https://pytorch.org/docs/stable/nn.html#dropout-layers).

In [None]:
class DropoutCNN(nn.Module):
    def __init__(self):
        super(DropoutCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout = nn.Dropout(p=.2)
        self.fc1 = nn.Linear(32 * 16 * 16, 10)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.dropout(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

model = DropoutCNN()

## Part 7: Add Batch Normalization
Modify the CNN to include **BatchNorm** after each convolutional layer.

**Extra challenge**: Try to find the documentation for this on your own!

In [None]:
class BatchNormCNN(nn.Module):
    def __init__(self):
        super(BatchNormCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.batchnorm1 = nn.BatchNorm2d(32)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 16 * 16, 10)

    def forward(self, x):
        x = self.pool(self.relu(self.batchnorm1(self.conv1(x))))
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

model = BatchNormCNN()

## Part 8: Implement Early Stopping
**Challenge 1**: Write the training code into the loop

**Challenge 2**: Modify the loop to stop if validation loss doesn't improve after **N epochs**.

In [None]:
best_loss = float('inf')
patience = 5  # Adjust as needed
counter = 0

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(10):

    model.train()
    train_loss = 0.0

    for batch in train_loader:
        optimizer.zero_grad()
        inputs, targets = batch
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    train_loss /= len(train_loader)  # Compute average training loss

    model.eval()
    val_loss = 0.0

    with torch.no_grad():
        for batch in val_loader:
            inputs, targets = batch
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            val_loss += loss.item()

    val_loss /= len(val_loader)  # Compute average validation loss
    print(f"Epoch {epoch+1}: Train Loss = {train_loss:.4f}, Validation Loss = {val_loss:.4f}")

    if val_loss < best_loss:
        best_loss = val_loss
        counter = 0
    else:
        counter += 1
        if counter >= patience:
            print('Early stopping triggered!')
            break

## Part 9: Data Augmentation

**Here's the big challenge for today**: Implement both ***CutMix*** and ***MixUp*** to your data!

1.   Augment your data with both CutMix and MixUp. In other words, randomly select images and apply one or the other method. Do not apply both methods to the same image.
2.   Use your new augmented data to train a simple CNN with Dropout, BatchNorm, and Early Stopping.



In [None]:
from torchvision.transforms import v2

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

cutmix = v2.CutMix(num_classes=10)
mixup = v2.MixUp(num_classes=10)
cutmix_or_mixup = v2.RandomChoice([cutmix, mixup])

# Initialize for early stopping
best_val_loss = float('inf')
patience = 5  # Number of epochs to wait for improvement
counter = 0

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(10):

    model.train()
    train_loss = 0.0

    for images, labels in train_loader:
        images, labels = cutmix_or_mixup(images, labels)  # Apply CutMix or MixUp

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    train_loss /= len(train_loader)  # Average training loss

    # Validation phase
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for images, labels in val_loader:
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()

    val_loss /= len(val_loader)  # Average validation loss

    print(f"Epoch {epoch + 1}: Train Loss = {train_loss:.4f}, Validation Loss = {val_loss:.4f}")

    # Early stopping check
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        counter = 0  # Reset counter if validation loss improves
    else:
        counter += 1
        if counter >= patience:
            print("Early stopping triggered!")
            break  # Stop training

**Say you want to only apply it to a proportion of your data.**
For each batch, you could generate a random number r (between 0 and 1) such that:

* If `r < mixup_prob`, MixUp is applied.
* If `r < mixup_prob + cutmix_prob`, CutMix is applied.
* Otherwise, no augmentation is applied.