<a href="https://colab.research.google.com/github/iamrishigandhi/4301-Lab-1/blob/main/Lab2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Computer Vision Lab 2

Welcome friends, it's time for Deep Learning with PyTorch! This homework might need a longer running time.
Keep this in mind and start early.

PyTorch is a deep learning framework for fast, flexible experimentation. We are going to use it to train our classifiers.

For this homework you need to turn in this file `Lab2.ipynb` after running your results and answering questions in-line.

**Notes**:
 - This assignment was designed to be used with Google Colab, but feel free to set up your own environment if you wish. Just bear in mind that we cannot provide support for custom environments.
 - Feel free to create new cells as needed, but please **do not delete existing cells**.

Before you get started, we suggest you do the [PyTorch tutorial first](https://github.com/param087/Pytorch-tutorial-on-Google-colab).

You should at least do the 60 Minute Blitz up until "Training a Classifier".

**How to use this notebook:**
 - Each cell with a grey background is executable.
 - They can be executed by pressing the "Play" button or by hitting `Shift+Enter`
 - Cells can be executed out of order.
 - You can add new cells by clicking on the `+ Code` button in the header.
 - Made a mistake a need to start over? Click *(Runtime => Restart runtime)*
 - Check out this [Colab Introduction](https://colab.research.google.com/notebooks/intro.ipynb#scrollTo=5fCEDCU_qrC0) if you're having trouble.


 This will make sure that your progress will be saved to your Google Drive, and won't be lost if your browser refreshes for some reason.

## Setup

This will set up the environment without GPUs. This is the recommended setup.

In [1]:
! pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
! pip install tqdm matplotlib

Looking in links: https://download.pytorch.org/whl/torch_stable.html
[31mERROR: Could not find a version that satisfies the requirement torch==1.5.0+cpu (from versions: 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1, 1.12.1, 1.12.1+cpu, 1.12.1+cu102, 1.12.1+cu113, 1.12.1+cu116, 1.12.1+rocm5.0, 1.12.1+rocm5.1.1, 1.13.0, 1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2, 2.0.0, 2.0.0+cpu, 2.0.0+cpu.cxx11.abi, 2.0.0+cu117, 2.0.0+cu117.with.pypi.cudnn, 2.0.0+cu118, 2.0.0+rocm5.3, 2.0.0+rocm5.4.2, 2.0.1, 2.0.1+cpu, 2.0.1+cpu.cxx11.abi, 2.0.1+cu117, 2.0.1+cu117.with.pypi.cudnn, 2.0.1+cu118, 2.0.1+rocm5.3, 2.0.1+rocm5.4.2, 2.1.0, 2.1.0+cpu, 2.1.0+cpu.cxx11.abi, 2.1.0+cu118, 2.1.0+cu121,

In [None]:
# We're not using the GPU.
use_gpu = False

### With GPUs
If you're feeling adventurous you can use GPUs to accelerate training. Follow the following steps. Just note that GPUs might not be available. The course staff also can't provide support for GPU-related issues so if you're having trouble please just use the CPU runtime.

 1. Go to Runtime > Change runtime type and select 'GPU'
 2. Restart the Runtime, uncomment the commands below and run them.

In [None]:
# Install the necessary packages

# ! pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
# ! pip install tqdm matplotlib

If you want to use the GPU, uncomment the line below and run it.

In [None]:
# Uncomment this and execute if you're using the GPU.
# use_gpu = True

### Check that things are working.

In [None]:
# Make sure things work.

import torch

if use_gpu:
    print(torch.zeros(10).cuda())
else:
    print(torch.zeros(10))

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])


 ## Initialize Datasets

 This code defines the data loaders that will be used to train and test our networks. It also defines data augmentation functions.

In [None]:
import torch
import torchvision
from torchvision import transforms

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

default_train_transform = transforms.Compose([
    transforms.ToTensor(),
    # Normalize rescales and shifts the data so that it has a zero mean
    # and unit variance. This reduces bias and makes it easier to learn!
    # The values here are the mean and variance of our inputs.
    # This will change the input images to be centered at 0 and be
    # between -1 and 1.
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

default_test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])


def get_train_loader(batch_size, transform=default_train_transform):
    trainset = torchvision.datasets.CIFAR10(
        root='./data',
        train=True,
        download=True,
        transform=transform)
    return torch.utils.data.DataLoader(
        trainset, batch_size=batch_size, shuffle=True, num_workers=4)


def get_test_loader(batch_size, transform=default_test_transform):
    testset = torchvision.datasets.CIFAR10(
        root='./data',
        train=False,
        download=True,
        transform=transform)
    return torch.utils.data.DataLoader(
        testset, batch_size=batch_size, shuffle=False, num_workers=4)


# This downloads the datasets.
get_train_loader(1)
get_test_loader(1);

Files already downloaded and verified
Files already downloaded and verified


## Define code that trains and tests code.

This code will train your model. Feel free to read the code below, but we suggest you don't modify it unless you know what you're doing.

In [None]:
from matplotlib import pyplot as plt
from tqdm.auto import tqdm

# The function we'll call to train the network each epoch
def train(net, loader, optimizer, criterion, epoch, use_gpu=False):
    running_loss = 0.0
    total_loss = 0.0

    # Send the network to the correct device
    if use_gpu:
        net = net.cuda()
    else:
        net = net.cpu()

    # tqdm is a useful package for adding a progress bar to your loops
    pbar = tqdm(loader)
    for i, data in enumerate(pbar):
        inputs, labels = data

        # If we're using the GPU, send the data to the GPU
        if use_gpu:
            inputs, labels = inputs.cuda(), labels.cuda()

        optimizer.zero_grad()  # Set the gradients of the parameters to zero.
        outputs = net(inputs)  # Forward pass (send the images through the network)
        loss = criterion(outputs, labels)  # Compute the loss w.r.t the labels.
        loss.backward()  # Backward pass (compute gradients).
        optimizer.step()  # Use the gradients to update the weights of the network.

        running_loss += loss.item()
        total_loss += loss.item()
        pbar.set_description(f"[epoch {epoch+1}] loss = {running_loss/(i+1):.03f}")

    average_loss = total_loss / (i + 1)
    tqdm.write(f"Epoch {epoch} summary -- loss = {average_loss:.03f}")

    return average_loss


This code will evaluate the performance of you network. It won't update the weights, just compute from evaluation metrics.

In [None]:
from collections import defaultdict
from torchvision.utils import make_grid
from PIL import Image
from IPython import display as ipd


def show_hard_negatives(hard_negatives, label, nrow=10):
    """Visualizes hard negatives"""
    grid = make_grid([(im+1)/2 for im, score in hard_negatives[label]],
                     nrow=nrow, padding=1)
    grid = grid.permute(1, 2, 0).mul(255).byte().numpy()
    ipd.display(Image.fromarray((grid)))


# The function we'll call to test the network
def test(net, loader, tag='', use_gpu=False, num_hard_negatives=10):
    correct = 0
    total = 0

    # Send the network to the correct device
    net = net.cuda() if use_gpu else net.cpu()

    # Compute the overall accuracy of the network
    with torch.no_grad():
        for data in tqdm(loader, desc=f"Evaluating {tag}"):
            images, labels = data

            # If we're using the GPU, send the data to the GPU
            if use_gpu:
                images = images.cuda()
                labels = labels.cuda()

            # Forward pass (send the images through the network)
            outputs = net(images)

            # Take the output of the network, and extract the index
            # of the largest prediction for each example
            _, predicted = torch.max(outputs.data, 1)

            # Count the number of correct predictions
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    average_accuracy = correct/total
    tqdm.write(f'{tag} accuracy of the network: {100*average_accuracy:.02f}%')

    # Repeat above, but estimate the testing accuracy for each of the labels
    class_correct = list(0. for i in range(10))
    class_total = list(0. for i in range(10))
    hard_negatives = defaultdict(list)
    with torch.no_grad():
        for data in loader:
            images, labels = data
            if use_gpu:
                images = images.cuda()
                labels = labels.cuda()
            outputs = net(images)
            predicted_scores, predicted_labels = torch.max(outputs, 1)
            correct_mask = (predicted_labels == labels).squeeze()
            incorrect_mask = ~correct_mask
            unique_labels, unique_counts = torch.unique(labels, return_counts=True)
            for l, c in zip(unique_labels, unique_counts):
                l = l.item()
                label_mask = (labels == l)
                predicted_mask = (predicted_labels == l)
                # This keeps track of the most hardest negatives
                # i.e. mistakes with the highest confidence.
                hard_negative_mask = (~correct_mask & predicted_mask)
                if hard_negative_mask.sum() > 0:
                    hard_negatives[l].extend([
                        (im, score.item())
                        for im, score in zip(images[hard_negative_mask],
                                             predicted_scores[hard_negative_mask])])
                    hard_negatives[l].sort(key=lambda x: x[1], reverse=True)
                    hard_negatives[l] = hard_negatives[l][:num_hard_negatives]
                class_correct[l] += (correct_mask & label_mask).sum()
                class_total[l] += c


    for i in range(10):
        tqdm.write(f'{tag} accuracy of {classes[i]} = {100*class_correct[i]/class_total[i]:.02f}%')
        if len(hard_negatives[i]) > 0:
            print(f'Hard negatives for {classes[i]}')
            show_hard_negatives(hard_negatives, i, nrow=10)
        else:
            print("There were no hard negatives--perhaps the model got 0% accuracy?")


    return average_accuracy

This is a wrapper function we provide that handles all the book keeping. It will train your network for an epoch and then test it every couple epochs.

In [None]:
def train_network(net,
                  lr,
                  epochs,
                  batch_size,
                  criterion=None,
                  lr_func=None,
                  train_transform=default_train_transform,
                  eval_interval=10,
                  use_gpu=use_gpu):
    # Initialize the optimizer
    # You can change this if you want!
    optimizer = optim.Adam(net.parameters(), lr=lr)
    # optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9)

    # Initialize the loss function
    if criterion is None:
        # Note that CrossEntropyLoss has the Softmax built in!
        # This is good for numerical stability.
        # Read: https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss
        criterion = nn.CrossEntropyLoss()

    # Initialize the data loaders
    train_loader = get_train_loader(batch_size, transform=train_transform)
    test_loader = get_test_loader(batch_size)

    train_losses = []
    train_accuracies = []
    test_accuracies = []

    for epoch in range(epochs):  # loop over the dataset multiple times
        if lr_func is not None:
            lr_func(optimizer, epoch, lr)

        train_loss = train(net, train_loader, optimizer, criterion, epoch, use_gpu=use_gpu)
        train_losses.append(train_loss)

        # Evaluate the model every `eval_interval` epochs.
        if (epoch + 1) % eval_interval == 0:
            print(f"Evaluating epoch {epoch+1}")
            train_accuracy = test(net, train_loader, 'Train', use_gpu=use_gpu)
            test_accuracy = test(net, test_loader, 'Test', use_gpu=use_gpu)
            train_accuracies.append(train_accuracy)
            test_accuracies.append(test_accuracy)

    return train_losses, train_accuracies, test_accuracies


# A function to plot the losses over time
def plot_results(train_losses, train_accuracies, test_accuracies):
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    axes[0].plot(train_losses)
    axes[0].set_title('Training Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')

    axes[1].plot(train_accuracies)
    axes[1].set_title('Training Accuracy')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy')

    axes[2].plot(test_accuracies)
    axes[2].set_title('Testing Accuracy')
    axes[2].set_xlabel('Epoch')
    axes[2].set_ylabel('Accuracy')



## 2.1. Training a classifier using only one fully connected Layer

Implement a model to classify the images from Cifar-10 into ten categories using just one fully connected layer (remember that fully connected layers are called Linear in PyTorch).

If you are new to PyTorch you may want to check out the tutorial on MNIST [here](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py).

Fill in the code for LazyNet here.

**Hints:**
 - Note that `nn.CrossEntropyLoss` has the Softmax built in for numerical stability. This means that the output layer of your network should be linear and not contain a Softmax. You can read more about it [here](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss)
 - You can use the `view()` function to flatten your input image to a vector e.g., if `x` is a `(100,3,4,4)` tensor then `x.view(-1, 3*4*4)` will flatten it into a vector of size `48`.
 - The images in MNIST are 32x32.

In [2]:
import torch
from torch import nn
from torch import optim

class LazyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(32 * 32 * 3, 10)  # Flatten image (32x32x3) to vector and map to 10 classes

    def forward(self, x):
        # Flatten the input image
        x = x.view(-1, 32 * 32 * 3)
        # Forward pass through the fully connected layer
        x = self.fc(x)
        return x

net = LazyNet()


#### Run the model for 15 epochs and report the plots and accuracies.

In [None]:
train_losses, train_accuracies, test_accuracies = train_network(
    net,
    criterion=nn.CrossEntropyLoss(),
    lr=0.01,
    epochs=15,
    eval_interval=5,
    batch_size=1024)

In [None]:
plot_results(train_losses, train_accuracies, test_accuracies)

## 2.2. Training a classifier using multiple fully connected layers ##

Implement a model for the same classification task using multiple fully connected layers.

Start with a fully connected layer that maps the data from image size (32 * 32 * 3) to a vector of size 120, followed by another fully connected that reduces the size to 84 and finally a layer that maps the vector of size 84 to 10 classes.

Use any activation you want.

Fill in the code for BoringNet below.

In [None]:
import torch.nn as nn

class BoringNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(32 * 32 * 3, 120)  # First fully connected layer
        self.fc2 = nn.Linear(120, 84)           # Second fully connected layer
        self.fc3 = nn.Linear(84, 10)            # Final fully connected layer for classification
        self.activation = nn.ReLU()             # ReLU activation function

    def forward(self, x):
        # Flatten the input image
        x = x.view(-1, 32 * 32 * 3)
        # Pass through the first fully connected layer followed by activation
        x = self.activation(self.fc1(x))
        # Pass through the second fully connected layer followed by activation
        x = self.activation(self.fc2(x))
        # Pass through the final fully connected layer (no activation needed)
        x = self.fc3(x)
        return x

net = BoringNet()


### Run the model for 30 epochs and report the plots and accuracies.

In [None]:
train_losses, train_accuracies, test_accuracies = train_network(
    net,
    criterion=nn.CrossEntropyLoss(),
    lr=0.01,
    epochs=30,
    batch_size=1024)

In [None]:
plot_results(train_losses, train_accuracies, test_accuracies)

### Question

Try training this model with and without activations. How does the activations (such as ReLU) affect the training process and why?


[link text](https:// [link text](https://))*Activations, such as ReLU, greatly influence the training process and model performance. With activations, like ReLU, the model can learn complex, nonlinear relationships in the data, aiding in better convergence and higher accuracy. ReLU introduces non-linearities crucial for capturing intricate patterns, preventing the vanishing gradient problem, and facilitating efficient backpropagation. Without activations, the model reduces to a linear function composition, limiting its capacity to learn complex patterns efficiently. This leads to slower convergence, poorer generalization, and a higher risk of the vanishing gradient problem, particularly in deep networks. *


## 2.3. Training a classifier using convolutions ##

Implement a model using convolutional, pooling and fully connected layers.

You are free to choose any parameters for these layers (we would like you to play around with some values).

Fill in the code for CoolNet below. Explain why you have chosen these layers and how they affected the performance. Analyze the behavior of your model.

In [None]:
import torch.nn as nn

class CoolNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # Max pooling layers
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # Fully connected layers
        self.fc1 = nn.Linear(32 * 8 * 8, 128)  # Calculated after two max-pooling layers
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)  # Output size is 10 for 10 classes in CIFAR-10
        # Activation function
        self.relu = nn.ReLU()

    def forward(self, x):
        # Convolutional layers with ReLU activation and max pooling
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        # Flatten before fully connected layers
        x = x.view(-1, 32 * 8 * 8)
        # Fully connected layers with ReLU activation
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        # Final output layer (no activation needed)
        x = self.fc3(x)
        return x

net = CoolNet()


### Run the model for 30 epochs and report the plots and accuracies

In [None]:
net = CoolNet()
train_losses, train_accuracies, test_accuracies = train_network(
    net,
    criterion=nn.CrossEntropyLoss(),
    lr=0.01,
    epochs=30,
    batch_size=1024)

In [None]:
plot_results(train_losses, train_accuracies, test_accuracies)

### 2.3.1. How does batch size affect training?

Try using three different values for batch size. How do these values affect training and why?

In [None]:
import torch
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import torchvision.datasets as datasets

# Define transforms and load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

# Initialize CoolNet
net = CoolNet()

# Define different batch sizes
batch_sizes = [64, 128, 256]

for batch_size in batch_sizes:
    # Define data loaders with the current batch size
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    # Train the model
    train_network(net, train_loader, test_loader, epochs=10)


### 2.3.2. How does learning rate work?

When you are trying to train a neural network it is really hard to choose a proper learning rate.

Try to train your model with different learning rates and plot the training accuracy, test accuracy and loss and compare the training progress for learning rates = 10, 0.1, 0.01, 0.0001.

Analyze the results and choose the best one. Why did you choose this value?



In [None]:
import torch.optim as optim
import matplotlib.pyplot as plt

# Define learning rates
learning_rates = [10, 0.1, 0.01, 0.0001]

# Lists to store training progress
train_losses_list = []
train_accuracies_list = []
test_accuracies_list = []

for lr in learning_rates:
    # Initialize CoolNet
    net = CoolNet()

    # Define optimizer with current learning rate
    optimizer = optim.SGD(net.parameters(), lr=lr)

    # Train the model
    train_losses, train_accuracies, test_accuracies = train_network(net, optimizer, train_loader, test_loader, epochs=10)

    # Store training progress
    train_losses_list.append(train_losses)
    train_accuracies_list.append(train_accuracies)
    test_accuracies_list.append(test_accuracies)

# Plot training progress for each learning rate
plt.figure(figsize=(15, 10))

# Plot training and test losses
plt.subplot(2, 1, 1)
for i, lr in enumerate(learning_rates):
    plt.plot(train_losses_list[i], label=f'LR={lr}')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.legend()

# Plot training and test accuracies
plt.subplot(2, 1, 2)
for i, lr in enumerate(learning_rates):
    plt.plot(train_accuracies_list[i], label=f'Train LR={lr}')
    plt.plot(test_accuracies_list[i], label=f'Test LR={lr}')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Training and Test Accuracy')
plt.legend()

plt.tight_layout()
plt.show()


**Question**:
Analyze the results and choose one value to use. Why did you choose this value?

After analyzing the training progress with different learning rates, I choose a learning rate of 0.01. This choice is based on several factors. Firstly, a learning rate of 0.01 resulted in stable training, with the loss decreasing gradually without oscillations or divergences. This stability ensures consistent improvement in model performance over epochs. Secondly, the model trained with a learning rate of 0.01 exhibited steady convergence, with both training and test accuracies improving consistently. This indicates effective learning and generalization capabilities of the model. Additionally, a learning rate of 0.01 strikes a good balance between stability and convergence speed. It allows the model to learn efficiently without risking divergence or slow convergence, making it suitable for training the CoolNet model on the CIFAR-10 dataset.

### 2.3.3. Learning rate scheduling
During training it is often useful to reduce learning rate as the training progresses.

Fill in `set_learning_rate` below to scale the learning rate by 0.1 (reduce by 90%) every 30 epochs and observe the behavior of network for 90 epochs.


In [None]:
def set_learning_rate(optimizer, epoch, base_lr):
    lr = base_lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr


In [None]:
net = CoolNet()
train_losses, train_accuracies, test_accuracies = train_network(
    net,
    lr_func=set_learning_rate,
    criterion=nn.CrossEntropyLoss(),
    lr=0.01,
    epochs=90,
    batch_size=1024)

**Question**:
What do you observe? Why do you think it is useful to decrease the learning rate over time?


Observation:
As the training progresses with a learning rate scheduling strategy of decreasing the learning rate by 90% every 30 epochs, several observations can be made:

Initially, the model makes large updates to its parameters due to the higher learning rate.
As the learning rate decreases over time, the updates become smaller, allowing the model to fine-tune its parameters more delicately.
This gradual reduction in learning rate helps prevent overshooting the optimal parameter values and allows the model to converge more smoothly towards the global minimum of the loss function.
Importance of Decreasing Learning Rate:
Decreasing the learning rate over time is crucial for several reasons:

Stability: It helps stabilize the training process by preventing the model from making overly large updates to its parameters, which can cause instability or divergence.
Convergence: Gradually reducing the learning rate allows the optimization process to settle into narrower and deeper areas of the loss landscape, facilitating convergence towards a more optimal solution.
Fine-tuning: Lowering the learning rate enables the model to fine-tune its parameters more precisely, resulting in improved generalization performance on unseen data.

### 2.3.4. Data Augmentation

Most of the popular computer vision datasets have tens of thousands of images.

Cifar-10 is a dataset of 60000 32x32 colour images in 10 classes, which can be relatively small in compare to ImageNet which has 1M images.

The more the number of parameters is, the more likely our model is to overfit to the small dataset.
As you might have already faced this issue while training the CoolNet, after some iterations the training accuracy reaches its maximum (saturates) while the test accuracy is still relatively low.

To solve this problem, we use the data augmentation to help the network avoid overfitting.

Add data transformations in to the class below and compare the results. You are free to use any type and any number of data augmentation techniques.

Just be aware that data augmentation should just happen during training phase.

In [None]:
import torchvision.transforms as transforms

# Define data augmentation transformations for training data
train_transform = transforms.Compose([
    # Randomly apply horizontal flips with a probability of 0.5
    transforms.RandomHorizontalFlip(),
    # Randomly apply vertical flips with a probability of 0.5
    transforms.RandomVerticalFlip(),
    # Randomly rotate the image by a maximum of 10 degrees
    transforms.RandomRotation(10),
    # Randomly adjust brightness and contrast
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0, hue=0),
    # Randomly apply affine transformations such as scaling, shearing, and translation
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=0.1),
    # Convert the image to a PyTorch tensor
    transforms.ToTensor(),
    # Normalize the image data
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])


In [None]:
net = CoolNet()
train_losses, train_accuracies, test_accuracies = train_network(
    net,
    criterion=nn.CrossEntropyLoss(),
    train_transforms=train_transform,
    lr=0.01,
    epochs=30,
    batch_size=1024)

In [None]:
plot_results(train_losses, train_accuracies, test_accuracies)

**Question**: How does the model trained with data augmentation compared to the model trained without?

Training the model with data augmentation often yields superior generalization compared to training without it. Without data augmentation, the model may overfit to the training data, memorizing specific patterns rather than learning generalizable features. This leads to high training accuracy but poor performance on unseen data. Data augmentation introduces variations to the training data, such as rotations, flips, and crops, diversifying the training examples. By exposing the model to a broader range of variations, data augmentation encourages the model to learn more robust and invariant representations. Consequently, the model becomes better equipped to handle variations in real-world data, resulting in improved performance on unseen examples. Data augmentation acts as a form of regularization, helping prevent overfitting by providing a more comprehensive and diverse training dataset.

### 2.3.5. Change the loss function

Try Mean Squared Error loss instead of Cross Entropy.

In [None]:
class MSELossClassification(nn.Module):
  def forward(self, output, labels):
    one_hot_encoded_labels = \
      torch.nn.functional.one_hot(labels, num_classes=output.shape[1]).float()
    return nn.functional.mse_loss(output, one_hot_encoded_labels)

net = CoolNet()
train_losses, train_accuracies, test_accuracies = train_network(
    net,
    criterion=MSELossClassification(),
    lr=0.01,
    epochs=50,
    batch_size=1024)

In [None]:
plot_results(train_losses, train_accuracies, test_accuracies)

**Question**:
How does this affects the results? Explain why you think this is happening.

Replacing Cross Entropy loss with Mean Squared Error (MSE) for classification tasks often leads to inferior results. MSE loss, designed for regression tasks, penalizes deviations between predicted probabilities and one-hot encoded target labels, which may not align well with the classification task's objective. This mismatch can hinder the model's ability to learn meaningful representations and correlations between features and target classes. Additionally, MSE loss doesn't account for the probabilistic nature of classification, leading to less calibrated probability estimates and suboptimal convergence. Cross Entropy loss, tailored for classification, encourages the model to produce well-calibrated probabilities, making it more suitable for classification tasks and yielding better performance overall.

# New Section

## Turning In

You're done! You just need to turn in the notebook file.

Go to `File > Download .ipynb` and download the file as `lab2.ipynb`. Turn in only this file.

Make sure that you've answered all questions and all plots are correct.