**Important: We highly recommend that you download and then upload this notebook to Google Colab on Google Drive. That way, you can take advantage of GPU hardware to accelerate training. You can submit the notebook file with outputs shown directly to Gradescope.**

In [5]:
from torch.utils.data import DataLoader, random_split
from torch.utils.data.dataset import TensorDataset
from collections import defaultdict
from tqdm import tqdm

import matplotlib.pyplot as plt
import torch.nn.functional as F
import torch.optim as optim
import torch.nn as nn
import numpy as np
import random

import warnings
import torch
import os

warnings.filterwarnings("ignore")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device used: {device}")

In [6]:
def display_data_loader_images(
    dataset: TensorDataset
) -> None:
    """
        Display 10 random images with this function. If you want to
        display a different set of images, simply call this
        function again on the same DataLoader object.

        Args
            dataset: a TensorDataset object containing
                Imagenette images from Problem Set 6.
    """

    label_map = {
        0: "cassette",
        1: "chainsaw",
        2: "church",
        3: "english springer",
        4: "french horn",
        5: "garbage truck",
        6: "gas pump",
        7: "golf ball",
        8: "parachute",
        9: "tench",
    }

    assert(type(dataset) == TensorDataset)
    random_images = [dataset[i] for i in random.sample(range(len(dataset)), 10)]
    fig, axes = plt.subplots(2, 5, figsize=(10, 4))
    for i in range(10):
        ax = axes[i // 5, i % 5]
        image, label = random_images[i]
        ax.imshow(np.moveaxis(image.numpy().squeeze(), 0, -1))
        ax.axis("off")
        ax.set_title(f"{label_map[label.item()]}")

    plt.show()

# Image Classification

For this last assignment, you will use PyTorch to implement a convolutional neural network image classifier. You will walk through the steps of setting up data, implementing training and validation code, specifying parameters, and learning and evaluating a model.

First, we will need to import the Imagenette data. In order to run it, download both `imagenette_train.pt` and `imagenett_test.pt` from the shared Google Drive [link](https://drive.google.com/drive/folders/1K8wD1dhGJ2ULG8KoBjDs6uzh5UwmfNw-?usp=sharing). Then, when you have saved them to a directory in your own Google Drive, run the function `import_imagenette_data()`, calling `/content/drive/MyDrive/<DIR_NAME>/` as a parameter.

In [7]:
def import_imagenette_data(
    dir: str = "/content/drive/MyDrive/"
):

    from google.colab import drive
    drive.mount('/content/drive')
    print("Downloading Imagenette Data...", end=" ")
    train_data = torch.load(os.path.join(dir, "imagenette_train.pt"))
    test_data = torch.load(os.path.join(dir, "imagenette_test.pt"))
    print("Download Complete!")

    return train_data, test_data

In [9]:
BASE_DIR = "/content/drive/MyDrive/"
DIR_NAME = "ai-hw-5/"
FULL_DIR = os.path.join(BASE_DIR, DIR_NAME)
all_train_data, test_data = import_imagenette_data(FULL_DIR)

If you'd like to look at any of the images, you can use the provided `display_data_loader_images` function to do so, as demonstrated below.

In [10]:
display_data_loader_images(all_train_data) # insert your dataset of interest here

# Task 1: `CNN` Class (8 points)

Our first task will be to write a class that stores a convolutional neural network model. Write the `__init__()` function. It should define two `nn.Sequential` attributes: `self.conv` and `self.fc`. `self.conv` should be a convolutional sequence; `self.fc` should be a simple multi-layer perceptron.

* The first layer of `self.conv` is a 2D convolutional layer that takes the specified number of in-channels, 16 out-channels, a kernel size of 8, and a stride length of 4. Follow this with a rectified linear unit, and then a 2D max pooling layer with kernel size 2 and stride length 2. Finally, repeat these three layers, but this time change the convolutional layer to have 16 in-channels, 32 out-channels, a kernel size of 4, and a stride length of 2.

* `self.fc` should consist of a flattening layer to flatten the output of `self.conv`. Then add a linear layer, a rectified linear unit, a linear layer, a rectified linear unit, and one final linear layer. The input dimension of `self.fc` should be the same as the output dimension of `self.conv`. The output dimension of `self.fc` should be the number of classes. For all other intermediate input/output dimensions, fix them to any number you'd like (we recommend anything above 256).

* Be sure to call `super()` so that `CNN` can access methods form `nn.Module`.

* Be sure to send each of the constructed sequences to the specified device.

In [11]:
class CNN(nn.Module):
    def __init__(
            self,
            in_channels: int = 3,
            num_classes: int = 10,
            device: str = "cpu",
    ):
        # TODO
        pass

    def forward(
            self,
            x,
    ):
        return self.fc(self.conv(x))

# Task 2: Data and Learning Setup (6 points)

At the beginning, we imported two datasets: ```all_train_data``` and ```test_data```. Store the training data and test data in two separate `DataLoader` objects, one for training and one for validation. Specify a batch size of 32. Then print out the number of data points in each dataset.

In order to train and evaluate our model, we need an optimizer and a criterion. For the former, we will use a procedure called [Adam](https://arxiv.org/abs/1412.6980) (ADAptive Moment estimation). Adam works similarly to stochastic gradient descent, except it _adapts_ its step size according to _momentum_, which is a measure of how close it is to a local minimum.

For the criterion, we will use cross-entropy loss, which just corresponds to the log loss that we saw in class for logistic regression.

* First initialize your CNN model, setting the number of input channels to 3 (one for each color channel), the number of classes to 10, and the device to the device you are currently using.

* Initialize an optimizer variable using the Adam optimizer from the PyTorch library. Set the learning rate to $1 \times 10^{-4}$.

* Initialize a criterion variable using the cross-entropy loss criterion from the PyTorch library.

# Task 3: `train()` Function (16 points)

Now we will implement ```train()```, the main function that will iterate through our data to learn the model. It takes in 7 parameters:

1. ```train_data```: this is a DataLoader object containing the training data.
2. ```val_data```: this is a DataLoader object containing the validation data.
3. ```model```: this is the CNN model that you instantiated to be trained.
4. ```criterion```: this is the criterion to be used during training.
5. ```optimizer```: this is the optimizer to be used during training.
6. ```num_epochs```: this is the number of epochs to train for.
7. ```device```: this is the device to send all computations to.

To train, you will iterate over the specified number of epochs. In each epoch, you will execute the training phase by iterating through the DataLoaders, calculating losses, and computing gradient update steps. You will then repeat these steps for the validation phase, but without the step of computing gradients and updating the model.

Within this function, you should also update the defined `info` dictionary. It contains four lists, updated at the end of each epoch: `"train_losses"`, `"train_accuracies"`, `"val_losses"`, `"val_accuracies"`. Return this dictionary when this function completes.

In [16]:
def train(
        train_data: DataLoader,
        val_data: DataLoader,
        model: nn.Module,
        criterion: nn.Module,
        optimizer: optim.Optimizer,
        num_epochs: int,
        device: str = "cpu",
):
    info = defaultdict(list)

    # TODO

    return info

Once you have this function implemented, call it to train the model with all of our defined parameters. Use 50 epochs.

# Task 4: Loss and Accuracy Curves (6 points)

Use the ```plot_info()``` function below to plot the training and validation losses and accuracies in `info`. Then briefly answer the following questions.

* Give a qualitative description of the training and validation loss curves. What do you notice about the loss curves in relation to each other?

* Let's focus on the validation loss and accuracy curves. Around how many epochs does the model begin to overfit? What do the loss and accuracy curves look like when this happens?

* Let's zoom in once more on the validation loss curve only. How does this curve inform you on when your model has the best performance without overfitting the training data?

In [18]:
def plot_info(info):
    fig, ax = plt.axes = plt.subplots(1, 2, figsize=(10, 3))
    ax[0].plot(info["train_losses"], label="Train Loss")
    ax[0].plot(info["val_losses"], label="Validation Loss")
    ax[0].set_xlabel("Epochs")
    ax[0].set_ylabel("Loss")
    ax[0].set_title("Loss vs. Epochs")
    ax[0].legend()
    ax[1].plot(info["train_accuracies"], label="Train Accuracy")
    ax[1].plot(info["val_accuracies"], label="Validation Accuracy")
    ax[1].set_xlabel("Epochs")
    ax[1].set_ylabel("Accuracy")
    ax[1].set_title("Accuracy vs. Epochs")
    ax[1].legend()
    plt.show()