### DAT303 Module 8.1 Notebook
---
Name:    
Date:

> It is assumed you are using the module8 conda environment specified in the *module8.yaml* file downloaded from Canvas. Be sure to read all cells in this notebook. You are only to provide code in cells that contain `##### YOUR CODE HERE #####` and written responses in cells that contain `YOUR WRITTEN RESPONSE HERE`. Ensure that code cells are executed sequentially to prevent unexpected errors.

In this notebook, you will work with the CIFAR-10 dataset to train a convolutional neural network in order to classify input images into one of the 10 pre-defined classes (more information on CIFAR-10 is available [here](https://dmacc.instructure.com/courses/11337/pages/the-cifar-10-dataset)). 

This notebook differs from previous modules in that you are not expected to do as much coding: The code to load the data, define the model, initialize the optimizer and setup the training and validation loops has been provided. You will then re-use this existing code and assess how adjusting various parameters such as dropout, learning rate and momentum affect the model's performance. Note that training on CPU may take some time: Training BasicCNN for 25 epochs via CPU requires anywhere from 5-15 minutes. 


* Part I: Evaluate BasicCNN with default settings.
* Part II: Evaluate BasicCNN with different combinations of learning rate, dropout and momentum.

<br>

**BE SURE TO READ THE INSTRUCTIONS FOR ALL SECTIONS!!!**



<br>

## Part I.


1. Start by loading in the CIFAR-10 dataset. The next cell loads and normalizes the images and creates `train_loader` and `valid_loader` instances, which are iterated over during training. Execute the cell below (no additional code required).


In [None]:

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

np.set_printoptions(suppress=True, precision=8, linewidth=1000)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


# Number of images to process in each batch. 
batch_size = 32


# Specify CIFAR-10 classes.
classes = [
    "plane", "car", "bird", "cat", "deer", "dog", 
    "frog", "horse", "ship", "truck"
    ]


# ImageNet transforms to normalize images. 
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
     )


# Download CIFAR-10 training and validation data.
train_ds = torchvision.datasets.CIFAR10(
    root="./data", train=True, download=True, transform=transform
    )
valid_ds = torchvision.datasets.CIFAR10(
    root="./data", train=False, download=True, transform=transform
    )

# Create training and validation DataLoader instances. This is what gets 
# iterated over during training. 
train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(valid_ds, batch_size=batch_size, shuffle=False)

print(f"\nNumber of training batches of size {batch_size}: {len(train_loader)}")



<br>

2. Display a batch of images to get an idea of what the training data looks like. Execute the cell below (no additional code required).


In [None]:

# Get first batch from train loader. 
images, labels = next(iter(train_loader))

fig, ax = plt.subplots(1, 1, figsize=(9, 9), tight_layout=True)
ax.axis("off")
ax.set_title("CIFAR-10 Training Images", fontsize=12)
ax.imshow(np.transpose(torchvision.utils.make_grid(images[:64], padding=1, normalize=True), (1, 2, 0)))
plt.show()



<br>

3. Define the network architecture. Refer to the *Anatomy of a PyTorch Neural Network* page in Canvas for a description of each component of the model. Prints the total number of trainable parameters. Execute the cell below (no additional code required).

In [None]:
  
import torch.nn as nn
import torch.nn.functional as F


class BasicCNN(nn.Module):
    def __init__(self, dropout=0):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
        self.fc1 = nn.Linear(in_features=16 * 5 * 5, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=84)
        self.fc3 = nn.Linear(in_features=84, out_features=10)
        self.drp = nn.Dropout(p=dropout)
        
    def forward(self, X):
        output = self.pool(F.relu(self.conv1(X)))
        output = self.pool(F.relu(self.conv2(output)))
        output = torch.flatten(output, 1)
        output = F.relu(self.drp(self.fc1(output)))
        output = F.relu(self.drp(self.fc2(output)))
        output = self.fc3(output)
        return output



# Print number of trainable parameters.
nbr_params = sum(p.numel() for p in BasicCNN().parameters())
print(f"BasicCNN nbr_params: {nbr_params:,.0f}")


<br>

4. Define the loss function and optimizer. For the loss function we use cross entropy loss and stochastic gradient descent as the optimizer. We also initialize an instance of our model, passing it the specified value for dropout. For the first run dropout will be set to 0. You'll have an opportunity to evaluate different values for dropout later in the notebook. Execute the cell below (no additional code required).

In [None]:

import torch.optim as optim


# Configuration ----------------------------------------------------------------

# Number of epochs.
n_epochs = 25

# Learning rate.
lr = 0.001

# Momentum.
momentum = .90

# Dropout.
dropout = 0.0

# ------------------------------------------------------------------------------

# Check if gpu is available. If not, use cpu. 
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Initialize instance of BasicCNN. Put on device for completeness.
mdl = BasicCNN(dropout=dropout).to(device)

# Specify loss function.
loss_fn = nn.CrossEntropyLoss()

# Specify optimizer. 
optimizer = optim.SGD(mdl.parameters(), lr=lr, momentum=momentum)

print(f"device: {device}")



<br>

5. Define training and validation functions. One epoch is a full pass through the training data. For example, if we had 1,000 training images and used a batch size of 50, one epoch would consist of evaluating 1000 / 50 = 20 batches of images. We usually train a network for multiple epochs. If we trained the model for 10 epochs, we would pass the full training dataset through the model 10 times.  
In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model's parameters. Execute the cell below (no additional code required).

In [None]:


def epoch_trainer(epoch, data_loader, model, loss_fn, optimizer, device, verbose=True):
    """
    Execute a single training epoch. Return last batch training loss
    and accuracy. 
    """
    loss, checkpoint_loss, correct, samples = 0.0, 0.0, 0, 0

    # Put model in train mode.
    model.train()

    # Iterate over batches in data_loader. 
    for ii, (X, yactual) in enumerate(data_loader):

        # Send datasets to device. 
        X, yactual = X.to(device), yactual.to(device)

        # Zero out parameter gradients.
        optimizer.zero_grad()

        # Get model predictions (forward pass). 
        ypred = model(X)

        # Compute loss. 
        loss_ii = loss_fn(ypred, yactual)

        # Backpropagation and optimizer step. 
        loss_ii.backward()
        optimizer.step()

        # Update running_loss.
        loss+=loss_ii.item()
        correct+=(ypred.argmax(dim=1)==yactual).type(torch.float).sum().item()
        samples+=yactual.size(dim=0)

        # Print running_loss for every 100 mini-batches.
        if ii % 250 == 0:
            checkpoint_acc = correct / samples
            checkpoint_loss = loss / 250

            if verbose:
                print(f"\t+ [train][epoch={epoch}, batch={ii}] loss = {checkpoint_loss:,.5f}, acc = {checkpoint_acc:.5f}.")
            
            loss, correct, samples = 0.0, 0, 0

    return checkpoint_loss, checkpoint_acc
        


def epoch_validator(data_loader, model, loss_fn, optimizer, device):
    """
    Execute a single validation epoch. Return average validation loss
    and accuracy.
    """
    valid_loss, correct = 0.0, 0

    # Put model in validation mode.
    model.eval()

    with torch.no_grad():

        for ii, (X, yactual) in enumerate(data_loader, start=1):

            # Send dataset and target to device. 
            X, yactual = X.to(device), yactual.to(device)

            # Get model predictions. 
            ypred = model(X)

            # Compute loss and update valid_loss.
            valid_loss+=loss_fn(ypred, yactual).item()

            # Count number of correct class predictions.
            correct+=(ypred.argmax(dim=1)==yactual).type(torch.float).sum().item()

    loss, acc = valid_loss / ii, correct / len(data_loader.dataset)

    return loss, acc





<br>

6. Train model for specified number of epochs. `results` is a list of 4-tuples consisting of train loss, train accuracy, validation loss and validation accuracy. Note for this dataset, any accuracy above 10% shows that the model has learned something, since 10% is the accuracy of a model that selects classes randomly. Execute the cell below (no additional code required). 

In [None]:

results = []

for epoch in range(1, n_epochs + 1):

    tloss, tacc = epoch_trainer(
        epoch=epoch, data_loader=train_loader, model=mdl, loss_fn=loss_fn, 
        optimizer=optimizer, device=device
        )
    
    vloss, vacc = epoch_validator(
        data_loader=valid_loader, model=mdl, loss_fn=loss_fn, 
        optimizer=optimizer, device=device
        )
    
    print(f"[epoch={epoch}]: tloss={tloss:.5f}, tacc={tacc:.5f}, vloss={vloss:.5f}, vacc={vacc:.5f}.")

    # Append metrics to results.
    results.append((tloss, tacc, vloss, vacc))



<br>

7. Which epoch achieved the minimum validation loss? The maximum validation accuracy? What are the values of the minimum validation loss and maximum validation accuracy?

In [None]:

##### YOUR CODE HERE #####


<br>

8. Create a side-by-side plot with training and validation loss on the left and training and validation accuracy on the right. Be sure to include a legend and label you axes. 

In [None]:

##### YOUR CODE HERE #####


<br>

9. Inspecting the graph, at which epoch do train and validation loss start to diverge? At which epoch to train and validation accuracy start to diverge?


YOUR WRITTEN RESPONSE HERE


<br>

## Part II.

1.  Re-run the same training pipeline above, but this time using different values for learning rate, momentum and dropout. For each set parameters, train the model for 25 epochs, and record the minimum validation loss and maximum validation accuracy for each parameter set. Note that for each set of parameters, you will need to re-initialize both the `BasicCNN` instance (since this takes dropout as a parameter) as well as the optimizer (since this takes learning rate and momentum as parameters). The `params` list of dicts provided in the next cell contains the full set of parameter combinations you are to evaluate.   
Note that this step make take a while (possibly longer than an hour).

In [None]:

# Parameter combinations to evaluate for ablation test.
params = [
    {"lr": .001, "momentum": 0.90, "dropout": .10},
    {"lr": .001, "momentum": 0.95, "dropout": .05},
    {"lr": .0005, "momentum": 0.90, "dropout": .10},
    {"lr": .0005, "momentum": 0.95, "dropout": .05},
    {"lr": .001, "momentum": 0.875, "dropout": .10},
    {"lr": .001, "momentum": 0.875, "dropout": .05},
    {"lr": .001, "momentum": 0.95, "dropout": .05},
    {"lr": .0025, "momentum": 0.95, "dropout": .10},
    {"lr": .0025, "momentum": 0.95, "dropout": .0},
    {"lr": .001, "momentum": 0.99, "dropout": .10},
    {"lr": .001, "momentum": 0.90, "dropout": .50},
    {"lr": .005, "momentum": 0.90, "dropout": .50},
    ]

# List of 2-tuples containing (min_valid_loss, max_valid_acc) for each set 
# of evaluated parameters. Upon completion, should have same length has params.
all_results = []


##### YOUR CODE HERE #####



<br>

2. Create a side-by-side barplot with validation loss on the left and validation accuracy on the right for each parameter group. The x-axis should be the index into the params list of the given parameter set (i.e., 0, 1, 2, ...).


In [None]:

##### YOUR CODE HERE #####



<br>

3. Which parameter set obtained the minimum validation loss? Was this higher or lower than the best validation loss achieved in Part I?


YOUR WRITTEN RESPONSE HERE



<br>

4. Which parameter(s) (learning rate, momentum or dropout) had the most significant impact when changed from the original model? Which direction were these changes (increase/decrease) from the model parameterization in Part I?



YOUR WRITTEN RESPONSE HERE



<br>

5. If you were to continue working on this image classifier, what other aspects of model development might you further explore to improve model performance (2-3 sentences)?



YOUR WRITTEN RESPONSE HERE
