## DAT303 - Module \#9 Notebook
---
Name:    
Date:

> It is assumed you are using the module9 conda environment specified in the *module9.yaml* file downloaded from Canvas. Be sure to read all cells in this notebook. You are only to provide code in cells that contain `##### YOUR CODE HERE #####` and written responses in cells that contain `YOUR WRITTEN RESPONSE HERE`. Ensure that code cells are executed sequentially to prevent unexpected errors.

In this notebook, you will again work with the CIFAR-10 dataset, but this time using pre-trained models in order to classify input images into one of the 10 pre-defined classes (more information on CIFAR-10 is available [here](https://dmacc.instructure.com/courses/11337/pages/the-cifar-10-dataset)). 


The tasks for this assignment are the following:

* Part I: Create dataset, network definition, define train and validation functions. 
* Part II: Train `resnet34` on CIFAR-10. 
* Part III: Train `densenet121` on CIFAR-10.
* Part IV: Train `vgg11` on CIFAR-10.
* Part V: Compare performance of the 3 models against the final test set. 


<br>


**BE SURE TO READ THE INSTRUCTIONS FOR ALL SECTIONS!!!**



<br>

## Part I.


1. Start by loading in the CIFAR-10 dataset. The next cell loads and normalizes the images and creates `train_loader`, `valid_loader` and `test_loader` instances. Note that `test_loader` will only be used in Part V to evaluate each of the best fitting models in order to determine which model generalizes best to unseen data. Recall the differences between training, validation and test sets:

    - **Training set**: Used to train the model. During this phase, the model learns the patterns and relationships within the data by adjusting its weights based on the loss calculated from its predictions. The model iteratively processes the training data, adjusting its parameters to minimize the error on this set.

    - **Validation set**: Used to tune hyperparameters and evaluate the model's performance during training. It helps in assessing the model's ability to generalize to new data. During training, the model is periodically evaluated on the validation set to monitor its performance and prevent overfitting. Hyperparameter tuning, such as adjusting learning rates, batch sizes, and network architecture, is done based on validation set performance.

    - **Test set**: Used to evaluate the final performance of the model after training is complete. It provides an unbiased assessment of the model's generalization capability to entirely new data. The model's performance on the test set is reported as the final measure of its accuracy, precision, recall, F1 score, or other relevant metrics. This set is only used once after the model has been trained and validated.

<br>


Execute the cell below (no additional code required).


In [None]:

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import torch
import torchvision
from torchvision import models
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torch.utils.data.sampler import SubsetRandomSampler

np.set_printoptions(suppress=True, precision=8, linewidth=1000)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


# Number of images to process in each batch. 
batch_size = 32


# ImageNet transforms to normalize images. 
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
     )


# Download CIFAR-10 training and test dataset.
train_ds = torchvision.datasets.CIFAR10(
    root="./data", train=True, download=True, transform=transform
    )
test_ds = torchvision.datasets.CIFAR10(
    root="./data", train=False, download=True, transform=transform
    )

# Use 50% of test data for validation set.
num_test = len(test_ds)
indices = list(range(num_test))
np.random.shuffle(indices)
split = int(np.floor(.50 * num_test))
valid_idx, test_idx = indices[split:], indices[:split]
valid_sampler = SubsetRandomSampler(valid_idx)
test_sampler = SubsetRandomSampler(test_idx)
train_loader = DataLoader(train_ds, batch_size=batch_size,shuffle=True)
valid_loader = DataLoader(test_ds, batch_size=batch_size, sampler=valid_sampler)
test_loader = DataLoader(test_ds, batch_size=batch_size, sampler=test_sampler)

print(f"\nNumber of training batches of size {batch_size}  : {len(train_loader)}")
print(f"Number of validation batches of size {batch_size}: {len(valid_loader)}")
print(f"Number of test batches of size {batch_size}      : {len(test_loader)}")



<br>

2. Display a batch of images to get an idea of what the training data looks like. Execute the cell below (no additional code required).


In [None]:

# Get first batch from train loader. 
images, labels = next(iter(train_loader))

fig, ax = plt.subplots(1, 1, figsize=(9, 9), tight_layout=True)
ax.axis("off")
ax.set_title("CIFAR-10 Training Images", fontsize=12)
ax.imshow(np.transpose(torchvision.utils.make_grid(images[:64], padding=1, normalize=True), (1, 2, 0)))
plt.show()



<br>

3. Define the network architecture. Refer to the *Anatomy of a Pre-Trained Network in PyTorch* page in Canvas for a description of each component of the model. `PreTrainedNetwork` accepts a generic `pt_model` parameter, which serves as a stand-in for pre-trained PyTorch models. We disable gradient updates for all model parameters except for the final fully-connected layer. 
Note that this architecture is slightly more complicated that what is shown in the Canvas page in order to facilitate resnet, densenet and vgg pre-trained models. Execute the cell below (no additional code required).

In [None]:
  
import torch.nn as nn


class PretrainedNetwork(nn.Module):
    def __init__(self, pt_model, dropout=0):
        super().__init__()

        self.mdl = pt_model

        # Set requires_grad to False for pretrained model backbone.
        for param in self.mdl.parameters():
            param.requires_grad = False

        # Get dimension of last layer to adjust for CIFAR-10 data. The PyTorch 
        # API is not consistent w.r.t. pre-trained models, so this is necessary 
        # becuase of differences in naming conventions. 
        pt_model_name = self.mdl.__class__.__name__.strip().lower()

        if pt_model_name.startswith("resnet"):
            pt_num_features = self.mdl.fc.in_features
            self.mdl.fc = nn.Sequential(
                nn.Linear(in_features=pt_num_features, out_features=128),
                nn.Dropout(p=dropout),
                nn.ReLU(),
                nn.Linear(in_features=128, out_features=10)
            )

        elif pt_model_name.startswith("densenet"):
            pt_num_features = self.mdl.classifier.in_features
            self.mdl.classifier = nn.Sequential(
                nn.Linear(in_features=pt_num_features, out_features=128),
                nn.Dropout(p=dropout),
                nn.ReLU(),
                nn.Linear(in_features=128, out_features=10)
            )

        elif pt_model_name.startswith("vgg"):
            pt_num_features = self.mdl.classifier[0].in_features
            self.mdl.classifier = nn.Sequential(
                nn.Linear(in_features=pt_num_features, out_features=128),
                nn.Dropout(p=dropout),
                nn.ReLU(),
                nn.Linear(in_features=128, out_features=10)
            )

    def forward(self, input):
        output = self.mdl(input)
        return output


















Here, we need to freeze all the network except the final layer. We need to set requires_grad = False to freeze the parameters so that the gradients are not computed in backward().

The code that follows shows how to load the pre-trained resnet18 model from PyTorch. 

<br>

4. Define the loss function and optimizer. For the loss function we use cross entropy loss and stochastic gradient descent as the optimizer. Execute the cell below (no additional code required).

In [None]:

import torch.optim as optim
from torch.optim import lr_scheduler

# Configuration ----------------------------------------------------------------

# Number of epochs.
n_epochs = 25

# Learning rate.
lr = 0.001

# Momentum.
momentum = .90

# Dropout.
dropout = 0.125

# ------------------------------------------------------------------------------

# Check if gpu is available. If not, use cpu. 
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Specify loss function.
loss_fn = nn.CrossEntropyLoss()


print(f"device: {device}")


<br>

5. *Define training and validation functions*. One epoch is a full pass through the training data. For example, if we had 1,000 training images and used a batch size of 50, one epoch would consist of evaluating 1000 / 50 = 20 batches of images. We usually train a network for multiple epochs. If we trained the model for 10 epochs, we would pass the full training dataset through the model 10 times.  
In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model's parameters. Execute the cell below (no additional code required).

In [None]:


def epoch_trainer(epoch, data_loader, model, loss_fn, optimizer, device, verbose=True):
    """
    Execute a single training epoch. Return last batch training loss
    and accuracy. 
    """
    loss, checkpoint_loss, correct, samples = 0.0, 0.0, 0, 0

    # Put model in train mode.
    model.train()

    # Iterate over batches in data_loader. 
    for ii, (X, yactual) in enumerate(data_loader):

        # Send datasets to device. 
        X, yactual = X.to(device), yactual.to(device)

        # Zero out parameter gradients.
        optimizer.zero_grad()

        # Get model predictions (forward pass). 
        ypred = model(X)

        # Compute loss. 
        loss_ii = loss_fn(ypred, yactual)

        # Backpropagation and optimizer step. 
        loss_ii.backward()
        optimizer.step()

        # Update running_loss.
        loss+=loss_ii.item()
        correct+=(ypred.argmax(dim=1)==yactual).type(torch.float).sum().item()
        samples+=yactual.size(dim=0)

        # Print running_loss for every 100 mini-batches.
        if ii % 250 == 0:
            checkpoint_acc = correct / samples
            checkpoint_loss = loss / 250

            if verbose:
                print(f"\t+ [train][epoch={epoch}, batch={ii}] loss = {checkpoint_loss:,.5f}, acc = {checkpoint_acc:.5f}.")
            
            loss, correct, samples = 0.0, 0, 0

    return checkpoint_loss, checkpoint_acc
        


def epoch_validator(data_loader, model, loss_fn, device):
    """
    Execute a single validation epoch. Return average validation loss
    and accuracy.
    """
    valid_loss, correct = 0.0, 0

    # Put model in evaluation mode.
    model.eval()

    with torch.no_grad():

        for ii, (X, yactual) in enumerate(data_loader, start=1):

            # Send dataset and target to device. 
            X, yactual = X.to(device), yactual.to(device)

            # Get model predictions. 
            ypred = model(X)

            # Compute loss and update valid_loss.
            valid_loss+=loss_fn(ypred, yactual).item()

            # Count number of correct class predictions.
            correct+=(ypred.argmax(dim=1)==yactual).type(torch.float).sum().item()

    loss, acc = valid_loss / ii, correct / len(data_loader.dataset)

    return loss, acc





<br>

## Part II.

<br>

1. Initialize an instance of `PretrainedNetwork` that accepts the `resnet34` pre-trained model. Identify this model as `mdl1`. Be sure to pass in `dropout`, which is set to .125 above. Put model on `device` (i.e. append `.to(device)`).

In [None]:

##### YOUR CODE HERE #####



<br>

2. In a sentence or two, describe the defining characteristics of ResNet models. What does the '34' represent in 'resnet34'?


YOUR WRITTEN RESPONSE HERE



<br>

3. Print the total number of parameters in `mdl1`.

In [None]:

##### YOUR CODE HERE #####



<br>

4. Train `mdl1` for specified number of epochs. `results1` is a list of 4-tuples consisting of train loss, train accuracy, validation loss and validation accuracy. Within the loop, whichever epoch achieves the minimum validation loss, the model at that time is written to disk as *best-mdl1.pth*. Execute the cell below (no additional code required).


In [None]:

results1 = []
best_loss = np.Inf

# Specify optimizer. Decay learning rate by a factor of .10 every 5 epochs.
optimizer = optim.SGD(mdl1.parameters(), lr=lr, momentum=momentum)
scheduler = lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.10)


for epoch in range(1, n_epochs + 1):

    tloss, tacc = epoch_trainer(
        epoch=epoch, data_loader=train_loader, model=mdl1, loss_fn=loss_fn, 
        optimizer=optimizer, device=device
        )
    
    vloss, vacc = epoch_validator(
        data_loader=valid_loader, model=mdl1, loss_fn=loss_fn, 
        device=device
        )

    if vloss < best_loss:
        best_loss = vloss
        torch.save(mdl1, "best-mdl1.pth")

    scheduler.step()
    
    print(f"mdl1 [epoch={epoch}-of-{n_epochs}]: tloss={tloss:.5f}, tacc={tacc:.5f}, vloss={vloss:.5f}, vacc={vacc:.5f}.")

    # Append metrics to results.
    results1.append((tloss, tacc, vloss, vacc))


<br>

5. Which epoch achieved the minimum validation loss for `mdl1`? The maximum validation accuracy? What are the values of the minimum validation loss and maximum validation accuracy?

In [None]:

##### YOUR CODE HERE #####


<br>

## Part III.

<br>

1. Initialize and instance of `PretrainedNetwork` that accepts the `densenet121` pre-trained model. Identify this model as `mdl2`. Be sure to pass in `dropout`, which is set to .125 above. Put model on device (i.e. call `.to(device)`).

In [None]:

##### YOUR CODE HERE #####




<br>

2. In a sentence or two, describe the defining characteristics of DenseNet models. What does the '121' represent in 'densenet121'?



YOUR WRITTEN RESPONSE HERE



<br>

3. Print the total number of parameters in `mdl2`.

In [None]:

##### YOUR CODE HERE #####



<br>

4. Train `mdl2` for specified number of epochs. `results2` is a list of 4-tuples consisting of train loss, train accuracy, validation loss and validation accuracy. Within the loop, whichever epoch achieves the minimum validation loss, the model at that time is written to disk as *best-mdl2.pth*. Copy the code from Part I, replacing instances of `mdl1` with `mdl2` and execute the cell to commence training.

In [None]:

results2 = []
best_loss = np.Inf

##### YOUR CODE HERE #####


<br>

5. Which epoch achieved the minimum validation loss for `mdl2`? The maximum validation accuracy? What are the values of the minimum validation loss and maximum validation accuracy?

In [None]:

##### YOUR CODE HERE #####



<br>

## Part IV.

<br>

1. Initialize and instance of `PretrainedNetwork` that accepts the `vgg11` pre-trained model. Identify this model as `mdl3`. Be sure to pass in `dropout`, which is set to .125 above. Put model on `device` (i.e. call `.to(device)`).

In [None]:

##### YOUR CODE HERE #####


<br>

2. In a sentence or two, describe the defining characteristics of VGG models. What does the '11' represent in 'vgg11'?


YOUR WRITTEN RESPONSE HERE



<br>


3. Print the total number of parameters in `mdl3`.


In [None]:

##### YOUR CODE HERE #####



<br>

4. Train `mdl3` for specified number of epochs. `results3` is a list of 4-tuples consisting of train loss, train accuracy, validation loss and validation accuracy. Within the loop, whichever epoch achieves the minimum validation loss, the model at that time is written to disk as *best-mdl3.pth*. Copy the code from Part I, replacing instances of `mdl1` with `mdl3` and execute the cell to commence training.
  

In [None]:

results3 = []
best_loss = np.Inf

##### YOUR CODE HERE #####


<br>

5. Which epoch achieved the minimum validation loss for `mdl3`? The maximum validation accuracy? What are the values of the minimum validation loss and maximum validation accuracy?

In [None]:

##### YOUR CODE HERE #####



<br>

## Part V.


1. Create a side-by-side plot with validation loss on the left and validation accuracy on the right for mdl1, mdl2 and mdl3 (three separate line plots on each facet). These metrics should be available in `results1`, `results2` and `results3`. 
For the validation accuracy plot, include a horizontal dashed line at 10% highlighting the accuracy of the null model. 
Be sure to include a legend and label you axes. 


In [None]:

##### YOUR CODE HERE #####



<br>

2. Which model achieved minimum validation loss? Did the same model achieve maximum validation accuracy?


YOUR WRITTEN RESPONSE HERE


<br>

3. Using `torch.load`, load the best performing model objects from *best-mdl1.pth*, *best-mdl2.pth* and *best-mdl3.pth*. Bind each to `mdl1`, `mdl2` and `mdl3` respectively. Be sure to append `.to(device)`.

In [None]:

##### YOUR CODE HERE #####



<br>

4. Using the `epoch_validator` function defined earlier, compute the final loss and accuracy on the test set for `mdl1`, `mdl2` and `mdl3`. Be sure to pass `test_loader` into `epoch_validator`. Print test loss and accuracy for each.

In [None]:

##### YOUR CODE HERE #####


<br>

5. Which model acheived the highest accuracy on the test set? Does this correspond to the model that had the most parameters?


YOUR WRITTEN RESPONSE HERE

<br>

6. In the next cell, `X` represents a batch of 32 images and `y` the corresponding labels (see `classes` list below: Within `y`, 0 represents "plane", 1 represents "car", etc.).   
When `X` is passed into the model, the result will be a 32x10 tensor. The predicted class will be the index with the maximum value for each row. You can use `torch.argmax` to find the index with the maximum value per row (result should be a tensor of size 32. Using your best performing model identified in the previous question, determine the class predictions for the batch of 32 images and compare it to the actual class. How many images did your model correctly predict?

In [None]:

# CIFAR-10 label descriptions.
classes = [
    "plane", "car", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"
    ]

X, y = next(iter(test_loader))
X, y = X.to(device), y.to(device)


##### YOUR CODE HERE ##### 
