# Program 3
v5.1

Adapted from previous iterations by Colette and Param.

## Base Setup

This section contains the basic environment set up for this notebook, including imports, constants, and any variable that needs to be easily accessed for changing.

In [None]:
#Import modules
import torch
import torchvision
import os
import datetime
import shutil
import platform
import time

import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
import pandas as pd
import seaborn as sns

from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
from PIL import Image
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.utils import shuffle
from numpy.random import choice
from ipylab import JupyterFrontEnd
from IPython.display import clear_output

This is a set of constants used mainly for workspace setup.

`TRAINING_DATA`: The relative or absolute path to the directory containing the training data.\
`CHECKPOINT`: The base name of the model checkpoint file to create. Will be modified later for checkpoint saving. Current version of the program expects "epoch_" or some other similar 6 characters at the end when not saving each epoch.\
`OUTPUT_DIR`: The absolute path from the current directory to the directory to be used for output files made by this notebook. <b>Note:</b> The directory structure may already exist, but it does not need to. A later function will make it if it does not exist.\
`NOTEBOOK_NAME`: The exact name of this notebook, including the file extension. Needed later for programmatic html conversion and copying of the notebook.\
`SAVED_FILES`: Not technically a constant, but should not be altered by user. Used to keep track of any non-checkpoint file that gets saved to later move to output.\
`APP`: JupyterFrontEnd instance that is used to save the notebook programmatically later.

In [None]:
#Define constants
TRAINING_DATA = "../../Data/Training/Primordial/v10/"
CHECKPOINT = "Resnet34_200px_v10_test_epoch_"
OUTPUT_DIR = "Output/Testing/Run Whatever/"
NOTEBOOK_NAME = "Program 3 - Final v1.ipynb" #Make sure this is identical to the name of THIS notebook
SAVED_FILES = [] #Leave as an empty list
APP = JupyterFrontEnd() #Needed to save the notebook programmatically later, do not change.

These are variables and flags for functions that get used later.

`transform`: The set of torchvision transforms to be applied to training images before they are input into the model.\
`batch_size`: The amount of images per DataLoader batch. Heavily affects VRAM usage. Speed testing has determined 128 to be optimal for my hardware.\
`num_workers`: The amount of workers for the DataLoader to use to parallelize training. Affects system RAM usage. Speed testing has determined 16 to be optimal for my hardware. <b>Note:</b> Windows is incapable of parallelizing Jupyter Notebooks like this; therefore, this variable will be set to 0 if on Windows.\
`freeze_model`: Flag to determine whether or not to freeze all layers of the model except the final layer. Testing has shown better prediction performance with this set to False.\
`use_class_weights`: Flag to determine whether or not to pass the calculated weights of each class to the loss function. Testing has shown better prediction performance with this set to True.\
`sgd_learning_rate`: The learning rate to give to the sgd optimizer. Testing has shown 0.00007 to be the best performing value for learning rate with the current training data set.\
`sgd_momentum`: The momentum to give to the sgd optimizer. Testing has shown an inconclusive effect on prediction performance. Using 0.9 for current training data set.\
`sgd_weight_decay`: The weight decay value to give to the sgd optimizer. Testing has shown an inconclusive effect on prediction performance. Using 0.01 for current training data set.\
`num_epochs`: The number of epochs to train for. Total number of epochs run will be this + 1 due to running an untrained epoch 0 for control purposes.\
`save_each_epoch`: Flag to determine whether to save a checkpoint for every epoch or only the epochs that have the best accuracy and loss values.\
`use_amp`: Flag to determine whether or not to use PyTorch's Automatic Mixed Precision. Can significantly boost performance with minimal cost to calculation accuracy.\
`save_figs`: Flag to determine whether to save any extra figures created in this notebook for analysis purposes. Checkpoints will still be saved regardless.

In [None]:
#Define variables
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean = [0.5, 0.5, 0.5], std = [0.5, 0.5, 0.5])
])

batch_size = 128
num_workers = 0 if platform.system() == "Windows" else 16

freeze_model = False
use_class_weights = True

sgd_learning_rate = 0.00007
sgd_momentum = 0.9
sgd_weight_decay = 0.01

num_epochs = 10
save_each_epoch = True

use_amp = True

save_figs = True

## Class and Function Declarations

This section contains all the Classes and Functions used by this notebook.

`ImageFolderWithPaths`: Class that extends `torchvision.datasets.ImageFolder`

This class extends the base torchvision ImageFolder class to add the ability to return the path to the image alongside the image and its label.

<b>Methods:</b>\
&emsp;`__getitem__`: Gets the item in the dataset at an index. Override.

&emsp;<b>Parameters:</b>\
&emsp;&emsp;`index`: An index value to be used to get an item in the dataset.

&emsp;Calls `super().__getitem__()` with the `index` parameter to get the image and label to be trained on.\
&emsp;Also gets the image path from `self.imgs`.

&emsp;<b>Returns:</b>\
&emsp;&emsp;A tuple containing the image, its label, and its path.

In [None]:
class ImageFolderWithPaths(datasets.ImageFolder):
    '''Image folder dataset class extending torchvision.datasets.ImageFolder'''
    def __getitem__(self, index):
        '''Gets the image and label at the given index. Returns them alongside the path to the image.'''
        image, label = super(ImageFolderWithPaths, self).__getitem__(index)
        path = self.imgs[index][0]
        return (image, label, path)

`setup_model`: Function used to prepare the image classification model being used.

<b>Parameters:</b>\
&emsp;`model`: The image classification model to be setup.\
&emsp;`out_features`: The number of output features to use in the final layer of the model.

If `freeze_model` flag is set to True, this function will loop through the existing layers of the model and prevent them from being altered during training.\
The function then replaces the fully connected layer of the model with a Linear layer that has a number of output features equal to `out_features`.

<b>Returns:</b>\
&emsp;The modified model sent to the torch `device` being used.

In [None]:
def setup_model(model, out_features):
    '''Applies desired modifications to given model. Returns model on torch device.'''
    #Freeze model, if desired
    if freeze_model:
        for param in model.parameters():
            param.requires_grad = False

    #Replace final layer
    model.fc = nn.Linear(model.fc.in_features, out_features)

    return model.to(device)

`train_model`: The main training loop for the model.

<b>Parameters:</b>\
&emsp;`model`: The already set up classification model to be trained.\
&emsp;`criterion`: The loss function for the model.\
&emsp;`optimizer`: The optimizer for the model.\
&emsp;`scheduler`: The learning rate scheduler for the model. <b>Default:</b> None.\
&emsp;`num_epochs`: The number of epochs to train for. <b>Default:</b> 10.

Iterates through `num_epochs` + 1 number of epochs for training. Epoch 0 is used with 0 learning rate to test the pretrained/untrained model as a control.\
Further iterates between "train" and "test" phases for each epoch.\
Makes predictions on every image in the DataLoader for the current phase. If "train" phase, back propagates the losses and steps `optimizer` and `scheduler` if it exists. Losses and `optimizer` are wrapped by a `torch.amp.GradScaler` called `scaler` to account for and scale values according to the level of precision used for each layer of the model.\
Saves losses and accuracy percentages for each phase of each epoch. Also keeps track of which epochs had the lowest loss and highest accuracy.\
Saves checkpoints for either each epoch or only the best loss and accuracy epochs depending on the state of the `save_each_epoch` flag.\
Prints progress, loss, and accuracy during and after each epoch.

<b>Returns:</b>\
&emsp;Dictionaries containing the per epoch losses and accuracies separated by phase ("train" or "test").

In [None]:
def train_model(model, criterion, optimizer, scheduler = None, num_epochs = 10):
    '''Trains the given model using the given loss function and optimizer on the given train dataloader.
    Tests the model's learning on the given test dataloader.
    Runs for num_epochs epochs.
    Returns train and test losses and accuracies as dictionaries of lists.'''
    start_time = datetime.datetime.now()
    
    epoch_losses = {
        'train': [],
        'test': []
    }
    epoch_accuracies = {
        'train': [],
        'test': []
    }

    max_test_accuracy = 0.0
    min_test_loss = np.inf
    best_accuracy_epoch = 0
    best_loss_epoch = 0
    total_steps = image_dataloader_sizes['train']
    learning_rate = optimizer.param_groups[0]['lr']

    scaler = torch.amp.GradScaler(enabled = use_amp)

    for epoch in range(num_epochs + 1):
        epoch_start_time = datetime.datetime.now()
        
        #Set lr to 0 for epoch 0 and return to original lr after
        if epoch == 0:
            optimizer.param_groups[0]['lr'] = 0
        elif epoch == 1:
            optimizer.param_groups[0]['lr'] = learning_rate

        print("Epoch: {} of {} - lr: {}".format(epoch, num_epochs, optimizer.param_groups[0]['lr']))

        #Iterate between train and test phases per epoch
        for phase in ['train', 'test']:
            if phase == 'train':
                model.train()
            else:
                model.eval()

            running_loss = 0.0
            running_corrects = 0

            #Iterate over data
            for batch, (data, targets, _) in enumerate(image_dataloaders[phase]):
                data, targets = data.to(device), targets.to(device)

                #Zero gradients
                optimizer.zero_grad()

                #Forward
                with torch.set_grad_enabled(phase == 'train'):
                    with torch.autocast(device_type = "cuda" if torch.cuda.is_available() else "cpu", enabled = use_amp):
                        outputs = model(data)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, targets)

                    #Backward
                    if phase == 'train':
                        scaler.scale(loss).backward()
                        scaler.step(optimizer)
                        scaler.update()

                #Batch Stats
                running_loss += loss.item() * data.size(0)
                running_corrects += torch.sum(preds == targets.data)

                #Print status during training
                if batch % 1000 == 0 and phase == 'train':
                    print("  Step: {} of {} - Loss: {:.4f}".format(batch, total_steps, loss.item()))

            #Step scheduler if it exists
            if phase == 'train' and scheduler is not None:
                scheduler.step()

            #Epoch stats
            epoch_losses[phase].append(running_loss / image_dataset_sizes[phase])
            epoch_accuracies[phase].append((100 * running_corrects.double() / image_dataset_sizes[phase]).item())

            print("  Phase: {}".format(phase))
            print("    Loss: {:.3f} - Mean Loss: {:.3f} - Accuracy: {:.1f}%".format(epoch_losses[phase][epoch], np.mean(epoch_losses[phase]), epoch_accuracies[phase][epoch]))
            
            #Save model if network improved on test set
            if phase == 'test' and epoch_accuracies[phase][epoch] > max_test_accuracy:
                max_test_accuracy = epoch_accuracies[phase][epoch]
                best_accuracy_epoch = epoch
                print("****New Best Accuracy****")

                if not save_each_epoch:
                    print("    Saving Checkpoint")
                    torch.save(model.state_dict(), CHECKPOINT[:-6] + "best_accuracy.pt")

            if phase == 'test' and epoch_losses[phase][epoch] < min_test_loss:
                min_test_loss = epoch_losses[phase][epoch]
                best_loss_epoch = epoch
                print("****New Best Loss****")

                if not save_each_epoch:
                    print("    Saving Checkpoint")
                    torch.save(model.state_dict(), CHECKPOINT[:-6] + "best_loss.pt")

            if save_each_epoch:
                print("  **Saving Epoch Checkpoint**")
                torch.save(model.state_dict(), CHECKPOINT + str(epoch) + ".pt")

        #Print runtimes
        epoch_end_time = datetime.datetime.now()
        
        print("  Epoch Time: {}".format(epoch_end_time - epoch_start_time))
        print("  Total Time: {}\n".format(epoch_end_time - start_time))

    #Print total runtime and lowest test loss
    print("Training finished in: {}".format(datetime.datetime.now() - start_time))
    print("  Best test accuracy: {:.1f}%".format(max_test_accuracy))
    print("  Best accuracy epoch: {}".format(best_accuracy_epoch))
    print("  Best test loss: {:.3f}".format(min_test_loss))
    print("  Best loss epoch: {}".format(best_loss_epoch))
    
    return epoch_losses, epoch_accuracies

`make_output_dir`: Creates the base output directory defined by `OUTPUT_DIR`, adds a sub-directory based on `epoch` or `checkpoint` parameters if they exist, then creates a time-stamped directory within the subdirectory.

<b>Parameters:</b>\
&emsp;`epoch`: The epoch number to make an output folder for when `save_each_epoch` is True. Must be passed as a string. <b>Default:</b> None\
&emsp;`checkpoint`: The descriptor of the checkpoint, `best_accuracy` or `best_loss`, to make an output folder for when `save_each_epoch` is False. Must be passed as a string. <b>Default:</b> None

Only one of `epoch` and `checkpoint` can be passed at once. If both are passed, the function will raise an error.\
First, makes the directory tree specified by `OUTPUT_DIR` if the final directory does not already exist.\
Then, creates a directory within `OUTPUT_DIR` based on the `epoch` or `checkpoint` that is passed. Saves this dir as `subdir`. If neither are passed, skips this step.\
Next, creates a directory within `subdir` that is time-stamped with the current date and time.\
Saves the time-stamped directory to a global constant `TIME_STAMP_OUTPUT_DIR` to be used in final cleanup.

In [None]:
def make_output_dir(epoch = None, checkpoint = None):
    '''Check if the directory specified by OUTPUT_DIR exists. Create directory if it does not exist.
    Create directories for each epoch/checkpoint depending on passed parameters. Finally, create time-stamped directory within.'''
    #Make sure only one of epoch and checkpoint are provided, not both
    if epoch is not None and checkpoint is not None:
        raise Exception("Only one of 'epoch' and 'checkpoint' can be passed at a time. Got values for both.")
        
    #Get the current time to timestamp
    time = datetime.datetime.now()
    
    #Create base output directory if it does not exist
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

    if epoch is not None:
        #Create directory for current epoch if it does not exist
        subdir = OUTPUT_DIR + "Epoch " + epoch + "/"
        
        if not os.path.exists(subdir):
            os.mkdir(subdir)
    elif checkpoint is not None:
        #Create directory for current checkpoint if it does not exist
        subdir = OUTPUT_DIR + checkpoint + "/"

        if not os.path.exists(subdir):
            os.mkdir(subdir)
    else:
        #If both epoch and checkpoint are not given, make subdir an empty string
        subdir = ""
    
    #Define global scope constant
    global TIME_STAMP_OUTPUT_DIR
    TIME_STAMP_OUTPUT_DIR = subdir + time.strftime("%Y-%m-%d_%H-%M-%S")
    
    #Make time-stamped output directory
    os.mkdir(TIME_STAMP_OUTPUT_DIR)

`train_test_graph`: Graphs the train and test loss/accuracy data from model training.

<b>Parameters:</b>\
&emsp;`data`: The dictionary containing both train and test data to graph.\
&emsp;`graph_type`: A string to represent which type of graph this is. (i.e. "Loss" or "Accuracy")

Graphs the train and test data on separate lines on the same plot and adds a legend with labels.\
If `save_figs` flag is set to True, saves the graph as a png and keeps track of it within the `SAVED_FILES` list.

In [None]:
def train_test_graph(data, graph_type):
    '''Graph the given data and save to a file. Determines whether graph is loss or accuracy based on graph_type.'''
    global SAVED_FILES
    
    _, ax = plt.subplots(figsize = (10, 5))

    ax.set_title(graph_type + " Graph")
    ax.set_ylabel(graph_type)
    ax.set_xlabel("Epoch")

    if graph_type == "Accuracy":
        ax.set_yticks(range(0, 101, 10))

    ax.plot(data['train'], label = "Train")
    ax.plot(data['test'], label = "Test")

    ax.legend()

    if save_figs:
        file = graph_type + "_Graph.png"

        #Keep track of file
        SAVED_FILES.append(file)
        
        plt.savefig(file)

    plt.show()

`dataloader_predictions`: Makes predictions on the given DataLoader phase.

<b>Parameters:</b>\
&emsp;`phase`: The phase of the DataLoader to make predictions using. (i.e. "train", "test", "validate")

Iterates through the entire DataLoader for `phase` and makes predictions on every image.\
Saves filename, actual and predicted class, whether the prediction was the same as the actual class, and the probabilities for each class into a pandas DataFrame.

<b>Returns:</b>\
&emsp;The predicted class labels, actual class labels, and the pandas DataFrame that was created.

In [None]:
def dataloader_predictions(phase):
    '''Makes predictions on the images contained in the dataloader for the given phase.
    Returns predicted labels, true labels, and a dataframe compiling the prediction data.'''
    predicted_labels = []
    true_labels = []
    
    prediction_df_base = {
        'Filename': [],
        'True Class': [],
        'Predicted Class': [],
        'Prediction': [],
        'Negative': [],
        'Primordial': [],
        'Transitional Primordial': [],
        'Primary': [],
        'Transitional Primary': [],
        'Secondary': [],
        'Multilayer': []
    }

    #Iterate over dataloader for phase
    with torch.no_grad():
        model.eval()
        
        for inputs, labels, filenames in image_dataloaders[phase]:
            inputs, labels = inputs.to(device), labels.to(device)
            
            with torch.autocast(device_type = "cuda" if torch.cuda.is_available() else "cpu", enabled = use_amp):
                output = model(inputs)
    
            probabilities = (torch.exp(output) / torch.sum(torch.exp(output), 1).reshape(-1, 1)).data.cpu().numpy()
            
            output = (torch.max(torch.exp(output), 1)[1]).data.cpu().numpy()
            predicted_labels.extend(output)
            
            labels = labels.data.cpu().numpy()
            true_labels.extend(labels)
    
            prediction_df_base['Filename'].extend(list(filenames))
            prediction_df_base['True Class'].extend(list(labels))
            prediction_df_base['Predicted Class'].extend(list(output))
            prediction_df_base['Prediction'].extend(list(output == labels))
            prediction_df_base['Negative'].extend(list(probabilities[:, 0]))
            prediction_df_base['Primordial'].extend(list(probabilities[:, 1]))
            prediction_df_base['Transitional Primordial'].extend(list(probabilities[:, 2]))
            prediction_df_base['Primary'].extend(list(probabilities[:, 3]))
            prediction_df_base['Transitional Primary'].extend(list(probabilities[:, 4]))
            prediction_df_base['Secondary'].extend(list(probabilities[:, 5]))
            prediction_df_base['Multilayer'].extend(list(probabilities[:, 6]))

    return predicted_labels, true_labels, pd.DataFrame(prediction_df_base)

`make_confusion_matrix`: Creates a confusion matrix based on given predicted labels and actual labels and creates a diagonal confusion matrix based on the same data. Plots both.

<b>Parameters:</b>\
&emsp;`predictions`: The predicted labels used to create the confusion matrix.\
&emsp;`labels`: The actual labels used to create the confusion matrix.\
&emsp;`phase`: The subset of the training data set that was used to generate the predicted labels. (i.e. "train", "test", "validate")

Creates confusion matrix using `predictions` and `labels`.\
Then, converts confusion matrix to percentages for easier viewing.\
Calls `plot_confusion_matrix` to display the full confusion matrix.\
Creates a diagonal confusion matrix where each category is modified to include the percentages for each class that is one class higher and lower than the actual class. Negative is excluded and the first and last classes only add the one higher and one lower class respectively.\
Then, calls `plot_confusion_matrix` to display the diagonal confusion matrix.

In [None]:
def make_confusion_matrix(predictions, labels, phase):
    '''Create a confusion matrix for given prediction and true labels. Determine which image set was used based on phase.
    Plot the confusion matrix and a graph showing each class percentage with a +-1 class buffer.'''
    conf_matrix = confusion_matrix(labels, predictions)
    percentage_conf_matrix = conf_matrix.astype(np.float32) / np.sum(conf_matrix)

    #Make confusion matrix percentage-based
    for i in range(len(classes)):
        percentage_conf_matrix[i, :] = percentage_conf_matrix[i, :] * 100 / np.sum(percentage_conf_matrix[i, :])

    #Plot confusion matrix
    plot_confusion_matrix(percentage_conf_matrix, phase)

    #Make modified confusion matrix of diagonal percentages with one class up and down, excluding negative
    percentage_conf_matrix_diagonal = []

    for i in range(len(classes)):
        if i == 0:
            percentage_conf_matrix_diagonal.append([percentage_conf_matrix[i, i]])
        elif i == 1:
            percentage_conf_matrix_diagonal.append([percentage_conf_matrix[i, i] + percentage_conf_matrix[i, i + 1]])
        elif i > 1 and i < 6:
            percentage_conf_matrix_diagonal.append([percentage_conf_matrix[i, i] + percentage_conf_matrix[i, i + 1] + percentage_conf_matrix[i, i - 1]])
        else:
            percentage_conf_matrix_diagonal.append([percentage_conf_matrix[i, i] + percentage_conf_matrix[i, i - 1]])

    percentage_conf_matrix_diagonal = np.array(percentage_conf_matrix_diagonal)

    #Plot modified diagonal confusion matrix
    plot_confusion_matrix(percentage_conf_matrix_diagonal, phase)

`plot_confusion_matrix`: Displays the given confusion matrix.

<b>Parameters:</b>\
&emsp;`conf_matrix`: Confusion matrix to display.\
&emsp;`phase`: The subset of training images used to create the confusion matrix. (i.e. "train", "test", "validate")

Displays the given confusion matrix using the Seaborn heatmap function. Determines whether it is diagonal or not based on the numpy .shape attribute. Uses `phase` to determine which DataLoader was used.\
If `save_figs` flag is True, saves the confusion matrix as a png using `phase` as part of the filename and keeps track of file using `SAVED_FILES`.\
Determines whether the confusion matrix is the full or diagonal one based on its shape property.

In [None]:
def plot_confusion_matrix(conf_matrix, phase):
    '''Plots given confusion matrix and saves to a file. Determines which image set was used based on phase.'''
    global SAVED_FILES
    
    class_labels = ["Negative", "Primordial", "Transitional Primordial", "Primary", "Transitional Primary", "Secondary", "Multilayer"]
    
    if conf_matrix.shape[1] != 1: #If given normal confusion matrix
        _, ax = plt.subplots(figsize = (14, 12))
    
        sns.heatmap(conf_matrix, cmap = 'BuPu', annot = True, annot_kws = {'size': 18}, vmin = 0, vmax = 100)
    
        ax.set_xlabel("Predicted Labels")
        ax.set_xticklabels(class_labels)
        ax.set_ylabel("True Labels")
        ax.set_yticklabels(class_labels)
        ax.set_title(phase.capitalize() + " Data Confusion Matrix")

        if save_figs:
            file = phase.capitalize() + "_Confusion_Matrix.png"
            
            #Keep track of file
            SAVED_FILES.append(file)
            
            plt.savefig(file)
    
        plt.show()
    else: #If given modified diagonal confusion matrix
        _, ax = plt.subplots(figsize = (4, 12))

        sns.heatmap(conf_matrix, cmap = 'BuPu', annot = True, annot_kws = {'size': 18}, vmin = 0, vmax = 100)

        ax.set_xticklabels([])
        ax.set_yticklabels(class_labels)
        ax.set_title(phase.capitalize() + " Data: Diagonal +-1 Class")

        if save_figs:
            file = phase.capitalize() + "_Diagonal_+-1_Class.png"

            #Keep track of file
            SAVED_FILES.append(file)
            
            plt.savefig(file)

        plt.show()

`save_dataframe`: Saves the given dataframe using phase to differentiate.

<b>Parameters:</b>\
&emsp;`df`: DataFrame to save.\
&emsp;`phase`: The DataLoader phase this DataFrame belongs to. (i.e. "train", "test", "validate")

If `save_figs` flag is set to True, this function will save the given DataFrame to a parquet file with the string given by `phase` appended to the beginning of the filename.\
Does nothing otherwise.

In [None]:
def save_dataframe(df, phase):
    '''Saves given dataframe to file. Determines which dataloader was used to generate df using phase.'''
    global SAVED_FILES
    
    if save_figs:
        file = phase.capitalize() + "_Prediction_Dataframe.parquet"

        #Keep track of file
        SAVED_FILES.append(file)
        
        df.to_parquet(file)

`display_predictions`: Uses predictions made by `dataloader_predictions` to create plots showing images, what class they were predicted to be by the model, and what class they were actually labelled as.

<b>Parameters:</b>\
&emsp;`df`: The Pandas DataFrame produced by `dataloader_predictions`.\
&emsp;`phase`: The phase of the training data set being used. (i.e. "train", "test", "validate")\
&emsp;`rows`: The number of rows to use for the final display. Affects the total number of images to be predicted. <b>Default:</b> 5.\
&emsp;`cols`: The number of columns to use for the final display. Affects the total number of images to be predicted. <b>Default:</b> 3.\
&emsp;`incorrect_only`: Flag to determine whether or not to display only images that were incorrectly predicted. <b>Default:</b> False.

If the `incorrect_only` flag is set to True, modify the DataFrame, `df`, to remove any entries where the model prediction and actual label for an image were the same. Otherwise, does not change the DataFrame.\
Randomly samples `rows` * `cols` number of images from `df`. Then, displays those images, the model predictions for them, and the human annotations for them on a labelled plot.\
If `save_figs` flag is True, displayed predictions are saved as pngs, with the `phase` and `incorrect_only` parameters determining filenames, and kept track of using `SAVED_FILES`.

In [None]:
def display_predictions(df, phase, rows = 5, cols = 3, incorrect_only = False):
    '''Randomly select rows * cols number of images from each class in the dataloader and make predictions on those images.
    Display the images and their true and predicted labels.
    If incorrect_only is True, only images that were incorrectly predicted will be displayed.
    Determine which dataloader to use based on phase.'''
    global SAVED_FILES
    
    total_images = rows * cols
    base_dir = TRAINING_DATA + phase + "/"
    label_translation = {
        0: "0_Negative",
        1: "1_Primordial",
        2: "2_Transitional Primordial",
        3: "3_Primary",
        4: "4_Transitional Primary",
        5: "5_Secondary",
        6: "6_Multilayer"
    }

    if incorrect_only: #If only looking for images that were incorrectly predicted
        df = df[df["Prediction"] == False]
    
    #Randomly sample total_images number of images from the DataFrame weighted so that each class is equally likely to be drawn
    predictions = df.sample(total_images, weights = (df["True Class"].value_counts()[0] / df["True Class"].value_counts())[df["True Class"]].values, ignore_index = True)
    predictions = predictions[["Filename", "True Class", "Predicted Class"]]
    
    fig, axes = plt.subplots(rows, cols, figsize = (cols * 5, rows * 5))

    axes = axes.flat

    #Plot each image
    for ax, (image_file, true_class, pred_class) in zip(axes, predictions.values):
        image = Image.open(image_file)
        file_name = image_file.split("/")[-1][:-4].split("_w")

        ax.imshow(image)
        ax.set_title(file_name[0] + "\nw" + file_name[1] + "\n\nTrue Label: " + label_translation[true_class] + "\nPredicted Label: " + label_translation[pred_class] + " (" + str(true_class == pred_class) + ")")

    fig.set_tight_layout(True)

    #Save plots if desired
    if save_figs:
        if incorrect_only:
            file = phase.capitalize() + "_Random_Image_Prediction_Results_Incorrect_Only.png"

            #Keep track of file
            SAVED_FILES.append(file)
            
            plt.savefig(file)
        else:
            file = phase.capitalize() + "_Random_Image_Prediciton_Results.png"
            
            #Keep track of file
            SAVED_FILES.append(file)
            
            plt.savefig(file)

    plt.show()

## Data and Model Setup

This section contains the set up for the Datasets, DataLoaders, and image classification model.

Set the device to use for PyTorch. Then print out the device to make sure it is using the correct one.\
&emsp;"cuda" = GPU, "cpu" = CPU

In [None]:
#Torch device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device: {}".format(device)) #Check torch device

Create dictionaries that store the datasets and DataLoaders as well as their sizes for each subset of the training data ("train", "test", "validate").

<b>Dictionary Structures:</b>
```python
image_datasets = {
    'train': ImageFolderWithPaths object,
    'test': ImageFolderWithPaths object,
    'validate': ImageFolderWithPaths object
}
image_dataloaders = {
    'train': DataLoader object,
    'test': DataLoader object,
    'validate': DataLoader object
}
image_dataset_sizes = {
    'train': length of image_datasets['train'],
    'test': length of image_datasets['test'],
    'validate': length of image_datasets['validate']
}
image_dataloader_sizes = {
    'train': length of image_dataloaders['train'],
    'test': length of image_dataloaders['test'],
    'validate': length of image_dataloaders['validate']
}
```

In [None]:
#Create datasets and dataloaders
image_datasets = {}
image_dataloaders = {}
image_dataset_sizes = {}
image_dataloader_sizes = {}

for phase in ['train', 'test', 'validate']:
    image_datasets[phase] = ImageFolderWithPaths(TRAINING_DATA + phase + "/", transform = transform)
    image_dataloaders[phase] = DataLoader(image_datasets[phase], batch_size = batch_size, shuffle = True, num_workers = num_workers, pin_memory = True)
    image_dataset_sizes[phase] = len(image_datasets[phase])
    image_dataloader_sizes[phase] = len(image_dataloaders[phase])

print("Dataset Sizes: {}\n".format(image_dataset_sizes))
print("Dataloader Sizes: {}\n".format(image_dataloader_sizes))

#Define classes
classes = image_datasets['train'].classes

print("Classes: {}".format(classes))

Determine the total number of images in the train subset of the training data. Also determine the number of images in each class of the train subset.

In [None]:
#Determine individual class sizes for, and total size of, training data
class_sizes = {}

for i in range(len(classes)):
    for _, _, files in os.walk(TRAINING_DATA + "train/" + classes[i] + "/"):
        class_sizes[classes[i]] = len(files)

print("Train Data Class Sizes: {}".format(class_sizes))

Using the calculated class sizes, create weights for each class by dividing the number of image in the largest class by the number of images in each class.

In [None]:
#Calculate class weights for uneven class sizes
class_weights = []

for i in class_sizes.values():
    class_weights.append(max(class_sizes.values()) / i)

class_weights = torch.tensor(class_weights).to(device)

print("Class Weights: {}".format(class_weights))

Call `setup_model` and pass it the pretrained ResNet34 to set up.

If `use_class_weights` flag is True, create the loss function with the class weights. Otherwise, just create the loss function.\
The loss function used is CrossEntropyLoss.

Create the optimizer, passing it `sgd_learning_rate`, `sgd_momentum`, and `sgd_weight_decay`.\
The optimizer used is SGD (Stochastic Gradient Descent).

In [None]:
#Model setup
model = setup_model(models.resnet34(weights = "ResNet34_Weights.DEFAULT"), len(classes))

#Loss function
if use_class_weights:
    criterion = nn.CrossEntropyLoss(weight = class_weights)
else:
    criterion = nn.CrossEntropyLoss()

#Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = sgd_learning_rate, momentum = sgd_momentum, weight_decay = sgd_weight_decay)

## Main Code

This section contains the main code of this notebook that trains the model and then analyzes it.

### Display some Training Images

Pull 8 images from the train subset and display both the original image as well as the transformed image side by side.

In [None]:
#Display random images and their transformed counterparts
for images, labels, file_paths in image_dataloaders['train']:
    images = images.numpy()

    cols = ["Original Image", "Transformed Image"]
    rows = labels[:8]
    
    fig, ax = plt.subplots(8, 2, figsize = (8, 25))

    for axis, col in zip(ax[0], cols):
        axis.set_title(col)

    for axis, row in zip(ax[:, 0], rows):
        axis.set_ylabel(classes[row])

    for i in range(8):
        ax[i, 0].imshow(Image.open(file_paths[i]))
        ax[i, 1].imshow(np.clip(np.transpose(images[i], (1, 2, 0)), a_min = 0, a_max = 1))

    fig.tight_layout()
    plt.show()
    break

### Model Training

Train the model for `num_epochs` + 1 by calling `train_model` and passing it the model, loss function, and optimizer created earlier, as well as `num_epochs`.\
Once finished, plot the losses and accuracies using `train_test_graph`.

In [None]:
#Train model
losses, accuracies = train_model(model, criterion, optimizer, num_epochs = num_epochs)

In [None]:
#Graph losses during training and testing
train_test_graph(losses, "Loss")

In [None]:
#Graph accuracies during training and testing
train_test_graph(accuracies, "Accuracy")

### Model Analysis and File Saving Per Epoch

If the `save_each_epoch` flag is set to True, iterate through each epoch using `num_epochs`. Also, iterate through each dataloader phase ("train", "test", "validate") per epoch. Make predictions on the dataloader and make confusion matrices from those predictions. Then, save the dataframe created by `dataloader_predictions` to a file. Print the classification report for the predictions made previously as well. Finally, display some randomly sampled images with their predictions and actual labels for comparison. Once all three phases are finished, create an output directory for the current epoch and move everything into it. Then, continue on to the next epoch. Once all epochs are finished, "Done" will be printed to the output.

If the `save_each_epoch` flag is set to False, only iterate through the "best_accuracy" and "best_loss" checkpoints that were saved. Also, iterate through each dataloader phase ("train", "test", "validate") per checkpoint. Make predictions on the dataloader and make confusion matrices from those predictions. Then, save the dataframe created by `dataloader_predictions` to a file. Print the classification report for the predictions made previously as well. Finally, display some randomly sampled images with their predictions and actual labels for comparison. Once all three phases are finished, create an output directory for the current checkpoint and move everything into it. Then, continue on to the next checkpoint. Once both checkpoints are finished, "Done" will be printed to the output.

In [None]:
if save_each_epoch: #If a checkpoint was saved for each epoch during training
    #Iterate over each epoch
    for epoch in range(0, num_epochs + 1):
        print("---------- Epoch: {} ----------".format(epoch))
        
        #Load checkpoint for epoch
        model.load_state_dict(torch.load(CHECKPOINT + str(epoch) + ".pt", map_location = device, weights_only = True))
    
        #Iterate over each phase
        for phase in ['train', 'test', 'validate']:
            print("  ---------- Phase: {} ----------".format(phase.capitalize()))
            
            #Make predictions
            predicted_labels, true_labels, prediction_df = dataloader_predictions(phase)
    
            #Make and plot confusion matrices
            make_confusion_matrix(predicted_labels, true_labels, phase)
    
            #Save prediction dataframe
            save_dataframe(prediction_df, phase)
    
            #Classification report
            print(classification_report(true_labels, predicted_labels))
    
            #Display random images and their predictions
            display_predictions(prediction_df, phase)
    
            #Display random images that were incorrectly predicted
            display_predictions(prediction_df, phase, incorrect_only = True)
    
        #Move outputs and save notebook for epoch
        print("  ---------- Saving and Moving Files ----------")
        
        #Create time-stamped output directory
        make_output_dir(epoch = str(epoch))
        print("    Output Dir: {}".format(TIME_STAMP_OUTPUT_DIR))
    
        #Move output files to output directory
        shutil.move(CHECKPOINT + str(epoch) + ".pt", TIME_STAMP_OUTPUT_DIR)
    
        if len(SAVED_FILES) > 0:
            for file in SAVED_FILES:
                shutil.move(file, TIME_STAMP_OUTPUT_DIR)
    
        SAVED_FILES.clear()
    
        #Save notebook
        APP.commands.execute("docmanager:save")
    
        #Copy notebook to output directory
        shutil.copy2(NOTEBOOK_NAME, TIME_STAMP_OUTPUT_DIR)
    
        #Clear cell output to avoid file save errors due to overflow
        clear_output(wait = True)
    
    print("Done")
elif not save_each_epoch: #If checkpoints for only the best accuracy and loss epochs were saved during training
    #Iterate over each checkpoint
    for checkpoint in ["best_accuracy", "best_loss"]:
        print("---------- Checkpoint: {} ----------".format(checkpoint))
        
        #Load checkpoint
        model.load_state_dict(torch.load(CHECKPOINT[:-6] + checkpoint + ".pt", map_location = device, weights_only = True))
    
        #Iterate over each phase
        for phase in ['train', 'test', 'validate']:
            print("  ---------- Phase: {} ----------".format(phase.capitalize()))
            
            #Make predictions
            predicted_labels, true_labels, prediction_df = dataloader_predictions(phase)
    
            #Make and plot confusion matrices
            make_confusion_matrix(predicted_labels, true_labels, phase)
    
            #Save prediction dataframe
            save_dataframe(prediction_df, phase)
    
            #Classification report
            print(classification_report(true_labels, predicted_labels))
    
            #Display random images and their predictions
            display_predictions(prediction_df, phase)
    
            #Display random images that were incorrectly predicted
            display_predictions(prediction_df, phase, incorrect_only = True)
    
        #Move outputs and save notebook for epoch
        print("  ---------- Saving and Moving Files ----------")
        
        #Create time-stamped output directory
        make_output_dir(checkpoint = checkpoint)
        print("    Output Dir: {}".format(TIME_STAMP_OUTPUT_DIR))
    
        #Move output files to output directory
        shutil.move(CHECKPOINT[:-6] + checkpoint + ".pt", TIME_STAMP_OUTPUT_DIR)
    
        if len(SAVED_FILES) > 0:
            for file in SAVED_FILES:
                shutil.move(file, TIME_STAMP_OUTPUT_DIR)
    
        SAVED_FILES.clear()
    
        #Save notebook
        APP.commands.execute("docmanager:save")
    
        #Copy notebook to output directory
        shutil.copy2(NOTEBOOK_NAME, TIME_STAMP_OUTPUT_DIR)
    
        #Clear cell output to avoid file save errors due to overflow
        clear_output(wait = True)
    
    print("Done")

### Convert Notebook to HTML and Move to Output

Programmatically save the notebook using `APP`. (For some reason, the `time.sleep()` call is needed for the save command to properly execute)\
Then, convert the notebook to html and move it to the base `OUTPUT_DIR`.

In [None]:
#For some reason, this is needed for the save command just below it to function
time.sleep(1)

#Save notebook
APP.commands.execute("docmanager:save")

#Convert notebook to html
!jupyter nbconvert --to html "$NOTEBOOK_NAME"

#Move html to output directory
shutil.move(NOTEBOOK_NAME[:-6] + ".html", OUTPUT_DIR + NOTEBOOK_NAME[:-6] + "_" + datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + ".html")