<h1>Convolutional Neural Networks</h1>
<br>
<img src="https://miro.medium.com/max/2000/1*1TI1aGBZ4dybR6__DI9dzA.png" width="900" align="center">

<br><br>
In this lab we will be constructing and training a "Convolutional Neural Network" aka a neural network that contains convolution kernels with learnable parameters.<br>
We are also going to learn a bit more about Pytorch transforms and how to create save "checkpoints" for our model!

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.utils.data.dataloader as dataloader
import os
import random
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import clear_output

In [None]:
#the size of our mini batches
batch_size     = 64
#How many itterations of our dataset
num_epochs     = 5
#optimizer learning rate
learning_rate  = 1e-4
#initialise what epoch we start from
start_epoch    = 0
#initialise best valid accuracy 
best_valid_acc = 0
#where to load/save the dataset from 
data_set_root = "data"

#start from a checkpoint or start from scratch?
start_from_checkpoint = False
#A directory to save our model to (will create it if it doesn't exist)
save_dir = 'Models'
#A name for our model!
model_name = 'LeNet5_MNIST'

In [None]:
#Set device to GPU_indx if GPU is avaliable
GPU_indx = 0
device = torch.device(GPU_indx if torch.cuda.is_available() else 'cpu')

<h3> Create a transform for the input data </h3>
As we have seen, we often wish to perform some operations on data before we pass it through our model. Such operations could be, cropping or resizing images, affine transforms and data normalizations. Pytorch's torchvision module has a large number of such "transforms" which can be strung together sequentially using the "Compose" function. <br>

Pytorch's inbuilt datasets take a transform as an input and will apply this transform to the data before passing it to you! This makes preprocessing data really easy! We will see more about data preprocessing in a later lab!

[torchvision.transforms](https://pytorch.org/docs/stable/torchvision/transforms.html)

In [None]:
#Prepare a composition of transforms
#transforms.Compose will perform the transforms in order
#NOTE: some transform only take in a PIL image, others only a Tensor
#EG Resize and ToTensor take in a PIL Image, Normalize takes in a Tensor
#Refer to documentation
transform = transforms.Compose([
            transforms.Resize(32),
            transforms.ToTensor(),
            transforms.Normalize([0.1307], [0.308])])

#Note: ToTensor() will scale unit8 and similar type data to a float and re-scale to 0-1
#Note: We are normalizing with the dataset mean and std 

<h3> Create the training, testing and validation data</h3>
When training many machine learning systems it is best practice to have our TOTAL dataset split into three segments, the training set, testing set and validation set. Up until now we have only had a train/test set split and have used the test set to gauge the performance during training. Though for the most "unbiased" results we should really not use our test set until training is done! So if we want to evaluate our model on an "unseen" part of the dataset we need another split - the validation set. <br><br>
<b>Training set</b>   - the data we train our model on<br>
<b>Validation set</b> - the data we use to gauge model performance during training<br>
<b>Testing set</b>   - the data we use to "rate" our trained model<br>

In [None]:
#Define our MNIST Datasets
#Can also try with CIFAR10 Dataset
#https://pytorch.org/docs/stable/torchvision/datasets.html#mnist
train_data = ########Fill out#########
test_data  = ########Fill out#########

#We are going to split the test dataset into a train and validation set 90%/10%
validation_split = 0.9

#Determine the number of samples for each split
n_train_examples = int(len(train_data)*validation_split)
n_valid_examples = len(train_data) - n_train_examples

#The function random_split will take our dataset and split it randomly and give us dataset
#that are the sizes we gave it
#Note: we can split it into to more then two pieces!
train_data, valid_data = torch.utils.data.random_split(train_data, [n_train_examples, n_valid_examples],
                                                       generator=torch.Generator().manual_seed(42))

#IMPORTANT TO KNOW!!!!!!!!!
#Here we pass the random_split function a manual seed, this is very important as if we did not do this then 
#everytime we randomly split our training and validation set we would get different splits!!!
#For example if we saved our model and reloaded it in the future to train some more, the dataset that we now use to
#train with will undoubtably contain datapoints that WERE in the validation set initially!!
#Our model would therefore be trained with both validation and training data -- very bad!!!
#Setting the manual seed to the same value everytime prevents this!

<h3> Check the lengths of all the datasets</h3>

In [None]:
print(f'Number of training examples: {len(train_data)}')
print(f'Number of validation examples: {len(valid_data)}')
print(f'Number of testing examples: {len(test_data)}')

<h3> Create the dataloader</h3>

In [None]:
#Create the training, Validation and Evaluation/Test Datasets
#It is best practice to separate your data into these three Datasets
#Though depending on your task you may only need Training + Evaluation/Test or maybe only a Training set
#(It also depends on how much data you have)
#https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataloader
train_loader =  ########Fill out#########
valid_loader =  ########Fill out#########
test_loader  =  ########Fill out#########

<h2> Create the LeNet5 network</h2>
LeNet5 is a "classic" old convolution neural network (one of the oldest dating back to 1998) we will be creating an implementation of it here! It uses both convolutional layers and linear layers to "learn" features of the image and perform the classification. It also uses "Max Pooling" to downsample the "feature maps" (the 2d hidden layers at the output of a convolutional layer, in the image at the top of the notebook they are called "Subsampling" layers)

[Max Pooling](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)

In [None]:
class LeNet(nn.Module):
    def __init__(self, channels_in):
        #Call the __init__ function of the parent nn.module class
        super(LeNet, self).__init__()
        #Define 2 Convolution Layers
        
        #conv1 kernel shape - 6xchannels_inx5x5 
        ########Fill out#########
        
        #conv2 kernel shape - 16x6x5x5
        ########Fill out#########
        
        #Define MaxPooling Layers
        #https://computersciencewiki.org/index.php/Max-pooling_/_Pooling
        #Default Stride is = to kernel_size
        #NOTE: You only need to create ONE maxpooling layer 
        #You can use the SAME maxpooling layer multiple times in your forward pass
        ########Fill out#########
        
        #Define 3 Linear/Fully connected/Dense Layers
        #Input to linear1 is the number of features from previous conv - 16x5x5
        #output of linear1 is 120
        ########Fill out#########
        
        #output of linear2 is 84
        ########Fill out#########
        
        #output of linear3 is 10
        ########Fill out#########
            
    def forward(self, x):
        #Pass input through conv layers
        #input shape is BatchSize-3-32-32
        
        #Conv then F.relu()  ########Fill out#########
        #output shape is BatchSize-6-28-28
        
        #maxpool  ########Fill out#########
        #output shape is BatchSize-6-14-14

        #Conv then F.relu()  ########Fill out#########
        #output shape is BatchSize-16-10-10
        
        #maxpool  ########Fill out#########
        #output shape is BatchSize-16-5-5

        #Flatten output to shape BatchSize-16x5x5
        ########Fill out#########
        
        #linear then F.relu()  ########Fill out#########
        #output shape is BatchSize-120
        
        #linear then F.relu()  ########Fill out#########
        #output shape is BatchSize-84
        
        #linear to output  ########Fill out#########
        #output shape is BatchSize-10
        return out5

<h3> Create our model and view the ouput! </h3>

In [None]:
#create a dataloader itterable object
dataiter = iter(train_loader)
#sample from the itterable object
images, labels = dataiter.next()

In [None]:
#create an instance of our network
#set channels_in to the number of channels of the dataset images
net =  ########Fill out#########
#view the network
print(net)

In [None]:
#pass image through network
out =  ########Fill out#########
#check output!
out.shape

<h3> Set up the optimizer </h3>

In [None]:
#Pass our network parameters to the optimiser set our lr as the learning_rate
#Use the Adam optimizer!
#https://pytorch.org/docs/stable/optim.html#torch.optim.Adam
optimizer = ########Fill out#########

In [None]:
#Define a Cross Entropy Loss
#https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss
########Fill out#########

<h3> Loading Checkpoints</h3>
This bit of code will load the parameters of a model and a optimizer from file if start_from_checkpoint == True. Saving your model parameters during training is a good idea!

In [None]:
#Create Save Path from save_dir and model_name, we will save and load our checkpoint here
save_path = os.path.join(save_dir, model_name + ".pt")

#Create the save directory if it does note exist
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)

#Load Checkpoint
if start_from_checkpoint:
    #Check if checkpoint exists
    if os.path.isfile(save_path):
        #load Checkpoint
        check_point = torch.load(save_path)
        #Checkpoint is saved as a python dictionary
        #https://www.w3schools.com/python/python_dictionaries.asp
        #here we unpack the dictionary to get our previous training states
        net.load_state_dict(check_point['model_state_dict'])
        optimizer.load_state_dict(check_point['optimizer_state_dict'])
        start_epoch = check_point['epoch']
        best_valid_acc = check_point['valid_acc']
        print("Checkpoint loaded, starting from epoch:", start_epoch)
    else:
        #Raise Error if it does not exist
        raise ValueError("Checkpoint Does not exist")
else:
    #If checkpoint does exist and Start_From_Checkpoint = False
    #Raise an error to prevent accidental overwriting
    if os.path.isfile(save_path):
        raise ValueError("Warning Checkpoint exists")
    else:
        print("Starting from scratch")

# Define the training process

In [None]:
#This function should perform a single training epoch using our training data
def train(net, device, loader, optimizer, loss_fun, loss_logger):
    
    #Set Network in train mode
    net.train()
    
    #Perform a single epoch of training on the input dataloader, logging the loss at every step 
    ########Fill out#########
        
    #return the logger array       
    return loss_logger

# Define the testing process

In [None]:
#This function should perform a single evaluation epoch, it WILL NOT be used to train our model
def evaluate(net, device, loader):
    
    #initialise counter
    epoch_acc = 0
    
    #Set network in evaluation mode
    #Layers like Dropout will be disabled
    #Layers like Batchnorm will stop calculating running mean and standard deviation
    #and use current stored values
    #(More on these layer types soon!)
    net.eval()
    
    #Perform a single epoch of evaluation and return the accuracy of the model on the input dataloader
    ########Fill out#########
            
    #return the accuracy from the epoch     
    return epoch_acc

# The training process

In [None]:
training_loss_logger = []
validation_acc_logger = []
training_acc_logger = []

In [None]:
#This cell implements our training loop
for epoch in range(start_epoch, num_epochs):
    
    #call the training function and pass training dataloader etc
    ########Fill out#########
    
    #call the evaluate function and pass the dataloader for both vailidation and training
    train_acc = ########Fill out#########
    valid_acc = ########Fill out#########
    #log the accuracies 
    ########Fill out#########
    ########Fill out#########

    #If this model has the highest performace on the validation set 
    #then save a checkpoint
    #{} define a dictionary, each entry of the dictionary is indexed with a string
    if (valid_acc > best_valid_acc):
        best_valid_acc = valid_acc
        print("Saving Model")
        torch.save({
            'epoch':                 epoch,
            'model_state_dict':      net.state_dict(),
            'optimizer_state_dict':  optimizer.state_dict(), 
            'train_acc':             train_acc,
            'valid_acc':             valid_acc,
        }, save_path)
    
    clear_output(True)
    print(f'| Epoch: {epoch+1:02} | Train Acc: {train_acc*100:05.2f}% | Val. Acc: {valid_acc*100:05.2f}% |')
print("Training Complete")

In [None]:
#plot out the training_loss_logger 

In [None]:
#plot out the validation_acc_logger and  training_acc_logger

# Evaluate

In [None]:
#call the evaluate function and pass the test dataloader to see how good our model is!
########Fill out#########