<h1>Convolutional Neural Networks</h1>
<img src="https://miro.medium.com/max/4348/1*PXworfAP2IombUzBsDMg7Q.png" width="750" align="center">

In this lab we will be constructing and training a "Convolutional Neural Network" aka a neural network that contains convolution kernels with learnable parameters.<br>
We are also going to learn a bit more about Pytorch transforms and how to create save "checkpoints" for our model!

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.utils.data.dataloader as dataloader
import os
import random
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import clear_output

In [None]:
#the size of our mini batches
batch_size     = 64
#How many itterations of our dataset
num_epochs     = 5
#optimizer learning rate
learning_rate  = 0.01
#initialise what epoch we start from
start_epoch    = 0
#initialise best valid accuracy 
best_valid_acc = 0
#where to load/save the dataset from 
data_set_root = "data"

#start from a checkpoint or start from scratch?
start_from_checkpoint = False
#A directory to save our model to (will create it if it doesn't exist)
save_dir = 'Models'
#A name for our model!
model_name = 'LeNet5_MNIST'

In [None]:
#Set device to GPU_indx if GPU is avaliable
GPU_indx = 0
device = torch.device(GPU_indx if torch.cuda.is_available() else 'cpu')

<h3> Create a transform for the input data </h3>
As we have seen, we often wish to perform some operations on data before we pass it through our model. Such operations could be, cropping or resizing images, affine transforms and data normalizations. Pytorch's torchvision module has a large number of such "transforms" which can be strung together sequentially using the "Compose" function. <br>

Pytorch's inbuilt datasets take a transform as an input and will apply this transform to the data before passing it to you! This makes preprocessing data really easy! We will see more about data preprocessing in a later lab!

[torchvision.transforms](https://pytorch.org/docs/stable/torchvision/transforms.html)

In [None]:
#Prepare a composition of transforms
#transforms.Compose will perform the transforms in order
#NOTE: some transform only take in a PIL image, others only a Tensor
#EG Resize and ToTensor take in a PIL Image, Normalize takes in a Tensor
#Refer to documentation
transform = transforms.Compose([
            transforms.Resize(32),
            transforms.ToTensor(),
            transforms.Normalize([0.1307], [0.308])])

#Note: ToTensor() will scale unit8 and similar type data to a float and re-scale to 0-1
#Note: We are normalizing with the dataset mean and std 

<h3> Create the training, testing and validation data</h3>
When training many machine learning systems it is best practice to have our TOTAL dataset split into three segments, the training set, testing set and validation set. Up until now we have only had a train/test set split and have used the test set to gauge the performance during training. Though for the most "unbiased" results we should really not use our test set until training is done! So if we want to evaluate our model on an "unseen" part of the dataset we need another split - the validation set. <br>
Training set   - the data we train our model on
Validation set - the data we use to gauge model performance during training
Testing set   - the data we use to "rate" our trained model

In [None]:
#Create our MNIST train and test datasets
#Can also try with CIFAR10 Dataset
#https://pytorch.org/docs/stable/torchvision/datasets.html#mnist
train_data = ########Fill out#########
test_data  = ########Fill out#########

#We are going to split the train dataset into a train and validation set 90/10
validation_split = 0.9

#Determine the number of samples for each split
n_train_examples = int(len(train_data)*validation_split)
n_valid_examples = len(train_data) - n_train_examples

#The function random_split will take our dataset and split it randomly and give us dataset
#that are the sizes we gave it
#Note: we can split it into to more then two pieces!
train_data, valid_data = torch.utils.data.random_split(train_data, [n_train_examples, n_valid_examples])

<h3> Check the lengths of all the datasets</h3>

In [None]:
print(f'Number of training examples: {len(train_data)}')
print(f'Number of validation examples: {len(valid_data)}')
print(f'Number of testing examples: {len(test_data)}')

<h3> Create the dataloader</h3>

In [None]:
#Create the training, Validation and Evaluation/Test Datasets
#It is best practice to separate your data into these three Datasets
#Though depending on your task you may only need Training + Evaluation/Test or maybe only a Training set
#(It also depends on how much data you have)
#https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataloader
train_loader =  ########Fill out#########
valid_loader =  ########Fill out#########
test_loader  =  ########Fill out#########

<h2> Create th LeNet5 network</h2>
LeNet5 is a "classic" old convolution neural network (one of the oldest dating back to 1998) we will be creating an implementation of it here! It uses both convolutional layers and linear layers to "learn" features of the image and perform the classification. It also uses "Max Pooling" to downsample the "feature maps" (the 2d hidden layers at the output of a convolutional layer)

[Max Pooling](https://computersciencewiki.org/index.php/Max-pooling_/_Pooling)

In [None]:
class LeNet(nn.Module):
    def __init__(self, channels_in):
        #Call the __init__ function of the parent nn.module class
        super(LeNet, self).__init__()
        #Define Convolution Layers
        #conv1 6 channels_inx5x5 kernals
        self.conv1 = ########Fill out#########
        
        #conv2 16 6x5x5 kernals
        self.conv2 =  ########Fill out#########
        
        #Define MaxPooling Layers
        #https://computersciencewiki.org/index.php/Max-pooling_/_Pooling
        #Default Stride is = to kernel_size
        self.maxpool = nn.MaxPool2d(kernel_size=2)
        
        #Define Linear/Fully connected/ Dense Layers
        #Input to linear1 is the number of features from previous conv - 16x5x5
        #output of linear1 is 120
        self.linear1 =  ########Fill out#########
        #output of linear2 is 84
        self.linear2 =  ########Fill out#########
        #output of linear3 is 10
        self.linear3 =  ########Fill out#########
            
    def forward(self, x):
        #Pass input through conv layers
        #x shape is BatchSize-3-32-32
        
        out1 = #Conv then F.relu()  ########Fill out#########
        #out1 shape is BatchSize-6-28-28
        out1 = #maxpool  ########Fill out#########
        #out1 shape is BatchSize-6-14-14

        out2 = #Conv then F.relu()  ########Fill out#########
        #out2 shape is BatchSize-16-10-10
        out2 = #maxpool  ########Fill out#########
        #out2 shape is BatchSize-16-5-5

        #Flatten out2 to shape BatchSize-16x5x5
        out2 =  ########Fill out#########
        
        out3 = #linear then F.relu()  ########Fill out#########
        #out3 shape is BatchSize-120
        out4 = #linear then F.relu()  ########Fill out#########
        #out4 shape is BatchSize-84
        out5 = #linear to output  ########Fill out#########
        #out5 shape is BatchSize-10
        return out5

<h3> Create our model and view the ouput! </h3>

In [None]:
#create a dataloader itterable object
dataiter = iter(train_loader)
#sample from the itterable object
images, labels = dataiter.next()

In [None]:
#create an instance of our network
#set channels_in to the number of channels of the dataset images
net =  ########Fill out#########
#view the network
print(net)

In [None]:
#pass image through network
out =  ########Fill out#########
#check output
out.shape

<h3> Set up the optimizer </h3>

In [None]:
#Pass our network parameters to the optimiser set our lr as the learning_rate
#https://pytorch.org/docs/stable/optim.html#torch.optim.Adam
optimizer = optim.Adam(########Fill out#########)

In [None]:
#Define a Cross Entropy Loss
#https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss
Loss_fun = ########Fill out#########

<h3> Loading Checkpoints</h3>
This bit of code will load the parameters of a model and a optimizer from file if start_from_checkpoint == True. Saving your model parameters during training is a good idea!

In [None]:
#Create Save Path from save_dir and model_name, we will save and load our checkpoint here
Save_Path = os.path.join(save_dir, model_name + ".pt")

#Create the save directory if it does note exist
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)

#Load Checkpoint
if start_from_checkpoint:
    #Check if checkpoint exists
    if os.path.isfile(Save_Path):
        #load Checkpoint
        check_point = torch.load(Save_Path)
        #Checkpoint is saved as a python dictionary
        #https://www.w3schools.com/python/python_dictionaries.asp
        #here we unpack the dictionary to get our previous training states
        net.load_state_dict(check_point['model_state_dict'])
        optimizer.load_state_dict(check_point['optimizer_state_dict'])
        start_epoch = check_point['epoch']
        best_valid_acc = check_point['valid_acc']
        print("Checkpoint loaded, starting from epoch:", start_epoch)
    else:
        #Raise Error if it does not exist
        raise ValueError("Checkpoint Does not exist")
else:
    #If checkpoint does exist and Start_From_Checkpoint = False
    #Raise an error to prevent accidental overwriting
    if os.path.isfile(Save_Path):
        raise ValueError("Warning Checkpoint exists")
    else:
        print("Starting from scratch")

# Define the accuracy calculator

In [None]:
def calculate_accuracy(fx, y):
    preds = fx.max(1, keepdim=True)[1]
    correct = preds.eq(y.view_as(preds)).sum()
    acc = correct.float()/preds.shape[0]
    return acc

# Define the training process

In [None]:
#This function should perform a single training epoch using our training data
def train(net, device, loader, optimizer, Loss_fun, loss_logger):
    
    #initialise counters
    epoch_loss = 0
    epoch_acc = 0
    
    #Set Network in train mode
    ########Fill out#########
    
    for i, (x, y) in enumerate(loader):
        
        #load images and labels to device
        x =  # x is the image
        y =  # y is the corresponding label
                
        #Forward pass of image through network and get output
        fx = ########Fill out#########
        
        #Calculate loss using loss function
        loss = ########Fill out#########
        
        #calculate the accuracy
        acc = ########Fill out#########

        #Zero Gradents
        ########Fill out#########
        
        #Backpropagate Gradents
        ########Fill out#########
        
        #Do a single optimization step
        ########Fill out#########
        
        #create the cumulative sum of the loss and acc
        epoch_loss += ########Fill out#########
        epoch_acc += ########Fill out#########
        
        #log the loss for plotting
        loss_logger.append(loss.item())

        #clear_output is a handy function from the IPython.display module
        #it simply clears the output of the running cell
        
        clear_output(True)
        print("TRAINING: | Itteration [%d/%d] | Loss %.2f |" %(i+1 ,len(loader) , loss.item()))
        
    #return the avaerage loss and acc from the epoch as well as the logger array       
    return epoch_loss / len(loader), epoch_acc / len(loader), loss_logger

# Define the testing process

In [None]:
#This function should perform a single evaluation epoch and will be passed our validation or evaluation/test data
#it WILL NOT be used to train out model
def evaluate(net, device, loader, Loss_fun, loss_logger = None):
    
    epoch_loss = 0
    epoch_acc = 0
    
    #Set network in evaluation mode
    #Layers like Dropout will be disabled
    #Layers like Batchnorm will stop calculating running mean and standard deviation
    #and use current stored values
    ########Fill out#########
    
    with torch.no_grad():
        for i, (x, y) in enumerate(loader):
            
            #load images and labels to device
            x = ########Fill out#########
            y = ########Fill out#########
            
            #Forward pass of image through network
            fx = ########Fill out#########
            
            #Calculate loss using loss function
            loss = ########Fill out#########
            
            #calculate the accuracy
            acc = ########Fill out#########
            
            #log the cumulative sum of the loss and acc
            epoch_loss += ########Fill out#########
            epoch_acc += ########Fill out#########
            
            #log the loss for plotting if we passed a logger to the function
            if not (loss_logger is None):
                loss_logger.append(loss.item())
            
            clear_output(True)
            print("EVALUATION: | Itteration [%d/%d] | Loss %.2f | Accuracy %.2f%% |" %(i+1 ,len(loader), loss.item(), 100*(epoch_acc/ len(loader))))
    
    #return the avaerage loss and acc from the epoch as well as the logger array       
    return epoch_loss / len(loader), epoch_acc / len(loader), loss_logger

# The training process

In [None]:
#This cell implements our training loop
training_loss_logger = []
validation_loss_logger = []

for epoch in range(start_epoch, num_epochs):
    
    #call the training function and pass training dataloader etc
    train_loss, train_acc, training_loss_logger = ########Fill out#########
    
    #call the evaluate function and pass validation dataloader etc
    valid_loss, valid_acc, validation_loss_logger = ########Fill out#########

    #If this model has the highest performace on the validation set 
    #then save a checkpoint
    #{} define a dictionary, each entry of the dictionary is indexed with a string
    if (valid_acc > best_valid_acc):
        print("Saving Model")
        torch.save({
            'epoch':                 epoch,
            'model_state_dict':      net.state_dict(),
            'optimizer_state_dict':  optimizer.state_dict(), 
            'train_acc':             train_acc,
            'valid_acc':             valid_acc,
        }, Save_Path)
    
    print(f'| Epoch: {epoch+1:02} | Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:05.2f}% | Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:05.2f}% |')

In [None]:
#plot out the training_loss_logger and validation_loss_logger
plt.figure(figsize = (10,10))
train_x = np.linspace(0, num_epochs, ########Fill out#########)
plt.plot(train_x, training_loss_logger, c = "y")
valid_x = np.linspace(0, num_epochs, ########Fill out#########)
plt.plot(valid_x, validation_loss_logger, c = "k")

plt.title("LeNet")
plt.legend(["Training Loss", "Validation Loss"])

# Evaluate

In [None]:
#call the evaluate function and pass the evaluation/test dataloader etc
test_loss, test_acc, _ = ########Fill out#########