# Practical Work 6: Classification of MNIST digits with a neural network

Practical work originally created by Alasdair Newson (https://sites.google.com/site/alasdairnewson/)

and later modified by Loïc Le Folgoc.

### Goal:

We want to implement a Convolutional Neural Network (CNN) for image recognition on the dataset MNIST (images of digits).

<br>We will first code the simple ConvNet described below using the Pytorch environment : https://pytorch.org/.

- The input of the CNN is a set of (3,m,n) image tensors (m and n depend on the dataset).
- We apply
    - a Convolutional layer of 32 filters of shape (3,3), with stride (1,1) and padding='same' (i.e. we apply zero-padding)
    - additive biases
    - a ReLu activation function
    - a Convolutional layer of 32 filters of shape (3,3), with stride (1,1) and padding='same' (i.e. we apply zero-padding)
    - additive biases
    - a ReLu activation function
    - a Max Pooling Layer of shape (2,2) and stride (2,2) (i.e. we reduce by two the size in each dimension)
    - We then Flatten the data (reduce them to a vector in order to be able to apply a Fully-Connected layer to it)
    - A softmax activation function which outputs are the $P(y_c | X)$ (multi-class problem)

    
For convolutional layers, we will use the border conditions "SAME".
    
### Your task:
You need to add the missing parts in the code (parts between # --- START CODE HERE and # --- END CODE HERE)

## Load packages

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms

### CNN model in Pytorch

There are several ways to write a CNN model in Pytorch. In this lab, you will be using the _Sequential_ class of Pytorch (similarly to Tensorflow). We will see the syntax further on.



# Import data

We first import the MNIST dataset. The training set is imported in `mnist_trainset` and the test set in `mnist_testset`.

In practice, training on `mnist_trainset` takes too much time for this practical work. For this reason, we define a smaller training set (`mnist_trainset_reduced`) with a random subset of images. We will use `mnist_trainset_reduced` when training.

In [None]:
# Convert input to Pytorch tensors (ToTensor includes a rescaling from the range [0,255] to [0.0,1.0])
input_transform=transforms.Compose([transforms.ToTensor()])

# Download MNIST training data
mnist_trainset = datasets.MNIST(root='./data',train=True,download=True,transform=input_transform)
print(mnist_trainset)

# Download test dataset
mnist_testset = datasets.MNIST(root='./data',train=False,download=True,transform=input_transform)

# Create data loader with smaller dataset size
max_mnist_size = 2000
mnist_trainset_reduced = torch.utils.data.random_split(mnist_trainset, [max_mnist_size, len(mnist_trainset)-max_mnist_size])[0]

We also make a direct access to the training and test data as `torch` tensors. We will use them for visualization purposes and to compute the final training/test accuracies.

In [None]:
# Extract the actual data and labels
X_train = torch.unsqueeze(mnist_trainset.data,axis=1)[mnist_trainset_reduced.indices]/255.0
Y_train = mnist_trainset.targets[mnist_trainset_reduced.indices]
X_test = torch.unsqueeze(mnist_testset.data,axis=1)/255.0
Y_test = mnist_testset.targets

## Exploring the data

We can explore the dataset mnist_trainset manually, although when we train the model, we will use the ```DataLoader``` of Pytorch (see later).

The images are contained in a sub-structure of ```mnist_trainset``` called ```data```. The labels are contained in another sub-structure of ```mnist_trainset``` called ```targets```. Note that these are kept in their native format (the transformations are not applied to them), so to use them we have to apply the transformation manually, as above.

__NOTE__ In general, if you want to find out what a structure contains, use the command ```dir()```, this will give you a list of the sub-structures.

__NOTE__ `mnist_trainset_reduced` is a `Subset` object rather than a `Dataset` object. We cannot call `.data` and `.target` directly on it, although we can pass it as argument to a `DataLoader`.

In [None]:
print(dir(mnist_trainset))

print("Size of training data : ", mnist_trainset.data.shape)
print("Size of training labels : ", mnist_trainset.targets.shape)


The mnist dataset has 10 classes. These are the following :

In [None]:
mnist_list = [ '0', '1','2','3','4','5','6','7','8','9']

## Display some of the images

In [None]:
plt.figure(figsize=(10, 6))
for idx in range(0,10):
    plt.subplot(2, 5, idx+1)
    rand_ind = np.random.randint(0,mnist_trainset.data.shape[0])
    plt.imshow(mnist_trainset.data[rand_ind,:,:],cmap='gray')
    plt.title(mnist_list[int(mnist_trainset.targets[rand_ind])])

## Defining the model for MNIST

We will now define the simple CNN described above, for use with MNIST. The input of the CNN is a set of (28,28,1) image tensors. We apply the following layers:

- a Convolutional layer of 32 filters of shape (3,3), with stride (1,1) and padding='same'
- a ReLu activation function
    
- a Convolutional layer of 32 filters of shape (3,3), with stride (1,1) and padding='same'
- a ReLu activation function
- a Max Pooling Layer of shape (2,2) and stride (2,2) (i.e. we reduce by two the size in each dimension)
    
- We then Flatten the data: reduce them to a vector in order to be able to apply a Fully-Connected layer to it
- Dense (fully connected) layer. Note, you will have to determine the input size, that is to say the number of elements after the last Max Pooling layer.

__VERY IMPORTANT NOTE !!!__

Pytorch carries out the softmax which we would expect at the end of our network automatically in the loss function that we will use, so there is no need to add it. Nevertheless, you must understand that the network output is a vector (of logits) which is _not_ normalised to be a probability distribution. This will be important later on.

Now, we define the following hyper-parameters of the model :

In [None]:
learning_rate = 0.01
n_epochs = 10
batch_size = 64
nb_classes = int(mnist_trainset.targets.max()+1)

nb_filters = 32       # number of convolutional filters to use
kernel_size = (3, 3)  # convolution kernel size
pool_size = (2, 2)    # size of pooling area for max pooling

# --- Size of the successive layers
n_h_0 = 1             # greyscale input images
n_h_1 = nb_filters
n_h_2 = nb_filters

## Defining a CNN with the Sequential API of Pytorch for MNIST

We are now going to create the CNN with Pytorch.

The Sequential approach is quite similar to that of Tensorflow. To define a model, just write:

```my_model = torch.nn.Sequential( first_layer, second_layer, ...)```

Your work here is to try understanding the following lines, and to draw the architecture of the neural network they define.

You can use the documentation of Pytorch to understand the parameters used in these lines.

In [None]:
mnist_model = torch.nn.Sequential(
        torch.nn.Conv2d(n_h_0,n_h_1,kernel_size=kernel_size,stride=1,padding='same'),
        torch.nn.ReLU(),
        torch.nn.Conv2d(n_h_1,n_h_2,kernel_size=kernel_size,stride=1,padding='same'),
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=pool_size,stride=pool_size),
        torch.nn.Flatten(),
        torch.nn.Linear(14*14*n_h_2,nb_classes),
    )

## Define dataloader

We use the ```torch.utils.data.DataLoader``` function of Pytorch to easily iterate over mini-batches of data. ```torch.utils.data.DataLoader``` is a useful function to extract batches of data from a dataset, applying the transformations which we have specified (conversion to Pytorch tensor, normalisation etc).

We will train using the smaller training set, `mnist_trainset_reduced`.

In [None]:
mnist_train_loader = torch.utils.data.DataLoader(mnist_trainset_reduced, batch_size=batch_size, shuffle=True)

## Define loss function and optimiser

Pytorch provides an easy way to define the loss criterion to optimise. The syntax is (considering that the Adam optimiser is used):

- ```criterion = torch.nn.BCELoss()``` or ```criterion = torch.nn.CrossEntropyLoss()```, etc., depending on your problem.
- ```optimizer = torch.optim.Adam(mnist_model.parameters(), lr=learning_rate)```

Fill in the following code, choosing the correct criterion to optimise. For the criterion, the individual loss over individual data samples can be aggregated into the total loss in several ways. Choose `reduction='sum'`, which takes the sum of individual losses.

In [None]:
# BEGIN STUDENT CODE
# ...
# END STUDENT CODE

## CNN prediction conversion

We recall here that the output of the classification CNN in Pytorch is a vector which is __NOT__ normalised to be a probability distribution. Therefore, for the purposes of finding the prediction of the CNN, we create a function which first converts an input vector to a probability distribution, and then determines the most likely class for each vector. The output should be, for each vector, an integer between 0 and (number of classes) $-1$.

The inputs to this function will be Pytorch tensors, so you can use the following Pytorch functions on them :

- ```torch.nn.Softmax()```
- ```torch.argmax()```

Create this function now.

In [None]:
def vector_to_class(x):
  # BEGIN STUDENT CODE
  # ... #
  # END STUDENT CODE
  return y

## Accuracy

Now, define a function which calculates the accuracy of the output of the neural network, with respect to the input labels. We consider that the input is a vector of class numbers (similar to the output of `vector_to_class`, but converted to a numpy array).

In [None]:
def cnn_accuracy(predict,labels):
  # BEGIN STUDENT CODE
  # ...
  # END STUDENT CODE
  return accuracy

## Training the model

Now, we carry out the actual training of the model.

In [None]:
train_losses=[]
valid_losses=[]

for epoch in range(0,n_epochs):
  train_loss=0.0
  all_labels = []
  all_predicted = []

  for batch_idx, (imgs, labels) in enumerate(mnist_train_loader):
    # pass the samples through the network
    # ...
    # apply loss function
    # ...
    # set the gradients back to 0
    # ...
    # backpropagation
    # ...
    # parameter update
    # ...
    # compute the train loss
    train_loss.append(loss.item())
    # store labels and class predictions
    all_labels.extend(labels.tolist())
    all_predicted.extend(vector_to_class(predict).tolist())

  print('Epoch:{} Train Loss:{:.4f}'.format(epoch,train_loss/len(mnist_train_loader.dataset)))

  # calculate accuracy
  print('Accuracy:{:.4f}'.format(cnn_accuracy(np.array(all_predicted),np.array(all_labels))))

<br>Let's compute the final training and test accuracies:

In [None]:

# Calculate accuracy on the training set and the test set

# BEGIN FILL IN STUDENT (use X_train, Y_train, X_test, Y_test)
# ...
# END FILL IN STUDENT

print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)

In [None]:
print("Visual results : ")

plt.figure(figsize=(10, 6))
for idx in range(0,10):
    plt.subplot(2, 5, idx+1)
    rand_ind = np.random.randint(0,X_test.shape[0])
    test_img = torch.unsqueeze(X_test[rand_ind,:,:,:],axis=1)
    predicted_class = vector_to_class(mnist_model(test_img))
    plt.imshow(test_img.squeeze(),cmap='gray')
    plt.title(mnist_list[int(predicted_class)])