#### Objective

- Write a convolutional neural network for the MNIST Database
- Unlike the previous lab, we are going to make use of convolutional networks instead of linear to classify the inputs

![](img/lab4/convolutionalnetwork.png)

In [1]:
import torch
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

train = datasets.MNIST("", train = True, download = True, transform = transforms.Compose([transforms.ToTensor()]))
test = datasets.MNIST("", train = False, download = True, transform = transforms.Compose([transforms.ToTensor()]))

trainset = torch.utils.data.DataLoader(train, batch_size = 10, shuffle = True)
testset = torch.utils.data.DataLoader(test, batch_size = 10, shuffle = True)
# Batchsize is a parameter that determines the number of samples in each batch before updating the optimizers (weights)

In [2]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 5, padding=2) # first two represents dimension, and 5 represents dimension of convolution kernel
        self.conv2 = nn.Conv2d(32, 64, 5, padding=2) # convolution kernel is 5x5 in this example, why is padding = 2?
        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)

    def convs(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2)) # 2x2 patch
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
        return x

    def forward(self, x): # take an input tensor, x, and produce an output tensor
        x = self.convs(x) # applies the defined self.convs modules, reduces spatial dimensions
        x = x.view(-1, 64*7*7) # reshapes output tensor into a 1-D tensor (-1) the output has 64 channels and the height and width is reduced to 7 (due to pooling)
        x = F.relu(self.fc1(x)) # applies linear layer to the flattened output tensor followed by ReLU activation, adding non-linear element to the model
        x = self.fc2(x) # applies linear layer

        # computes to softmax function of given input
        return F.softmax(x, dim = 1)

# create instance of neural network
net = Net()

self.conv: 1 (input channel, as it is grayscale), 32 (output channel, number of filters to be used), 5 (kernel size, size of the convolutional filter in this case 5), padding = 2 (amount of padding added to the input image to ensure that the output feature map has the same spatial dimensions, 2 pixels is added to each sides of the image)

##### what is the padding for?
padding is used to preserve the spatial resolution of the input image and prevent information loss at the edges of the image during convolution

##### what is the point of pooling?
pooling is used to reduce the spatial dimensions while retaining the most important information. This reduces the parameters in the network and prevents overfitting. Max pooling takes the maximum value of a local neighborhood pixels (in this case a 2x2 patch), resulting ina smaller output feature map. Overall, pooling helps make the network computationally more efficient.

In [3]:
# sets up optimisation method for backpropagation alg
# optimiser = optim.SGD(net.parameters(), lr = 0.001) # SGD (Stoachastic Gradient Descent)
optimiser = optim.Adam(net.parameters(), lr = 0.001) # Adam (increases Accuracy drastically)

In [4]:
Epochs = 3 # number of training epochs

# iterate over the training data (trainset)
for epoch in range(Epochs):
    for data in trainset:
        X, y = data # assign input (X) and labels (y)
        net.zero_grad() # set gradiant stored to zero to reset gradient value for each iteration
        output = net.forward(X) # change as the network is given a matrix instead of a vector for input
        loss = F.nll_loss(output, y) # loss function (cross-entropy sicne we are working w classifier)
        loss.backward() # compute gradient wrt loss function over each parameter of the network (must set gradient to 0, line 48)
        optimiser.step() # update parameters of the network according to the optimisation alg and gradient stored within each variable

correct, total = 0, 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net.forward(X)
        for idx, i in enumerate(output):
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1
print("Accuracy: ", round(correct/total, 3))

Accuracy:  0.972


### Possible Questions
##### Why does it take significantly more time?
It takes longer time to compute an epoch, likely because they involve additional computational steps such as convolutions and pooling, which can be computationally expensive compared to the simpler matrix multiplications in ANNs.
##### Changing the number of neurons?
##### Changing the optimiser and activation functions?
##### Modifying the pooling layer?
##### Modifying the dimensions of the convolutional kernel (and padding)?

###### Is it reasonable to expect to reach 100% classification accuracy?

Achieving 100% accuracy on MNIST dataset is possible, given that it is a relatively simple task. However, it is still possible to overfit, especially if the model is too complex, and therefore, the training set would not be representative of the general dataset or new data. Essentially it would be remembering the patterns of the training set if overfit rather than the actual dataset.

Achieving high accuracy on the MNIST is not a sufficient condition for the model to be considered good, and the model should be evaluated on its ability to generalise to new data, and its performance should be compared to other models on the same tasks.