# Lab 4

### Objectives

1. Write a convolutional neural network for the MNIST Database

In this lab, convolutional layers are used.

The network looks as follows:</br>
<img src=".\i\lab4-network.png" width="400"> </br>

- Activation function for each layer is the recitifed linear function (ReLu)
- Pooling layer Mp takes the maximum value of a patch of the image.
- The output layer is made of softmax functions
- 1st layer has 32 separate kernels 
- 2nd layer has 64 kernels (64 separate 7x7 images)

Note you can put any image size in! The dimensions set here are to help learn this specific dataset but it does not affect new inputs.

In [1]:
# copying code from lab 3 
import torch
from torchvision import transforms, datasets
import torch.nn as nn 
import torch.nn.functional as F 
import torch.optim as optim

train = datasets.MNIST(
    "", 
    train=True, 
    download=True, 
    transform=transforms.Compose([transforms.ToTensor()]))
test = datasets.MNIST("", train=False, download=True, transform=transforms.Compose([transforms.ToTensor()]))

trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=True)

### Changing Net class

In [2]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # applies 2d convolution
        # first 2 inputs are dimensions of layer,
        # 3rd is dimension of convolutional kernel (5x5 - based on what works well)
        # last param = padding = 2 because...?
        self.conv1 = nn.Conv2d(1, 32, 5, padding=2)
        self.conv2 = nn.Conv2d(32, 64, 5, padding=2)

        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)

    def convs(self, x):
        # in each convolutional layer, ReLu is the activation function
        # max_pool2d() takes the max value of a patch of the image (patch = 2x2)
        # --> this reduces the dimension of the matrix and the number of features to learn
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))

        return x

    def forward(self, x):
        x = self.convs(x) # applies two convolutional layers
        x = x.view(-1, 64*7*7) # unpack 64x7x7 tensor to be linear
        #print(x)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        return F.softmax(x, dim=1)


### Running

In [4]:
net = Net()

optimizer = optim.Adam(net.parameters(), lr=0.001)

for epoch in range(3):
    for data in trainset:
        X, y = data
        net.zero_grad()
        output = net.forward(X) # no need for X.view as this is in the function
        loss = F.nll_loss(output, y)
        loss.backward()
        optimizer.step()

    #print("loss: ", loss)

correct = 0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net.forward(X)
        for idx, i in enumerate(output):
            # argmax is finding the highest probability out of y (from SoftMax)
            # y is a label between 0 and 9
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0316, 0.0003, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0167, 0.0033, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0024, 0.0206, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.0027, 0.0109, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0002, 0.0104, 0.0000]],
       grad_fn=<ViewBackward0>)
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0602, 0.0183, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0029, 0.0045, 0.0000],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.0598, 0.0089, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0222, 0.0146, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0055, 0.0072, 0.0000]],
       grad_fn=<ViewBackward0>)
tensor([[0.0081, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0081, 0.0003, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
        [0.0

### Results

Accuracy:  0.872

Loss tensor approaches 0 for all elements with each epoch.

### General comments

- This code takes significantly longer to run than the previous one.
- Note that "hyperparameters" refers to the kernels, optimizers, and everything else that is part of the learning but are parameters of the model rather than the weights and bias.