# Lab 3

### Objectives

1. Understand how to use Pytorch to write a Neural Network
2. Write a neural network with multiple hidden layers

In this lab, linear layers are used. The lab is building the following network:</br>
<img src=".\i\lab3-network.png" width="400"> </br> </br>

This has:
- A 28*28 pixel input (==784 dimensional vector / a 784 1-dimensional tensor)
- 3 hidden layers
- Each hidden layer has 64 neurons.
- The output layer can be an integer between 0 and 9 (==10 dimensional vector / a 10 1-dimensional tensor)
- Sigmoid activation functions in the hidden layers
- Softmax functions for the neurons in the output layer


In [1]:
import torch
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

Importing data.

We are using the MNIST database: http://yann.lecun.com/exdb/mnist/.
- The input is a 28 by 28 matrix (2-dimensional tensor), which corresponds to the
pixels of the image.
- The output is an integer between 0 and 9.

In [2]:
train = datasets.MNIST(
    "", 
    train=True, 
    download=True, 
    transform=transforms.Compose([transforms.ToTensor()])) #compose allows you to do multiple transformations, .ToTensor() converts a PIL Image or numpy.ndarray to tensor.

test = datasets.MNIST("", train=False, download=True, transform=transforms.Compose([transforms.ToTensor()]))

# DataLoader() wraps an iterable over the given dataset and supports automatic batching, sampling, shuffling and multiprocess data loading
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=True)

Defining the network.

In [3]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64) # linear transformation, 𝑥𝐖+𝑏xW+b, 28*28 inputs, 64 outputs
        self.fc2 = nn.Linear(64, 64) # automatically creates the weight and bias tensors
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x)) # passing the linearly transformed input through a neuron
        x = torch.sigmoid(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))
        x = self.fc4(x)
        return F.softmax(x, dim=1)

net = Net()

Setting up the optimizer, the error function, and compute the gradient wrt the loss and then train for number of epochs.

NB: ```net.zero_grad()``` resets the gradient to 0 for each variable in the network. Otherwise, at each iteration, Pytorch's "autograd" feature (which allows tensors to keep track of operations that involved them and provide series of derivative wrt their inputs) will keep adding the new value of the gradient to the previous one.

Without this line, the method ```loss.backward()``` would accumulate over each iteration leading to an incorrect training.

More info on autograd: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

In [6]:
optimizer = optim.SGD(net.parameters(), lr=0.001) # optimization method in backpropogation

for epoch in range(3): # number of training epochs
    for data in trainset: # iterating over training data
        X, y = data # assign input (X) and labels (y)
        net.zero_grad() # see note above
        output = net.forward( # compute the output using the forward method
            X.view(-1, 28*28) # transform 28*28 tensor input to a 784 vector
        ) 
        loss = F.nll_loss(output, y) # cross-entropy loss function since this is a classifier
        loss.backward() # compute gradient wrt loss function over each parameter of network
        optimizer.step() # update parameters of the network according to optimisation algorithm and gradient stored within each variable

Train network and test.

NB: ```torch.no_grad()``` means network won't update gradient stored in each variable during the test session (as there is no loss to be minimized).

In [7]:
correct =0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net.forward(X.view(-1, 28*28))
        for idx, i in enumerate(output):
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

Accuracy:  0.942


Re-running with ReLu functions and Adam's gradient descent.

In [10]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.softmax(x, dim=1)

net = Net()

optimizer = optim.SGD(net.parameters(), lr=0.001)

for epoch in range(3):
    for data in trainset:
        X, y = data
        net.zero_grad()
        output = net.forward(X.view(-1, 28*28))
        loss = F.nll_loss(output, y)
        loss.backward()
        optimizer.step()

correct =0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net.forward(X.view(-1, 28*28))
        for idx, i in enumerate(output):
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

Accuracy:  0.104


**Results for different activation functions and optimizers** </br>
Sigmoids and SGD: accuracy == 0.098. --> to get a higher accuracy here, you need to train for many more epochs, but using SGD that will be very slow.

ReLu and SGD: 0.141

Sigmoids and Adam: accuracy == 0.942

ReLu and Adam: accuracy == 0.939

PLEASE SEE NOTEBOOK 3.1 FOR FUTHER DETAILS ON THIS AND CHANGING THE NUMBER OF LAYERS IN THE NETWORK.
