# _That one_ MNIST Data Multilayer perceptron model

This notebook is actually an assignment for the MLP module in the Duke University's _Introduction to Machine Learning_ course on Coursera (great course by the way, check it out!). 

_For this assignment, we had to implement a multilayer perceptron to analyze the MNIST dataset. The input consisted of the usual 28×28 images. We had to use a hidden layer with 500 nodes using ReLU as the activation function, and an output layer with ten nodes. The chosen loss function was cross-entropy, so there was no need to apply softmax to the output._


### Imports

In [15]:
import numpy as np
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms


#constants
IMG_HEIGHT = 28
IMG_WIDTH = 28
IMG_SIZE_VEC = IMG_HEIGHT*IMG_WIDTH
FIRST_LAYER_SIZE = 500
OUTPUT_LAYER_SIZE = 10
EPOCHS = 10

### Loading the MNIST data

In [16]:
#load the data
mnist_train = datasets.MNIST(root="./datasets", train=True, 
                             transform=transforms.ToTensor(), download=True)
mnist_test = datasets.MNIST(root="./datasets", train=False, 
                            transform=transforms.ToTensor(), download=True)


### Setting up batches

In [None]:
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size=100, shuffle=False)

Image batches:   torch.Size([100, 1, 28, 28])
Label batches:   torch.Size([100])


In [None]:
#making it concrete
data_iterator = iter(train_loader)
images, labels = next(data_iterator)

print("Image batch dimension:  ", images.shape)
print("Label batch dimension:  ", labels.shape)

Image batch dimension:   torch.Size([100, 1, 28, 28])
Label batch dimension:   torch.Size([100])


### The actual code

In [None]:
#definitions
W1 = torch.randn(IMG_SIZE_VEC, FIRST_LAYER_SIZE)/np.sqrt(IMG_SIZE_VEC)
W1.requires_grad_()
b1 = torch.zeros(FIRST_LAYER_SIZE, requires_grad=True)

W2 = torch.randn(FIRST_LAYER_SIZE, OUTPUT_LAYER_SIZE)/np.sqrt(FIRST_LAYER_SIZE)
W2.requires_grad_()
b2 = torch.zeros(OUTPUT_LAYER_SIZE, requires_grad=True)

#minimization
optimizer = torch.optim.SGD([W1,W2, b1, b2], lr=0.1)

i = 0
#go through each batch
for epoch in range(EPOCHS):
    i = 0
    for images, labels in train_loader:
        i += 1
        if i%100 == 0: print("Epoch: {} | Batch number: {}".format(epoch+1, i))
            
        #reset the gradient at each batch
        optimizer.zero_grad()
        
        #the so called forward pass
        #vectorize the images in the batches (100 batches, 28*28 components)
        x = images.view(-1, IMG_SIZE_VEC)

        #initial value for z
        z = torch.matmul(x, W1) + b1

        z = F.relu(z)

        y = torch.matmul(z, W2) + b2
        
        #the loss function
        cross_entropy = F.cross_entropy(y, labels)

        #backward pass (actual minimization)
        cross_entropy.backward()
        optimizer.step()

### Testing!

In [None]:
correct = 0
n_samples = len(mnist_test)

with torch.no_grad():
    #go through the test set minibatches 
    for images, labels in test_loader:
        #forward pass
        #vectorize it
        x = images.view(-1, IMG_SIZE_VEC)

        #calculate the prediction
        z = torch.matmul(x, W1) + b1

        z = F.relu(z)

        y = torch.matmul(z, W2) + b2
        
        #get index of the biggest value of the vector y
        #that is, it represents the digit the model think the 
        #image represents for each image int the batch

        predictions = torch.argmax(y, dim=1)

        #right predictions "boolean" vector
        predictions_vec = (predictions == labels)
        
        correct += torch.sum(predictions_vec)
    
print('Test accuracy: {}'.format(correct/n_samples))
#for 10 epochs accuracy is around 97% 


Test accuracy: 0.9740999937057495
