<a href="https://colab.research.google.com/github/ferngonzalezp/deep_learning_lab/blob/main/DL_labSession1_part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#MNIST Hand-written digits classification

In this part, we will build a Neural Network to classify Handwritten digits. For this part we will use the vanilla neural network, the MLP. We will use the pytorch library for this reason, you can find additional information in the [docs](https://pytorch.org/docs/stable/index.html).

The first part is to import the necessary libraries we will need for this:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as datasets
import torchvision
import numpy as np
import matplotlib.pyplot as plt

The first part is to download the data, we will use the torchvision dataset and download the MNIST dataset.

In [None]:
transform = torchvision.transforms.ToTensor
train_dataset = datasets.MNIST(root='./',download=True, train=True, transform=transform())
test_dataset = datasets.MNIST(root='./', train=False, transform=transform())

Let's explore the dataset to see what it outputs.

In [None]:
example = train_dataset[np.random.randint(0,len(train_dataset))]
plt.imshow(example[0][0], cmap='gray')
plt.title(example[1])

As we can see we have images with a 28 x 28 resolution and only one channel, the dataset also has an accompanying label to each image and the numbers go from 0 to 9.

____________________

**Builindg a model:**

Pytorch has all the necessary tools for defining a model using the module nn. We will build a Multi Layer Perceptron using linear layers. A common MLP looks like this:



In [None]:
class mlp(nn.Module):
  def __init__(self):
    super(mlp,self).__init__()
    self.model = nn.Sequential(
        nn.Linear(28**2,32),
        nn.Sigmoid(),
        nn.Linear(32,10),
        nn.Softmax(dim=1)
    )
  def forward(self,x):
    x = torch.flatten(x,1)
    return self.model(x)

This is a MLP with one hidden layer, the number of hidden layers usually is called "depth" and the number of neurons is called "width". In this case we have a depth of 1 and a width of 32. Notice that the output has 10 values, these represent each category, and the input represent the number of pixels in the image. In the ``` forward() ``` method of the module we first have to flatten the image in a vector in order for it to be processed by the Linear layers. After the linear layer computation we have to pass the output through a non-linear activation, in this case we use the sigmoid function for the hidden layers because it keeps the otuputs between zero and one and the output is the softmax function that keeps the sum of the given output equal to one, this is because the outputs represents the probability that the image corresponds to a given number.

 ______________

We can define the models in a manual way as the previous example, but we can build a model more efficiently by just using building a list recursively and then pass it to the '''nn.Sequential''' module. Try to define a function with arguments **depth** and **width** to build a list of Neural Network layers.

In [None]:
def layers(depth,width):
  layers = []
  # code for building list of layers
  #tip1: use the .append() method
  #tip2: The input and output layers don't change
  return layers

In [None]:
class mlp(nn.Module):
  def __init__(self, depth, width):
    super(mlp,self).__init__()
    self.model = nn.Sequential(*layers(depth,width))
  def forward(self,x):
    x = torch.flatten(x,1)
    return self.model(x)

Neural networks usually are trained on GPUs, a GPU allows faster calculations of matrix calculations making models parallelize and run faster than on several cpu cores. With the following line of code we detect if there is GPU availaible and create a device instance. For training and using a model on a GPU we simply need to call the ``.to(device)`` method on every torch module and tensors/data.

In [None]:
#Use GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
model = mlp(depth=1,width=32).to(device)
print(model)
pytorch_total_params = sum(p.numel() for p in model.parameters())
print("number of parameters in model: %d"%(pytorch_total_params))

We need to specify the parameters for training our model, in this case batch size, number of epochs and learning rate.

In [None]:
#Hyperparameters
batch_size = 100
n_epochs = 25
learn_rate = 2e-1

Pytorch uses dataloaders in order to sample batches from a dataset.

In [None]:
#DataLoaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size)

The next things we need to define are the loss function and the optimizer. For the optimizer we will use the already studied Stochastic gradient descent, luckily we don't need to code it because pytorch has already a very good implementation along with other [optimizers](https://pytorch.org/docs/stable/optim.html#algorithms). The loss function we will use in this case is the [Binary Cross-Entropy](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html#torch.nn.BCELoss). Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0. And for this reason is the we use the softmax function in the output layer, because it will transform the output of the NN in a way that the sum of it will be equal to one. If the model is trained to perfection then the NN will output 1 at the position of the correct label and zero everywhere.

\\

So for this reason we need to embed the target labels to a [one-hot](https://en.wikipedia.org/wiki/One-hot) encoding. Pytorch has an integrated function to do that that is ``F.one_hot()``. For example, for the target 5, the one-hot version would be: ``[0,0,0,0,0,1,0,0,0,0]``.

In [None]:
#Loss function and Optimizer
criterion = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(),lr=learn_rate)

In [None]:
#Training Loop
train_loss = []
val_loss = []
for epoch in range(n_epochs):  # loop over the dataset multiple times
    epoch_loss = 0.0
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.to(device)
        labels = F.one_hot(labels,10).type_as(inputs)
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        # print statistics
        running_loss += loss.item()

        if i % 50 == 49:    # print every 50 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 50))
            running_loss = 0.0
    train_loss.append(epoch_loss/(i+1))
    #Evaluation of the trained model
    correct = 0
    total = 0
    epoch_loss = 0.0
    print("validating...")
    with torch.no_grad():
        for i, data in enumerate(test_loader, 0):
            inputs, labels = data
            inputs = inputs.to(device)
            outputs = model(inputs)
            predicted = torch.argmax(outputs,dim=1)
            loss = criterion(outputs, F.one_hot(labels,10).type_as(inputs))
            labels =  labels.type_as(inputs)
            total += labels.shape[0]
            correct += (predicted == labels).sum().item()
            epoch_loss += loss.item()
    print('Accuracy of the network on the test images: %d %%' % (
        100 * correct / total))
    val_loss.append(epoch_loss/(i+1))

print('Finished Training')

In [None]:
plt.plot(train_loss, label='training')
plt.plot(val_loss, label='validation')
plt.legend()
plt.title('Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')

Now, we want to test our trained model. For this task we use Accurracy as performance metric. Accuracy is as simples as:

\begin{equation}
  Accuracy = \frac{Number \ of \ correct \ predictions}{Total \ number \ of \ predictions}
\end{equation}

Although it is woth to note that accuracy is not a reliable measure in real life but for this case it is good enough. Try to think in which cases accuracy fails to correcly measure the performance of a model.

In [None]:
#Evaluation of the trained model
correct = 0
total = 0
with torch.no_grad():
    for i, data in enumerate(test_loader, 0):
        inputs, labels = data
        inputs = inputs.to(device)
        labels =  labels.type_as(inputs)
        outputs = model(inputs)
        predicted = torch.argmax(outputs,dim=1)
        total += labels.shape[0]
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the test images: %d %%' % (
    100 * correct / total))

So we got a 93% prediction accuracy with this model with just one hidden layer, do you think you can improve this results? Below we plot some test samples along with their predicted label.

In [None]:
from math import ceil
n_samples = 6
id = np.random.randint(0,len(test_dataset),n_samples)
rows = ceil(n_samples/3)
plt.figure(figsize=(rows*5,10))
for i in range(len(id)):
  images = test_dataset[id[i]][0].to(device)
  pred_labels = torch.argmax(model(images),dim=1)[0]
  plt.subplot(rows,3,i+1)
  plt.imshow(images[0].cpu(), cmap='gray')
  plt.title('prediction: %d'%(pred_labels))