# MNIST Handwritten Digit Recognition in PyTorch

Handwritten digit classification is the classical 'Hello World" exercice of machine learning. 

Here we'll see that it can be done rather efficiently already with a simple feedforward neural network.

<center><img src="img/mnist.png" width="400" /></center>


## Import PyTorch

In [None]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

import matplotlib.pyplot as plt

## Initialize Hyper-parameters

In [None]:
input_size = 784       # The image size = 28 x 28 = 784
hidden_size = 500      # The number of nodes at the hidden layer
num_classes = 10       # The number of output classes. In this case, from 0 to 9
num_epochs = 5         # The number of times entire dataset is trained
batch_size = 100       # The size of input data took for one iteration
learning_rate = 0.001  # The speed of convergence

## Download MNIST Dataset

Set download=True the first time you run the code.

The train dataset is composed of 60k images, and the test dataset of 10k images.

In [None]:
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=False)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

## Load the Dataset

We shuffle the loading process of train_dataset to make the learning process independent of data orderness, but the order of test_loader remains to examine whether we can handle unspecified bias order of inputs.

In [None]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

## Let's take a look at some examples. 
We'll use the test_loader for this, and the enumerate function that allows to loop on this data

In [None]:
# Set loop over test_loader
examples = enumerate(test_loader)
batch_idx, (example_data, example_targets) = next(examples)

# Let's see what one test data batch consists of
# --> we have [batch_size] examples of 28x28 pixels in grayscale (i.e. no rgb channels, hence the one). 
print("batch data shape:", example_data.shape)

print("batch target shape:", example_targets.shape)


## Exercise: print pixel content of 1st image and its target value

Not very instructive but at least we see the raw values

## Plot some examples with matplotlib

In [None]:
fig = plt.figure()
for i in range(6):
  plt.subplot(2,3,i+1)
  plt.tight_layout()
  plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
  plt.title("Ground Truth: {}".format(example_targets[i]))
  plt.xticks([])
  plt.yticks([])
    
plt.show()

## Feedforward Neural Network (FNN) Model using nn.Sequential

Use the nn package to define our model as a sequence of layers:

* nn.Sequential is a Module which contains other Modules, and applies them in sequence to produce its output. 

* Each Linear Module computes output from input using a linear function, and holds internal Tensors for its weight and bias.

* The NN Architecture below is 784 -> 500 -> 10, with a ReLU activation function on the hidden layer

In [None]:
net = torch.nn.Sequential(
    torch.nn.Linear(input_size, hidden_size),     # 1st Full-Connected Layer: 784 (input data) -> 500 (hidden node)
    torch.nn.ReLU(),                              # Non-Linear ReLU Layer: max(0,x)
    torch.nn.Linear(hidden_size, num_classes),    # Last Full-Connected Layer: 500 (hidden node) -> 10 (output class)
)

## Optional: enable GPU 

In [None]:
# net.cuda()    # You can comment out this line to disable GPU

## Loss Function and Optimizer

The function CrossEntropyLoss is a variant of the cross-entropy loss for multi-classes, that combines softmax and binary cross entropy algorithms, see [pytorch doc.](https://pytorch.org/docs/master/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)
For C classes, a NN output $\vec{y}$ of dimension C and an integer target value $t \in [0,C-1]$ this loss function calculates:
$$\ell(\vec{y},t) = - y_t + \log \left( \sum_{j=1}^C e^{y_j} \right)$$

As an alternative to Stochastic Gradient Descent we'll use the Adam method, see [here.](https://pytorch.org/docs/stable/optim.html)

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

## Train the FNN Model

Note: torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample. If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

In [None]:
#initialize some counters
train_losses = []
train_counter = []
test_losses = []
test_counter = [i*len(train_loader.dataset) for i in range(num_epochs + 1)]

In [None]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):   # Load a batch of images with its (index, data, class)
        images = Variable(images.view(-1, 28*28))         # Convert torch tensor to Variable: change image from a matrix of 28 x 28 from to a vector of size 784 (view works as numpy's reshape function)
        labels = Variable(labels)        
        
        optimizer.zero_grad()                             # Intialize the hidden weight to all zeros
        outputs = net(images)                             # Forward pass: compute the output class given a image
        loss = criterion(outputs, labels)                 # Compute the loss: difference between the output class and the pre-given label
        loss.backward()                                   # Backward pass: compute the weight
        optimizer.step()                                  # Optimizer: update the weights of hidden nodes
        
        if (i+1) % 100 == 0:                              # Logging
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.item()))
            
        if (i+1) % 10 == 0:                              # keep track of loss value
            train_losses.append(loss.item())
            train_counter.append(((i+1)*batch_size) + ((epoch)*len(train_loader.dataset)))

## Exercise: evaluate the model's training performance

Show in a plot the evolution of the loss as a function of the number of training example seen. 

## Optional: save the model for future implementations

In [None]:
#torch.save(net.state_dict(), 'fnn_mnist_model.pkl')

## Exercise: test the FNN Model

Now is time to test the model performance on the test dataset. The differences are that no loss is calculated and weights are not updated.

Run on test_loader data, pass each batch of images through the NN (see how this is done above) and count how many images are correctly classified. For this you can use the following function to choose the best class from the output (i.e class with the best score):

```python
_, predicted = torch.max(outputs.data, 1)
```

## Exercise: try another NN structure

Perform the training a a new neural network structure :
* Add two layer of 250 and 100 nodes so to make the following NN architecture 784 -> 500 -> 250 -> 100 -> 10
* Use ReLU on all hidden layer
* Add a Sigmoid activation function on the output layer (torch.nn.Sigmoid())
* Run the training again

## Exercice: add dropout

Apply a 20% dropout on the tensor returned by the 1st hidden layer of the previous network, then redo the training.

Dropout: during training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.
See more [here.](https://pytorch.org/docs/stable/nn.html#torch.nn.Dropout)


## Exercice (optional): try to improve the accuracy of the network
Add layers, play with activation functions, etc !