## MNIST Digit Recognizer

This notebook describes my solution for recognizing images of digits on the [MNIST](http://yann.lecun.com/exdb/mnist/) database

The MNIST database consists of 60k images for training and 10k images for testing, all of them with a fixed size of 28x28 pixels.
In this exercise, the training data were randomly split into train and validation (80% and 20%), enabling to infer the model accuracy along the epochs.
Then, I used the model with the highest validation accuracy for predicting the results on the test set.


In [None]:
from torch import nn,optim, utils
import torch
import torch.nn.functional as F
from torch.utils.data.sampler import SubsetRandomSampler
from torchvision import datasets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

- Download the Train and Test data, save it locally
- Transform the data into tensor


In [None]:
batch_size = 20
valid_size = 0.2

transform = transforms.ToTensor()
train_data = datasets.MNIST(root='~/pytorch_nanodegree/.pytorch/MNIST_data/', train=True,
                                           download=False, transform=transform)
test_data = datasets.MNIST(root='~/pytorch_nanodegree/.pytorch/MNIST_data/', train=False,
                                          download=False, transform=transform)

- Randomize and split the Train data into train (80%) and validation(20%) sets

In [None]:
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]
print("Number of samples: \nTrain subset: %d \nValidation subset: %d\nTest subset: %d"
          %(len(train_idx),len(valid_idx),len(test_data)))

train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)


train_loader = utils.data.DataLoader(train_data, batch_size=batch_size,
            sampler=train_sampler)
valid_loader = utils.data.DataLoader(train_data, batch_size=batch_size, 
            sampler=valid_sampler)
test_loader = utils.data.DataLoader(test_data, batch_size=batch_size)


- Also, we can visualize some samples from the Train data

In [None]:
#Get a train batch
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

fig = plt.figure(figsize=(16, 5))
#plot this batch images and its labels
for idx in np.arange(batch_size):
    ax = fig.add_subplot(5, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title(str(labels[idx].item()))

## Defining the network architecture
A perceptron with two hidden layers was used in this model. 
The input is a 784-dim Tensor (the linearized input image of 28x28 pixels),
1024 hidden nodes and the output is a 10-dim Tensor, which corresponds of a probability of being one of the 10 classes (0-9 digits). 
The hiperparameters are:
- ReLU activation
- dropout: 0.2
- loss function: [cross entropy](https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html#cross-entropy) 
- optimization: [SGD (stochastic gradient descent)](http://ruder.io/optimizing-gradient-descent/index.html#stochasticgradientdescent) 
- learning rate: 0.01.
- maximum of 40 epochs at training stage

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28 * 28, 1024)
        self.fc2 = nn.Linear(1024, 1024)
        self.fc3 = nn.Linear(1024, 10)
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        #first it converts the input image into 784*1
        x = x.view(-1, 28 * 28)
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.fc3(x)

        return x
    
model = Net()
print(model)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = 0.01)
epochs = 1
valid_loss_min = np.Inf

## Train stage

In [None]:

for epoch in range(epochs):
    train_loss = 0.0
    valid_loss = 0.0

    model.train()
    
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output,target)
        loss.backward() 
        optimizer.step()
        train_loss += loss.item()*data.size(0)

    
    model.eval()
    for data, target in valid_loader:
        output = model(data)
        loss = criterion(output, target)
        valid_loss += loss.item()*data.size(0)

    train_loss = train_loss/len(train_loader.dataset)
    valid_loss = valid_loss/len(valid_loader.dataset)
    print('Epoch {} \tTraining Loss: {:.6f} \tValid Loss: {:.6f}'.format(epoch+1, train_loss, valid_loss))

    #Se erro da validação diminuir, salva o modelo atual no arquivo
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        valid_loss_min, valid_loss))
        torch.save(model.state_dict(), 'model.pth')
        valid_loss_min = valid_loss
        
        

## Testing the model
The final model was used in the MNIST test set.
Also, the accuracy of each class is shown, allowing to identify the best and worst class predictions individually.

In [None]:
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model.eval()

for data, target in test_loader:
    output = model(data)
    loss = criterion(output, target)
    test_loss += loss.item()*data.size(0)
    _, pred = torch.max(output, 1)
    correct = np.squeeze(pred.eq(target.data.view_as(pred)))

    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))


for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            str(i), 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))


- It is also possible to see a Test batch with the final predictions and its labels in the parenthesis.

In [None]:
dataiter = iter(test_loader)
images, labels = dataiter.next()

output = model(images)
_, preds = torch.max(output, 1)
images = images.numpy()

fig = plt.figure(figsize=(16, 8))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title("{} ({})".format(str(preds[idx].item()), str(labels[idx].item())),
                 color=("green" if preds[idx]==labels[idx] else "red"))