# Training a Convolutional Neural Network for digit recognition

This was built following a tutorial in the wonderful book *Machine Learning with PyTorch and Scikit-Learn* by Raschka, Liu and Mirjalili (2022, Packt Publishing).  In this notebook, we do the following:

1. Use the PyTorch library (https://pytorch.org/) to construct and train a convolutional neural network (CNN) on the MNIST handwritten digit database (http://yann.lecun.com/exdb/mnist%7D).
2. Create a Kaggle submission of our predictions.
3. As a bonus, deploy the trained model as an interactive web app using the Gradio library (https://gradio.app/).

Note that the Gradio app can be found by visiting my huggingface page: https://huggingface.co/spaces/etweedy/digits

## Import libraries

In [2]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch import optim
import torchvision
from torchvision import transforms
from torch.utils.data import Subset, DataLoader, Dataset
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Load and the MNIST dataset and create PyTorch DataLoaders

In [3]:
df = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')
df_train,df_val = train_test_split(df,test_size=0.1, random_state=42, shuffle=True)

We now create a custom PyTorch Dataset child class called MNISTDataset, which we will use to process our DataFrames into PyTorch tensors.  The __getitem__ method either:
* if is_test=False, returns just the image as a numpy array of shape (28,28) with greyscale pixel values in [0,255]
* if is_test=True, returns that image numpy array as well as the ground truth label as an integer in [0,9]

In [4]:
class MNISTDataset(Dataset):
    def __init__(self, data,transform=None, is_test=False):
        super().__init__()
        self.dataset = data
        self.transform=transform
        self.is_test = is_test
        
    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, index):
        data = self.dataset.iloc[index].to_numpy()
        label = data[0]                             #il faut dire ce qui est le Y et ce qui est le X pour chaque ligne 
        image = data[1:].reshape(28, 28)
        if self.transform is not None:
            image = self.transform(image.astype(np.float32))
        return image, label
  
    # def __getitem__(self, index):
    #     if self.is_test:
    #         data = self.dataset.iloc[index].to_numpy()
    #         image = data.reshape(28, 28)/255
    #         label = None
    #     else:
    #         data = self.dataset.iloc[index].to_numpy()
    #         label = data[0]
    #         image = data[1:].reshape(28, 28)/255
    #     if self.transform is not None:
    #         image = self.transform(image.astype(np.float32))
    #     return image, label


Next we will define a transform for the Dataset class to use, which will be a composition of:
1. Converstion from numpy array to PIL image file using ToPILImage, then
2. Conversion to a PyTorch tensor using ToTensor().  Note that ToTensor() automatically rescales the pixel values in [0,255] to [0,1] so we needn't do that separately.

In [5]:
#transform = transforms.Compose([transforms.ToPILImage(),transforms.ToTensor()])
transform = transforms.Compose([])

In [6]:

ds_train = MNISTDataset(df_train,transform = transform)
ds_val = MNISTDataset(df_val,transform=transform)
ds_test = MNISTDataset(df_test,transform = transform, is_test=True)

In [7]:
batch_size=64
torch.manual_seed(1)
dl_train = DataLoader(ds_train,batch_size,shuffle=True)
dl_val = DataLoader(ds_val,batch_size,shuffle=True)

## Construct CNN model

We build a CNN with two convolutive layers with batch normalization, ReLU activation, and 2x2 max-pooling, followed by a flattening layer and two linear layers with a dropout layer between.

In [8]:
model = nn.Sequential()
model.add_module(
    'conv1',
    nn.Conv2d(
        in_channels=1,out_channels=32,
        kernel_size=5,padding=2
    ),
)
model.add_module('bn1',nn.BatchNorm2d(32))
model.add_module('relu1',nn.ReLU())
model.add_module('pool1',nn.MaxPool2d(kernel_size=2))
model.add_module(
    'conv2',
    nn.Conv2d(
        in_channels=32,out_channels=64,
        kernel_size=5,padding=2
    ),
)
model.add_module('bn2',nn.BatchNorm2d(64))
model.add_module('relu2',nn.ReLU())
model.add_module('pool2',nn.MaxPool2d(kernel_size=2))
model.add_module('flatten',nn.Flatten())
model.add_module('fc1',nn.Linear(3136,1024))
model.add_module('dropout',nn.Dropout(p=0.5))
model.add_module('fc2',nn.Linear(1024,10))

model.to(device)

Sequential(
  (conv1): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu2): ReLU()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=3136, out_features=1024, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc2): Linear(in_features=1024, out_features=10, bias=True)
)

In [9]:
loss_fn = nn.CrossEntropyLoss()
opt = optim.Adam(model.parameters(),lr=0.0015)

## Training the CNN

We now construct our training function, which keeps track of training loss and accuracy and validation loss and accuracy after each epoch.  Accuracy values are printed as we progress through training.

In [9]:
def train(model,num_epochs,dl_train,dl_val):
    loss_hist_train = [0]*num_epochs
    acc_hist_train = [0]*num_epochs
    loss_hist_val = [0]*num_epochs
    acc_hist_val = [0]*num_epochs
    for epoch in range(num_epochs):
        model.train()
        for x_batch,y_batch in dl_train:
            x_batch=x_batch.to(device)
            y_batch=y_batch.to(device)
            pred = model(x_batch)
            loss = loss_fn(pred,y_batch)
            loss.backward()
            opt.step()
            opt.zero_grad()
            loss_hist_train[epoch] += loss.item()*y_batch.size(0)
            is_correct=(torch.argmax(pred,dim=1) == y_batch).float()
            acc_hist_train[epoch] += is_correct.sum()
            
        loss_hist_train[epoch] /= len(dl_train.dataset)
        acc_hist_train[epoch] /= len(dl_train.dataset)
        
        model.eval()
    
        with torch.no_grad():
            for x_batch,y_batch in dl_val:
                x_batch=x_batch.to(device)
                y_batch=y_batch.to(device)
                pred = model(x_batch)
                loss = loss_fn(pred,y_batch)
                loss_hist_val[epoch] += loss.item()*y_batch.size(0)
                is_correct=(torch.argmax(pred,dim=1) == y_batch).float()
                acc_hist_val[epoch] += is_correct.sum()
            loss_hist_val[epoch] /= len(dl_val.dataset)
            acc_hist_val[epoch] /= len(dl_val.dataset)
        
            print(f' Epoch {epoch+1} ---- train accuracy: {acc_hist_train[epoch]:.4f} ---- val accuracy: {acc_hist_val[epoch]:.4f}')
        
    return loss_hist_train,loss_hist_val,acc_hist_train,acc_hist_val

In [10]:
torch.manual_seed(1)
num_epochs=10

In [11]:
hist = train(model,num_epochs,dl_train,dl_val)

 Epoch 1 ---- train accuracy: 0.9224 ---- val accuracy: 0.9552
 Epoch 2 ---- train accuracy: 0.9706 ---- val accuracy: 0.9774
 Epoch 3 ---- train accuracy: 0.9849 ---- val accuracy: 0.9819
 Epoch 4 ---- train accuracy: 0.9877 ---- val accuracy: 0.9857
 Epoch 5 ---- train accuracy: 0.9889 ---- val accuracy: 0.9850
 Epoch 6 ---- train accuracy: 0.9914 ---- val accuracy: 0.9898
 Epoch 7 ---- train accuracy: 0.9921 ---- val accuracy: 0.9833
 Epoch 8 ---- train accuracy: 0.9919 ---- val accuracy: 0.9860
 Epoch 9 ---- train accuracy: 0.9922 ---- val accuracy: 0.9886
 Epoch 10 ---- train accuracy: 0.9934 ---- val accuracy: 0.9845


## Prepare the submission file

Load in sample_submission.csv as a dataframe, in which we will fill in our predicted labels.

In [12]:
sub_df = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')
sub_df

Unnamed: 0,ImageId,Label
0,1,0
1,2,0
2,3,0
3,4,0
4,5,0
...,...,...
27995,27996,0
27996,27997,0
27997,27998,0
27998,27999,0


Convert ds_test data to a PyTorch tensor of shape (1,28,28) and send it to device, where the model is.

In [13]:
X_test = np.array([ds_test[x][0].numpy() for x in range(len(ds_test))])
X_test = torch.Tensor(X_test).to(device)

Make the prediction and insert the predicted labels into the sub_df.  Then export our submission file as .csv.

In [14]:
with torch.no_grad():
    preds=model(X_test).argmax(-1)

In [15]:
for k in range(len(preds)):
    sub_df.Label.iloc[k]=int(preds[k])

In [16]:
sub_df

Unnamed: 0,ImageId,Label
0,1,2
1,2,0
2,3,9
3,4,9
4,5,3
...,...,...
27995,27996,9
27996,27997,7
27997,27998,3
27998,27999,9


In [17]:
sub_df.to_csv('../working/submission.csv', index=False)

## Saving the model

In [18]:
torch.save(model,'mnist_model.pth')

In [19]:
torch.save(model.state_dict(),'mnist_model_weights.pth')

## Gradio app implementation

The following code creates an interactive Gradio app, which will ask the user to draw a digit on an in-browser sketchpad and then guess the digit using the model we've trained.  See this link for an implementation hosted on my huggingface account: https://huggingface.co/spaces/etweedy/digits

There are several steps to this implementation:
1. Write a prediction function which will take in an image from the Gradio sketchpad and make a prediction of the digit using our model.
2. Write the code that launchest the Gradio interface.

In [20]:
%%capture
! pip install gradio
import gradio as gr

In [21]:
def predict(img):
    x = torch.tensor(img, dtype=torch.float32).unsqueeze(0).unsqueeze(0) / 255.
    with torch.no_grad():
        pred = model(x)[0]
    return int(pred.argmax())

This code creates a locally hosted version of your web app which you can open and play with in your browser, if you are running this notebook on your local machine.

It's easy to share your machine learning project as a Gradio space on huggingface! More info: https://huggingface.co/blog/gradio-spaces

In [None]:
title = "Guess that digit"
description = "Draw your favorite base-10 digit (0-9) and click submit - I'll try to guess what you drew! I do a bit better if you're not too messy and your digit is fairly centered."
gr.Interface(fn=predict, 
             inputs="sketchpad",
             outputs="label",
             title = title,
             description = description,
              ).launch()