# Digit Recognition using CNN

This notebook contains solution for the famous digit recognition competition hosted on Kaggle.
We will perform the following operations throughout our code pipeline:

1.   ETL(Extract, transform and load) pipeline.
2.   Creating a custom dataset class for preparing our dataset.
3.   Implementing a model in pytorch from scratch.
4.   Performing hyper parameter tuning of the model using runbuilder and runmanager class.
5.   Evaluating the model performance using tensorboard and pandas.
6.   getting the predictions for test dataset.
7.   Exporting the model using onnx.








# Importing required libraries

In [None]:
import pandas as pd
import numpy as np
import torch
from PIL import Image
import torch.nn as nn
import torchvision.transforms as transform
from torch.utils.data import Dataset, DataLoader,random_split
import matplotlib.pyplot as plt
import torchvision
from collections import OrderedDict, namedtuple
from itertools import product
import torch.nn as nn
import torchvision.models as models
import torch.nn.functional as F
import torch.optim as optim
from tqdm.notebook import tqdm_notebook
from torch.utils.tensorboard import SummaryWriter
import time
import json
torch.set_deterministic(True)
import torch.onnx
torch.manual_seed(0)
from IPython.display import clear_output
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
device

# Importing dataset

I have imported the dataset using google drive.
You can change this code as per your location of training data

In [None]:
!unzip /content/drive/MyDrive/digit-recognizer.zip

# Creating a dataset
Now we'll create a custom dataset class.
This class will inherit *Dataset* class implemented by pytorch.
 

*   The digitRecognizerDataset inherits Dataset class and this class act as a dataset provider for our dataloader.

*   We'll implement two methods in our class

1. "__getitem__(self,idx)": This method will take as input an index and return data at that index
2.  "__len__(self)" : This method will return the length of the dataset

*   This class will also apply all the transformations that we provide on the dataset



In [None]:
class digitRecognizerDataset(Dataset):
  def __init__(self,path,train=True,transform=None):
    self.path = path
    self.train = train
    self.transform = transform
    self.df = pd.read_csv(self.path)

  def __getitem__(self,idx):
    if(self.train):
      label = self.df.iloc[idx].values[0]
      image = self.df.iloc[idx].values[1:].reshape(28,28,1)/255
      if(self.transform):
        image = self.transform(image)
      return image,label
    else:
      image = self.df.iloc[idx].values.reshape(28,28,1)/255
      if(self.transform):
        image = self.transform(image)
      return image

  def __len__(self):
    return len(self.df.index)

# Transformations
Not we'll create some transformations for our train and test set.
 

* We can have multiple transformations based on the dataset we are dealing and what type of input is expected by our network.
* We can even have different transformations for train and test dataset based on our usecase. For now we'll have the same transformation for both our dataset 


In [None]:
train_transform = transform.Compose([
                                     transform.ToTensor()
])
test_transform = transform.Compose([
                                    transform.ToTensor()
])

# DataLoader
DataLoader makes use of our dataset class to break our data into batches. This helps during training and testing since we can load as much data as we want based on our system memory and training performance

In [None]:
train_set = digitRecognizerDataset('train.csv',transform=train_transform)
train_size = int(0.8*len(train_set))
validation_size = int(len(train_set) - train_size)
train_set,validation_set = random_split(train_set,[train_size,validation_size])
train_loader = DataLoader(train_set,shuffle=True,batch_size=8)
validation_loader = DataLoader(validation_set,shuffle=True,batch_size=100)
test_set = digitRecognizerDataset('test.csv',train=False,transform=test_transform)
test_loader = DataLoader(test_set,batch_size=8)

# Display Image
The first thing we need to do before proceeding with designing our model is to get accustomed to our data. We'll make use "make_grid" method to display our images side by side in form of grid.
We can even use matplotlib to display our images and labels

In [None]:
def displayImages(images,labels):
  plt.figure(figsize=(15,15))
  print(labels)
  images = images/2 + 0.5
  plt.imshow(np.transpose(images,[1,2,0]))
  plt.xticks([])
  plt.yticks([])
  plt.show()
images,labels = next(iter(train_loader))
grid = torchvision.utils.make_grid(images)
displayImages(grid,labels)

# Designing our model
The first and foremost thing to keep in mind while designing any model is to be aware of what of data we are dealing with. Since in this notebook we are going to deal with image data, we'll use cnn model for this problem.
We'll create our model in 3 steps:



1.   Create a class and inherit nn.Module(this class wraps all the details regarding weight initialization, connecting the layers etc).
2.   in the __init__ method create the model architecture using Sequential method.
3. Create a forward method. This method gets triggered automatically when we pass data to the instance of this class. This automatic method call is internally handled by nn.module class




In [None]:
class Network(nn.Module):
  def __init__(self):
    super(Network,self).__init__()
    self.network = nn.Sequential(
              nn.Conv2d(in_channels=1,out_channels=20,kernel_size=3), # output(26,26)
              nn.ReLU(),
              nn.BatchNorm2d(20),
              nn.MaxPool2d(kernel_size=3,stride=2),                    # output(12,12)

              nn.Conv2d(in_channels=20,out_channels=32,kernel_size=3),  #output (10,10)
              nn.ReLU(),
              nn.BatchNorm2d(32),
              nn.Dropout(0.25),

              nn.Conv2d(in_channels=32,out_channels=48,kernel_size=3), #output (8,8)
              nn.ReLU(),
              nn.Dropout(0.25),

              nn.Flatten(),
              nn.Linear(in_features=8*8*48,out_features=500),
              nn.ReLU(),
              nn.BatchNorm1d(num_features=500),
              nn.Linear(in_features=500,out_features=10)
    )

  def forward(self,tensor):
    return self.network(tensor)

# Creating a run Builder class
The run builder class will take in all our hyperparameters that we want to tune and give us all possible combinations of them.

In [None]:
params = OrderedDict(
    batch_size = [16,32,64],
    lr=[0.1,0.01,0.05],
    epochs=[10,20,50]
)
class RunBuilder():
  @staticmethod
  def get_runs(params):
    Run = namedtuple('Run',params.keys())
    runs = []
    for run in product(*params.values()):
      runs.append(Run(*run))
    return runs
for param in RunBuilder.get_runs(params):
  print(param._asdict())

# Run Manager
The run manager is the most complex yet elegant part of this notebook.It acts as a abstract layer over our training loop to track our model training cycle.
It will perform the following tasks:

*   Keep track of number of epochs and runs executed
*   performance of model on training and validation set when trained on different hyperparameters


*   Save the data for each run in tensorboard Summary Writer for later evaluation 
*   Saving the result of all the runs in a csv file





In [None]:
class RunManager():
  def __init__(self):
    self.run_start_time = None
    self.run_count = 0
    self.run_params = None
    self.run_data = []

    self.epoch_start_time = None
    self.epoch_loss = 0
    self.epoch_correct_preds = 0
    self.epoch_count=0
    self.correct_val_preds = 0
    self.loader = None
    self.network = None
    self.params = None

  @torch.no_grad()
  def eval(self):
    self.network.eval()
    for images,labels in validation_loader:
      images,labels = images.to(device),labels.to(device)
      preds = self.network(images.float())
      self.correct_val_preds +=self.correct_preds(preds,labels)
    self.network.train()
    return self.correct_val_preds/len(validation_loader.dataset)

 
  def start_run(self,network,loader,params):
    self.network = network
    self.loader = loader
    self.run_params = params

    self.run_count+=1
    self.run_start_time = time.time()
    self.tb = SummaryWriter(comment=f'-{self.run_params}')

    images,labels = next(iter(loader))
    images,labels = images.to(device).float(),labels.to(device)
    grid = torchvision.utils.make_grid(images)
    self.tb.add_image("images",grid)
    self.tb.add_graph(self.network,images)
  
  def end_run(self):
    self.tb.close()
    self.epoch_count = 0


  def start_epoch(self):
    self.epoch_count+=1
    self.epoch_start_time = time.time()
    self.epoch_loss = 0
    self.epoch_correct_preds = 0
    self.correct_val_preds = 0

  def end_epoch(self):
    epoch_duration = time.time() - self.epoch_start_time
    run_duration = time.time() - self.run_start_time
    loss = self.epoch_loss/len(self.loader.dataset)
    accuracy = self.epoch_correct_preds/len(self.loader.dataset)
    
    self.tb.add_scalar('Accuracy',accuracy,self.epoch_count)
    self.tb.add_scalar('Loss',loss,self.epoch_count)
    for name,param in self.network.named_parameters():
      self.tb.add_histogram(name,param,self.epoch_count)
      self.tb.add_histogram(f'{name}.grad',param.grad,self.epoch_count)
    results = OrderedDict()
    results['loss'] = loss
    results['Train Accuracy'] = accuracy
    results['Val Accuracy'] = self.eval()
    results['epoch_duration'] = epoch_duration
    results['run_duration'] = run_duration
    results['run'] = self.run_count
    results['epoch'] = self.epoch_count
    for k,v in self.run_params._asdict().items(): results[k]=v
    self.run_data.append(results)
    df = pd.DataFrame.from_dict(self.run_data,orient='columns')

    clear_output(wait=True)
    display(df)

  def track_loss(self,loss):
    self.epoch_loss+=loss.item()*self.loader.batch_size

  def track_correct_preds(self,preds,labels):
    self.epoch_correct_preds+=self.correct_preds(preds,labels)
  
  @torch.no_grad()
  def correct_preds(self,preds,labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

  def save(self,filename):
    pd.DataFrame.from_dict(
        self.run_data,
        orient='columns'
    ).to_csv(f'{filename}.csv')

    with open(f'{filename}.json','w',encoding='utf-8') as f:
      json.dump(self.run_data,f,ensure_ascii=False,indent=4)
  

# Training loop
Finally we'll train our model with different parameters. You can pass in as many parameters as you want. Try different combinations of params and tweak your model accordingly

In [None]:
m = RunManager()
params = OrderedDict(
    batch_size = [64,128],
    lr=[0.001],
    epochs=[100]
)
for run in RunBuilder.get_runs(params):
  network = Network()
  network = network.to(device)
  optimizer = optim.Adam(network.parameters(),lr=run.lr)
  loader = DataLoader(train_set,shuffle=True,batch_size=run.batch_size)
  m.start_run(network,loader,run)
  for epoch in range(run.epochs):
    m.start_epoch()
    for batch in loader:
      images,labels = batch
      images,labels = images.to(device).float(),labels.to(device)
      preds = network(images)
      loss = F.cross_entropy(preds,labels)
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()
      m.track_loss(loss)
      m.track_correct_preds(preds,labels)
    m.end_epoch()
  m.end_run()
m.save("results")

In [None]:
df = pd.read_csv('results.csv')

In [None]:
df.sort_values(by=["Train Accuracy","Val Accuracy"],ascending=False).head()

In [None]:
df.sort_values(by=["Val Accuracy","Train Accuracy"],ascending=False).head()

# Final Train
Finally we'll train the model based on the results of evalution and training set accuracy calculated above

In [None]:
import copy
epochs=50
batch_size=128
lr=0.001
network = Network()
network = network.to(device)
train_loader = DataLoader(train_set,batch_size=batch_size)
optimizer = optim.Adam(network.parameters(),lr=lr)
min_loss=100000
best_model = None
for _ in tqdm_notebook(range(epochs)):
  total_loss=0
  for batch in train_loader:
    images,labels = batch
    images,labels = images.to(device),labels.to(device)
    preds = network(images.float())
    loss = F.cross_entropy(preds,labels)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    total_loss+=loss.item()
  if(total_loss<min_loss):
    min_loss = total_loss
    best_model = copy.deepcopy(network.state_dict())
  print("loss",loss.item())

# Loading our best model
Now we'll load the model and set it to eval mode. Eval mode switches of batchnorm optimization and dropout layer in our model. It also set the gradient calculation to False. So that no computaional graph is created while inferencing.

In [None]:
network.load_state_dict(best_model)
network.eval()

# Using Onnx
Onnx is a platform independent file format used to save DL and ML models so that they can be used across different frameworks and hardwares without any optimization to be done by developer.

We'll use this onnx file to create our application.

In [None]:
onnx_model_path = "model.onnx"
images,labels = next(iter(train_loader))
torch.onnx.export(network.to(device),images.to(device).float(),onnx_model_path,verbose=True)

# Kaggle Time
Now you can run the model and get the predictions for your test set. You can then submit the submission csv to the kaggle problem.[Digit Recognizer on kaggle](https://www.kaggle.com/c/digit-recognizer/data)

In [None]:
predictions = []
for image in test_loader:
  preds = network(image.to(device).float())
  predictions.extend(F.softmax(preds,dim=1).argmax(dim=1))
final_predictions =list(map(int,predictions))
object = {
    "ImageId":list(range(1,len(predictions)+1)),
    "Label":final_predictions
}
result = pd.DataFrame(object).reset_index(drop=True)
result.to_csv("kaggle_submission.csv",index=False)