Digit recognition is not something that difficult or advanced. It is kind of “Hello world!” program – not that cool, but you start exactly here. So I decided to share my work and at the same time refresh the knowledge – it’s being a long ago I played with images.

## Data Import and Exploration

We start with importing all the necessary packages.

In [None]:
import pandas as pd
import random
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
%matplotlib inline

from sklearn.model_selection import train_test_split

import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import os
print(os.listdir("../input"))

MNIST dataset, which contains 40 thousands hand-written digits is a “Hello World” dataset for this task, we will use the data from the competition here at Kaggle. No need to invent a wheel.

In [None]:
# load data
warnings.filterwarnings("ignore")
df = pd.read_csv('../input/train.csv')
df.head()

In [None]:
len(df)

As we can see from the head() method, first column in dataset contains labels and the rest pixels of the image 28×28 – that is why we have 784 columns more. It is also useful to check the length of the dataset each time after some modification to make sure we did everything correct.

Next, let’s visualize our pixels and watch the images we have. We use randint() to select random image every time we run the code below. Also we have to transform our pixels to numpy array (now its’ type is Series) and reshape it to the size 28×28 to be able to plot them.

In [None]:
ix = random.randint(0, len(df)-1)
label, pixels = df.loc[ix][0], df.loc[ix][1:]
img = np.array(pixels).reshape((28,28))
print('label: ' + str(label))
plt.imshow(img)

## Data Preprocessing

Now, to make our life little bit easier we will transform our dataframe to have only two columns – label and image, where image is a numpy array of pixels. Also we will reduce the size of dataframe for faster computation (first we want to make sure everything works and then we start playing with model)

In [None]:
# transforming df for easier manipulation
def transform_df(df):
    labels, imgs = [], []
    for index, row in df.iterrows():
        label, pixels = row[0], row[1:]
        img = np.array(pixels)
        labels.append(label)
        imgs.append(img)

    df_img = pd.DataFrame({'label': labels, 'img': imgs})
    # to speed up the process we can use for example only 1000 samples
    # df_img = df_img[:1000]
    return df_img

df_img = transform_df(df)
df_img.head()

In [None]:
# checking images using new df structure
ix = random.randint(0, len(df_img)-1)
img = df_img.loc[ix].img.reshape((28,28))
label = df_img.loc[ix].label
print('label: ' + str(label))
plt.imshow(img)

When we have our data prepared, we want to split it into 2 datasets: one to traing our model and another to test it’s performance. And the best way to do that is using sklearn. We set up a test_size=0.2 which is standard value for this operation (usually for test we leave 20-30% of data), which means that for training remains 80%. It is also a good practice to set shuffle=True as some datasets might have ordered data, so the model will learn to recognize 0s and 1s, but won’t have any idea that 8 exists for example.

In [None]:
train_df, test_df = train_test_split(df_img, test_size=0.2, shuffle=True)
print(len(train_df), len(test_df))

In [None]:
train_df.head()

## Building a Model

We checked the length, the head of datasets – all good, we can start building our model. For this we will use pytorch.

Next, we have to transform our data into pytorch Dataset. torch.utils.data.Dataset is an abstract class representing a dataset. Your custom dataset should inherit Dataset and override the following methods:

* *(double_underscore)len(double_underscore)* so that len(dataset) returns the size of the dataset.
* *(double_underscore)getitem(double_underscore)* to support the indexing such that dataset[i] can be used to get its sample

In [None]:
# create torch dataset
from torch.utils.data import Dataset
class MNISTDataset(Dataset):
  def __init__(self, imgs, labels):    
    super(MNISTDataset, self).__init__()
    self.imgs = imgs
    self.labels = labels
  def __len__(self):
    return len(self.imgs)
  def __getitem__(self, ix):
    img = self.imgs[ix]
    label = self.labels[ix]
    return torch.from_numpy(img).float(), label

dataset = {
    'train': MNISTDataset(train_df.img.values, train_df.label.values),
    'test': MNISTDataset(test_df.img.values, test_df.label.values)
} 

len(dataset['train'])

In [None]:
# again checking image, now based on torch dataset
ix = random.randint(0, len(dataset['train'])-1)
img, label = dataset['train'][ix]
print(img.shape, img.dtype)
print(label)
plt.imshow(img.reshape((28,28)))

The beauty of pytorch is its simplicity in defining the model. We define our layer with inputs and outputs, we add some batch normalization to improve our model (It is a technique to provide any layer in a neural network with inputs that are zero mean/unit variance) and activation function, in this case ReLU.

For the first input we have 784 neurons (one neuron per each pixel) and 512 for output (this one is almost random – I tried few different values and this one performed pretty well, so I left it). Next layer will have 512 inputs (input_layer[n+1] == output_layer[n]) and 256 for output, next 256 inputs and 128 outputs and the last one – 128 inputs and 10 for output (each neuron represents one of 10 digits)

In [None]:
# create model
import torch.nn as nn
def block(in_f, out_f):
  return nn.Sequential(
      nn.Linear(in_f, out_f),
      nn.BatchNorm1d(out_f),
      nn.ReLU(inplace=True),
      #nn.Dropout(),
  )
model = nn.Sequential(
  block(784,512),
  block(512,256),
  block(256,128),
  nn.Linear(128, 10)
)
model.to(device)

Now we need to create few additional parameters for our model:

* criterion – to calculate loss function, in our case CrossEntropyLoss
* optimizer – to set up learning rate
* scheduler – to update learning rate if model doesn’t improve with time (quite powerful technique, allows us to tweak the system on the go)
* dataloader – class for pytorch that provides single- or multi-process iterators over the dataset

In [None]:
from torch.utils.data import DataLoader
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)
scheduler = ReduceLROnPlateau(optimizer, 'max', factor=0.1, patience=3, min_lr=0.0001, verbose=True)

dataloader = {
    'train': DataLoader(dataset['train'], batch_size=32, shuffle=True, num_workers=4),
    'test': DataLoader(dataset['test'], batch_size=32, shuffle=False, num_workers=4),
}

## Training and Evaluating the Model

With all above we can start training and evaluating our model. Although we define 100 epochs, it is also useful to stop the loop if model doesn’t improve with time. Here we have set up early_stop = 10, so if model doesn’t change for 10 epochs in a row we will stop the training process.

Training process: we iterate through our train data by assigning each image and label to a device defined previously, we give our model an image and it tries to find the correct class (preds), we clear all gradients (zero_grad()) and calculate the loss function and the gradient (loss), perform an optimizer step and append new value to a total_loss array.

Testing process: we iterate through the test data, make predictions, calculate the loss and accuracy of the model. In torch.max() we are looking for an index of the maximum value as it will represent the class of a digit and in our case it will match labels. Then by comparing labels and predictions we calculate the accuracy of our model.

Every time we find the best model we save it and if we hit the early_stop we exit and report the results. Usually it won’t need all those 100 epochs.

In [None]:
# train
best_acc, stop, early_stop = 0, 0, 10
for e in range(100):

    model.train()
    total_loss = []
    for imgs, labels in tqdm(dataloader['train']):
        imgs, labels = imgs.to(device), labels.to(device)
        preds = model(imgs)
        optimizer.zero_grad()
        loss = criterion(preds, labels)
        loss.backward()
        optimizer.step()
        total_loss.append(loss.data)

    model.eval()
    val_loss, acc = [], 0.
    with torch.no_grad():
        for imgs, labels in tqdm(dataloader['test']):
            imgs, labels = imgs.to(device), labels.to(device)
            preds = model(imgs)
            loss = criterion(preds, labels)
            val_loss.append(loss.data)
            _, preds = torch.max(preds, 1)
            acc += (preds == labels).sum().item()

    acc /= len(dataset['test'])
    if acc > best_acc:
        print('\n Best model ! saved.')
        torch.save(model.state_dict(), 'best_model.pt')
        best_acc = acc
        stop = -1

    stop += 1
    if stop >= early_stop:
        break

    scheduler.step(acc)

    print('\n Epoch {}, Training loss: {:4f}, Val loss: {:4f}, Val acc: {:4f}'.format(
        e + 1, torch.mean(torch.stack(total_loss)), torch.mean(torch.stack(val_loss)), acc))

print('\n Best model with acc: {}'.format(best_acc))

When we found our best model and saved it, we can play with it by feeding it with new data and see how it performs.

In [None]:
# test
model.load_state_dict(torch.load('best_model.pt'))
model.to(device)
model.eval()

ix = random.randint(0, len(dataset['test'])-1)
img, label = dataset['test'][ix]
pred = model(img.unsqueeze(0).to(device)).cpu()
pred_label = torch.argmax(pred)
print('Ground Truth: {}, Prediction: {}'.format(label, pred_label))
plt.imshow(img.reshape((28,28)))

## Submission of results

In [None]:
submission = pd.read_csv('../input/test.csv')
submission.head()

In [None]:
imgs = []
for index, row in submission.iterrows():
    pixels = row[0:]
    img = np.array(pixels)
    imgs.append(img)

submission_transf = pd.DataFrame({'img': imgs})
submission_transf.head()

In [None]:
# converting into pytorch dataset
# inserting index values as labels
submission_pt = {
    'test': MNISTDataset(submission_transf.img.values, submission_transf.index.values)
} 

In [None]:
# test individual samples from dropout dataset
model.load_state_dict(torch.load('best_model.pt'))
model.to(device)
model.eval()

ix = random.randint(0, len(dataset['test'])-1)
img, idx = submission_pt['test'][ix]
pred = model(img.unsqueeze(0).to(device)).cpu()
pred_label = torch.argmax(pred)
print(type(idx))
print('Prediction: {}'.format(pred_label))
plt.imshow(img.reshape((28,28)))

In [None]:
# make predictions on every image
subm_dict = dict()

for ix in range(0,len(submission_pt['test'])):
    img, idx = submission_pt['test'][ix]
    pred = model(img.unsqueeze(0).to(device)).cpu()
    pred_label = torch.argmax(pred)
    subm_dict[idx+1] = pred_label.item()

In [None]:
# create submission file
final_df = pd.DataFrame.from_dict(subm_dict, orient='index')
final_df.index.name = 'ImageId'
final_df.columns = ['Label']
final_df.to_csv('submission.csv')

Like it was said in the beginning it is a “Hello World” for the image recognition, we didn’t use convolutional neural network which is normally used in tasks like this, just entry level to understand the flow. I don’t usually work with images, so if there are some mistakes, please let me know. It was a nice refresher for me, hopefully it helped someone else.