## Fast.ai Multi Label Image Classification of MNIST Handwritten Digits

#### Sunil Kumar

Its working setup, i.e., Kernel is available in Kaggle at https://www.kaggle.com/suniliitb96/fast-ai-classification-with-mnist-digits

This solution to multi-labels classification is pretty much same as that for binary classification as in https://www.kaggle.com/suniliitb96/fast-ai-learning-through-cats-dogs. This solution utilizes Image Augmentation duing training and the same augmentation during prediction on test images too.

##### Fast.ai specific terms: - 
* It interprets problem classification type whether it is binary -or- multi-labels from training label values. 
* Plain-nets & Res-nets expects its input of certain size. ResNet50 expects batch_size x 224 x 224 x 3 data buffer in each mini batch. Fast.ai transforms input images and augmented intermediate images of arbitrary sizes using PyTorch torchvision API along with specified image augmentations.
* Fast.ai SGDR is actually Cyclical Learning, a.k.a., Learning Rate Annealing with Warm Restart. This approach helps in come out of any possible local minima.
* Fast.ai Differential Learning Rate is for fine tuning pre-trained weights of ResNet
* If training employs image augmentations (through 'aug_tfms' in 'ImageClassifierData', then learned model must use 'learn.TTA(...)' than plain 'learn.predict(...)'

##### Few points to note: -
* As all pre-trained plain-net (LeNet, AlexNet, .., GoogleNet) or ResNet* are trained on color images and hence they expect our images too in color. Hence, we colorize our grayscale images by replicating gray value into RGB channels.

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline 

import cv2

import os

from fastai.conv_learner import *
from fastai.plots import *

from sklearn.model_selection import train_test_split

In [None]:
os.listdir("../input")

In [None]:
# 42k train & 28k test images of size 28x28 are available in row-per-image flattened csv
train_img_lbl = pd.read_csv("../input/train.csv")
test_img = pd.read_csv("../input/test.csv")

In [None]:
# train's 1st column is label
train_img = train_img_lbl.iloc[:, 1:]
train_label = train_img_lbl.iloc[:, 0:1]

In [None]:
train_img = train_img.values.reshape(-1, 28, 28)
test_img = test_img.values.reshape(-1, 28, 28)

(train_img.shape, test_img.shape)

In [None]:
# Converting images from 8-bit to 24-bit 
train_img = np.stack((train_img,)*3, axis = -1).astype('float32')
test_img = np.stack((test_img,)*3, axis = -1).astype('float32')

(train_img.shape, test_img.shape)

In [None]:
train_img, val_img, train_lbl, val_lbl = train_test_split(train_img, train_label, train_size=0.8, random_state=1, stratify=train_label)

In [None]:
train_lbl = train_lbl.values.flatten()
val_lbl = val_lbl.values.flatten()

In [None]:
# Though 30' random rotation loos quite large, it gave good results with limited samples
# This relatively large random roation was tried to check if it helps avoid mis-labeling

arch = resnet50
sz = 28
classes = np.unique(train_lbl)
data = ImageClassifierData.from_arrays(path = "/tmp",
                                     trn = (train_img, train_lbl),
                                     val = (val_img, val_lbl),
                                     classes = train_lbl,
                                     test = test_img,
                                     tfms = tfms_from_model(arch, sz, aug_tfms = [RandomRotateZoom(deg=30, zoom=1.2, stretch=1.0)]))

In [None]:
learn = ConvLearner.pretrained(arch, data, precompute = True)

In [None]:
###
### Search for suitable, i.e., best Learning Rate for our-newly-added-Last Layer (as we have used 'precompute=True', i.e., ResNet50-minus-its-last-layer weights are being re-used as is)
###
#lrf=learn.lr_find()
#learn.sched.plot_lr()

#learn.sched.plot()

###
### Use the identified best Learning Rate for our-newly-added-Last Layer
### Note that even without running above 3 lines of Learning Rate Finder, it is well known that best learning rate is 0.01 even for MNIST Digits 28x28 images
###
#learn.fit(0.01, 2)

In [None]:
###
### SGDR (SGD with warm Resrart): fast.ai uses half Cosine shape decay (start with 0.01 & decay till 0) of LR during each epoch and then it restarts with 1e-02
###
learn.fit(1e-2, 10, cycle_len = 1)
learn.sched.plot_lr()

In [None]:
###
### Continue from Last Layer learned model with PreCompute=TRUE
### Unfreeze all layers (all weights learned so far are retained) => it sets PreCompute=FALSE making all layers learnable
### Effectively, the network weights are intialized as (ResNet-minus-last-layer with its original pre-trained weight & Last Layer as per above model learning while keeping ResNet as frozen)
### Now, all layers are FURTHER learnable
###
learn.unfreeze()

# Differential LR (above identified best LR for last layer, x0.1 to middle layer, x0.01 to inner layer)
lr=np.array([1e-4, 1e-3, 1e-2])

learn.fit(lr, 3, cycle_len = 1, cycle_mult =  2)

In [None]:
learn.sched.plot_lr()

In [None]:
#temp = learn.predict(is_test = True)
#pred = np.argmax(temp, axis = 1)

log_preds, y = learn.TTA(is_test=True)
probs_test = np.mean(np.exp(log_preds), 0)

pred_df = pd.DataFrame(probs_test)

In [None]:
pred_df = pred_df.assign(Label = pred_df.values.argmax(axis=1))
pred_df = pred_df.assign(ImageId = pred_df.index.values + 1)

In [None]:
submit_df = pred_df[['ImageId', 'Label']]
submit_df.shape

In [None]:
f, ax = plt.subplots(5, 5, figsize = (15, 15))

for i in range(0,25):
    ax[i//5, i%5].imshow(test_img[i].astype('int'))
    ax[i//5, i%5].axis('off')
    ax[i//5, i%5].set_title("Predicted:{}".format(submit_df.Label[i]))    

plt.show()

In [None]:
submit_df.to_csv('submission.csv', index=False)

### References

1. [Fast.ai Learning through Cats & Dogs Image Binary Classification](https://www.kaggle.com/suniliitb96/fast-ai-learning-through-cats-dogs)