<h1><center><font size="6">fastai-v2/GPU for Chinese MNIST Prediction</font></center></h1>

# <a id='0'>Table of Contents</a>

- <a href='#1'>Introduction</a>  
- <a href='#2'>Preparing our Data</a>   
- <a href='#3'>Preparing our DataBlock</a>   
- <a href='#4'>Preparing our Learner</a> 
- <a href='#5'>Baseline Model</a> 
- <a href='#6'>Improving our Model</a> 
- <a href='#7'>Conclusions</a> 

# <a id='1'>Introduction</a> 

The Chinese MNIST dataset provides us with 15,000 images of Chinese numbers handwritten by 100 volunteers. Each participant provided 10 samples of the 15 Chinese characters for numbers. 

The objective of this notebook is to demonstrate how to solve the Chinese MNIST classification task with 0.999 accuracy using: 
1. `fastai` version 2
2. GPU acceleration
3. Multilabel classification

# <a id='2'>Preparing our Data</a>   

### Install Dependencies

In [None]:
import fastai
from fastai.vision.all import *
from fastai.vision.widgets import *
import pandas as pd
import os

### Import Files

In [None]:
#imports files from kaggle
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

After importing your files, you can create a `path` to the folder that contains your images. The `path` object contains a `.ls()` method that behaves similar to a Python `list` but has additional functionality. 

In [None]:
#creates a path to the folder containing image files
path = Path("../input/chinese-mnist/data/data")

#makes .ls() format easier to read  
Path.BASE_PATH = path

#checks image files using .ls() method
path.ls()

### Create a DataFrame

The Chinese MNIST dataset also contains a CSV file that we can use to label our variables.

In [None]:
df = pd.read_csv("../input/chinese-mnist/chinese_mnist.csv")
df.head()

We currently don't have a column that we can use to reference our x variables/images. 

Notice that our image files have a similar structure. For example, `"input_47_6_7.jpg"` and `"input_12_8_2.jpg"` share various components. All of our image files begin and end similarly, and they all contain, in the same order, the number of the participant, the number of the sample, and the code of the Chinese character.

We can create a new column in Pandas and use our existing columns to concatenate our file names.

In [None]:
df['fname'] = ("input_" + df['suite_id'].astype(str) 
               + "_" 
               + df['sample_id'].astype(str) 
               + "_" 
               + df['code'].astype(str) 
               + ".jpg")
df.head()

# <a id='3'>Preparing our DataBlock</a>  

### Define Variables

Now that we have our DataFrame, we can define our variables. We can use our new `fname` column for our x variables, but we will need to attach a path to each variable. We can also use our `value` column to label our y variables.

In [None]:
def get_x(r): return path/r['fname']

#.astype() and .split() method were added to contain each label
def get_y(r): return r['value'].astype(str).split(" ")

### From DataFrames to DataLoaders

Before we jump into creating a `DataLoaders` object, lets review some terminology. Note that the last two classes are specific to fastai and build ontop of PyTorch's `Dataset` and `DataLoader` classes.

* `Dataset`: A collection that returns a tuple of your independent and dependent variable for a single item.
* `DataLoader`: An iterator that provides a stream of mini-batches, where each mini-batch is a tuple of a batch of independent variables and a batch of dependent variables.
* `Datasets`: An object that contains a training `Dataset` and a validation `Dataset`.
* `DataLoaders`: An object that contains a training `DataLoader` and a validation `DataLoader`.
<p><font size="1">*From "Deep Learning for Coders with Fastai and PyTorch" - credit to fastai/Jeremy Howard/Sylvain Gugger</font></p>

We will need to compile a `DataBlock` to create our `DataLoaders` object:

In [None]:
#creates a Datablock object
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                  splitter=RandomSplitter(seed=42),
                  get_x=get_x,
                  get_y=get_y,
                  item_tfms = RandomResizedCrop(128, min_scale=0.35))

# passes our dataframe into the dataloaders method of our DataBlock object
dls = dblock.dataloaders(df)

Lets break this down: 
* `blocks` let us to pass `ImageBlock` and `MultiCategoryBlock.` Even though we converted our x variables into image paths, we still need a method to open our images and to transform them into tensors. `ImageBlock` does this. 
* `MultiCategoryBlock` allows us to have multiple labels for each item. More on one-hot-encoding later.
* `splitter` splits our `DataFrame` into a training and validation set. Default split is 80/20.
* `get_x` calls our `get_x` function to retreive our image paths. 
* `get_y` calls our `get_y` function to retreive our image labels.
* `item_tfms` makes sure all of our images are the same scale and GPU compatible.
* `dblock.dataloaders(df)` creates a `DataLoaders` object and passes in our `DataFrame`.

Let's analyze our new `DataLoaders` object: 

In [None]:
#displays number of batches for our training and validation sets
len(dls.train), len(dls.valid)

In [None]:
#displays a batch with images and labels
dls.show_batch(nrows=1, ncols=5)

In [None]:
#displays training Dataset
dls.train_ds

In [None]:
#displays validation Dataset
dls.valid_ds

Notice that our `DataLoaders` object has split our `DataFrame` into a training `Dataset` of 12,000 and a validation `Dataset` of 3,000. Furthermore, our `DataLoaders` object transforms our `Dataset` objects into batches. 

Also notice the structure of our `Dataset` objects:
* Our x variables are images with a size of 64x64 pixels. 
* The lists of 0s and 1s contains our category labels and refers to ***one-hot-encoding***. Each category is considered independently and a 1 is granted if the category is present. Therefore, we can expect to see 15 digits for our 15 possible categories. 
* Although we don't expect to find multiple labels in our data, our multilabel classification approach allows our model to choose no label in the abscence of a prediction above our treshold. This is in contrast to a multicategory classification model with a softmax loss function which always predicts a category label even when there are no valid matches.

# <a id='4'>Preparing our Learner</a>   

### Batch Testing the Model

Before we test our `DalaLearner` object, lets generate predictions from a single batch.

In [None]:
#uses fastai's resnet18 model
learn = cnn_learner(dls, resnet18)

#creates a batch from our train dataset
x,y = to_cpu(dls.train.one_batch())

#generates predictions from our batch
batch = learn.model(x)

In [None]:
#analyzes batch
batch.shape

Our batch size is 64 images, which is the default for fastai, and each image is generating predictions for 15 seperate categories. Lets analyze a single image:

In [None]:
#we can index into our batch to return predictions for a single image
batch[0]

As we expected, our image is receiving predictions for each of our 15 categories. We will compare our predictions with our targets/labels to calculate a loss.

### Loss Function

By default, fastai will apply a Binary Cross-Entropy loss function to multilabel classification problems. We can call `loss_func` on our learner object to see our loss function. 

In [None]:
learn.loss_func

* Because we have a one-hot-encoded dependent variable, we cannot use a cross entropy loss function. The softmax function that's used to transform predictions into comparative activations makes it impossible to do multilabel classification. Softmax tends to push one activation over the others and cannot identify multiple labels in one image.
* Instead, a Binary Cross-Entropy loss function uses a sigmoid function to transform our predictions into activations between 0 and 1. Each prediction is then compared with our targets using a similar function to `mnist_loss`.
* `BCEWithLogitsLoss` refers to both sigmoid and binary cross-entropy loss in a single function

In [None]:
#defining our own sigmoid function
def sigmoid(x): return 1/(1+torch.exp(-x))

#defining our own BCELoss function  
def binary_cross_entropy(inputs, targets):
    inputs = inputs.sigmoid()
    return -torch.where(targets==1, inputs, 1-inputs).log().mean()

Now that we understand `BCEWithLogitsLoss`, lets compare our batch predictions with our targets.

In [None]:
#creates a loss function
loss_func = nn.BCEWithLogitsLoss()

#passes our predictions and our labels into our loss function
loss = loss_func(batch, y)

#prints out or loss
loss

Notice the `grad_fn` attribute. This tells us fastai is automatically keeping track of our gradients for us! Our gradients will be calculated from our loss and they will be used to update our parameters.

### Metrics for Accuracy

We will also need to make sure our metric is compatible with our multilabel classification task. Because we could have more than one prediction on a single image, we need to pick a treshold to evaluate the accuracy of each prediction. The default treshold in fastai is 0.5, but Jeremy Howard suggests using 0.2.

In [None]:
def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
    if sigmoid: inp = inp.sigmoid()
    return ((inp>thresh)==targ.bool()).float().mean()

# <a id='5'>Baseline Model</a>

### Results

Now that we have our `DataLoaders` object, lets create a `learner`. 

In [None]:
learn = cnn_learner(dls, resnet18, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(6)

Our baseline model was able to achieve 0.999 accuracy on our first attempt. Lets break down the `learner` object to see how we got our results:  
* `cnn_learner` is a fastai class that allows us to build our model with a pretrained convolutional neural network. 
* `dls` is our `DataLoaders` object that contains our images in batches of training and validation sets.
* `resnet18` tells fastai we want to use a pretrained `cnn` with 18 layers.
* `metrics` calls our `accuracy_multi()` function which is needed for multilabel classification.
* `fine_tune` is a fastai method that allows us to train our model and pass in the total number of epochs
* `base_lr` is our learning rate which will be multiplied by our gradients to inform new activations. The default learning rate in fastai is 1e-3 and does not need to be specified inside of `fine_tune`. 

### Model Analysis

Now that we have our results, we can plot our top losses with fastai's `ClassificationInterpretation` class. Notice that our `probabilities` category is a tensor of 15 predictions for each image. 

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(5, nrows=1)

Out of our 3000 images in our validation set, our model only mislabeled a few. Because we are using one-hot-encoding our results are saying actual 1s were predicted as 0s x times, and actual 0s were predicted as 1s y times. In any case, we can use the sum of x and y to determine the total number of mislabelled images.

In [None]:
interp.most_confused(5)

# <a id='6'>Improving our Model</a>

### Learning Rate Finder

We can improve our model by finding our ideal learning rate. Fastai lets us do this with the `.lr_find()` method on our `learner` object:

In [None]:
learn = cnn_learner(dls, resnet18, metrics=partial(accuracy_multi, thresh=0.2))
learn.lr_find()

We can see that there's not much activity between 1e-7 and 1e-3 (fastai's default learning rate), so lets test a learning rate of 1e-2:

In [None]:
learn = cnn_learner(dls, resnet18, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(6, base_lr=1e-2)

Our results improved! But there's still more we can do. 

### Unfreezing and Transfer Learning

Because we are using transfer learning, we are replacing the final linear layer of our `cnn` with a new layer of random weights. We want to train a model in such a way that it is able to remember the useful ideas from the pretrained model so that it can adjust these weights as required for our specific task. We can do this by freezing pretrained layers and only updating the weights for our new linear layer. 

`fastai` lets us do this with the `fit_one_cycle` method:

In [None]:
learn = cnn_learner(dls, resnet18, metrics=partial(accuracy_multi, thresh=0.2))
learn.fit_one_cycle(3, base_lr=1e-2)

Now that we've gone through 3 epochs, we can unfreeze our pretrained layers using the fastai `.unfreeze()` method:

In [None]:
learn.unfreeze()

After unfreezing our layers, we call the `.lr_find` method again since adding new layers results in a new learning rate:

In [None]:
learn.lr_find()

We don't have a steep decending slope like our previous plot because the model has already been trained. The goal is to pick a point before the sharp increase, not the maximum gradient. Given the flattened slope, we will train our model with our transfered weights for 6 more epochs.

In [None]:
learn.fit_one_cycle(6, 1e-4)

Our accuracy has increased once more!

# <a id='7'>Conclusions</a>

We were able to create a baseline model with .999+ accuracy using fastai's library, gpu acceleration, and multilabel classification. We were further able to improve our model by freezing our pretrained weights and by finding the ideal learning rates for multiple steps in our model. There is still some fine-tuning we can do, but this should be enough to get others started with fastai and multilabel classification!

Thank you to Jeremy Howard.