# Using the Facial Recognition Challenge dataset with fastai

This is my take on lesson's [one](https://course.fast.ai/videos/?lesson=1) and [two](https://course.fast.ai/videos/?lesson=2) of the fastai course. I decided to use this library and see how well it will work out of the box on the [2013 facial recognition challenge](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge).

## Adding 2013 Facial Recognition dataset to the kernel
Before we start, we must upload the FER dataset to our kernel, to do so I followed the instructions of [this post on kaggle](https://www.kaggle.com/product-feedback/45472).

Click on **File** and then **Add or upload data** from within a kernel you are editing. You’ll see that **Competition Data** is now listed alongside Datasets and Kernel Output Files in the popup. You can search for specific competitions using the search box.
![competition-data](https://i.imgur.com/ndFipL4.png)

Every notebook starts with the following three lines; they ensure that any edits to libraries you make are reloaded here automatically, and also that any charts or images displayed are shown in this notebook.

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

We import all the necessary packages. We are going to work with the [fastai V1 library](http://www.fast.ai/2018/10/02/fastai-ai/) which sits on top of [Pytorch 1.0](https://hackernoon.com/pytorch-1-0-468332ba5163). The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models.

We will also import [fastai.widgets](https://docs.fast.ai/widgets.image_cleaner.html#Image-Cleaner-Widget) which offer several widgets to support the workflow of a deep learning practitioner. The purpose of the widgets is to help you organize, clean, and prepare your data for your model. Widgets are separated by data type.

In [None]:
from fastai import *
from fastai.vision import *
from fastai.widgets import *

import os
import sys
import cv2
import shutil  
import tarfile
import numpy as np

I ran into a little issue with the csv file, and only the one I extracted from the competition source works as intended. Because of this, I had to move the fer2013.tar.gz file to the working directory because the input directory is read-only.

In [None]:
# Set the path to the dataset directory (needs to be movings to kaggle/working to be extracted because the input folder is read-only)
path = '/kaggle/input/challenges-in-representation-learning-facial-expression-recognition-challenge'

os.chdir(path)

print(f"Before moving file, file path is:\n{os.getcwd()}\n\nThe directory contains:\n{os.listdir(path)} \n")  

# Destination path  
destination = '/kaggle/working'

if not os.path.isdir('/kaggle/working/challenges-in-representation-learning-facial-expression-recognition-challenge'):
    try:
        # Lets move fer2013.tar.gz to working
        dest = shutil.move(path, destination)
    except OSError:
        print(sys.exc_info())

    
# Remove files
# shutil.rmtree("/kaggle/working/challenges-in-representation-learning-facial-expression-recognition-challenge")

In [None]:
# Let's rename the folder name since it's too long
if not os.path.isdir('/kaggle/working/fer-challenge'):
    os.rename("/kaggle/working/challenges-in-representation-learning-facial-expression-recognition-challenge", "/kaggle/working/fer-challenge")

# Set path to where we moved the dataset in output/working
os.chdir("/kaggle/working/fer-challenge")

# Extract fer2013tar.gz 
tf = tarfile.open("fer2013.tar.gz")
tf.extractall()

The competition dataset gives us a csv file with sets of pixels rather than the images themselves, the code below taken from the [competition's decision](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/discussion/29428) lets us convert those pixels to black and white images.

Thanks to [MadScientist](https://www.kaggle.com/madmlscientist) for this code snippet.


In [None]:
output_path =  "/kaggle/working/fer-challenge/images"

if os.path.exists(output_path):
    os.system('rm -rf {}'.format(output_path))

os.system('mkdir {}'.format(output_path))

label_names = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']

data = np.genfromtxt('fer2013/fer2013.csv',delimiter=',',dtype=None, encoding=None)
labels = data[1:,0].astype(np.int32)
image_buffer = data[1:,1]
images = np.array([np.fromstring(image, np.uint8, sep=' ') for image in image_buffer])
usage = data[1:,2]
dataset = zip(labels, images, usage)
usage_path = ""
for i, d in enumerate(dataset):
    if(d[-1] == "Training" or d[-1] == "PrivateTest"):
        usage_path = os.path.join(output_path, "Training")
    else:
        usage_path = os.path.join(output_path, d[-1])

    label_path = os.path.join(usage_path, label_names[d[0]])
    img = d[1].reshape((48,48))
    img_name = '%08d.jpg' % i
    img_path = os.path.join(label_path, img_name)
    if not os.path.exists(usage_path):
        os.system('mkdir {}'.format(usage_path))
    if not os.path.exists(label_path):
        os.system('mkdir {}'.format(label_path))
    cv2.imwrite(img_path, img)

    #     print('Write {}'.format(img_path))

### Option A: Using cleaned.csv file to filter the training dataset

Since I have run this multiple times, I have obtained the `cleaned.csv` file which you will also obtain from the *Cleaning up* section of this kernel.

Run the code below (Option A) and don't run Option B if you have obtained cleaned.csv file.

In [None]:
# Copy cleaned.csv file to working folder
path = '/kaggle/input/cleaned/cleaned.csv'
destination = '/kaggle/working/fer-challenge/images/cleaned.csv'
shutil.copyfile(path, destination)

In [None]:
# Change path to where we formed our images
path = "/kaggle/working/fer-challenge/images"
df = pd.read_csv(path+'/cleaned.csv', header='infer')

In [None]:
np.random.seed(42)
data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='cleaned.csv',
ds_tfms=get_transforms(), size=224, num_workers=8).normalize(imagenet_stats)

### Option B: Load the training dataset from folder without using cleaned.csv file

Uncomment and run this code instead of *Option A* if this is your first time running the kernel and you haven't obtained a cleaned.csv file yet.

In [None]:
# # change path to where we formed our images
# path = "/kaggle/working/fer-challenge/images"
# os.chdir(path)

# # bs = 64
# tfms = get_transforms(do_flip=False)
# data = ImageDataBunch.from_folder(path, train = "Training", valid_pct=0.2, ds_tfms=tfms, size=26, num_workers=0, bs = 64)

In [None]:
print(f"Classes in our data: {data.classes}\nNumber of classes: {data.c}\nTraining Dataset Length: {len(data.train_ds)}\nValidation Dataset Length: {len(data.valid_ds)}")

data.show_batch(rows=3, columns = 5, figsize=(5,5))

## Train model

To train a model properly, I followed [Poonam's](https://forums.fast.ai/t/why-do-we-need-to-unfreeze-the-learner-everytime-before-retarining-even-if-learn-fit-one-cycle-works-fine-without-learn-unfreeze/41614/5) advice on the fastai forum on when to freeze and unfreeze the learner during model training. I suggested reading her approach to understanding how to optimize your model better.

In [None]:
learn = cnn_learner(data, models.resnet34, metrics=[accuracy,error_rate])
learn.fit_one_cycle(4)

In [None]:
learn.save('stage-1')

Training with the backbone frozen allows us to only train the untrained layers in the head. Once those layers have converged somewhat, we unfreeze the entire model and continue training.

> Note: With the fastai library, loading the model will load it in a frozen state by default.

In [None]:
learn.load('stage-1')

In [None]:
learn.unfreeze()

In [None]:
learn.lr_find()
learn.recorder.plot()

On the learning rate finder, we are looking for the strongest downward slope that's kind of sticking around for quite a while. For this case, it seems that we don't have a downward slope so let's limit our learning rate between 3e<sup>-6</sup> and 3e<sup>-3</sup>.

In [None]:
learn.fit_one_cycle(2, max_lr=slice(3e-6,3e-3))

In [None]:
learn.save('stage-2')

## Interpretation

We can use the ClassificationInterpretation class to have a look at what's going on.

In [None]:
learn.load('stage-2');

In [None]:
interp = ClassificationInterpretation.from_learner(learn)

In [None]:
interp.plot_confusion_matrix()

## Cleaning Up

Some of our top losses aren't due to bad performance by our model. There are images in our data set that shouldn't be there.

Using the `ImageCleaner` widget from `fastai.widgets` we can prune our top losses, removing photos that don't belong.

First, we need to get the file paths from our top_losses. We can do this with `.from_toplosses`. We then feed the top losses indexes and corresponding dataset to `ImageCleaner`.

Notice that the widget will not delete images directly from disk but it will create a new csv file `cleaned.csv` from where you can create a new ImageDataBunch with the corrected labels to continue training your model.

Note: Please Set the Number of images to a number that you'd like to view:
ex: ```n_imgs=100```

In [None]:
db = (ImageList.from_folder("/kaggle/working/fer-challenge/images/Training")
                   .split_none()
                   .label_from_folder()
                   .transform(get_transforms(), size=224)
                   .databunch()
     )

In [None]:
learn_cln = cnn_learner(db, models.resnet34, metrics=error_rate)

learn_cln.load('/kaggle/working/fer-challenge/images/models/stage-2');

In [None]:
ds, idxs = DatasetFormatter().from_toplosses(learn_cln)

Using `ImageCleaner` we will get the widget running inside our kernel and we can correct the labels or delete images that don't with any of our labels. This is important to reduce the noise in our dataset and increase the performance of our learner.

In [None]:
ImageCleaner(ds, idxs, path, batch_size=6)

Flag photos for deletion by clicking 'Delete'. Then click 'Next Batch' to delete flagged photos and keep the rest in that row. ImageCleaner will show you a new row of images until there is no more to show. In this case, the widget will show you images until there are none left from top_losses.ImageCleaner(ds, idxs)

You can also find duplicates in your dataset and delete them! To do this, you need to run .from_similars to get the potential duplicates' ids and then run ImageCleaner with duplicates=True. The API works similarly as with misclassified images: just choose the ones you want to delete and click 'Next Batch' until there are no more images left.

## Validating the test set

Since we were given the test set with labels out of the box, we will have to approach this as if we are validating our model rather than testing an unlabeled set of images. To do that, we will simply create a `data_test` from the images we have split my folders `Training` and `PublicTest` then validate to see how well our model performed. We do this rather than submitting the results because submissions for this competition is closed. For more information check out the [fastai docs](https://docs.fast.ai/data_block.html#Add-a-test-set).

In [None]:
learn.load('stage-2')
# ds_tfms=get_transforms(), size=224, num_workers=8
data_test = (ImageList.from_folder(path)
            .split_by_folder(train='Training', valid='PublicTest')
            .label_from_folder()
            .transform(tfms=get_transforms(), size=224)
            .databunch()
            .normalize()
        )

loss, acc, err_r = learn.validate(data_test.valid_dl)

In [None]:
loss = str(np.round(loss, 3))
print(f"Our final model's training loss: {loss}, with Accuracy: {round(acc.item(), 3)} and Error Rate: {round(err_r.item(), 3)}")

## Putting the model in production

To put my model in production, I used lankinen's approach and followed his instructions in his [medium article](https://medium.com/@lankinen/fastai-model-to-production-this-is-how-you-make-web-app-that-use-your-model-57d8999450cf).

### Notes
>When trying to install `torch_nightly`, the URL provided gave me a 404 error, to get around it, I visited the URL and manually found the latest version applicable to my environment (Ubunutu 18) and installed it.
>
>`wget https://download.pytorch.org/whl/nightly/cpu/torch_nightly-1.2.0.dev20190805%2Bcpu-cp36-cp36m-linux_x86_64.whl
pip3 install torch_nightly-1.2.0.dev20190805%2Bcpu-cp36-cp36m-linux_x86_64.whl`
>
>I also had to use t2.large instance which provides 8gb of RAM and increased the volume to 25gb because I ran out of memory and disk  space while installing fastai library.
>
>We will need export.pkl file which we get from `data.export()`
>
>Finally, just before loading up the server, you should update torchvision's installing by type:
>`pip3 install torchvision`