## Prediction with Apparel dataset


This dataset was created in order for me to practice multi-label classification based on Jeremy Howard's FastAi lecture 3.
The dataset contains 8 different clothing categories in 9 different colours. The main objective of multi-label classification is to be able to label items found in photos based on these categories. 

From - https://www.kaggle.com/kaiska/apparel-dataset  

https://github.com/Kaggle/kaggle-api - reference for kaggle cli

In [0]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [0]:
# Set up fastai for collab 
!curl -s https://course.fast.ai/setup/colab | bash
!pip uninstall torch torchvision -y
!pip install torch==1.4.0 torchvision==0.5.0

In [0]:
#from google.colab import drive
#drive.mount('/content/gdrive', force_remount=True)
#root_dir = "/content/gdrive/My Drive/"
#base_dir = root_dir + 'fastai-v3/'

In [0]:
from fastai.vision import *

## Getting the data

First, install the Kaggle API by uncommenting the following line and executing it, or by executing it in your terminal (depending on your platform you may need to modify this slightly to either add `source activate fastai` or similar, or prefix `pip` with a path. Have a look at how `conda install` is called for your platform in the appropriate *Returning to work* section of https://course.fast.ai/. (Depending on your environment, you may also need to append "--user" to the command.)

In [0]:
! pip install -q kaggle  

Then you need to upload your credentials from Kaggle on your instance. Login to kaggle and click on your profile picture on the top left corner, then 'My account'. Scroll down until you find a button named 'Create New API Token' and click on it. This will trigger the download of a file named 'kaggle.json'.

Upload this file to the directory this notebook is running in, by clicking "Upload" on your main Jupyter page, then uncomment and execute the next two commands (or run them in a terminal). For Windows, uncomment the last two commands.

In [0]:
from google.colab import files

files.upload()

In [0]:
! mkdir -p ~/.kaggle/
! mv kaggle.json ~/.kaggle/

# For Windows, uncomment these two commands
# ! mkdir %userprofile%\.kaggle
# ! move kaggle.json %userprofile%\.kaggle

In [0]:
! chmod 600 ~/.kaggle/kaggle.json
# ! kaggle datasets list

In [0]:
path = Config.data_path()/'apparel'
path.mkdir(parents=True, exist_ok=True)
path

In [0]:
! kaggle datasets download -d kaiska/apparel-dataset --force -p {path} --unzip 

In [0]:
# path.ls()

## Image classification

In [0]:
bs=32
tfms = get_transforms(do_flip=False) 
data = (ImageList.from_folder(path=path).split_by_rand_pct().label_from_folder().transform(tfms, size=256).databunch().normalize(imagenet_stats))

In [0]:
data.show_batch(rows=3, figsize=(12,9))

In [0]:
print(data.classes)
len(data.classes),data.c

To create a `Learner` we use the same function as in lesson 1. Our base architecture is resnet50 again, but the metrics are a little bit differeent: we use `accuracy_thresh` instead of `accuracy`. In lesson 1, we determined the predicition for a given class by picking the final activation that was the biggest, but here, each activation can be 0. or 1. `accuracy_thresh` selects the ones that are above a certain threshold (0.5 by default) and compares them to the ground truth.

As for Fbeta, it's the metric that was used by Kaggle on this competition. See [here](https://en.wikipedia.org/wiki/F1_score) for more details.

In [0]:
arch = models.resnet50

In [0]:
# acc_02 = partial(accuracy_thresh, thresh=0.2)
# f_score = partial(fbeta, thresh=0.2)
learn = cnn_learner(data, arch, metrics=accuracy)

We use the LR Finder to pick a good learning rate.

In [0]:
learn.lr_find()

In [0]:
learn.recorder.plot()

Then we can fit the head of our network.

In [0]:
lr = 0.01

In [0]:
learn.fit_one_cycle(5, slice(lr))

In [0]:
learn.save('stage-1-rn50')

...And fine-tune the whole model:

In [0]:
learn.unfreeze()

In [0]:
learn.lr_find()
learn.recorder.plot()

In [0]:
learn.fit_one_cycle(5, slice(1e-6, lr/5))

Wow! 97% accuracy. 

In [0]:
learn.save('stage-2-rn50')

In [0]:
learn.recorder.plot_losses()

In [0]:
learn.export()