# iMet 2020 — FGVC7 dataset

## A small test using fastai

Hello World! This is my second submission for Kaggle's [iMet 2020 — FGVC7 competition](https://www.kaggle.com/c/imet-2020-fgvc7/overview). Here we are checking some approaches taught on fastai's classes. There are a lot of inspiration from other places to this; you can find the references at the bottom, ok?

* __Notes:__ in this third iteration, we increased the batch size a little bit more, and also number of epochs on the first and second training rounds.

We start importing everything from `fastai.vision`. Hey, not very PEP8-ish, but stay with me.

In [None]:
from fastai.vision import *

Let's check if we have a GPU available.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Now we use the Jupyter magic `%matplotlib inline`, so all figures we generate will appear inside this notebook.

In [None]:
%matplotlib inline

Let's choose a batch size here. In my tests, I had better results with a large number — larger than 32, although the literature says otherwise here. I am not sure why, but I will check it.

In [None]:
batch_size = 400

Setting up our input and output paths, train and test folders.

In [None]:
path_input = Path('/kaggle/input/imet-2020-fgvc7')
folder_train = 'train'
folder_test = 'test'

path_output = Path('/kaggle/working')

Let's take a look on the file `train.csv`. Check that a filename (`id`) can have several labels (`attribute_ids`); these labels are separated by spaces from each other.

In [None]:
train_csv = pd.read_csv(path_input/'train.csv')
train_csv.head()

Now we define our source image list, using `ImageList`.

* First, we set a seed — then you will generate the same images.
* We use `from_csv()` to read data on `path`, gather labels from `train.csv`, and label images on `folder_train` using these labels.
* We will also use 20% of the images on that folder as the validation dataset (`split_by_rand_pct(0.2)`).
* `label_from_df(label_delim=' ')` ensures that labels will be split whenever we have a space between them.

In [None]:
np.random.seed(42)  # the "Answer" is 42! 
source = (ImageList.from_csv(path=path_input,
                             csv_name='train.csv',
                             folder=folder_train,
                             suffix='.png')
                   .split_by_rand_pct(0.2)
                   .label_from_df(label_delim=' ')
         )

Now we define some transforms for data augmentation. We will allow some variation on lighting and zoom, and disable warping.

In [None]:
aug_transforms = get_transforms(max_lighting=0.1,
                                max_zoom=1.05,
                                max_warp=0.)

OK, with source and transforms ready, it's time to create our dataset.
* First, we will use `transform()` to apply our augmentation transforms and set an image size of `(128, 128)`.
* Then, we use `databunch()` to create objects with 32 images each (`bs=batch_size`).
* We use also `normalize(imagenet_stats)` to use normalization parameters from ImageNet. These are `mean = [0.485, 0.456, 0.406]` and `std = [0.229, 0.224, 0.225]`, as seen in [`fastai.vision.data`](https://github.com/fastai/fastai/blob/aeae6a0ae0cb975d916d43b3e1d71afd42894977/fastai/vision/data.py#L78). Each coordinate is applied to a dimension of the RGB input images.

In [None]:
data = (source.transform(aug_transforms, size=128)
              .databunch(bs=batch_size)
              .normalize(imagenet_stats)
       )

Great! Our data is ready. Let's see some examples using `data.show_batch()`.

In [None]:
data.show_batch(rows=3, figsize=(10, 10))

Now we create folders to receive the models, since we cannot use the internet.

In [None]:
# copying the ResNet-50 structure...
!mkdir -p /root/.cache/torch/checkpoints
!cp /kaggle/input/resnet/resnet50-19c8e357.pth /root/.cache/torch/checkpoints/resnet50-19c8e357.pth

# ... and the previously trained models.
!cp /kaggle/input/pretrained-models/imet-stage1_size128.pth /kaggle/working
!cp /kaggle/input/pretrained-models/imet-stage2_size128.pth /kaggle/working

Now we define the architecture we will use to train the dataset. We will use the ResNet with 50 layers:

In [None]:
architecture = models.resnet50

Let's define some metrics to evaluate our model.
* Here we use accuracy with a threshold of 0.2 (`accuracy_thresh`), and the F-2 score (`fbeta`), the function used to evaluate inputs in this competition. `beta`'s default in `fbeta` is already 2.
* `partial()` helps us calculating these measures for each epoch.

In [None]:
acc_02 = partial(accuracy_thresh, thresh=0.2) 
f_score = partial(fbeta, thresh=0.2)

Time to create our model! We use `cnn_learner` for that. The inputs are our data, the architecture, and the metrics.

In [None]:
learn = cnn_learner(data,
                    architecture,
                    metrics=[acc_02, f_score],
                    model_dir=path_output)

We can use `lr_find()` to check a suitable learning rate. Using `suggestion=True`, fastai returns a suggestion for that value — the min numerical gradient — as well.

In [None]:
# learn.lr_find()
# learn.recorder.plot(suggestion=True)

In [None]:
learning_rate = 3.31E-02  # as suggested by fastai's lr_find()

Let's start fitting! First we fit during 100 epochs. We use the 1cycle policy to fit.

After fitting, we save the partial results using `learn.save()`.

To fit and save, please uncomment (remove the `#`) on the lines below.

In [None]:
# learn.fit_one_cycle(100, slice(learning_rate))
# learn.save('imet-stage1_size128')

My results during this training stage were:

```
epoch     train_loss  valid_loss  accuracy_thresh  fbeta     time
(...)
97        0.002964    0.003352    0.998843         0.587971  09:15
98        0.002957    0.003352    0.998839         0.588603  09:16
99        0.002953    0.003348    0.998832         0.589520  09:15
```

I am using a nVidia Tesla K80, 12 GB of RAM, on an Ubuntu Linux 20.04 machine. My driver version is 440.64, my CUDA version is 10.2.

We already fitted this network, so we use the command `learn.load()` to load it.  After loading, we unfreeze (allowing access to the layers of) the     network, to continue fitting a little bit more. A more precise definition of `unfreeze()`, from the [fastai documentation](https://docs.fast.ai/basic_train.html):

> * when we say `unfreeze`, we mean that in the specified layer groups the `requires_grad` of all layers with weights (except BatchNorm layers) are set `True`, so the layer weights will be updated during training.

In [None]:
learn.load('imet-stage1_size128')
learn.save('imet-stage1_size128')
learn.unfreeze()

Now we use `lr_find()` to return a new learning rate suggestion.

In [None]:
# learn.lr_find()
# learn.recorder.plot(suggestion=True)

learning_rate = 1.74E-05  # fastai's lr_find() suggested 8.32E-06 this time. I was sleeping, so I'll try this on a next iteration :)

With the new learning rate, we come back to fitting. Now we fit for 100 more epochs.

To fit and save, please uncomment (remove the `#`) on the lines below.

In [None]:
# learn.fit_one_cycle(100, slice(learning_rate, learning_rate/10), wd=0.1)
# learn.save('imet-stage2_size128')

My results during this training stage were:

```
epoch     train_loss  valid_loss  accuracy_thresh  fbeta     time
(...)
97        0.002891    0.003315    0.998829         0.595562  11:57
98        0.002901    0.003321    0.998842         0.593342  11:56
99        0.002903    0.003327    0.998858         0.590404  11:57
```

We already fitted this network, so we use the command `learn.load()` one more time.

In [None]:
learn.load('imet-stage2_size128')
learn.save('imet-stage2_size128')

We finish our training session freezing the network and exporting our coefficients.  A definition of `freeze()`, from the [fastai documentation](https://docs.fast.ai/basic_train.html):

> When we say `freeze`, we mean that in the specified layer groups the `requires_grad` of all layers with weights (except BatchNorm layers) are set `False`, so the layer weights won't be updated during training. 

In [None]:
learn.freeze()
learn.export(path_output/'export.pkl')

## Generating `submission.csv`

Finally, we use the training coefficients to gather the results on the test data and generate the submission file.
To predict on the test images, we:
* Generate a list with all test images, using `ItemList.from_folder()`.
* Load our learner and point it to the test images, using `load_learner()`.
* Predict (infer) on the test images, using `learn.get_preds()`.

In [None]:
itemlist_test = ItemList.from_folder(path_input/folder_test)
learn = load_learner(path=path_output, test=itemlist_test)
predictions, _ = learn.get_preds(ds_type=DatasetType.Test)

The next steps are:
* Set a threshold to consider valid results.
* Gather predictions higher than the threshold we set, and group them according to the input file.
* Gather the filenames for processed test images.

In [None]:
threshold = 0.1
labelled_preds = [' '.join([learn.data.classes[idx] for idx, pred in enumerate(prediction) if pred > threshold]) for prediction in predictions]
filenames = [f.name[:-4] for f in learn.data.test_ds.items]

Finally, we create a pandas `DataFrame` to receive the variables, and transform it to a `.csv` — the submission file.

In [None]:
data_frame = pd.DataFrame({'id':filenames, 'attribute_ids':labelled_preds},
                           columns=['id', 'attribute_ids'])
data_frame.to_csv(path_output/'submission.csv',
                  index=False)

## References

Some references I used throughout the process. Thank you y'all!

* The basics, and all the structure of the notebook, came from Jeremy Howard's awesome course on `fastai`. Some notes — more succinct, from @PoonamV, and more detailed, from @hiromi — follow as well:

[1] https://course.fast.ai/videos/?lesson=3

[2a] https://forums.fast.ai/t/deep-learning-lesson-3-notes/29829

[2b] https://github.com/hiromis/notes/blob/master/Lesson3.md

* Where I learned how to copy the offline neural network, and the submission code as well:

[3] https://www.kaggle.com/aminyakubu/aptos-2019-blindness-detection-fast-ai

* Why the batch size should not matter. Masters & Luschi (2018) show that a small batch size (up to 32) could even have _better performance_:

[4] https://blog.janestreet.com/does-batch-size-matter/

[5] https://arxiv.org/abs/1804.07612

* The great ResNet paper:

[6] https://arxiv.org/abs/1512.03385

* The 1cycle learning rate policy, from Smith (2018):

[7] https://arxiv.org/abs/1803.09820

* Where I learned how to use trained-elsewhere nets on Kaggle:

[8] https://www.kaggle.com/finlaymacrae/fastai-resnet34-transfer-learning

* The Hitchhiker's Guide th the Galaxy and 42 as the "Answer to the Ultimate Question of Life, the Universe, and Everything":

[42] https://en.wikipedia.org/wiki/42_(number)#The_Hitchhiker's_Guide_to_the_Galaxy

_This notebook is licensed under the BSD 3-Clause License._

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

  * Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
  * Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.
  * Neither the name of the <organization> nor the
    names of its contributors may be used to endorse or promote products
    derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.