# Stanford Cars - NB9: Mish EfficientNet + Ranger - 5 run avg trial

## TL;DR
- Achieved **93.8%** 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b3 + Ranger
- Beat the EfficientNet paper EfficientNet-b3 result by **0.2%**
- EfficientNet author's best result using b3 was 93.6%, best EfficientNet result was 94.8% (current SOTA) with EfficientNet-b7
- Used MEfficientNet-b3, created by swapping the Squish activation function for the **Mish** activation function
- Used the **Ranger** optimisation function (a combination of RAdam and Lookahead) and trained with **FlatCosAnnealScheduler**
- EfficientNet-b3 with Ranger but without Mish was giving test set accuracy around 93.4% (-0.4%) and was much more stable to train than my efforts to train the model with RMSProp, which was used in the paper


## Credits: 
- Ranger - @lessw2020
    - Lookahead paper: [Lookahead Optimizer: k steps forward, 1 step back](https://arxiv.org/abs/1907.08610)
    - RAdam paper: [On the Variance of the Adaptive Learning Rate and Beyond, RAdam](https://arxiv.org/abs/1908.03265)
    - @lessw2020 Ranger implementation https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/ranger.py
    - version 9.3.19 used
 
- Mish @digantamisra98
    - Paper: [Mish: A Self Regularized Non-Monotonic Neural Activation Function](https://arxiv.org/abs/1908.08681v1)
    - Mish Repo: https://github.com/digantamisra98/Mish
    - Mish blog: https://medium.com/@lessw/meet-mish-new-state-of-the-art-ai-activation-function-the-successor-to-relu-846a6d93471f
    - Mish code implementation - @lessw2020 - https://github.com/lessw2020/mish/blob/master/mish.py
    
- EfficientNet - @lukemelas
    - Efficient Pytorch implementation that I swapped in Mish for: https://github.com/lukemelas/EfficientNet-PyTorch

- FlatCosAnnealScheduler - @muellerzr
    - Code taken from fastai thread below, being added to the fastai rep atm

- [Inspirational fastai thread, credit to all the contributors here](https://forums.fast.ai/t/meet-mish-new-activation-function-possible-successor-to-relu/53299/280)


### Training Params used:
   - 40 epoch
   - lr = 15e-4
   - start_pct = 0.10
   - wd = 1e-3
   - bn_wd=False
   - true_wd=True

*Default Ranger params were used*: 
   - alpha=0.5
   - k=6
   - N_sma_threshhold=5
   - betas=(.95,0.999)
   - eps=1e-5

### Augmentations used:
- Image size : 299 x 299
- Standard Fastai transforms from **get_transforms()**: 
     - do_flip = True, max_rotate = 10.0, max_zoom = 1.1, max_lighting = 0.2, max_warp = 0.2, p_affine: float = 0.75, p_lighting = 0.75
- **ResizeMethod.SQUISH**, which I found worked quite well from testing with ResNet152

### Training Notes
- Unlike testing done on the fastiai forums with XResNet and the Imagewoof dataset, this setup performed better with a shorter amount of time with a flat lr, followed by a longer cosine anneal.
- I used the full test set as the validation set, similar to the Imagewoof thread in the fastai thread linked above
- I manually restarted the gpu kernel and changed the run count as weights seemed to be being saved between runs. This persisted even when using learn.purge() and learn.destroy(). There had been a mention on the forums that the lookahead element of the Ranger implementation might have been responsible, but the problem persisted even after using version 9.3.19 which was supposed to address the issue.
- Ran on a Paperspace P4000 machine

### Thanks
Thanks as always to the amazing team at fast.ai and the fastai community! This and the following notebooks are all thanks to fastai's AMAZING MOOC and deep learning library, checkout https://fast.ai for the course and library, you won't regret it!

In [2]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import pandas as pd
from pathlib import Path
import json
from PIL import ImageDraw, ImageFont
from matplotlib import patches, patheffects
import matplotlib.pyplot as plt

import scipy.io as sio

In [3]:
from fastai import *
from fastai.vision import *
from fastai.utils.mem import *

In [4]:
# @lukemelas EfficientNet implementation: https://github.com/lukemelas/EfficientNet-PyTorch
from efficientnet_pytorch import EfficientNet

# @lessw2020 implementation : https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/ranger.py
# version 9.3.19 used
from ranger import Ranger

from helper_functions import compare_most_confused, compare_top_losses, show_img

## Getting the Data

In [5]:
path = 'data/stanford-cars/'

In [6]:
labels_df = pd.read_csv('labels_df.csv')
labels_df.head(3)

Unnamed: 0,filename,bbox_x1,bbox_y1,bbox_x2,bbox_y2,class_id,class_name,is_test,filename_cropped,bbox_h,bbox_w
0,00001.jpg,39,116,569,375,14,Audi TTS Coupe 2012,0,cropped_00001.jpg,260,531
1,00002.jpg,36,116,868,587,3,Acura TL Sedan 2012,0,cropped_00002.jpg,472,833
2,00003.jpg,85,109,601,381,91,Dodge Dakota Club Cab 2007,0,cropped_00003.jpg,273,517


Lets look closer at the data, how many class_ids do we have? Does it match the number of class names?

## Data Loading
Used the standard fastai image transforms and held out 20% of the training data for validation.

In [7]:
def get_data(SZ:int=299, do_cutout:bool=False, p_cutout:float=0.75):
    SEED = 42
    LABEL = 'class_name'
    
    if do_cutout == True:
        cutout_tfm = cutout(n_holes=(1,2), length=(100, 100), p=p_cutout)
        car_tfms = get_transforms(xtra_tfms=[cutout_tfm])
    else: car_tfms = get_transforms()

    #tfms = get_transforms()

    trn_labels_df = labels_df.loc[labels_df['is_test']==0, ['filename', 'class_name', 'class_id']].copy()

    src = (ImageList.from_df(trn_labels_df, path, folder='train', cols='filename')
                        .split_by_rand_pct(valid_pct=0.2, seed=SEED)
                        .label_from_df(cols=LABEL))

    data = (src.transform(car_tfms, 
                          size=SZ,  
                          resize_method=ResizeMethod.SQUISH, 
                          padding_mode='reflection')
                .databunch()
                .normalize(imagenet_stats))
    
    # Get test data
    TEST_SZ = 299
    src_test = (ImageList.from_df(labels_df, path, folder='merged', cols='filename')
           # the 'is_test' column has values of 1 for the test set
           .split_from_df(col='is_test')
           .label_from_df(cols=LABEL))

    data_test = (src_test.transform(car_tfms, 
                                  size=SZ,  
                                  resize_method=ResizeMethod.SQUISH, 
                                  padding_mode='reflection')
                .databunch()
                .normalize(imagenet_stats))
    
    return data, data_test, src, src_test, car_tfms

data, data_test, src, src_test, car_tfms = get_data(do_cutout=False)

### Flat and cosine annealer

In [1]:
# By @muellerzr on the fastai forums:
# https://forums.fast.ai/t/meet-mish-new-activation-function-possible-successor-to-relu/53299/133       

from fastai.callbacks import *

def FlatCosAnnealScheduler(learn, lr:float=4e-3, tot_epochs:int=1, moms:Floats=(0.95,0.999),
                          start_pct:float=0.72, curve='cosine'):
    "Manage FCFit trainnig as found in the ImageNette experiments"
    n = len(learn.data.train_dl)
    anneal_start = int(n * tot_epochs * start_pct)
    batch_finish = ((n * tot_epochs) - anneal_start)
    if curve=="cosine":
        curve_type=annealing_cos
    elif curve=="linear":
        curve_type=annealing_linear
    elif curve=="exponential":
        curve_type=annealing_exp
    else:
        raiseValueError(f"annealing type not supported {curve}")

    phase0 = TrainingPhase(anneal_start).schedule_hp('lr', lr).schedule_hp('mom', moms[0])
    phase1 = TrainingPhase(batch_finish).schedule_hp('lr', lr, anneal=curve_type).schedule_hp('mom', moms[1])
    phases = [phase0, phase1]
    return GeneralScheduler(learn, phases)
                
def fit_fc(learn:Learner, tot_epochs:int=None, lr:float=defaults.lr,  moms:Tuple[float,float]=(0.95,0.85), start_pct:float=0.72,
                  wd:float=None, callbacks:Optional[CallbackList]=None, show_curve:bool=False)->None:
    "Fit a model with Flat Cosine Annealing"
    max_lr = learn.lr_range(lr)
    callbacks = listify(callbacks)
    callbacks.append(FlatCosAnnealScheduler(learn, lr, moms=moms, start_pct=start_pct, tot_epochs=tot_epochs))
    learn.fit(tot_epochs, max_lr, wd=wd, callbacks=callbacks)

## Save Metrics

In [11]:
def save_metrics_to_csv(exp_name, run_count, learn, metrics):
    for m in metrics:
        name = f'{m}_{exp_name}_run{str(run_count)}_2019-09_04'

        ls = []
        if m == 'val_loss_and_acc':
            acc = []
            for l in learn.recorder.metrics:
                 acc.append(l[0].item())
            ls = learn.recorder.val_losses 

            d = {name: ls, 'acc': acc}
            df = pd.DataFrame(d)
            #df.columns = [name, 'acc']
        elif m == 'trn_loss':
            for l in learn.recorder.losses:
                ls.append(l.item())
            df = pd.DataFrame(ls)
            df.columns = [name]

        df.to_csv(f'{name}_{m}.csv')
        print(df.head())

# MEfficientNet + Ranger Trial

In [3]:
# Modified version of @lukemelas' EfficientNet implementation with Mish instead of Swish activation
from MEfficientNet_PyTorch.efficientnet_pytorch import EfficientNet as MEfficientNet

effnet_b3 = 'efficientnet-b3'
def getModel(data, model_name):
    model = MEfficientNet.from_pretrained(model_name)
    model._fc = nn.Linear(1536, data.c)
    return model

mish_model = getModel(data, effnet_b3) 

In [14]:
exp_name = 'mefficient_b3_ranger_40e_15e4_wd1e-3_10pct_start'
metrics = ['trn_loss', 'val_loss_and_acc']

#Adding Mish activation to EfficientNet-b3 meant reducing bs from 32 -> 24
data_test.batch_size = 24

# Manually restarted the gpu kernel and changed the run count as weights seemed to be being saved between runs
run_count = 5

learn = Learner(data_test, 
                model=mish_model,
                wd = 1e-3,
                opt_func=Ranger,
                bn_wd=False,
                true_wd=True,
                metrics=[accuracy],
                loss_func=LabelSmoothingCrossEntropy()
               ).to_fp16()

fit_fc(learn, tot_epochs=40, lr=15e-4, start_pct=0.10, wd=1e-3, show_curve=False)

learn.save(f'9_{exp_name}_run{run_count}')

# SAVE METRICS
save_metrics_to_csv(exp_name, run_count, learn, metrics)

epoch,train_loss,valid_loss,accuracy,time
0,3.728042,2.873347,0.458774,04:38
1,1.944554,1.743006,0.74543,04:38
2,1.516537,1.591039,0.792439,04:38
3,1.334032,1.482762,0.830991,04:38
4,1.264926,1.473987,0.829126,04:38
5,1.249027,1.411017,0.861957,04:38
6,1.1902,1.417674,0.851884,04:38
7,1.150759,1.384996,0.85947,04:38
8,1.098737,1.330367,0.875762,04:37
9,1.102312,1.352557,0.877254,04:39


   trn_loss_mefficient_b3_ranger_40e_15e4_wd1e-3_10pct_start_run5_2019-09_04
0                                           5.269136                        
1                                           5.274109                        
2                                           5.290759                        
3                                           5.294561                        
4                                           5.287923                        
   val_loss_and_acc_mefficient_b3_ranger_40e_15e4_wd1e-3_10pct_start_run5_2019-09_04  \
0                                           2.873347                                   
1                                           1.743006                                   
2                                           1.591039                                   
3                                           1.482762                                   
4                                           1.473987                                   

        a

## Mean Test Set Accuracy

In [4]:
aa = pd.read_csv('val_loss_and_acc_mefficient_ranger_40e_15e4_wd1e-3_10pct_start_run1_2019-09_04_val_loss_and_acc.csv')
a = pd.read_csv('val_loss_and_acc_mefficient_b3_ranger_40e_15e4_wd1e-3_10pct_start_run2_2019-09_04_val_loss_and_acc.csv')
b = pd.read_csv('val_loss_and_acc_mefficient_b3_ranger_40e_15e4_wd1e-3_10pct_start_run3_2019-09_04_val_loss_and_acc.csv')
c = pd.read_csv('val_loss_and_acc_mefficient_b3_ranger_40e_15e4_wd1e-3_10pct_start_run4_2019-09_04_val_loss_and_acc.csv')
d = pd.read_csv('val_loss_and_acc_mefficient_b3_ranger_40e_15e4_wd1e-3_10pct_start_run5_2019-09_04_val_loss_and_acc.csv')

In [6]:
(aa['acc'][39:].values[0] 
 + a['acc'][39:].values[0] 
 + b['acc'][39:].values[0]
 + c['acc'][39:].values[0]
 + d['acc'][39:].values[0]) / 5

0.9380425333976745