# Dog Breed Identification
* Video starts at 1h16m. https://youtu.be/JNxcznsrRb8?t=1h16m59s
* https://www.kaggle.com/c/dog-breed-identification
* http://www.github.com/floydwch/kaggle-cli

New demo with Dog_Breeds_Identification competition on Kaggle, download/import data from Kaggle with `kaggle-cli`
. 
#### To install Kaggle-cli
`pip install kaggle-cli`
#### To download data files
`kg download -u <username> -p <password> -c <competition>`<br/>
Replace `<competition>` with `dog-breed-identification`.

### [Official pretrained models](https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/discussion/31862):
* DenseNet: https://github.com/flyyufelix/DenseNet-Keras
* ResNet-101: https://gist.github.com/flyyufelix/65018873f8cb2bbe95f429c474aa1294
* ResNet-152: https://gist.github.com/flyyufelix/7e2eafb149f72f4d38dd661882c554a6
* SqueezeNet: https://github.com/rcmalli/keras-squeezenet
* Inception v4: https://github.com/titu1994/Inception-v4/releases
* VGG16: https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3
* VGG19: https://gist.github.com/baraldilorenzo/8d096f48a1be4a2d660d
* Other Keras models: https://keras.io/applications/

### Overview of the Steps
1. Enable data augmentation, and `precompute=True`
1. Use `lr_find()` to find highest learning rate where loss is still clearly improving
1. Train last layer from precomputed activations for 1-2 epochs
1. Train last layer with data augmentation (i.e. `precompute=False`) for 2-3 epochs with `cycle_len=1`
1. Unfreeze all layers
1. Set earlier layers to 3x-10x lower learning rate than next higher layer
1. Use `lr_find()` again
1. Train full network with `cycle_mult=2` until over-fitting

In [1]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
from fastai.imports import * 
from fastai.torch_imports import * 
from fastai.transforms import * 
from fastai.conv_learner import * 
from fastai.model import * 
from fastai.dataset import * 
from fastai.sgdr import * 
from fastai.plots import *
import os
import pandas as pd

In [None]:
torch.cuda.set_device(1)

In [3]:
PATH = "data/dogbreed/"
size = 224 
arch = resnext101_64 
batch_size = 58

In [None]:
label_csv = os.path.join(PATH, 'labels.csv')
n = len(list(open(label_csv)))-1
val_idxs = get_cv_idxs(n)
print(n)
print(len(val_idxs))

## 2 Initial exploration

In [4]:
!ls $PATH

ls: data/dogbreed/: No such file or directory


In [None]:
label_df = pd.read_csv(label_csv)

In [None]:
label_df.head()

In [None]:
label_df.pivot_table(index='breed', aggfunc=len).sort_values('id', ascending=False)

In [None]:
tfms = tfms_from_model(arch, size, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_csv(PATH, 'train', label_csv, test_name='test', val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=batch_size)

In [None]:
fn = PATH + data.trn_ds.fnames[0]; fn

In [None]:
img = PIL.Image.open(fn); img

In [None]:
img.size

In [None]:
size_d = {k: PIL.Image.open(PATH+k).size for k in data.trn_ds.fnames}

In [None]:
row_size, col_size = list(zip(*size_d.values()))
row_size = np.array(row_size); col_size = np.array(col_size)
row_size[:5]


In [None]:
plt.hist(row_size);

In [None]:
plt.hist(row_size[row_size < 1000])

In [None]:
plt.hist(col_size)

In [None]:
plt.hist(col_size[col_size < 1000])

In [None]:
len(data.trn_ds), len(data.test_ds)

In [None]:
len(data.classes), data.classes[:5]

## 3 Initial model

In [None]:
def get_data(size, batch_size): 
    tfms = tfms_from_model(arch, size, aug_tfms=transforms_side_on, max_zoon=1.1), 
    data = ImageC1assifierData.from_csv(PATH, 'train', label_csv, test_name='test', num_workers=4, 
                                        val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=batch_size)
    return data if size > 300 else data.resize(340, 'tmp')

### 3.1 Precompute

In [None]:
data = get_data(size, batch_size)

In [None]:
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(1e-2, 5)

### 3.2 Augmentation

In [None]:
from sklearn import metrics

In [None]:
data = get_data(size, batch_size)

In [None]:
learn = ConvLearner.pretrained(arch, data, precompute=Tre, ps=0.5)

In [None]:
learn.fit(1e-2, 2)

In [None]:
learn.save('224_pre')

In [None]:
learn.load('224_pre')

### 3.3 Increase size

In [None]:
learn.set_data(get_data(299, batch_size))
learn.freeze()

In [None]:
learn.fit(1e-2, 3, cycle_len=1)

In [None]:
learn.fit(1e-2, 3, cycle_len=1, cycle_mult=2))

In [None]:
log_preds, y = learn.TTA() 
probs = np.exp(log_preds)  
accuracy(log_preds, y), metrics.log_loss(y, probs)

In [None]:
learn.save('299_pre')

In [None]:
learn.load('299_pre')

In [None]:
learn.fit(1e-2, 1, cycle_len=2)

In [None]:
learn.save('299_pre')

In [None]:
log_preds, y = learn.TTA() 
probs = np.exp(log_preds) 
accuracy(logpreds, y), metrics.log_loss(y, probs)