## Let's train the classifier to differentiate Celebs that look alike

First of all, thanks to Jeremy Howard, Fastai team and community. I have learned so much from them. This would not be possible without them and their awesome Fastai library.

In [None]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import sys
sys.path.append("../../")

In [None]:
# This file contains all the main external libs we'll use
from fastai.imports import *

In [None]:
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

In [None]:
PATH = "../../storage/jessoreva/"

Make sure I have CUDA

In [None]:
torch.cuda.is_available()

In [None]:
torch.backends.cudnn.enabled

## Explore the data

Fastai library will assume that you have *train* and *valid* directories. It also assumes that each dir will have subdirs for each class you wish to recognize (in this case, 'eva' and 'jess').

In [None]:
os.listdir(PATH)

In [None]:
os.listdir(f'{PATH}valid')

In [None]:
files = os.listdir(f'{PATH}valid/eva')[:5]
files

In [None]:
img = plt.imread(f'{PATH}valid/eva/{files[0]}')
plt.imshow(img);

In [None]:
img.shape

In [None]:
img[:4,:4]

## Check 2 see what pretrained resnet34 can get us

In [None]:
# Uncomment the below if you need to reset your precomputed activations
# shutil.rmtree(f'{PATH}tmp', ignore_errors=True)

In [None]:
arch=resnet34
sz=128
data = ImageClassifierData.from_paths(PATH, bs=2, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)

In [None]:
lrf=learn.lr_find()

In [None]:
learn.sched.plot_lr()

one iteration = one *minibatch* of SGD. In one epoch there are (num_train_samples/num_iterations) of SGD.

In [None]:
learn.sched.plot()

In [None]:
learn.fit(0.0001, 12)

In [None]:
lrf=learn.lr_find()
learn.sched.plot()

In [None]:
learn.fit(0.00003, 5)

- garbage if we just use pretrained model!!! 
- Note, since *precompute=True*, data augmentation has no effect.

## Improving the model

### Let's try Data augmentation

In [None]:
sz=128
tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1)

In [None]:
def get_augs():
    data = ImageClassifierData.from_paths(PATH, bs=2, tfms=tfms, num_workers=1)
    x,_ = next(iter(data.aug_dl))
    return data.trn_ds.denorm(x)[1]

ims = np.stack([get_augs() for i in range(6)])
plots(ims, rows=2)

Next, create a new `data` object that includes this augmentation in the transforms.

In [None]:
data = ImageClassifierData.from_paths(PATH, tfms=tfms)
learn = ConvLearner.pretrained(arch, data, precompute=True)

In [None]:
learn.fit(4e-4, 1)

In [None]:
learn.precompute=False  # note this allows activation in forward pass to recalculated, meaning data augmentation will have effects :)

By default when we create a learner, it sets all but the last layer to *frozen*. That means that it's still only updating the weights in the last layer when we call `fit`.

In [None]:
learn.freeze()

In [None]:
learn.fit(4e-4, 4, cycle_len=1, cycle_mult=2)   # 4 cycles of 1 epoch in each cycle, AND multiply # of epoch by 2 after each cycle

In [None]:
learn.sched.plot_lr()

- Validation loss isn't improving much, so there's probably no point further training the last layer on its own.
- Note, up to this point, learn.freeze() is in effect which means only the last layer is being trained.

In [None]:
learn.save('128_lastlayer')

In [None]:
learn.load('128_lastlayer')

### Unfreeze and Differential Learning Rates

*Unfreeze* all the layers (this will allow the weights update in all the layers)

In [None]:
learn.unfreeze()

Below is the wise words from Jeremy :) 


Note that the other layers have *already* been trained to recognize imagenet photos (whereas our final layers where randomly initialized), so we want to be careful of not destroying the carefully tuned weights that are already there.

Generally speaking, the earlier layers (as we've seen) have more general-purpose features. Therefore we would expect them to need less fine-tuning for new datasets. For this reason we will use different learning rates for different layers: the first few layers will be at 1e-4, the middle layers at 1e-3, and our FC layers we'll leave at 1e-2 as before. We refer to this as *differential learning rates*, although there's no standard name for this techique in the literature that we're aware of.

In [None]:
lr=np.array([1e-4,3e-4,9e-3])

In [None]:
learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

In [None]:
learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

In [None]:
learn.sched.plot_loss()

Note that's what being plotted above is the learning rate of the *final layers*. The learning rates of the earlier layers are fixed at the same multiples of the final layer rates as we initially requested (i.e. the first layers have 100x smaller, and middle layers 10x smaller learning rates, since we set `lr=np.array([1e-4,1e-3,1e-2])`.

In [None]:
learn.save('128_all')

In [None]:
learn.load('128_all')

### Change the size to 256 and repeat (sz=256)

In [None]:
sz=256
tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1)

In [None]:
ims = np.stack([get_augs() for i in range(8)])
plots(ims, rows=2)

In [None]:
data = ImageClassifierData.from_paths(PATH, tfms=tfms)
learn.set_data(data)
learn.freeze()
learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

In [None]:
learn.unfreeze()
learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

In [None]:
learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

There is something else we can do with data augmentation: use it at *inference* time (also known as *test* time). Not surprisingly, this is known as *test time augmentation*, or just *TTA*.

TTA simply makes predictions not just on the images in your validation set, but also makes predictions on a number of randomly augmented versions of them too (by default, it uses the original image along with 4 randomly augmented versions). It then takes the average prediction from these images, and uses that. To use TTA on the validation set, we can use the learner's `TTA()` method.

In [None]:
log_preds,y = learn.TTA()
probs = np.mean(np.exp(log_preds),0)

In [None]:
accuracy_np(probs, y)

## Analyzing results: looking at pictures

As well as looking at the overall metrics, it's also a good idea to look at examples of each of:
1. A few correct labels at random
2. A few incorrect labels at random
3. The most correct labels of each class (i.e. those with highest probability that are correct)
4. The most incorrect labels of each class (i.e. those with highest probability that are incorrect)
5. The most uncertain labels (i.e. those with probability closest to 0.5).

In [None]:
# This is the label for a val data
data.val_y

In [None]:
# from here we know that 'cats' is label 0 and 'dogs' is label 1.
data.classes

In [None]:
# this gives prediction for validation set. Predictions are in log scale
log_preds = learn.predict()
log_preds.shape

In [None]:
log_preds[:10]   # [prediction for dogs, prediction for cats]

In [None]:
preds = np.argmax(log_preds, axis=1)  # from log probabilities to 0 or 1
probs = np.exp(log_preds[:,1])        # pr(dog)

In [None]:
probs[:]

In [None]:
def rand_by_mask(mask): return np.random.choice(np.where(mask)[0], min(len(preds), 4), replace=False)
def rand_by_correct(is_correct): return rand_by_mask((preds == data.val_y)==is_correct)

def plots(ims, figsize=(12,6), rows=1, titles=None):
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i])
        
def load_img_id(ds, idx): return np.array(PIL.Image.open(PATH+ds.fnames[idx]))

def plot_val_with_title(idxs, title):
    imgs = [load_img_id(data.val_ds,x) for x in idxs]
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(imgs, rows=1, titles=title_probs, figsize=(16,8)) if len(imgs)>0 else print('Not Found.')

In [None]:
# 1. A few correct labels at random
plot_val_with_title(rand_by_correct(True), "Correctly classified")

In [None]:
# 2. A few incorrect labels at random
plot_val_with_title(rand_by_correct(False), "Incorrectly classified")

In [None]:
def most_by_mask(mask, mult):
    idxs = np.where(mask)[0]
    return idxs[np.argsort(mult * probs[idxs])[:4]]

def most_by_correct(y, is_correct): 
    mult = -1 if (y==1)==is_correct else 1
    return most_by_mask(((preds == data.val_y)==is_correct) & (data.val_y == y), mult)

In [None]:
plot_val_with_title(most_by_correct(0, True), "Most correct Evangeline")

In [None]:
plot_val_with_title(most_by_correct(1, True), "Most correct Jessica")

In [None]:
plot_val_with_title(most_by_correct(0, False), "Most incorrect Evangeline")

In [None]:
plot_val_with_title(most_by_correct(1, False), "Most incorrect Jessica")

In [None]:
most_uncertain = np.argsort(np.abs(probs -0.5))[:4]
plot_val_with_title(most_uncertain, "Most uncertain predictions")

## Analyzing results

### Confusion matrix 

In [None]:
preds = np.argmax(probs, axis=1)
probs = probs[:,1]

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y, preds)

In [None]:
plot_confusion_matrix(cm, data.classes)