## Image classification with Convolutional Neural Networks

Welcome to the first week of the second deep learning certificate! We're going to use convolutional neural networks (CNNs) to allow our computer to see - something that is only possible thanks to deep learning.

## Introduction to our first task: 'Dogs vs Cats'

We're going to try to create a model to enter the Dogs vs Cats competition at Kaggle. There are 25,000 labelled dog and cat photos available for training, and 12,500 in the test set that we have to try to label for this competition. According to the Kaggle web-site, when this competition was launched (end of 2013): "State of the art: The current literature suggests machine classifiers can score above 80% accuracy on this task". So if we can beat 80%, then we will be at the cutting edge as of 2013!

In [None]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

Here we import the libraries we need. We'll learn about what each does during the course.

In [None]:
import pandas as pd

In [None]:
!pip freeze | grep fastai

In [None]:
# This file contains all the main external libs we'll use
from fastai.imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

`PATH` is the path to your data - if you use the recommended setup approaches from the lesson, you won't need to change this. `sz` is the size that the images will be resized to in order to ensure that the training runs quickly. We'll be talking about this parameter a lot during the course. Leave it at `224` for now.

In [None]:
ConvLearner

In [None]:
PATH = "../input/"
TMP_PATH = "/tmp/tmp"
MODEL_PATH = "/tmp/model/"
sz=224

It's important that you have a working NVidia GPU set up. The programming framework used to behind the scenes to work with NVidia GPUs is called CUDA. Therefore, you need to ensure the following line returns `True` before you proceed. If you have problems with this, please check the FAQ and ask for help on [the forums](http://forums.fast.ai).

In [None]:
torch.cuda.is_available()

In addition, NVidia provides special accelerated functions for deep learning in a package called CuDNN. Although not strictly necessary, it will improve training performance significantly, and is included by default in all supported fastai configurations. Therefore, if the following does not return `True`, you may want to look into why.

In [None]:
torch.backends.cudnn.enabled

## First look at cat pictures

In [None]:
os.listdir(PATH)

In [None]:
fnames = np.array([f'train/{f}' for f in sorted(os.listdir(f'{PATH}train'))])
# labels = np.array([(0 if 'cat' in fname else 1) for fname in fnames])

In [None]:
fnames[:10]

In [None]:
img = plt.imread(f'{PATH}{fnames[10]}')
plt.imshow(img);

Here is how the raw data looks like

In [None]:
img.shape

In [None]:
img[:4,:4]

## Label df

In [None]:
!head ../input/labels.csv

In [None]:
!ls ../input/

In [None]:
label_df = pd.read_csv(f'{PATH}/labels.csv')

In [None]:
label_df.head()

In [None]:
label_df['breed'].value_counts()

## Our first model: quick start

We're going to use a <b>pre-trained</b> model, that is, a model created by some one else to solve a different problem. Instead of building a model from scratch to solve a similar problem, we'll use a model trained on ImageNet (1.2 million images and 1000 classes) as a starting point. The model is a Convolutional Neural Network (CNN), a type of Neural Network that builds state-of-the-art models for computer vision. We'll be learning all about CNNs during this course.

We will be using the <b>resnet34</b> model. resnet34 is a version of the model that won the 2015 ImageNet competition. Here is more info on [resnet models](https://github.com/KaimingHe/deep-residual-networks). We'll be studying them in depth later, but for now we'll focus on using them effectively.

Here's how to train and evalulate a *dogs vs cats* model in 3 lines of code, and under 20 seconds:

In [None]:
# arch=resnet101
# arch = resnet152
arch = resnet50
bs = 56
sz = 224

In [None]:
n = len(label_df)
val_idxs = get_cv_idxs(n)

In [None]:
# Uncomment the below if you need to reset your precomputed activations
# shutil.rmtree(f'{PATH}tmp', ignore_errors=True)

In [None]:
tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_csv(f'{PATH}', 'train', f'{PATH}labels.csv', bs=bs, test_name='test', 
                                    val_idxs=val_idxs, tfms=tfms, suffix='.jpg')

### Look at data

In [None]:
fn = PATH + data.trn_ds.fnames[0]; fn

In [None]:
img = PIL.Image.open(fn); img

In [None]:
img.size

In [None]:
size_d = {k: PIL.Image.open(PATH + k).size for k in data.trn_ds.fnames}


In [None]:
row_sz, col_sz = zip(*size_d.values())
row_sz = np.array(row_sz); col_sz = np.array(col_sz)

In [None]:
from collections import Counter

In [None]:
len(Counter(row_sz))

In [None]:
len(Counter(col_sz))

In [None]:
plt.hist(row_sz)
plt.title('Row size histogram')

In [None]:
plt.hist(row_sz[row_sz < 1000])
plt.title('Row size histogram')

In [None]:
plt.hist(col_sz[col_sz < 1000])
plt.title('Col size histogram')

### Traning


In [None]:
len(label_df)

In [None]:
def get_data(sz, bs):
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
    data = ImageClassifierData.from_csv(f'{PATH}', 'train', f'{PATH}labels.csv', bs=bs, test_name='test', 
                                    val_idxs=val_idxs, tfms=tfms, suffix='.jpg')
    return data if sz > 300 else data.resize(340, TMP_PATH)

## 3.1. Precompute 

In [None]:
data = get_data(sz, bs)

In [None]:
learn = ConvLearner.pretrained(arch, data, precompute=True)

In [None]:
learn

In [None]:
learn.fit(0.01, 5)

## 3.2. Augment

In [None]:
data = get_data(sz, bs)

In [None]:
learn = ConvLearner.pretrained(arch, data, precompute=True, ps=0.5)

In [None]:
learn

In [None]:
lrf=learn.lr_find()

In [None]:
learn.sched.plot_lr()

In [None]:
learn.sched.plot()

In [None]:
learn.fit(1e-2, 2)

In [None]:
learn.precompute = False

In [None]:
learn.fit(1e-2, 5, cycle_len=1)

In [None]:
learn.save('224_pre')

## 3.3 Increase image size

In [None]:
learn.set_data(get_data(299, bs))

In [None]:
learn

In [None]:
tfms_from_model

In [None]:
learn.fit(1e-2, n_cycle=3, cycle_len=1)

In [None]:
learn.fit(1e-2, n_cycle=3, cycle_len=1, cycle_mult=2)

In [None]:
learn.save('299_pre')

In [None]:
learn.load('299_pre')

In [None]:
learn.fit(1e-2, n_cycle=1, cycle_len=2)

In [None]:
log_preds, y = learn.TTA()
probs = np.mean(np.exp(log_preds),0)
preds = np.argmax(probs, axis=1) 

In [None]:
log_preds.shape, y.shape, probs.shape

In [None]:
# probs = np.exp(log_preds)
accuracy_np(probs, y), metrics.log_loss(y, probs)

In [None]:
accuracy(preds, y)

How good is this model? Well, as we mentioned, prior to this competition, the state of the art was 80% accuracy. But the competition resulted in a huge jump to 98.9% accuracy, with the author of a popular deep learning library winning the competition. Extraordinarily, less than 4 years later, we can now beat that result in seconds! Even last year in this same course, our initial model had 98.3% accuracy, which is nearly double the error we're getting just a year later, and that took around 10 minutes to compute.

## Analyzing results: looking at pictures

As well as looking at the overall metrics, it's also a good idea to look at examples of each of:
1. A few correct labels at random
2. A few incorrect labels at random
3. The most correct labels of each class (i.e. those with highest probability that are correct)
4. The most incorrect labels of each class (i.e. those with highest probability that are incorrect)
5. The most uncertain labels (i.e. those with probability closest to 0.5).

In [None]:
# This is the label for a val data
data.val_y

In [None]:
# from here we know that 'cats' is label 0 and 'dogs' is label 1.
data.classes

In [None]:
# this gives prediction for validation set. Predictions are in log scale
log_preds = learn.predict()
log_preds.shape

In [None]:
log_preds[:10]

In [None]:
probs.shape

In [None]:
preds = np.argmax(probs, axis=1)  # from log probabilities to 0 or 1

In [None]:
preds.shape

In [None]:
def rand_by_mask(mask): return np.random.choice(np.where(mask)[0], 4, replace=False)
def rand_by_correct(is_correct): return rand_by_mask((preds == data.val_y)==is_correct)

In [None]:
def plots(ims, figsize=(12,6), rows=1, titles=None):
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i])

In [None]:
def load_img_id(ds, idx): return np.array(PIL.Image.open(PATH+ds.fnames[idx]))

def plot_val_with_title(idxs, title):
    imgs = [load_img_id(data.val_ds,x) for x in idxs]
#     title_probs = [max(probs[x]) for x in idxs]
    title_probs = [f'Label: {data.classes[data.val_y[x]]} \n Pred: {data.classes[preds[x]]} \n {max(probs[x])}' for x in idxs ]
    print(title)
    return plots(imgs, rows=1, titles=title_probs, figsize=(16,8))

In [None]:
# 1. A few correct labels at random
plot_val_with_title(rand_by_correct(True), "Correctly classified")

In [None]:
# 2. A few incorrect labels at random
plot_val_with_title(rand_by_correct(False), "Incorrectly classified")

## 4. Predict

In [None]:
log_preds, y = learn.TTA(is_test=True)

In [None]:
len(data.test_ds)

In [None]:
probs = np.mean(np.exp(log_preds),0)
preds = np.argmax(probs, axis=1) 

In [None]:
df = pd.DataFrame(probs, columns=data.classes)

In [None]:
iddf = pd.DataFrame([x.split('/')[1].split('.')[0] for x in data.test_ds.fnames], columns=['id'])

In [None]:
iddf.head()

In [None]:
testdf = pd.concat([iddf, df], axis=1)

In [None]:
testdf.head()

In [None]:
testdf.to_csv('submission.csv', index=False)

There is something else we can do with data augmentation: use it at *inference* time (also known as *test* time). Not surprisingly, this is known as *test time augmentation*, or just *TTA*.

TTA simply makes predictions not just on the images in your validation set, but also makes predictions on a number of randomly augmented versions of them too (by default, it uses the original image along with 4 randomly augmented versions). It then takes the average prediction from these images, and uses that. To use TTA on the validation set, we can use the learner's `TTA()` method.

I generally see about a 10-20% reduction in error on this dataset when using TTA at this point, which is an amazing result for such a quick and easy technique!

## Analyzing results

### Confusion matrix 

## Review: easy steps to train a world-class image classifier

1. precompute=True
1. Use `lr_find()` to find highest learning rate where loss is still clearly improving
1. Train last layer from precomputed activations for 1-2 epochs
1. Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
1. Unfreeze all layers
1. Set earlier layers to 3x-10x lower learning rate than next higher layer
1. Use `lr_find()` again
1. Train full network with cycle_mult=2 until over-fitting

## Analyzing results: loss and accuracy

When we run `learn.fit` we print 3 performance values (see above.) Here 0.03 is the value of the **loss** in the training set, 0.0226 is the value of the loss in the validation set and 0.9927 is the validation accuracy. What is the loss? What is accuracy? Why not to just show accuracy?

**Accuracy** is the ratio of correct prediction to the total number of predictions.

In machine learning the **loss** function or cost function is representing the price paid for inaccuracy of predictions.

The loss associated with one example in binary classification is given by:
`-(y * log(p) + (1-y) * log (1-p))`
where `y` is the true label of `x` and `p` is the probability predicted by our model that the label is 1.