# Introduction

&ndash; *So simple it's almost silly*

This is just a teaser for what you can achieve in image classification using more powerful convolutional neural networks. 

***This will be covered in detail in the course DAT255: Practical deep learning.*** If you plan to do a MSc in Bergen you're welcome to join the course Spring 2021! In addition to image classification and other computer vision tasks, we'll discuss natural language processing, recommendation systems and predictive models from structured data. 

<a href="https://www.hvl.no/en/studies-at-hvl/study-programmes/course/dat255"><img width=60% src="assets/dat255.png"></a>

The below examples are based on the `fastai` deep learning library built on top of PyTorch:  https://github.com/fastai/fastai. 

> If you want to run this on your own computer you have to have a compatible NVIDIA GPU, and install CUDA, cuDNN and fastai. 

**Note:** Most of the code will be very unfamiliar to you! You're not meant to understand what's going on in detail. If you want to learn more, consider doing the fast.ai course: https://course.fast.ai

> If you want to learn a lot of machine learning and deep learning, and solve important real-life problems, consider doing a master's degree with us! Or at least follow DAT255. :-)

# Example 1: Cifar-10

In the previous notebook we created a convolutional neural network in PyTorch to classify the CIFAR-10 images.

<img src="assets/cifar10.png">

After a bit of work we reached an accuracy of approximately 57%, which is not too bad as there are 10 (essentially balanced) classes in CIFAR-10. 

Let's see what we can do by bringing in some bigger guns:

In [None]:
!nvidia-smi

In [None]:
from pathlib import Path
NB_DIR = %pwd
NB_DIR = Path(NB_DIR)

In [None]:
%matplotlib inline
from utils import plot_confusion_matrix

In [None]:
# Setup fastai and PyTorch
from fastai import *
from fastai.vision import *
torch.backends.cudnn.benchmark = True

Import the CNN model that we'll use. It's a 22 layer wide ResNet model https://arxiv.org/abs/1605.07146. If you take a peek at the source code you'll see that it's not completely unfamilar to you (as it's a PyTorch model): https://github.com/fastai/fastai/blob/master/fastai/vision/models/wrn.py. 

In [None]:
from fastai.vision.models.wrn import wrn_22

Dowload and unpack Cifar-10:

In [None]:
path = untar_data(URLs.CIFAR)

Set up a data loader. As we saw in the previous notebook, we need this to feed data to the network. We'll also use what's called **data augmentation**: expanding the data set by using random paddings and flips of the images. 

> Creating new images by changing the images you already have in various ways is a trick almost always used in deep learning for images. It's a way to create "new" data from what you already have.

In [None]:
ds_tfms = get_transforms(xtra_tfms=[*rand_pad(4,32)])

In [None]:
normalize=cifar_stats
data = ImageDataBunch.from_folder(path, valid='test', ds_tfms=ds_tfms, bs=256).normalize(normalize)

Here's a fastai-specific way to create a "learner" from our network and then train it (using **lots of very clever tricks** behind the scenes).

In [None]:
learn = Learner(data, wrn_22(), metrics=accuracy)

In [None]:
#learn = learn.to_fp16()

Just to show you one of many possible tricks to improve neural network training: by computing the loss achieved when varying the learning rate one can get an indication of what learning rate to choose (i.e. hyperparameter optimization).

In [None]:
learn.lr_find()
learn.recorder.plot(suggestion=True)

We want somewhere to store our models:

In [None]:
learn.model_dir = NB_DIR/'models/'
learn.model_dir.mkdir(exist_ok=True)
#learn = learn.load('cifar-model')

In [None]:
learn.callback_fns += [ShowGraph]

As you understand from the name and the parameters going into the function, there's quite a lot of interesting stuff buried in `.fit_one_cycle`. We can't go into it here. Let's just use it to see what happens. 

In [None]:
learn.fit_one_cycle(100, 3e-3, wd=0.4, div_factor=10, pct_start=0.5)

**Save the model**

In [None]:
learn.model_dir = NB_DIR/'models/'
learn.model_dir.mkdir(exist_ok=True)

In [None]:
#learn.save('cifar-model')

In [None]:
learn = learn.load('cifar-model')

**Inspect results**

Let's predict on the test data and see what we got:

In [None]:
preds,y,losses=learn.TTA(with_loss=True)

(Another of the clever tricks happens behind the scene in the above cell: use the data augmentation also when making predictions.)

In [None]:
acc = accuracy(preds,y)
acc

In [None]:
print(f"Misclassified {int((1 - acc)*len(data.valid_ds))} of {len(data.valid_ds)} images")

Here are the ones we missed the most (i.e. highest loss) among the test data:

In [None]:
interp = ClassificationInterpretation(learn, preds,y,losses)

In [None]:
interp.plot_top_losses(16)

In [None]:
cm = interp.confusion_matrix()

In [None]:
fig, ax = plt.subplots(figsize=(16,10))
plot_confusion_matrix(cm, classes=data.valid_ds.classes, ax=ax)
plt.show()

**Is this a good result?**

https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#43494641522d3130

https://benchmarks.ai/cifar-10

**Could we do better?** 

Of course! One simple thing it's easy to add is *ensembling*. We can create a bunch of different high-performing models, ensemble them (f.ex. using the `VotingClassifier` from earlier), and likely get a model that outperforms every single ensemble member. Have a look back at Part 4 of the course for a refresher. 

# Example 2: MNIST

To see that this way of doing computer vision is quite general, let's try it out on the MNIST data set. Back in Part 2 of the course we used a random forest to do this and got an accuracy of about 97%.

In [None]:
from fastai import *
from fastai.vision import *
torch.backends.cudnn.benchmark = True

In [None]:
path = untar_data(URLs.MNIST)

In [None]:
ds_tfms = get_transforms(do_flip=False, xtra_tfms=[*rand_pad(2,28)], max_lighting=0.)

In [None]:
normalize=imagenet_stats
data = ImageDataBunch.from_folder(path, ds_tfms=ds_tfms, bs=256, train='training', valid='testing').normalize(normalize)

In [None]:
learn = cnn_learner(data, models.resnet18, metrics=accuracy, pretrained=True)
learn.callback_fns += [ShowGraph]

In [None]:
NB_DIR = %pwd
NB_DIR = Path(NB_DIR)
learn.model_dir = NB_DIR/'models/'
#learn = learn.load('mnist-model')

Both our Cifar and MNIST models are _pretrained_ on the ImageNet dataset. Meaning that we're doing what's often called _transfer learning_ or _fine-tuning_. Knowing that, there are some clever tricks we can do to efficiently train our pre-trained models, and we use some of these below (this explains the `.freeze()` and `.unfreeze()` among other things).. 

I'll say a bit more about this in the lecture ([linky](https://predictiveprogrammer.com/wp-content/uploads/2018/06/visualize_cnn-1024x730.png)).

In [None]:
learn.freeze()

In [None]:
learn.fit_one_cycle(1, 1e-2)

In [None]:
learn.unfreeze()

In [None]:
learn.fit_one_cycle(12, max_lr=slice(5e-4, 5e-4))

In [None]:
#learn = learn.load('mnist-model')

In [None]:
#learn.save('mnist-model')

In [None]:
preds,y,losses=learn.get_preds(with_loss=True)
acc = accuracy(preds,y)
acc

With this accuracy, as there are 10.000 test images we only misclassified...

In [None]:
10000*(1-acc)

...images

Here they are:

In [None]:
interp = ClassificationInterpretation(learn, preds,y,losses)

In [None]:
interp.plot_top_losses(48)

In [None]:
cm = interp.confusion_matrix()

In [None]:
fig, ax = plt.subplots(figsize=(16,8))
plot_confusion_matrix(cm, classes=data.valid_ds.classes, ax=ax)
plt.show()

It's hard to do better as the miscassified digits are mostly very understandable. But again, using some standard tricks like _ensembling_ will pick up a couple of more digits. 

# Example 3: Dogs vs cats

Let's follow the above procedure for something more interesting: distinguishing dogs from cats.

In [None]:
from fastai import *
from fastai.vision import *
torch.backends.cudnn.benchmark = True

Import the CNN model that we'll use. Again, the code to implement this (state-of-the-art) model will not be completely unfamilar to you: https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py. 

In [None]:
from fastai.vision.models.cadene_models import se_resnext50_32x4d
model = se_resnext50_32x4d

In [None]:
NB_DIR = %pwd
NB_DIR = Path(NB_DIR)

In [None]:
path = untar_data(URLs.DOGS)
path

In [None]:
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=128, bs=128).normalize(imagenet_stats)
data.show_batch(rows=4)

Here we're doing another data augmentation trick: train the model first on severly scaled down images, then progressively on larger and larger images. 

In [None]:
# We start by resizing to 64x64x3
sz = 64

In [None]:
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=sz, bs=128).normalize(imagenet_stats)

In [None]:
learn = cnn_learner(data, model, metrics=accuracy, pretrained=True)#.to_fp16()
learn.model_dir = NB_DIR/'models/'

In [None]:
learn.freeze()

In [None]:
learn.lr_find()
learn.recorder.plot(suggestion=True)

In [None]:
learn.fit_one_cycle(1, 5e-4)

In [None]:
#learn.save(f'dogsvscats-model-{sz}-lvl1')

In [None]:
#learn = learn.load(f'dogsvscats-model-{sz}-lvl1')

In [None]:
learn.unfreeze()

In [None]:
learn.lr_find()
learn.recorder.plot(suggestion=True)

In [None]:
learn.fit_one_cycle(1, max_lr=slice(1e-5, 1e-4))

In [None]:
#learn.save(f'dogsvscats-model-{sz}-final')

In [None]:
#learn = learn.load(f'dogsvscats-model-{sz}-final')

In [None]:
gc.collect()

In [None]:
sz = 128

In [None]:
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=sz, bs=128).normalize(imagenet_stats)

In [None]:
learn.data = data

In [None]:
learn.unfreeze()
learn.lr_find()
learn.recorder.plot(suggestion=True)

In [None]:
learn.fit_one_cycle(1, max_lr=slice(1e-4, 1e-3))

In [None]:
#learn.save(f'dogsvscats-model-{sz}-final')

In [None]:
#learn = learn.load(f'dogsvscats-model-{sz}-final')

In [None]:
sz = 224

In [None]:
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=sz, bs=128).normalize(imagenet_stats)

In [None]:
learn.data = data

In [None]:
learn.unfreeze()
learn.lr_find()
learn.recorder.plot(suggestion=True)

In [None]:
learn.fit_one_cycle(4, max_lr=slice(1e-5, 1e-4))

In [None]:
#learn.save(f'dogsvscats-model-{sz}-final')

In [None]:
learn = learn.load(f'dogsvscats-model-{sz}-final')

## How did we do?

In [None]:
preds,y,losses = learn.TTA(with_loss=True)

In [None]:
accuracy(preds,y)

In [None]:
interp = ClassificationInterpretation(learn, preds, y, losses)

In [None]:
interp.plot_confusion_matrix()

We only misclassified 3 cats as dogs and 5 dogs as cats. 

Let's have a look at them:

In [None]:
interp.plot_top_losses(16)

In [None]:
interp.plot_top_losses(16, heatmap=True) # "Explainable AI"

## Submit to Kaggle

For fun, let's submit predictions on the unseen test data from Kaggle.

Small trick: Fine-tune a bit on both train and val data. We want to use all the labeled data to construct our model before submission. 

In [None]:
sz=224
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=sz, bs=128, train='train_val').normalize(imagenet_stats)

In [None]:
learn.fit_one_cycle(1, max_lr=slice(1e-6, 1e-5))

In [None]:
#learn.save(f'dogsvscats-model-{sz}-train_val-final')

In [None]:
#learn = cnn_learner(data, model, metrics=accuracy, pretrained=True)#.to_fp16()
#learn.model_dir = NB_DIR/'models/'

#learn = learn.load(f'dogsvscats-model-{sz}-train_val-final')

In [None]:
learn.data.add_test(list((path/'test1').iterdir()))

In [None]:
log_predictions,y = learn.TTA(ds_type=DatasetType.Test)

In [None]:
len(log_predictions)

In [None]:
#np.save('models/dogscats-test-tta-preds', log_predictions)

In [None]:
#np.save('models/dogscats-test-tta-y', y)

In [None]:
#log_predictions = np.load('models/dogscats-test-tta-preds.npy')
#y = np.load('models/dogscats-test-tta-y.npy')

Here are our predictions:

In [None]:
log_predictions[:2]

We need to submit our predictions as requested by the Kaggle competition:

In [None]:
labelled_preds = log_predictions[:, 1]

In [None]:
labelled_preds[:2]

In [None]:
fnames = [f.name[:-4] for f in learn.data.test_ds.items]

In [None]:
fnames=list(map(int,fnames))
df = pd.DataFrame({'id':fnames, 'label':labelled_preds}, columns=['id', 'label'])

df=df.sort_values(by=['id'])

In [None]:
df.head()

**Clip the predictions** 
(to not get too punished when making very certain but wrong predictions)

In [None]:
min_clip=0.005
max_clip=0.995
clipped = log_predictions.numpy().clip(min=min_clip, max=max_clip)

In [None]:
labelled_preds_clip = clipped[:, 1]

In [None]:
df_clip = pd.DataFrame({'id':fnames, 'label':labelled_preds_clip}, columns=['id', 'label'])
df_clip=df_clip.sort_values(by=['id'])

In [None]:
#df_clip.to_csv(f'catsvsdogs-submission_clip_{min_clip}.csv', index=False)

In [None]:
df_clip.head()

Let's submit to Kaggle..

How did we do?

# What's next?

CIFAR-10, MNIST and Dogs vs Cats are fun and all, but not particularly useful in themselves. However, the path from what we've done above to things that are actually potentially useful (and definitely interesting) is not very long. 

One could for example swap out the above dogs and cats with X-ray images and try to create a system that can detect abnormalities directly from bone X-rays:

<img src="assets/mura-dataset.png">

Have a look at the Stanford ML Group's Bone X-Ray Deep Learning Competition at https://stanfordmlgroup.github.io/competitions/mura if you want to give it a shot. 

<img src="assets/mura-competition.png">