# Introduction

In this notebook we'll attack the MedNIST data set presented in the previous notebook using a deep learning library called _fastai_. We'll also study another, more difficult data set.

> **Note:** To run this notebook locally you have to have the fastai library installed. See https://docs.fast.ai for instructions if you want to install on your own computer. If you're using Google Colab or Paperspace Gradient, running the notebook will install what's necessary.

# Setup

In [None]:
# This is a quick check of whether the notebook is currently running on Google Colaboratory, as that makes some difference for the code below.
# We'll do this in every notebook of the course.
if 'google.colab' in str(get_ipython()):
    print('The notebook is running on Colab. colab=True.')
    colab=True
else:
    print('The notebook is not running on Colab. colab=False.')
    colab=False

In [None]:
# Set to True if you're using Paperspace Gradient:
gradient=False

In [None]:
if colab or gradient:
    !pip install -Uqq fastbook
    import fastbook
    fastbook.setup_book()
    from fastbook import *
    !pip install fastai-amalgam
    !pip install palettable
    !pip install matplotlib_venn
    from fastai.vision.all import *
    NB_DIR = Path.cwd()
else:
    from fastai.vision.all import *
    NB_DIR = Path.cwd()
    DATA = NB_DIR    
    
if colab:
    DATA = Path('./gdrive/MyDrive/ColabData')
    DATA.mkdir(exist_ok=True)
if gradient:
    DATA = Path('/storage')
    DATA.mkdir(exist_ok=True)


In [None]:
import os, shutil, gc

# Load data

In [None]:
if gradient:
    path = untar_data("https://www.dropbox.com/s/5wwskxctvcxiuea/MedNIST.tar.gz?dl=1", archive='MedNIST.tar.gz', dest='/storage')
else:
    path = untar_data("https://www.dropbox.com/s/5wwskxctvcxiuea/MedNIST.tar.gz?dl=1", archive='MedNIST.tar.gz')

In [None]:
path.ls()

We set up a data loader, more precisely a fastai [`ImageDataLoader`](https://docs.fast.ai/vision.data.html#ImageDataLoaders), setting aside 30% of the images as validation data:

In [None]:
fnames = get_image_files(path)
def label_func(x): return x.parent.name
dls = ImageDataLoaders.from_path_func(path, fnames, label_func, valid_pct=0.3)

In [None]:
dls.show_batch()

In [None]:
print(f'Number of training images: {len(dls.train_ds)}')
print(f'Number of validation images: {len(dls.valid_ds)}')

# Train a model

Here we create what fastai calls a _learner_. It's created from the above dataloaders and its neural network has a specific architecture (and is pretrained on a large 1.4 million image set from the ImageNet competition). Here we use a `resnet18`. 

In [None]:
learn = cnn_learner(dls, resnet18, pretrained=True, metrics=accuracy)

Let's train it for a bit:

In [None]:
learn.fine_tune(1)

Essentially 100% accurate on the validation data after seconds of training.

# Evaluate model

We can have a look at some predictions:

In [None]:
learn.show_results(figsize=(10,10))

...and also the confusion matrix and the images it found most difficult

In [None]:
interp = ClassificationInterpretation.from_learner(learn)

In [None]:
interp.plot_confusion_matrix(figsize=(8,8))

Here are the four images the model misclassified and the ones it was closest to misclassifying, in the order of model confidence:

In [None]:
interp.plot_top_losses(9, figsize=(12,12))

We see that the problem is too simple when we're using such powerful methods as above. Let's try another!

In [None]:
# Free up memory:
learn = None
dls = None
path=None
gc.collect()
torch.cuda.empty_cache()

# Another example

In [None]:
# This is a quick check of whether the notebook is currently running on Google Colaboratory, as that makes some difference for the code below.
# We'll do this in every notebook of the course.
if 'google.colab' in str(get_ipython()):
    print('The notebook is running on Colab. colab=True.')
    colab=True
else:
    print('The notebook is not running on Colab. colab=False.')
    colab=False

# Set to True if you're using Paperspace Gradient:
gradient=False

if colab or gradient:
    !pip install -Uqq fastbook
    import fastbook
    fastbook.setup_book()
    from fastbook import *
    !pip install fastai-amalgam
    !pip install palettable
    !pip install matplotlib_venn
    from fastai.vision.all import *
    NB_DIR = Path.cwd()
else:
    from fastai.vision.all import *
    NB_DIR = Path.cwd()
    DATA = NB_DIR    
    
if colab:
    DATA = Path('./gdrive/MyDrive/ColabData')
    DATA.mkdir(exist_ok=True)
if gradient:
    DATA = Path('/storage')
    DATA.mkdir(exist_ok=True)


In [None]:
import os, shutil, gc

We need something more challenging to see what we can achieve with such more powerful models and training methods. 

Let's keep things relatively simple by using one of the data sets collected in the repository _fast.ai Datasets_ (https://course.fast.ai/datasets). You'll find the options using the previous link, or by looking at

In [None]:
print([d for d in dir(URLs) if '__' not in d])

Let's use the Caltech-UCSD Birds-200-2011 data sets of 200 different bird species, with 11,788 images in total:<br>
<img src="assets/birds_collage.jpg">

In [None]:
path = untar_data(URLs.CUB_200_2011)

Let's figure out what we've downloaded. In particular, where we can find the images and the corresponding image labels:

In [None]:
path.ls()

In [None]:
(path/'CUB_200_2011').ls()

In [None]:
images = path/'CUB_200_2011'/'images'
images.ls()

We see that the images are stored in 200 separate subfolders whose names are the bird species.

In [None]:
(path/'CUB_200_2011'/'images'/'051.Horned_Grebe').ls()

There are 60 Horned Grebes. Here's one:

In [None]:
fname = path/'CUB_200_2011'/'images'/'051.Horned_Grebe'/'Horned_Grebe_0069_34990.jpg'

In [None]:
from fastai.vision import *

In [None]:
im = Image.open(fname)
show_image(im, figsize=(10,10))
plt.show()

In [None]:
im.shape

## Create a dataloader:

In [None]:
item_sz=300
db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
               get_items=get_image_files,
               get_y=parent_label,
               splitter=RandomSplitter(seed=42),
               item_tfms=Resize(item_sz),
               batch_tfms=Normalize.from_stats(*imagenet_stats))

In [None]:
# If you run out of GPU memory, then you can lower the batch size
bs=64
dls = db.dataloaders(images, bs=64)

In [None]:
print(f'Number of training images: {len(dls.train_ds)}')
print(f'Number of validation images: {len(dls.valid_ds)}')

Here are a few images from one batch (batch size is set to 64, these are 6 of them chosen at random).

In [None]:
dls.show_batch(max_n=6, figsize=(12,8))

We create a learner as above.

In [None]:
learn = cnn_learner(dls, resnet18, pretrained=True, metrics=accuracy).to_fp16()

In [None]:
#learn.model

In [None]:
lr = learn.lr_find()

In [None]:
lr

In [None]:
learn.fine_tune(5, base_lr=lr.valley)

### Is this a good result?

Here's the state-of-the-art on the same data set from 2014: https://pub.inf-cv.uni-jena.de/pdf/Goering14:NPT.

<img src="assets/goering.png">

<img src="assets/goering_approach.png">

Not too bad for something that could be constructed this easily.

Later in this notebook and in the course we'll learn several tricks that could be used to improve the results (e.g. progressive resizing, more advanced data augmentation, ensembling, and more).

## Evaluating the model

Here are a few predictions on validation data:

In [None]:
learn.show_results(figsize=(12,12))

In [None]:
interp = ClassificationInterpretation.from_learner(learn)

These are the images the model was most confident on, while also incorrect. I.e. in some sense the worst mistakes:

In [None]:
interp.plot_top_losses(6, figsize=(16,12))

### Inspecting the model

By extracting the gradients belonging to each of the classes in the data set as they flow into the final convolutional layer in the image, one can produce a heatmap that indicates where in the image the model based its predictions on.

<img src="assets/gradcam.png">

We'll use the gradCAM implementation from [fastai-amalgam](https://github.com/Synopsis/amalgam). You can have a look [here](https://github.com/Synopsis/amalgam/blob/master/fastai_amalgam/interpret/gradcam.py) for details.

In [None]:
import sys
!{sys.executable} -m pip install matplotlib_venn fastai_amalgam
!conda install --yes --prefix {sys.prefix} palettable

In [None]:
# See the file `some_utils.py` to check what's imported here
# to compute the gradcam maps.
sys.path.append("../")
from some_utils import *

In [None]:
import random
for img_fn in random.choices(dls.valid_ds.items, k=4):
    gcam = gradcam(learn, img_fn, show_original=True)
    f = plt.figure(figsize=(16,8))
    plt.imshow(gcam)
    plt.axis('off')
    plt.show()

## Improving the results

In [None]:
# Free up memory:
learn = None
dls = None
gc.collect()
torch.cuda.empty_cache()

### Data augmentation

We'll do some transformations of the data as in the PyTorch notebook. The rotation and flips are examples of **data augmentation**. By randomly changing the images by rotation and left-right horisontal flips while keeping their labels one can in a sense create "extra" training data. And also make the trained model more robust for those transformations. 

In [None]:
# Note: To speed up processing and use less GPU memory (at the cost of accuracy), 
# one can set the sizes in the Resize methods to something smaller

item_sz = 500
batch_sz = 400

if colab or gradient:
    item_sz = 400
    batch_sz = 300
    

item_tfms = Resize(item_sz, method='pad', pad_mode='zeros')

batch_tfms = [Resize(batch_sz, method='pad', pad_mode='zeros'), Flip(), Zoom(),
              Contrast(), Rotate(max_deg=20), Normalize.from_stats(*imagenet_stats)]


db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
               get_items=get_image_files,
               get_y=parent_label,
               splitter=RandomSplitter(seed=42),
               item_tfms=item_tfms,
               batch_tfms=batch_tfms
              )

In [None]:
bs=32
dls = db.dataloaders(images, bs=bs)

We can take a look at some data augmentation results for a single image:

In [None]:
dls.show_batch(max_n=6, figsize=(12,8), unique=True)

> Data augmentation is in general a topic worth thinking hard about each time you face a new data set. Certain transformations may be very important to include as data augmentation (to increase raw model performance or make the model more invariant to certain transformations), others should not be included as they can be completely unrealistic or change what the label of the image should be. In practice, designing good data augmentation strategies may require substantial domain knowledge.

> We'll have more to say about this later.

In [None]:
learn = cnn_learner(dls, resnet18, metrics=accuracy).to_fp16()

In [None]:
lr = learn.lr_find()

In [None]:
lr

In [None]:
learn.fine_tune(7, base_lr=lr.valley)

> We'll have a look at other, more advanced data augmentation techniques later in the course.

### Test-time augmentation

We can use the data augmentations also at test time by producing multiple predictions for each image, one for each of a set of random data augmentations applied to the image. This is called _test-time augmentation_ or TTA.

Here's the result without TTA:

In [None]:
y_preds_proba, y_true = learn.get_preds()

In [None]:
accuracy(y_preds_proba, y_true)

Here's the result with TTA:

In [None]:
y_preds_proba, y_true = learn.tta()

In [None]:
accuracy(y_preds_proba, y_true)

### Other CNN model architectures

We used an 18 layer ResNet above. Let's try a couple of different model architectures:

> **NB:** The below experiment will take a _long_ time to run, and require quite a lot of computational resources. You may want to just browse the code and its output, and try the approach on your own data later.

In [None]:
models = {
    'dn169':       densenet169,
    'rn50':        resnet50,
    'squeezenet':  squeezenet1_0,
    'rn34':        resnet34, 
    'dn121':       densenet121, 
}

In [None]:
epochs = 10
preds = {} # For storing the model predictions
acc = {}   # For storing the model accuracies

for m in models.keys():
    print(f"Training the model {m}\n")
    learn = cnn_learner(dls, models[m], metrics=accuracy).to_fp16()
    lr = learn.lr_find(show_plot=False)
    learn.fine_tune(epochs, base_lr=lr.valley)
    y_preds_probs, y_true = learn.tta()
    preds[m] = y_preds_probs
    acc[m] = accuracy(y_preds_probs, y_true)
    print(f"Accuracy for model {m} with TTA is {acc[m]}")
    print('#'*40)

### Here are the training outputs

<img src="assets/other_models_results.png">

### Performance

In [None]:
# Code used to save precomputed results to disk, for reference
#
def save_obj(obj, name):
    with open(f'assets/{name}.pkl', 'wb') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

def load_obj(name):
    with open(f'assets/{name}.pkl', 'rb') as f:
        return pickle.load(f)
    
#save_obj(acc, 'acc')
#save_obj(preds, 'preds')

In [None]:
import pickle

In [None]:
# Load the results from disk if you didn't run the above training process:
if not colab: 
    acc = load_obj('acc')
    preds = load_obj('preds')
if colab:
    import urllib
    with urllib.request.urlopen('https://github.com/alu042/DAT801/raw/master/extra-deep_learning/assets/acc.pkl') as url:
        acc = pickle.load(url)
    with urllib.request.urlopen('https://github.com/alu042/DAT801/raw/master/extra-deep_learning/assets/preds.pkl') as url:
        preds = pickle.load(url)

Here are the accuracies for the models we tried:

In [None]:
acc

We note that most of the models have a similar score

In [None]:
import seaborn as sns
plt.figure(figsize=(8,6))
vals = [float(a) for a in acc.values()]
sns.barplot(x=list(acc.keys()), y=vals)
plt.show()

Here are the model predictions:

In [None]:
preds

### Model ensembling

> TODO: Can use different splits of data to obtain different models: boosting. Then we need a test set..

In [None]:
if not colab: 
    acc = load_obj('acc')
    preds = load_obj('preds')
if colab:
    import urllib
    with urllib.request.urlopen('https://github.com/alu042/DAT801/raw/master/extra-deep_learning/assets/acc.pkl') as url:
        acc = pickle.load(url)
    with urllib.request.urlopen('https://github.com/alu042/DAT801/raw/master/extra-deep_learning/assets/preds.pkl') as url:
        preds = pickle.load(url)

Once you have multiple models of similar performance, a simple trick to obtain an even better model is to construct an _ensemble_. 

For classifiers, a simple strategy is to have each model in the ensemble vote on a class, and then use the majority class as the final output prediction. This is called _hard voting_. 

Another frequently used simple strategy is to use the average of the models confidence scores, and then use a threshold to obtain the final predictions. In other words, a model that's more confidence than another model for a given instance contributes more to the prediction. This is called _soft voting_. 

Here's an example of soft voting, using the models trained above, except the SqueezeNet:

In [None]:
preds.keys()

In [None]:
del preds['squeezenet']
del acc['squeezenet']

In [None]:
# We add all the predictions from the different models together
added_probs = list(preds.values())[0]
for p in list(preds.values())[1:]:
    added_probs += p
# then divide by the number of predictions
ensembled_probs = added_probs/len(list(preds.values())[0])

In [None]:
acc['ensemble'] = accuracy(ensembled_probs, y_true)

In [None]:
acc

### No pretraining

Note that all the above results were obtained with models _pretrained_ on the ImageNet challenge data. This makes the comparison with the 2014 state-of-the-art results unfair, as the ImageNet data set didn't exist back then. 

Therefore, let's try again without pre-training.

In [None]:
learn = cnn_learner(dls, resnet18, metrics=accuracy, pretrained=False).to_fp16()

In [None]:
learn.unfreeze()

In [None]:
lr = learn.lr_find()

In [None]:
lr

In [None]:
learn.fit_one_cycle(50, lr_max=lr.valley)

#### Precomputed output:

<img width=40% src="assets/cub-no-pretrain.png">