# Introduction to Deep Learning: Cours-1
_Adapted from [Dataflowr Module 1](https://dataflowr.github.io/website/modules/1-intro-general-overview/) by Marc Lelarge_

Let's start with an existing model for one of the most popular task in machine learning: image classification.
In this notebook, we focus on the [Dogs vs Cats competition](https://www.kaggle.com/c/dogs-vs-cats) at Kaggle.

## System setup

Import the required packages, check the current version of PyTorch, and check that GPU is available (on Colab you may need to change the runtime first).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import models,transforms,datasets
import time
import torch

device = "cuda:0" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
print(f"{torch.__version__=}")
print(f"Using {device=}")


## Download the dataset

There are 25,000 labeled dog and cat photos available for training, and 12,500 in the test set that we have to try to label for this competition. According to the Kaggle website, when this competition was launched (end of 2013): "State of the art: The current literature suggests machine classifiers can score above 80% accuracy on this task". So if you can beat 80%, then you will be at the cutting edge as of 2013!

Jeremy Howard (fast.ai) provides a direct link to the dogscats dataset. He's separated the cats and dogs into separate folders and created a validation folder.

In [None]:
%mkdir data
# the following line should be modified if you run the notebook on your computer
# change directory to data where you will store the dataset
%cd data/ #%cd /content/data/
!wget http://files.fast.ai/data/examples/dogscats.tgz

In [None]:
!tar -zxvf dogscats.tgz
%cd dogscats/

Here is the file tree of `dogscats`:
```bash
.
├── test1 # contains 12500 images of cats and dogs
├── train
|   └── cats # contains 11500 images of cats
|   └── dogs # contains 11500 images of dogs
├── valid
|   └── cats # contains 1000 images of cats
|   └── dogs # contains 1000 images of dogs
├── sample
|   └── train
|       └── cats # contains 8 images of cats
|       └── dogs # contains 8 images of dogs    
|   └── valid 
|       └── cats # contains 4 images of cats
|       └── dogs # contains 4 images of dogs    
├── models # empty folder
```

12,500 images are in the `test1` sub-folder; the dataset of 25,000 labeled images has been split into a training set and a validation set.

The sub-folder `sample` is here only to make sure the code is running properly on a very small dataset.

In [None]:
%cd ..

## Data processing

In [None]:
data_dir = './dogscats' # modify if needed

`datasets` is a class defined in the `torchvision` package (see [torchvision.datasets](http://pytorch.org/docs/master/torchvision/datasets.html)) for data loading. It integrates a multi-threaded loader that fetches images from the disk, groups them in mini-batches and serves them continuously to the GPU right after each _forward_/_backward_ pass through the network.

Images need a bit of preparation before passing them through the network. They need to all have the same size $224\times 224 \times 3$ plus some extra formatting done below by the normalize transform (explained later).

In [None]:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

imagenet_format = transforms.Compose([
                transforms.CenterCrop(224), # Crop the image to 224 px
                transforms.ToTensor(), # Convert to Torch tensors
                normalize, # Normalize
            ])

In [None]:
dsets = {x: datasets.ImageFolder(os.path.join(data_dir, x), imagenet_format)
         for x in ['train', 'valid']}

We used `datasets.ImageFolder` to load the datasets. Let's look at this class.

In [None]:
?datasets.ImageFolder

We see that `datasets.ImageFolder` has attributes: classes, class_to_idx, imgs.

Let's see what they are.

In [None]:
dset_classes = dsets['train'].classes
dset_classes

The names of the classes are directly inferred from the structure of the folder:
```bash
├── train
|   └── cats
|   └── dogs
```

In [None]:
dsets['train'].class_to_idx

Label 0 corresponds to cats and 1 to dogs.

We can see that the first 5 elements of the train dataset are pairs (location_of_the_image, label): 

In [None]:
dsets['train'].imgs[:5]

In [None]:
dset_sizes = {x: len(dsets[x]) for x in ['train', 'valid']}
dset_sizes

As expected, we have 23,000 images in the training set and 2,000 in the validation set.

The `torchvision` packages allows complex pre-processing/transforms of the input data (_e.g._ normalization, cropping, flipping, jittering). A sequence of transforms can be grouped in a pipeline with the help of the `torchvision.transforms.Compose` function, see [torchvision.transforms](http://pytorch.org/docs/master/torchvision/transforms.html).

The magic help `?` allows you to retrieve function you defined and forgot!

In [None]:
?imagenet_format

Where do this normalization and the magic constants for `mean` and `std` come from?

As explained in the [PyTorch doc](https://pytorch.org/docs/stable/torchvision/models.html), you will use a pretrained model. All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using `mean = [0.485, 0.456, 0.406]` and `std = [0.229, 0.224, 0.225]`.

We can now define 2 dataloaders for the train and valid datasets using the `DataLoader` class.

In [None]:
loader_train = torch.utils.data.DataLoader(dsets['train'], batch_size=64, shuffle=True, num_workers=6)
loader_valid = torch.utils.data.DataLoader(dsets['valid'], batch_size=5, shuffle=False, num_workers=6)

In [None]:
?torch.utils.data.DataLoader

In [None]:
count = len(loader_valid)
inputs_try, labels_try = next(iter(loader_valid))

In [None]:
labels_try

In [None]:
inputs_try.shape

The validation dataset contains 2,000 images, hence there are 400 batches of size 5. `labels_try` contains the labels of the first batch and `inputs_try` contains the images of the first batch.
But what is an image here?

In [None]:
inputs_try[0]

A 3-channel RGB image has shape (3 x H x W). Note that entries can be negative because of the normalization.

In [None]:
def imshow(inp, title=None):
#   Imshow for Tensor.
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = np.clip(std * inp + mean, 0,1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)

In [None]:
# Make a grid from batch from the validation data
out = torchvision.utils.make_grid(inputs_try)

imshow(out, title=[dset_classes[x] for x in labels_try])

In [None]:
# Get a batch of training data
inputs, classes = next(iter(loader_train))

n_images = 8

# Make a grid from batch
out = torchvision.utils.make_grid(inputs[0:n_images])

imshow(out, title=[dset_classes[x] for x in classes[0:n_images]])

## The VGG model

The torchvision module comes with a zoo of popular CNN architectures that are already trained on [ImageNet](http://www.image-net.org/) (1.2M training images). When called for the first time, if `pretrained=True` the model is fetched over the internet and downloaded to `~/.torch/models`.
For subsequent calls, the model will be directly read from there.

In [None]:
model_vgg = models.vgg16(weights='DEFAULT')

We will first use the VGG model without any modification. In order to interpret the results, we need to import the 1,000 ImageNet categories, available at: [https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json](https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json)

In [None]:
!wget https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json

In [None]:
import json

fpath = 'imagenet_class_index.json'

with open(fpath) as f:
    class_dict = json.load(f)
dic_imagenet = [class_dict[str(i)][1] for i in range(len(class_dict))]

In [None]:
dic_imagenet[:4]

Now let's try to run the model on our small input, and see the results.

Note: In PyTorch, we need to transfer the input tensors to the device to use the GPU.

In [None]:
inputs_try , labels_try = inputs_try.to(device), labels_try.to(device)

model_vgg = model_vgg.to(device)

In [None]:
outputs_try = model_vgg(inputs_try)

In [None]:
outputs_try

In [None]:
outputs_try.shape

To translate the outputs of the network into 'probabilities', we pass it through a [Softmax function](https://en.wikipedia.org/wiki/Softmax_function)

In [None]:
m_softm = nn.Softmax(dim=1)
probs = m_softm(outputs_try)
vals_try,preds_try = torch.max(probs,dim=1)

Let's check that we obtain probabilities!

In [None]:
torch.sum(probs,1)

In [None]:
vals_try

The predictions correspond to the labels of the ImageNet categories.

In [None]:
print(preds_try)
print([dic_imagenet[i] for i in preds_try.data])

Here are the predictions (label with maximum probability) and the corresponding images.

In [None]:
out = torchvision.utils.make_grid(inputs_try.data.cpu())

imshow(out, title=[dic_imagenet[i] for i in preds_try.data])

## Modify the last layer and freeze the rest

Let's look at the current model.

In [None]:
print(model_vgg)

We'll learn about what these different blocks do later in the course. For now, it's enough to know that:

- Convolution layers are for finding small to medium size patterns in images -- analyzing the images locally
- Dense (fully connected) layers are for combining patterns across an image -- analyzing the images globally
- Pooling layers downsample -- in order to reduce image size and to improve invariance of learned features

In [None]:
![vgg16](https://dataflowr.github.io/notebooks/Module1/img/vgg16.png)

In this practical example, our goal is to use the already trained model and just change the number of output classes. To this end, we replace the last `nn.Linear` layer trained for 1,000 classes with one with 2 classes. In order to freeze the weights of the other layers during training, we set the field `requires_grad=False`. In this manner, no gradient will be computed for them during backprop and hence no update to the weights. Only the weights for the 2-class layer will be updated.

In [None]:
for param in model_vgg.parameters():
    param.requires_grad = False
model_vgg.classifier._modules['6'] = nn.Linear(4096, 2)

PyTorch documentation for [LogSoftmax](https://pytorch.org/docs/stable/nn.html#logsoftmax)

In [None]:
print(model_vgg.classifier)

In [None]:
model_vgg = model_vgg.to(device)

## Training the new Layer

### Creating loss function and optimizer

We choose a loss function for our classification task.
The loss is the objective function we are trying to minimize during training.
PyTorch documentation for [CrossEntropyLoss](https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.loss.CrossEntropyLoss.html) and the [torch.optim module](https://docs.pytorch.org/docs/stable/optim.html).

In [None]:
criterion = nn.CrossEntropyLoss()
lr = 0.001
optimizer_vgg = torch.optim.SGD(model_vgg.classifier[6].parameters(),lr = lr)

We can now train our model to minimize the loss.
This is a classic training loop:
For each batch:
- Execute the forward pass to compute the output of the network
- Compute the loss using the output and the expected value
- Execute the backward pass to compute the gradients
- Update the parameters

Repeat the entire process for several epochs (passes over the full dataset).

In [None]:
def train_model(model,dataloader,size,epochs=1,optimizer=None):
    model.train()
    
    for epoch in range(epochs):
        running_loss = 0.0
        running_corrects = 0
        for inputs,classes in dataloader:
            inputs, classes = inputs.to(device), classes.to(device) # move to GPU
            outputs = model(inputs) # forward pass
            loss = criterion(outputs,classes) # loss
            optimizer.zero_grad()
            loss.backward() # backward pass
            optimizer.step() # update
            _,preds = torch.max(outputs.data,1)
            # statistics
            running_loss += loss.data.item()
            running_corrects += torch.sum(preds == classes.data)
        epoch_loss = running_loss / size
        epoch_acc = running_corrects.data.item() / size
        print('Loss: {:.4f} Acc: {:.4f}'.format(
                     epoch_loss, epoch_acc))

In [None]:
%%time
train_model(model_vgg,loader_train,size=dset_sizes['train'],epochs=2,optimizer=optimizer_vgg)

It is now time to test our new model. The following test function iterates over a dataset batch by batch, but we do not execute a backward pass or update the parameters.

In [None]:
def test_model(model,dataloader,size):
    model.eval()
    predictions = np.zeros(size)
    all_classes = np.zeros(size)
    all_proba = np.zeros((size,2))
    i = 0
    running_loss = 0.0
    running_corrects = 0
    for inputs,classes in dataloader:
        inputs = inputs.to(device)
        classes = classes.to(device)
        outputs = model(inputs)
        loss = criterion(outputs,classes)           
        _,preds = torch.max(outputs.data,1)
            # statistics
        running_loss += loss.data.item()
        running_corrects += torch.sum(preds == classes.data)
        predictions[i:i+len(classes)] = preds.to('cpu').numpy()
        all_classes[i:i+len(classes)] = classes.to('cpu').numpy()
        all_proba[i:i+len(classes),:] = m_softm(outputs.data).to('cpu').numpy()
        i += len(classes)
    epoch_loss = running_loss / size
    epoch_acc = running_corrects.data.item() / size
    print('Loss: {:.4f} Acc: {:.4f}'.format(
                     epoch_loss, epoch_acc))
    return predictions, all_proba, all_classes

In [None]:
%%time

predictions, all_proba, all_classes = test_model(model_vgg,loader_valid,size=dset_sizes['valid'])

Let's visualize a few results in the validation set.

In [None]:
inputs, classes = next(iter(loader_valid))

n_images = 7

out = torchvision.utils.make_grid(inputs[0:n_images])

imshow(out, title=[dset_classes[x] for x in classes[0:n_images]])

As with the original VGG model, we can convert the output of the network into probabilities using a softmax.

In [None]:
outputs = model_vgg(inputs[:n_images].to(device))
print(m_softm(outputs))

## Model predictions

The most important metrics for us to look at are for the validation set, since we want to check for over-fitting.

With our first model, we should try to overfit before we start worrying about how to handle that - there's no point even thinking about regularization, data augmentation, etc. if you're still underfitting!

As well as looking at the overall metrics, it's also a good idea to look at examples of each of:

   1. A few correct labels at random
   2. A few incorrect labels at random
   3. The most correct labels of each class (ie those with highest probability that are correct)
   4. The most incorrect labels of each class (ie those with highest probability that are incorrect)
   5. The most uncertain labels (ie those with probability closest to 0.5).

In general, these are particularly useful for debugging problems in the model. Since our model is very simple, there may not be too much to learn at this stage...

In [None]:
# Number of images to view for each visualization task
n_view = 8

### 1. Correct predictions

First, let's compute the overall accuracy on the validation set.

In [None]:
correct = np.where(predictions==all_classes)[0]
len(correct)/dset_sizes['valid']

Our model achieves 97.9% accuracy! Let's look at a few random correct predictions.

In [None]:
from numpy.random import permutation
from IPython.display import Image, display
for x in permutation(correct)[:n_view]:
    display(Image(filename=dsets['valid'].imgs[x][0], retina=True))

### 2. Incorrect predictions

Now let's examine some incorrect predictions to understand where the model fails.

In [None]:
incorrect = np.where(predictions!=all_classes)[0]
for x in permutation(incorrect)[:n_view]:
    print(dsets['valid'].imgs[x][1], predictions[x])
    display(Image(filename=dsets['valid'].imgs[x][0], retina=True))

### 3. Most confident correct predictions

We now look at the predictions where the model was most confident and correct. For cats, we sort by probability of class 0 (ascending order means lowest probability = highest confidence after we reverse).

In [None]:
correct_cats = np.where((predictions==0) & (predictions==all_classes))[0]
most_correct_cats = np.argsort(all_proba[correct_cats,0])[:n_view]

for x in most_correct_cats:
    print(dsets['valid'].imgs[correct_cats[x]][1], predictions[correct_cats[x]])
    display(Image(filename=dsets['valid'].imgs[correct_cats[x]][0], retina=True))

Similarly, we can look at the images the model was most confident were dogs.

In [None]:
correct_dogs = np.where((predictions==1) & (predictions==all_classes))[0]
most_correct_dogs = np.argsort(all_proba[correct_dogs,1])[:n_view]

for x in most_correct_dogs:
    print(dsets['valid'].imgs[correct_dogs[x]][1], predictions[correct_dogs[x]])
    display(Image(filename=dsets['valid'].imgs[correct_dogs[x]][0], retina=True))

### 4. Most confident incorrect predictions (most wrong)

Now, these are the images where the model was very confident but wrong. We look for images predicted as cats (class 0) but which are actually dogs (class 1).

In [None]:
incorrect_cats = np.where((predictions==0) & (predictions!=all_classes))[0]
most_incorrect_cats = np.argsort(all_proba[incorrect_cats, 0])[:n_view]

for x in most_incorrect_cats:
    print(dsets['valid'].imgs[incorrect_cats[x]][1], predictions[incorrect_cats[x]])
    display(Image(filename=dsets['valid'].imgs[incorrect_cats[x]][0], retina=True))

### 5. Most uncertain predictions

Finally, let's look at the predictions where the model was most uncertain. These are images where the probability is closest to 0.5 (the model cannot decide between cat and dog).

In [None]:
uncertainty = np.abs(all_proba[:, 1] - 0.5)
most_uncertain = np.argsort(uncertainty)[:n_view]

for x in most_uncertain:
    print(dsets['valid'].imgs[x][1], all_proba[x, :])
    display(Image(filename=dsets['valid'].imgs[x][0], retina=True))

# Conclusion

What did we do in the end? We probably killed a fly with a sledgehammer!

In our case, the sledgehammer is VGG pretrained on ImageNet, a dataset containing a lot of pictures of cats and dogs. Indeed, we saw that without modification the network was able to predict dog and cat breeds. Hence, it is not very surprising that the features computed by VGG are very accurate for our classification task. In the end, we need to learn only the parameters of the last linear layer, i.e., 8,194 parameters (do not forget the bias $2\times 4096+2$). Indeed, this can be done on CPU without any problem.

Nevertheless, this example is still instructive as it shows all the necessary steps in a deep learning project. Here we did not struggle with the learning process of a deep network, but we did all the preliminary engineering tasks:
 
- downloading a dataset, 
- setting up the environment to use a GPU, 
- preparing the data, 
- use a pretrained model
- retrain with a new layer on a different task...

These steps are essential in any deep learning project and a necessary requirement before having fun playing with network architectures and understanding the learning process.