# Pre-trained deep neural networks in PyTorch

**Objectives**

This week, we will apply what we learned in the tutorials and get a quick idea of what a deep neural network is capable of when it comes to image classification tasks. To do so, we will play with a pre-trained neural network (ResNet101). 

## Contents:

1. Pre-trained deep neural networks in PyTorch
2. Making predictions using a neural network in Pytorch  
    1. Defining a preprocess pipeline using PyTorch's transforms  
    2. Loading and preprocessing data  
    3. Making predictions using our neural network  
    4. Interpreting the output  
3. Playing with the ResNet model
4. Good to know

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
from torchvision import transforms
from PIL import Image
from os import listdir

## 1. Pre-trained deep neural networks in PyTorch

As written in the documentation:

> The [torchvision.models](https://pytorch.org/vision/stable/models.html#torchvision-models) subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection and video classification. \[...\] It provides pre-trained models.

[ResNet](https://pytorch.org/vision/stable/models.html#id10) is a deep residual neural network that aims at classifying images. In Pytorch, several pre-trained ResNet models are available with different depths (resnet18, resnet34, resnet50, resnet101 and resnet152). Here we will use [resnet101](https://pytorch.org/vision/stable/models.html#torchvision.models.resnet101).

These pre-trained models were built and trained exactly as we did with our custom neural networks in the tutorials and can also be used in the exact same way. Unsurprisingly, they also subclass [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module).

In [2]:
# The next line is all we need to create an instance of a pre-trained ResNet101 model 
# 101 means that we choose the ResNet architecture with 101 layers
resnet = models.resnet101(pretrained=True)   
print("Pytorch class of pre-trained  models: ", type(resnet))
print("Which is subclass of a nn.Module:     ", issubclass(type(resnet), nn.Module))
print("\n", resnet)      

Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /home/natacha/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth


  0%|          | 0.00/171M [00:00<?, ?B/s]

Pytorch class of pre-trained  models:  <class 'torchvision.models.resnet.ResNet'>
Which is subclass of a nn.Module:      True

 ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=Tru

**QUESTIONS**

1. If we have 1000 different labels (e.g cat, dog, mouse, goose, etc) what should be the dimension of the output layer of the neural network?
1. In the output above we can see a module called "Sequential". We already met this module in the second and third tutorial, can you briefly explained what it is?
1. In the output above we can also see a module called "Bottleneck". This module was very quickly mentioned in the third tutorial, do you remember what it is? 

## 2. Making predictions using a neural network in Pytorch

In this section we will:

1. Load an image and our labels
1. Preprocess our image
1. Make predictions using our neural network
1. Interpret the output

### 2.1 Defining a preprocess pipeline using PyTorch's transforms

As we saw in the tutorials, the [torchvision.transforms](https://pytorch.org/vision/stable/transforms.html#torchvision-transforms) module can easily performs the most common image transformations such as [Resize](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Resize), [CenterCrop](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.CenterCrop), [ToTensor](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.ToTensor), [Normalize](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Normalize), etc. In addition, this module allows us to quickly define preprocessing pipelines using the [transforms.Compose](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Compose) method.

In the following cell we define the pre-processing transformations that will be applied on our input images. Remember that when it comes to storing numerical data, the "PyTorch-friendly objects" are not numpy arrays but PyTorch's [tensors](https://pytorch.org/docs/stable/tensors.html#torch.Tensor) and that the [ToTensor](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.ToTensor) transform implicitly:

1. Reshapes a ``(H, W, C)`` image into a ``(C, H, W)`` tensor (Height, Width, Channel (color))
2. Rescales ``[0 255]`` int arrays into ``[0 1]`` float tensors

**TODO** 

Use [transforms.Compose](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Compose) as well as appropriate transforms in order to define a preprocessor ``preprocessor`` that:
1. Resize images to ``256x256``  
1. Crop images, keeping only the ``224x224`` pixels at the center
1. Transform images to tensors
1. Normalize tensors, using ``mean = [0.485, 0.456, 0.406]`` and ``std = [0.229, 0.224, 0.225]``

In [3]:
preprocessor = transforms.Compose([
    transforms.Resize(256),     # Resize to a 256x256 image
    transforms.CenterCrop(224), # Crop the center (usually where the interesting object is)
    transforms.ToTensor(),      # PyTorch's counterpart of Numpy's arrays
    transforms.Normalize(       # Normalize input the same way ResNet training inputs were normalized 
    mean=[0.485, 0.456, 0.406], ### Mean given to match what was presented to ResNet during training
    std=[0.229, 0.224, 0.225]   ### Same here
)])

### 2.2 Loading and preprocessing data

In [4]:
# ------------------------------
# Images
# ------------------------------

# Load one of our images
img = Image.open("imgs/Bobby.jpeg")
# Preprocess our image using our preprocessor ('t' stands for 'tensor')
img_t = preprocessor(img)
# Reshape so that it is a batch (of size 1) as required in Pytorch         
batch_t = torch.unsqueeze(img_t, 0)
# Check that it has the required shape (N, C, H, W)
# (See 2nd tutorial if you're struggling with shape conventions in Pytorch)
print("Shape of our input batch: ", batch_t.size())

# ------------------------------
# Labels
# ------------------------------

# Read all the labels with which ResNet was trained and store them in the list 'labels'
with open('list_labels.txt') as f:
    labels = [line.strip() for line in f.readlines()]

Shape of our input batch:  torch.Size([1, 3, 224, 224])


### 2.3 Making predictions using our neural network

After recalling that: 

> "Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use [model.train()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) or [model.eval()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval) (from the [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module)) as appropriate.

We are now ready to make some predictions on our images. Let's show the output of the resnet model given our image of Bobby the Golden Retriever.

**QUESTION** 

1. Set ``resnet`` in evaluation mode.
1. Compute the output ``out`` corresponding to the input batch ``batch_t`` (defined in the cell above) 
1. Print the output tensor
1. Print the dimension the output tensor using the [Tensor.size()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.size) method
2. Does it match your previous answer about the output dimension? 

In [5]:
# Pytorch method to indicate that we are now using the model to make predictions and not to train it 
resnet.eval()  
# Feed our image and get the output
out = resnet(batch_t)
print("\n Output batch: \n", out)
# TODO: Print the dimension of 'out'
print("\nShape of our output batch: ", out.size())


 Output batch: 
 tensor([[-3.4803e+00, -1.6618e+00, -2.4515e+00, -3.2662e+00, -3.2466e+00,
         -1.3611e+00, -2.0465e+00, -2.5112e+00, -1.3043e+00, -2.8900e+00,
         -1.6862e+00, -1.3055e+00, -2.6129e+00, -2.9645e+00, -2.4300e+00,
         -2.8143e+00, -3.3019e+00, -7.9404e-01, -6.5182e-01, -1.2308e+00,
         -3.0193e+00, -3.9457e+00, -2.2675e+00, -1.0811e+00, -1.0232e+00,
         -1.0442e+00, -3.0918e+00, -2.4613e+00, -2.1964e+00, -3.2354e+00,
         -3.3013e+00, -1.8553e+00, -2.0921e+00, -2.1327e+00, -1.9102e+00,
         -3.2403e+00, -1.1396e+00, -1.0925e+00, -1.2186e+00, -9.3332e-01,
         -4.5093e-01, -1.5489e+00,  1.4161e+00,  1.0871e-01, -1.8442e+00,
         -1.4806e+00,  9.6227e-01, -9.9456e-01, -3.0060e+00, -2.7384e+00,
         -2.5798e+00, -2.0666e+00, -1.8022e+00, -1.9328e+00, -1.7726e+00,
         -1.3041e+00, -4.5848e-01, -2.0537e+00, -3.2804e+00, -5.0451e-01,
         -3.8174e-01, -1.1147e+00, -7.3998e-01, -1.4299e+00, -1.4883e+00,
         -2.1073e+00

### 2.4 Interpreting the output

You don't know what to do with that tensor right? How do you know if this output tensor means that the image is a dog or a cat or something else? 

Well that's actually simple. The first idea would be to find the most activated output unit, that is to say, the index of max value and find the label with the corresponding index. To do so we use the [torch.max](https://pytorch.org/docs/stable/generated/torch.max.html?highlight=max#torch.max) function

In [6]:
_, index = torch.max(out, dim=1)
print(
    "Index: ", index,  
    "\nLabel: ", labels[index], 
    "\nOutput value: ", out[0, index]
    ) 

Index:  tensor([207]) 
Label:  golden retriever 
Output value:  tensor([15.6744], grad_fn=<IndexBackward0>)


Now the question is "how to interpret this output value?" How can we say if the model hesitates between this label and another one? 

We would like to convert this tensor value into something that could be interpreted as the confidence that the model has in its prediction. To do so, we use the [softmax](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.softmax) function which normalizes our outputs to \[0, 1\]
For more information about the SoftMax function, you can watch the videos by Andrew Ng: 
- [Softmax Regression (C2W3L08)](https://www.youtube.com/watch?v=LLux1SW--oM)
- [Training Softmax Classifier (C2W3L09)](https://www.youtube.com/watch?v=ueO_Ph0Pyqk)

**QUESTION** 

1. Find the index corresponding to the max value of ``out`` **Hint:** Look at the previous cell 

In [7]:
# TODO: Find the index corresponding to the max value of out
_, index = torch.max(out, 1)
confidences = F.softmax(out, dim=1)[0]
percentages = confidences * 100
print(
    "Label: ",labels[index[0]], 
    "\nConfidence: ", round(percentages[index[0]].item(), 2), "%")

Label:  golden retriever 
Confidence:  96.29 %


#### Top-1 and Top-5 errors

When evaluating an image classifier we often use the terms *Top-1 error* and *Top-5 error* 

If the classifier’s top guess is the correct answer (e.g., the highest score is for the “dog” class, and the test image is actually of a dog), then the correct answer is said to be in the Top-1. If the correct answer is at least among the classifier’s top 5 guesses, it is said to be in the Top-5.

The top-1 score is the conventional accuracy, that is to say it checks if the top class (the one having the highest confidence) is the same as the target label. This is what we have done in the cell above. On the other hand, the top-5 score checks if the target label is one of your top 5 predictions (the 5 ones with the highest confidences). To do so we use the [torch.sort](https://pytorch.org/docs/stable/generated/torch.sort.html#torch-sort) function

**QUESTIONS**

1. Complete the code below **Hint:** Look at how we preprocessed the first image Bobby 
2. Does the model seem confident about the first prediction?

In [8]:
num_preds = 5

img = Image.open("imgs/golden_retriever_online.jpeg")
# TODO: preprocess the image 
img_t = preprocessor(img) 
# TODO: create a batch of size 1
batch_t = torch.unsqueeze(img_t, 0)
# TODO: Compute the output tensor of the tensor image contained in img_t
out = resnet(batch_t)
# TODO: Compute the percentage representing the confidence of the model about the output
percentages = F.softmax(out, dim=1)[0] * 100 
_, indices = torch.sort(out, descending=True)

results = [(labels[idx], round(percentages[idx].item(), 2)) for idx in indices[0][:num_preds]]
for i_pred in range(num_preds):
    print(
        "Guess number ", i_pred, ": ",
        "\n    Label: ", results[i_pred][0], 
        "\n    Confidence: ",  results[i_pred][1],"%"
        )

Guess number  0 :  
    Label:  golden retriever 
    Confidence:  97.04 %
Guess number  1 :  
    Label:  cocker spaniel, English cocker spaniel, cocker 
    Confidence:  0.48 %
Guess number  2 :  
    Label:  tennis ball 
    Confidence:  0.37 %
Guess number  3 :  
    Label:  Pembroke, Pembroke Welsh corgi 
    Confidence:  0.29 %
Guess number  4 :  
    Label:  Irish setter, red setter 
    Confidence:  0.2 %


## 3. Playing with the ResNet model

Put all the images that you want in the 'imgs/' folder (could be personal pictures or taken from the internet)

**QUESTIONS**

1. Complete the code below so that for each image it prints the 5 best guests according to the model
2. When the image is a dog, what are usually the 1st, 2nd, 3rd guesses? 
3. Use one of your personal pictures of an object whose label is in the list of labels.
4. Try to find an image on the web whose label is in the list of labels but whose corresponding prediction is wrong. How can you try to make it difficult for the model to recognize the object? 
5. Try to find an image on the web whose label is NOT in the list of labels with which the model was trained. Look at the output, is it consistent even though it is necessarily wrong? 

In [9]:
# ------------------------------
# Load inputs
# ------------------------------

# Load all the images in the 'imgs/' folder
list_img_t = []                  # Where input tensors will be stored
path_imgs = 'imgs/'   
list_files = listdir('imgs/')    # Find all filenames in the 'imgs/' folder
for f in list_files:
    img = Image.open(path_imgs + f)
    img = img.convert('RGB')  # Because some of the images are in the RGBA format while ResNet requires a RGB format
    img_t = preprocessor(img) # TODO: preprocess the image
    list_img_t.append(torch.unsqueeze(img_t, 0) )

# ------------------------------
# Make predictions
# ------------------------------
num_preds = 5
for i, batch_t in enumerate(list_img_t):
    print("\n ====== ", list_files[i], " ====== ")

    # TODO: Compute the output tensor of the tensor image contained in batch_t
    out = resnet(batch_t)
    # TODO: Compute the percentage representing the confidence of the model about the output
    percentages = F.softmax(out, dim=1)[0] * 100 
    # TODO: Sort the out tensor in descending order
    _, indices = torch.sort(out, descending=True)
    results = [(labels[idx], round(percentages[idx].item(), 2)) for idx in indices[0][:num_preds]]
    for i_pred in range(num_preds):
        print(
            "Guess number ", i_pred, ": ",
            "\n    Label: ", results[i_pred][0], 
            "\n    Confidence: ",  results[i_pred][1],"%"
            )


Guess number  0 :  
    Label:  Eskimo dog, husky 
    Confidence:  20.56 %
Guess number  1 :  
    Label:  chow, chow chow 
    Confidence:  20.46 %
Guess number  2 :  
    Label:  Samoyed, Samoyede 
    Confidence:  20.06 %
Guess number  3 :  
    Label:  malamute, malemute, Alaskan malamute 
    Confidence:  14.73 %
Guess number  4 :  
    Label:  golden retriever 
    Confidence:  8.75 %

Guess number  0 :  
    Label:  sea lion 
    Confidence:  51.53 %
Guess number  1 :  
    Label:  otter 
    Confidence:  27.64 %
Guess number  2 :  
    Label:  snow leopard, ounce, Panthera uncia 
    Confidence:  6.98 %
Guess number  3 :  
    Label:  tabby, tabby cat 
    Confidence:  4.03 %
Guess number  4 :  
    Label:  Egyptian cat 
    Confidence:  3.8 %

Guess number  0 :  
    Label:  golden retriever 
    Confidence:  96.29 %
Guess number  1 :  
    Label:  Labrador retriever 
    Confidence:  2.81 %
Guess number  2 :  
    Label:  cocker spaniel, English cocker spaniel, cocker 
    

## 4. Good to know
- In PyTorch, data are stored in [tensors](https://pytorch.org/docs/stable/tensors.html#torch.Tensor). This is the Pytorch counterpart of Numpy's array and most of the methods that are available in Numpy are also available in Pytorch. (e.g 
[size](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.size), 
[amax](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.amax), 
[argmax](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.argmax), 
[sort](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.sort), 
[abs](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.abs), 
[cos](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.cos), 
[sum](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.sum) etc.)
- In PyTorch all neural networks should be a class that is itself a subclass of the PyTorch's [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module) class
- There are many well-known deep neural network architectures available in the [torchvision.models](https://pytorch.org/vision/stable/models.html?highlight=models) sub-package. 
  - For each of these architectures a pre-trained model is available. 
  - Some of them such as the ResNet architecture even have multiple pre-trained model instances of different depths. For the [ResNet](https://pytorch.org/vision/stable/models.html#id10) class, we have [resnet18](https://pytorch.org/vision/stable/models.html#torchvision.models.resnet18), [resnet50](https://pytorch.org/vision/stable/models.html#torchvision.models.resnet50), [resnet101](https://pytorch.org/vision/stable/models.html#torchvision.models.resnet101), etc.
- During the preprocessing, we can use the [torchvision.transforms](https://pytorch.org/vision/stable/transforms.html#torchvision-transforms) module to perform the most common image transformations
- Some models use modules that have different training and evaluation behavior, such as batch normalization. To switch between these modes, we use [model.train()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) and [model.eval()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval) accordingly
- Top-1 and Top-5 scores are commonly used in image classification
- When there are more than 2 possible classes we often use the [SoftMax]((https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.softmax)) function in the output layer to convert the output tensor values into confidence values.
- However, we will see in this course that we don't need a softmax function in the output layer if we use [nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html?highlight=crossentro#torch.nn.CrossEntropyLoss) loss function.