## Visualization of CNN: Grad-CAM
* **Objective**: Convolutional Neural Networks are widely used on computer vision. It is powerful for processing grid-like data. However we hardly know how and why it works, due to the lack of decomposability into individually intuitive components. In this assignment, we use Grad-CAM, which highlights the regions of the input image that were important for the neural network prediction.


* NB: if `PIL` is not installed, try `conda install pillow`.

In [None]:
# Standard library imports
import numpy as np
from PIL import Image

# Third party imports
import torch
from torchvision import models, datasets, transforms
import matplotlib.pyplot as plt
import pickle
import urllib.request
from PIL import Image

# Local application imports

%matplotlib inline

### Download the Model
We provide you a pretrained model `ResNet-34` for `ImageNet` classification dataset.
* **ImageNet**: A large dataset of photographs with 1 000 classes.
* **ResNet-34**: A deep architecture for image classification.

In [None]:
resnet34 = models.resnet34(pretrained=True)
resnet34.eval() # set the model to evaluation mode

![ResNet34](https://miro.medium.com/max/1050/1*Y-u7dH4WC-dXyn9jOG4w0w.png)


Input image must be of size (3x224x224). 

First convolution layer with maxpool. 
Then 4 ResNet blocks. 

Output of the last ResNet block is of size (512x7x7). 

Average pooling is applied to this layer to have a 1D array of 512 features fed to a linear layer that outputs 1000 values (one for each class). No softmax is present in this case. We have already the raw class score!

In [None]:
classes = pickle.load(urllib.request.urlopen('https://gist.githubusercontent.com/yrevar/6135f1bd8dcf2e0cc683/raw/d133d61a09d7e5a3b36b8c111a8dd5c4b5d560ee/imagenet1000_clsid_to_human.pkl'))

### Input Images
We provide you 20 images from ImageNet (download link on the webpage of the course or download directly using the following command line,).<br>
In order to use the pretrained model resnet34, the input image should be normalized using `mean = [0.485, 0.456, 0.406]`, and `std = [0.229, 0.224, 0.225]`, and be resized as `(224, 224)`.

In [None]:
def preprocess_image(dir_path):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    dataset = datasets.ImageFolder(dir_path, transforms.Compose([
            transforms.Resize(256), 
            transforms.CenterCrop(224), # resize the image to 224x224
            transforms.ToTensor(), # convert numpy.array to tensor
            normalize])) #normalize the tensor

    return (dataset)

In [None]:
import os
os.mkdir("data")
os.mkdir("data/TP2_images")
!cd data/TP2_images && wget "https://www.lri.fr/~gcharpia/deeppractice/2023/TP2/TP2_images.zip" && unzip TP2_images.zip

In [None]:
# The images should be in a *sub*-folder of "data/" (ex: data/TP2_images/images.jpg) and *not* directly in "data/"!
# otherwise the function won't find them
dir_path = "data/" 
dataset = preprocess_image(dir_path)

In [None]:
# show the orignal image 
index = 12
input_image = Image.open(dataset.imgs[index][0]).convert('RGB')
plt.imshow(input_image)

In [None]:
output = resnet34(dataset[index][0].view(1, 3, 224, 224))
values, indices = torch.topk(output, 3)
print("Top 3-classes:", indices[0].numpy(), [classes[x] for x in indices[0].numpy()])
print("Raw class scores:", values[0].detach().numpy())

## Grad-CAM 

* **Overview:** Given an image, and a category (‘tiger cat’) as input, we forward-propagate the image through the model to obtain the `raw class scores` before softmax. The gradients are set to zero for all classes except the desired class (tiger cat), which is set to 1. This signal is then backpropagated to the `rectified convolutional feature map` of interest, where we can compute the coarse Grad-CAM localization (blue heatmap).


* **To Do**: Define your own function Grad_CAM to achieve the visualization of the given images. For each image, choose the top-3 possible labels as the desired classes. Compare the heatmaps of the three classes, and conclude. 


* **To be submitted within 2 weeks**: this notebook, **cleaned** (i.e. without results, for file size reasons: `menu > kernel > restart and clean`), in a state ready to be executed (if one just presses 'Enter' till the end, one should obtain all the results for all images) with a few comments at the end. No additional report, just the notebook!


* **Hints**: 
 + We need to record the output and grad_output of the feature maps to achieve Grad-CAM. In pytorch, the function `Hook` is defined for this purpose. Read the tutorial of [hook](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks) carefully. 
 + The pretrained model resnet34 doesn't have an activation function after its last layer, the output is indeed the `raw class scores`, you can use them directly. 
 + The size of feature maps is 7x7, so your heatmap will have the same size. You need to project the heatmap to the resized image (224x224, not the original one, before the normalization) to have a better observation. The function [`torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html?highlight=interpolate#torch.nn.functional.interpolate) may help.  
 + Here is the link of the paper [Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/pdf/1610.02391.pdf)

Class: ‘pug, pug-dog’ | Class: ‘tabby, tabby cat’
- | - 
![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/dog.jpg)| ![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/cat.jpg)

In [None]:
from grad_cam import VanillaGradCAM
import cv2

for i, val in enumerate(dataset):
  
    original_img = np.array(Image.open(dataset.imgs[i][0]).convert('RGB'))
    resized_img = np.float32(cv2.resize(original_img, (224, 224))) / 255
    input_tensor = dataset[i][0].view(1, 3, 224, 224)

    # Set the model in evaluation mode
    resnet34.eval()

    # Create an instance of GradCAM
    gradcam = VanillaGradCAM(resnet34, target_layer=32)
    
    output = resnet34(input_tensor)
    values, indices = torch.topk(output, 3)

    # Subplots for 
    fig, ax = plt.subplots(1, 3, figsize=(20, 10))
     
    for i, val in enumerate(indices[0].numpy()):
        target_class = classes[val]
        # Generate a GradCAM heatmap
        cam = gradcam.generate_class_activation_maps(input_tensor=input_tensor, classes_dict=classes,target_class=target_class) #Egyptian cat = 285 | tiger cat = 282
        heatmap = gradcam.display_heatmap(img=resized_img,cam=cam,use_rgb=True,colormap = cv2.COLORMAP_JET, image_weight = 0.5)

        ax[i].imshow(heatmap)
        ax[i].set_title(target_class)
        ax[i].axis('off')

    plt.show()


## Comments

#### Interpretation
Looking at the results for each image in the dataset, it appears that almost time, the heatmap propagated on the top-1 class activation (plotting the gradient of the last ``nn.Conv2D`` layer) appears to be the most relevant. But in some cases, the top-3 classes are not so different, and the heatmap is not so clear, with even some better heatmaps for the other classes.

#### Analysis
- For many images analysed, all the heatmaps (i.e. performed for each of the top-3 classes) demonstrates that the same features/patterns are used for the model to predict the class involved.

- To elaborate, for many images the predicted classes belong to the same "category" (e.g. felines with tiger cat, lion, puma). That is, they share many of the same characteristics, thus the same region will be highlighted in the heatmaps.

- On the other hand, some heatmaps diverge a lot between the classes they try to explain. Taking the image n°5 (the laying dog), the heatmap for the class "Norwegian elkhound" is very different from the heatmap for the class "Cardigan". The first one highlights the dog's head, while the second one highlights the dog's tail. Thus displaying multiple heatmaps for the same image can be very useful to understand the model's reasoning and the small difference between the classes.

#### Conclusion
To conclude Gradient-weighted Class Activation Maps (Grad-CAM) allows us to visualize the regions in an input image that the deep neural network focuses on for making its predictions, and genuilel helps us gain insights into the internal workings of the network, given insights on how it interprets the visual features of an image to make predictions.

In our task, Grad-CAM was applied to classify different animal species and even different breeds within the same species. By using Grad-CAM, we were able to better understand the characteristics that the network pays attention to when making its predictions. For example, we may have discovered that the network relies on specific patterns in the fur or unique markings to differentiate between animal species and breeds. This information can be useful in improving the accuracy of further predictions, as well as in designing new and more efficient neural network architectures.