## Visualization of CNN: Grad-CAM
* **Objective**: Convolutional Neural Networks are widely used on computer vision. It is powerful for processing grid-like data. However we hardly know how and why it works, due to the lack of decomposability into individually intuitive components. In this assignment, we use Grad-CAM, which highlights the regions of the input image that were important for the neural network prediction.

* **To be submitted by next session**: this notebook, **cleaned** (i.e. without results, for file size reasons: `menu > kernel > restart and clean`), in a state ready to be executed (if one just presses 'Enter' till the end, one should obtain all the results for all images) with a few comments at the end. No additional report, just the notebook!

* NB: if `PIL` is not installed, try `conda install pillow`.


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, datasets, transforms
import matplotlib.pyplot as plt
import pickle
import urllib.request

import numpy as np
from PIL import Image

%matplotlib inline

### Download the Model
We provide you a pretrained model `ResNet-34` for `ImageNet` classification dataset.
* **ImageNet**: A large dataset of photographs with 1 000 classes.
* **ResNet-34**: A deep architecture for image classification.

In [None]:
resnet34 = models.resnet34(pretrained=True)
resnet34.eval() # set the model to evaluation mode

![ResNet34](https://miro.medium.com/max/1050/1*Y-u7dH4WC-dXyn9jOG4w0w.png)

In [3]:
classes = pickle.load(urllib.request.urlopen('https://gist.githubusercontent.com/yrevar/6135f1bd8dcf2e0cc683/raw/d133d61a09d7e5a3b36b8c111a8dd5c4b5d560ee/imagenet1000_clsid_to_human.pkl') )

### Input Images
We provide you 20 images from ImageNet (download link on the webpage of the course or download directly using the following command line,).<br>
In order to use the pretrained model resnet34, the input image should be normalized using `mean = [0.485, 0.456, 0.406]`, and `std = [0.229, 0.224, 0.225]`, and be resized as `(224, 224)`.

In [4]:
def preprocess_image(dir_path):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    dataset = datasets.ImageFolder(dir_path, transforms.Compose([
            transforms.Resize(256), 
            transforms.CenterCrop(224), # resize the image to 224x224
            transforms.ToTensor(), # convert numpy.array to tensor
            normalize])) #normalize the tensor

    return (dataset)

In [None]:
# The images should be in a *sub*-folder of "data/" (ex: data/TP2_images/images.jpg) and *not* directly in "data/"!
# otherwise the function won't find them
!rm -rf data

import os
os.mkdir("data")
os.mkdir("data/TP2_images")
!cd data/TP2_images && wget "https://www.lri.fr/~gcharpia/deeppractice/2022/TP2/TP2_images.zip" && unzip TP2_images.zip
# dir_path = project_folder+"/data/" 
dataset = preprocess_image("data/")

In [None]:
# show the orignal image 
index = 5
input_image = Image.open(dataset.imgs[index][0]).convert('RGB')
plt.imshow(input_image)

In [None]:
output = resnet34(dataset[index][0].view(1, 3, 224, 224))
values, indices = torch.topk(output, 3)
print("Top 3-classes:", indices[0].numpy(), [classes[x] for x in indices[0].numpy()])
print("Raw class scores:", values[0].detach().numpy())

### Grad-CAM 
* **Overview:** Given an image, and a category (â€˜tiger catâ€™) as input, we forward-propagate the image through the model to obtain the `raw class scores` before softmax. The gradients are set to zero for all classes except the desired class (tiger cat), which is set to 1. This signal is then backpropagated to the `rectified convolutional feature map` of interest, where we can compute the coarse Grad-CAM localization (blue heatmap).


* **To Do**: Define your own function Grad_CAM to achieve the visualization of the given images. For each image, choose the top-3 possible labels as the desired classes. Compare the heatmaps of the three classes, and conclude. 


* **Hints**: 
 + We need to record the output and grad_output of the feature maps to achieve Grad-CAM. In pytorch, the function `Hook` is defined for this purpose. Read the tutorial of [hook](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks) carefully. 
 + The pretrained model resnet34 doesn't have an activation function after its last layer, the output is indeed the `raw class scores`, you can use them directly. 
 + The size of feature maps is 7x7, so your heatmap will have the same size. You need to project the heatmap to the resized image (224x224, not the original one, before the normalization) to have a better observation. The function [`torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html?highlight=interpolate#torch.nn.functional.interpolate) may help.  
 + Here is the link of the paper [Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/pdf/1610.02391.pdf)

![Grad-CAM](https://da2so.github.io/assets/post_img/2020-08-10-GradCAM/2.png)

https://github.com/jacobgil/pytorch-grad-cam/tree/master/pytorch_grad_cam

In [8]:

resnet34 = models.resnet34(pretrained=True)
resnet34.eval() # set the model to evaluation mode
0


0

In [9]:

# Compute the gradient weights
def get_weights(grads):
    return np.mean(grads, axis=(2, 3))

# Multiply the gradient weights 
def get_cam(grads, acts):
    weights = get_weights(grads)
    return np.sum((weights[:, :, None, None] * acts), axis=1)

# Average the grad-cam image over all of the cam images from the filters of layer 4
def get_mean_cam(gradients, activations):
    cam_images = []
    for i in range(len(gradients)):
        cam = get_cam(gradients[i].data.numpy(), activations[i].data.numpy())
        cam_images.append(np.maximum(cam, 0)[:, None, :]) # ReLU

    cam_images = np.concatenate(cam_images, axis=1)
    cam_images = np.maximum(cam_images, 0) # ReLU
    cam = np.mean(cam_images, axis=1)
    cam /= np.max(cam)

    return cam


In [None]:


activations = []
gradients = []

# Retrieve the activations of the filters
def save_activation(module, input, output):
    output = torch.nn.functional.interpolate(output, size=(224, 224), mode="bilinear")
    activations.append(output)
    
def _store_grad(grad):
    global gradients
    grad = torch.nn.functional.interpolate(grad, size=(224, 224), mode="bilinear")
    gradients = gradients + [grad.cpu().detach()] 

# Retrieve the gradient in the back propagation
def save_gradient(module, input, output):
    output.register_hook(_store_grad)

# Compute the grad cam image for the top k classes detected in an input image
def grad_cam(index, k=3):
    activations.clear()
    gradients.clear()

    heatmaps = []
    label_classes = None
    for i in range(k+1):
        # Initialize a new resnet model
        resnet34 = models.resnet34(pretrained=True)
        resnet34.eval() # set the model to evaluation mode
        
        target_layers = [resnet34.layer4[0].conv1, resnet34.layer4[0].conv2, 
                        resnet34.layer4[1].conv1, resnet34.layer4[1].conv2, 
                        resnet34.layer4[2].conv1, resnet34.layer4[2].conv2]
        
        # Set up the hooks to retrieve the activations and gradients
        for target_layer in target_layers:
            target_layer.register_forward_hook(save_activation)
            target_layer.register_forward_hook(save_gradient)
        
        output = resnet34(dataset[index][0].view(1, 3, 224, 224))
        # Get the top k predicted labels
        values, indices = torch.topk(output, k)
        if label_classes is None:
            label_classes = [classes[x] for x in indices[0].numpy()]
        if i < k:
            label = indices[0].numpy()[i]
        else:
            # Add a random label for comparison
            label = np.random.randint(len(classes))
            label_classes.append(classes[label] + " (random)")
        # Back propagate to get the activations and gradients
        output[:, label].backward()
        # Compute the grad-cam heatmap from the activations and gradients
        heatmap = get_mean_cam(gradients[-6:], activations[-6:])
        heatmaps.append(heatmap)

    return heatmaps, label_classes

index = np.random.randint(len(dataset))
heatmaps, label_classes = grad_cam(index)

fig = plt.figure(figsize=(20, 40))

# Plot the heatmaps
for i, heatmap in enumerate(heatmaps):
    fig.add_subplot(1, 5, i + 1)
    plt.title(label_classes[i])
    plt.imshow(heatmap[0])

plt.show()


In [None]:

import cv2
import torchvision.transforms.functional as TF

# Merge an image and a heatmap to superpose the two in one image
def merge_img_heatmap(img, heatmap):

    heatmap = np.uint8(heatmap * 255)[0]
    heatmap = Image.fromarray(heatmap)
    heatmap = cv2.cvtColor(-np.array(heatmap), cv2.COLOR_BGR2RGB)
    heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)

    combination = cv2.addWeighted(img, 0.4, heatmap, 0.6, 0)

    return combination

img = dataset[index][0].view(3, 224, 224)
img = img.numpy().transpose(1, 2, 0)
img = (img - np.min(img)) / (np.max(img) - np.min(img))
img = np.uint8(img * 255)

fig = plt.figure(figsize=(20, 40))

# Plot the original image
fig.add_subplot(1, 5, 1)
plt.title("Original image")
plt.imshow(img)

# Plot the image and its heatmap for a few labels
for i, heatmap in enumerate(heatmaps):
    fig.add_subplot(1, 5, i + 2)
    plt.title(label_classes[i])
    plt.imshow(merge_img_heatmap(img, heatmap))


In [None]:

for index in range(len(dataset)):

    heatmaps, label_classes = grad_cam(index)

    img = dataset[index][0].view(3, 224, 224)
    img = img.numpy().transpose(1, 2, 0)
    img = (img - np.min(img)) / (np.max(img) - np.min(img))
    img = np.uint8(img * 255)

    fig = plt.figure(figsize=(20, 40))

    # Plot the original image
    fig.add_subplot(1, 5, 1)
    plt.title("Original image")
    plt.imshow(img)

    # Plot the image and its heatmap for a few labels
    for i, heatmap in enumerate(heatmaps):
        fig.add_subplot(1, 5, i + 2)
        plt.title(label_classes[i])
        plt.imshow(merge_img_heatmap(img, heatmap))
    
    print("Done visualizing image {}/{}.".format(index+1, len(dataset)))


We have defined our own function Grad_CAM and achieved the visualization of the given images as above, and we have chosen the top 3 possible labels as the desired classes. The fourth image is the result of the GradCam for a random label, and as thought, they does not make any sense.

After comparison of the heatmaps with the original image, we think our GradCam works pretty well, as you can see in the final visualization, the heatmap well distinguished the main part of the image. This can tell us where have the machine paid attention to so that in further training, we can make the training process more efficient by applying this attention system.