# "Class Activation Map"
> "Class Activation Map explained"

- toc: true
- branch: master
- badges: true
- comments: true
- author: Pramesh Gautam
- categories: [computer-vision]

In [1]:
# imports

import torch
import matplotlib.pyplot as plt
from torchvision import models, transforms
import imageio
import json
import ast
from copy import deepcopy
from torch import nn
import torch.nn.functional as F

# CAM

Class activation map was introduced in [Learning Deep Features for Discriminative Localization](https://arxiv.org/abs/1512.04150). It was introduced to use the classifier networks for localization tasks. However it can also be used to interpret the models and figure out where the network focuses to classify a input. It uses the weights in the final layer to weight the feature maps in the final convolution layer. That weighted sum is used to see the activation map.

![Class activation map computation]("/images/CAM.png")

As seen in the figure above, once the input image passes through the CONV layers, let's say it produces feature map  of shape $1\times2048\times7\times7$ in the format $B \times C \times H \times W$ format. Global Average Pooling will then sum the spatial dimension and produce output of shape $1\times2048$ after collapsing across spatial dimension. There will be $2048\times1000$ weights mapping from output of GAP layer to final layer (1000 number of classes in ImageNet). 

We'll be using PyTorch hooks to extract the intermediate feature maps. Hooks are the functions that can be executed during forward or backward pass of the network. You can learn model about hooks [here](https://web.stanford.edu/~nanbhas/blog/forward-hooks-pytorch/).

In [2]:
# define hooks to save activation

activation = {}

def get_activation(name):
    def hook(model, input, output):        
        activation[name] = output.detach()
    return hook

def get_cam(input_image, model, transforms):
    input_data = transform(imageio.imread(input_image)).unsqueeze(0)
    labels = ast.literal_eval(open("imagenet1000_clsidx_to_labels.txt").read())
    
    # we multiply the output of layer4 (last convolutional layer) by the weights that map from 
    # avgpool layer to fc layer. Since weights can be extracted from the model itself, we only
    # use hooks to save the output of laster convolutional layer.
    model.layer4.register_forward_hook(get_activation("layer4"))
    
    model.eval();
    with torch.no_grad():
        preds = model(input_data)
        preds_softmax = torch.nn.functional.softmax(preds, dim=1)
        top_prob, top_pred = preds_softmax.max(dim=1)

        return top_pred, top_prob, labels[top_pred.item()], activation

In [3]:
image = "n02102040_7490.JPEG"
model = models.resnet50(pretrained=True)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])


pred_class, pred_prob, pred_label, activation = get_cam(image, model, transforms)

pred_class, pred_prob, pred_label

FileNotFoundError: No such file: '/mnt/64DA7865DA783580/Pramesh/projects/personal-blog/_notebooks/n02102040_7490.JPEG'

In [None]:
input_img = imageio.imread(image)
plt.imshow(input_img);

In [None]:
fc_weights = model.fc.weight[pred_class, :].unsqueeze(2).unsqueeze(3)
fc_weights.shape

In [None]:
activation["layer4"].shape

In [None]:
res=torch.einsum("bchw,bchw->bhw", fc_weights, activation["layer4"])
res.shape

In [None]:
combined_cam = res.unsqueeze(0)

In [None]:
combined_cam.shape

In [None]:
# reshape cam to original input shape
final_cam = F.interpolate(combined_cam, tuple(input_img.shape[:2]), mode="bilinear")

In [None]:
plt.imshow(input_img)
plt.imshow(final_cam.squeeze().detach().numpy(), alpha=0.8)