# Eigen-CAM
This example interactively demonstrates Eigen-CAM using nnabla's pre-trained model.

[Mohammed Bany, Muhammad, Mohammed Yeasin. Eigen-CAM: Class Activation Map using Principal Components. CoRR, 2018.](https://arxiv.org/abs/2008.00299)

Where does machine learning look at images to make decisions? This paper describes Eigen-CAM, an algorithm that uses heat maps to indicate the image areas used as the basis for judgment in image classification by CNN [R.R. Selvaraju + 2016]. The principle behind Eigen-CAM is to visualize the information on which the decision is based by displaying the location of the large input gradient to the final convolution layer. Eigen-CAM allows you to visualize areas of relevance for each prediction class. In the final output, the feature maps are decomposed into EigenVectors using SVD. The feature is that the operation is very light with no back propagation, so it can be operated at high speed. 

 "![Eigen-CAM algorithm](https://github.com/sony/nnabla-examples/raw/master/responsible_ai/eigencam/images/overview.png)"

# Preparation
Let's start by installing nnabla and accessing [nnabla-examples repository](https://github.com/sony/nnabla-examples). If you're running on Colab, make sure that your Runtime setting is set as GPU, which can be set up from the top menu (Runtime → change runtime type), and make sure to click **Connect** on the top right-hand side of the screen before you start.

In [None]:
# May show warnings for newly imported packages if run in Colab default python environment.
# Please click the `RESTART RUNTIME` to run the following script correctly.
# The error message of conflicts is acceptable.
!pip install nnabla-ext-cuda116
!git clone https://github.com/sony/nnabla-examples.git

In [None]:
%cd nnabla-examples/responsible_ai/eigencam

Import dependencies

In [None]:
import os
import cv2
import urllib.request
import numpy as np
import matplotlib.pyplot as plt
import nnabla as nn
import nnabla.functions as F
from nnabla.utils.image_utils import imread, imresize
from nnabla.models.imagenet import VGG16

from nnabla.ext_utils import get_extension_context
from utils import overlay_images, resize_image_for_yolo, decode_img_str, encode_img

ctx = get_extension_context('cudnn')
nn.set_default_context(ctx)

## Image Preparation 
Download image to apply Grad-CAM for.

In [None]:
url = 'https://upload.wikimedia.org/wikipedia/commons/4/4e/A_crab_spider_on_a_flower_preying_upon_a_euglossine_bee%2C_while_a_butterfly_looks_for_nectar.jpg'
img_path = 'input_flower_moth_spider.jpg'
if not os.path.isfile(img_path):
    tgt = urllib.request.urlopen(url).read()
    with open(img_path, mode='wb') as f:
        f.write(tgt)

Take a look at what the image looks like.  
We can see a flower in the middle on which a butterfly rests.

In [None]:
img = imread(img_path, size=(224, 224), channel_first=True)
plt.imshow(img.transpose(1,2,0))
plt.show()

## Network Definition
Loading the model is very simple.<br>
You can choose other models such as `VGG11`, `VGG13`, by specifying the model's name as an argument. Of course, you can choose other pretrained models as well. See the [Docs](https://nnabla.readthedocs.io/en/latest/python/api/models/imagenet.html).

**NOTE**: If you use the `VGG16` for the first time, nnabla will automatically download the weights from `https://nnabla.org` and it may take up to a few minutes.

In [None]:
model = VGG16()

In [None]:
batch_size = 1
x = nn.Variable((batch_size,) + model.input_shape)
vgg = model(x, returns_net=True)
vgg_variables = vgg.variables

We now define the input, and extract the necessary outputs.  
middle_layer: the last convolution layer  
pred: final output of the model

In [None]:
input_name = list(vgg.inputs.keys())[0]
vgg_variables[input_name].d = img
middle_layer = vgg_variables['VGG16/ReLU_13']
pred = vgg_variables["VGG16/Affine_3"]

Let's see how the model predicted the image.  
We can see the model classified the image as we expect.  
Labels regarding butterfly comes high, while flower is also recognized although it is14th ranked probability.

In [None]:
pred.forward()

In [None]:
predicted_labels = np.argsort(-pred.d[0])
for i, predicted_label in enumerate(predicted_labels[:15]):
        print(f'Top {i+1}, Label index: {predicted_label},  Label name: {model.category_names[predicted_label]}')

## Eigen-CAM Computation

Let's compute the heatmap using the gradient with respect to the last convolution layer.

In [None]:
def eigencam(middle_layer, eigenvector_index=0):
    """
    Calculate EigenCAM.
    Parameters
    ----------
    middle_layer: nn.Variable
        The layer of interest to apply EigenCAM
    Returns
    ----------
    heatmap: ndarray
        2D array of same size as width and height of middle_layer
    """
    conv_layer_output = middle_layer.d
    heatmap = get_2d_projection(conv_layer_output, eigenvector_index)
    max_v, min_v = np.max(heatmap), np.min(heatmap)
    if max_v != min_v:
        heatmap = (heatmap - min_v) / (max_v - min_v)
    return heatmap[0]

def get_2d_projection(activation_batch, eigenvector_index=0):
    # https://github.com/jacobgil/pytorch-grad-cam/blob/master/pytorch_grad_cam/utils/svd_on_activations.py
    # TBD: use pytorch batch svd implementation
    activation_batch[np.isnan(activation_batch)] = 0
    activation_batch[np.isinf(activation_batch)] = 0
    
    projections = []
    for activations in activation_batch:
        reshaped_activations = (activations).reshape(
            activations.shape[0], -1).transpose()
        # Centering before the SVD seems to be important here,
        # Otherwise the image returned is negative
        reshaped_activations = reshaped_activations - \
            reshaped_activations.mean(axis=0)
        U, S, VT = np.linalg.svd(reshaped_activations, full_matrices=True)
        projection = reshaped_activations @ VT[eigenvector_index, :]
        projection = projection.reshape(activations.shape[1:])
        projections.append(projection)
    return np.float32(projections)

In [None]:
heatmap = eigencam(middle_layer)

## Visualization
Take a look at how the heatmap looks like in the layer of interest.

In [None]:
plt.matshow(heatmap)
plt.show()

Then we overlay the heatmap onto the original image to understand where the model focused.

In [None]:
base_img = imread(img_path, size=(224, 224))
overlaid_img_butterfly = overlay_images(base_img, heatmap)

Now we overlay the heatmap onto the original image to understand where the model focused.  
We can speculate the model recognized the butterfly, focusing on its wing.

In [None]:
plt.imshow(overlaid_img_butterfly)
plt.show()

Let's visualize another eigenvector visualization.

In [None]:
heatmap = eigencam(middle_layer, eigenvector_index=1)
plt.matshow(heatmap)
plt.show()

We can see the model focus is widely spread comparing to than for butterfly as if the heatmap wrapped the flower. 

In [None]:
overlaid_img = overlay_images(base_img, heatmap)
plt.imshow(overlaid_img)
plt.show()

Finally, let's visualize top-3 eigen vectors. We can see each heatmap focuses on a butterfly, flower and spider.

In [None]:
fig = plt.figure(figsize=(10, 40))

ax = fig.add_subplot(1, 4, 1)
ax.imshow(base_img)
ax.axis("off")
plt.title("original image")

for i in range(3):
    ax = fig.add_subplot(1, 4, i+2)
    heatmap = eigencam(middle_layer, eigenvector_index=i)
    overlaid_img = overlay_images(base_img, heatmap)
    plt.title(f"using\n  {i+1}st singular vector")
    ax.imshow(overlaid_img)
    ax.axis("off")
plt.show()

# Detection using yoloV2
### Preparation
Let's also download pre-trained weight parameters and auxiliary file. We will also convert the weight parameters to .h5 format to make it compatible with NNabla. This may take a few minutes.

In [None]:
%cd ../../object-detection/yolov2/

In [None]:
!wget https://pjreddie.com/media/files/yolov2.weights
!wget https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names

In [None]:
!python convert_yolov2_weights_to_nnabla.py --input yolov2.weights

In [None]:
import yolov2
from draw_utils import DrawBoundingBoxes
from arg_utils import get_anchors_by_name_or_parse
from yolov2_detection import draw_bounding_boxes

Set each parameters for Yolo

In [None]:
class_names = "coco.names"
classes = 80
weights = 'yolov2.h5'
width = 608
height = 608 
anchors = np.array(get_anchors_by_name_or_parse("coco"))
num_anchors = len(anchors) // 2
anchors = np.array(anchors).reshape(-1, 2)
thresh = 0.5
nms = 45
nms_per_class = True

names = np.genfromtxt(class_names, dtype=str, delimiter='?')
rng = np.random.RandomState(1223)
colors = rng.randint(0, 256, (classes, 3)).astype(np.uint8)
colors = [tuple(c.tolist()) for c in colors]

In [None]:
  # Load parameter
  nn.clear_parameters()
  _ = nn.load_parameters(weights)

#### Build a YOLO v2 network

In [None]:
feature_dict = {}
x = nn.Variable((1, 3, width, width))
y = yolov2.yolov2(x, num_anchors, classes,
                  test=True, feature_dict=feature_dict)
y = yolov2.yolov2_activate(y, num_anchors, anchors)
y = F.nms_detection2d(y, thresh, nms, nms_per_class)

# Upload Image
Run the following cell to upload your own image.

In [None]:
from google.colab import files
upload_img = files.upload()

Let's rename the image for convenience

In [None]:
ext = os.path.splitext(list(upload_img.keys())[-1])[-1]
os.rename(list(upload_img.keys())[-1], "input_image{}".format(ext)) 
input_img = "input_image" + ext
img_orig = imread(input_img, num_channels=3)
im_h, im_w, _ = img_orig.shape
img, new_w, new_h = resize_image_for_yolo(img_orig)

# Execute YOLO v2

In [None]:
in_img = img.transpose(2, 0, 1).reshape(1, 3, width, width)
x.d = in_img
y.forward(clear_buffer=True)
bboxes = y.d[0]

In [None]:
img_draw = draw_bounding_boxes(img_orig, bboxes, im_w, im_h, names, colors, new_w * 1.0 / width, new_h * 1.0 / height, 0.5)
plt.imshow(img_draw)
plt.show()

# Get middle layer activations

In [None]:
from collections import OrderedDict
class get_middle_variables:
    def __init__(self):
        self.middle_vars_dict = OrderedDict()
        self.middle_layer_count_dict = OrderedDict()
    def __call__(self, f):
        if f.name in self.middle_layer_count_dict:
            self.middle_layer_count_dict[f.name] += 1
        else:
            self.middle_layer_count_dict[f.name] = 1
        key = f.name + '_{}'.format(self.middle_layer_count_dict[f.name])
        self.middle_vars_dict[key] = f.outputs[0]

In [None]:
GET_MIDDLE_VARIABLES_CLASS = get_middle_variables()
y.visit(GET_MIDDLE_VARIABLES_CLASS)
middle_vars = GET_MIDDLE_VARIABLES_CLASS.middle_vars_dict

In [None]:
fig = plt.figure(figsize=(20, 5))

ax = fig.add_subplot(1, 4, 1)
ax.imshow(img_orig)
ax.axis("off")
plt.title("original image")

for i in range(3):
    ax = fig.add_subplot(1, 4, i+2)
    heatmap = eigencam(middle_vars['ConvolutionCudaCudnn_23'], eigenvector_index=i) #set variables key
    overlaid_img = overlay_images(img_orig, heatmap)
    plt.title(f"using\n  {i+1}st singular vector")
    ax.imshow(overlaid_img)
    ax.axis("off")
plt.show()

# Real time Visualization for YoloV2

In [None]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
!pip install bottle
!pip install bottle_websocket
!pip install gevent

### Set processing callback for notebook and render webcamera using javescript

In [None]:
import IPython
import base64
from google.colab import output
from io import BytesIO

def run(img_str):
    #decode to image
    decimg = decode_img_str(img_str)
    im_h, im_w, _ = decimg.shape

    ##### detection and visualization 
    img, new_w, new_h = resize_image_for_yolo(decimg)
    in_img = img.transpose(2, 0, 1).reshape(1, 3, width, width)
    x.d = in_img
    y.forward(clear_buffer=True)
    bboxes = y.d[0]

    GET_MIDDLE_VARIABLES_CLASS = get_middle_variables()
    y.visit(GET_MIDDLE_VARIABLES_CLASS)
    middle_vars = GET_MIDDLE_VARIABLES_CLASS.middle_vars_dict

    heatmap = eigencam(middle_vars["ConvolutionCudaCudnn_23"], eigenvector_index=0)
  
    img_draw = draw_bounding_boxes(
        decimg, bboxes, im_w, im_h, names, colors, new_w * 1.0 / width, new_h * 1.0 / height, 0.4)
    
    out_img = overlay_images(img_draw, heatmap)

    #encode to string
    img_str = encode_img(out_img)

    return IPython.display.JSON({'img_str': img_str})

output.register_callback('notebook.run', run)

In [None]:
from IPython.display import display, Javascript
from google.colab.output import eval_js

def use_webcam(quality=0.8):
  js = Javascript('''
    async function useCam(quality) {
      const div = document.createElement('div');
      document.body.appendChild(div);

      // camera btn
      var current_deviceId = "test";
      var new_deviceId = "test";
      const camera_div = document.createElement('div');
      document.body.appendChild(camera_div);
      //get deviceIds
      navigator.mediaDevices.enumerateDevices()
      .then(function(devices) {
          devices.forEach(function(device, i) {
              //exit button
              if (device.deviceId != "") {
                const canera_btn = document.createElement('button');
                canera_btn.textContent = "camera" + i;
                canera_btn.onclick = function() {
                  new_deviceId = device.deviceId
                };
                camera_div.appendChild(canera_btn);
              }
          });
      })
      .catch(function(err) {
        console.log(err.name + ": " + err.message);
      });

      //video element
      const video = document.createElement('video');
      video.style.display = 'None';
      const stream = await navigator.mediaDevices.getUserMedia({video: { deviceId: current_deviceId } });
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      //canvas for display. frame rate is depending on display size and jpeg quality.
      display_size = 500
      const src_canvas = document.createElement('canvas');
      src_canvas.width  = display_size;
      src_canvas.height = display_size * video.videoHeight / video.videoWidth;
      const src_canvasCtx = src_canvas.getContext('2d');
      src_canvasCtx.translate(src_canvas.width, 0);
      src_canvasCtx.scale(-1, 1);
      div.appendChild(src_canvas);

      const dst_canvas = document.createElement('canvas');
      dst_canvas.width  = src_canvas.width;
      dst_canvas.height = src_canvas.height;
      const dst_canvasCtx = dst_canvas.getContext('2d');
      div.appendChild(dst_canvas);

      //exit button
      const btn_div = document.createElement('div');
      document.body.appendChild(btn_div);
      const exit_btn = document.createElement('button');
      exit_btn.textContent = 'Exit';
      var exit_flg = true
      exit_btn.onclick = function() {exit_flg = false};
      btn_div.appendChild(exit_btn);


      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      var send_num = 0
      // loop
      _canvasUpdate();
      async function _canvasUpdate() {

            src_canvasCtx.drawImage(video, 0, 0, video.videoWidth, video.videoHeight, 0, 0, src_canvas.width, src_canvas.height);     
            if (send_num<1){
                send_num += 1
                const img = src_canvas.toDataURL('image/jpeg', quality);
                const result = google.colab.kernel.invokeFunction('notebook.run', [img], {});
                result.then(function(value) {
                    parse = JSON.parse(JSON.stringify(value))["data"]
                    parse = JSON.parse(JSON.stringify(parse))["application/json"]
                    parse = JSON.parse(JSON.stringify(parse))["img_str"]
                    var image = new Image()
                    image.src = parse;
                    image.onload = function(){dst_canvasCtx.drawImage(image, 0, 0)}
                    send_num -= 1
                })
            }
            if (exit_flg){
                requestAnimationFrame(_canvasUpdate);   
            }else{
                stream.getVideoTracks()[0].stop();
            }
            if (new_deviceId != current_deviceId) {
              console.log("change camera!");
              current_deviceId = new_deviceId;
              const stream = await navigator.mediaDevices.getUserMedia({video: { deviceId: current_deviceId } });
              video.srcObject = stream;
              await video.play();
            }
      };
    }
    ''')
  display(js)
  data = eval_js('useCam({})'.format(quality))

In [None]:
use_webcam()