# Visualizing what convnets learn
The representations learned by convnets are highly amenable to visualization because they're *representations of visual concepts*.

- Visualizing intermediate activations
- Visualizing convnets filters
- Visualizing heatmaps of class activation in an image

## Visualizing intermediate activations
It's displaying the feature maps that are output by various convolution and pooling layers in a network, given a certain input (the output of a layer is often called its *activation*). 
This allows us to see how an input is decomposed into the filters learned by the network

In [None]:
from keras.models import load_model

from keras.preprocessing import image
import numpy as np

import matplotlib.pyplot as plt

from keras import models

In [None]:
model = load_model('/tf/data/saved-models/cats_and_dogs_small_2.h5')
model.summary()

Get an input image from the test set (not part of the images the network was trained on).

In [None]:
img_path = '/tf/data/test/cats/cat.1700.jpg'

img = image.load_img(img_path, target_size=(150, 150))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.

print(img_tensor.shape)

plt.imshow(img_tensor[0])
plt.show()

In order to etract the feaure maps, create a Keras model that takes batches of images as input and outputs the activations of all convolution and pooling layers. 
Use the ```Model``` class instead of ```Sequential``` class because the first one allows for models with **multiple outputs**, unlike the second one.

In [None]:
layer_outputs = [layer.output for layer in model.layers[:8]]
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

When fed an image input, this model returns the values of the layer activations in the original model. The output for this model has one input (the image) and 8 outputs (one per layer activation).

For instance, this is the activation of the first convolution layer for the cat image input:

In [None]:
activations = activation_model.predict(img_tensor)

first_layer_activation = activations[0]
print(first_layer_activation.shape)

The output is a 148 x 148 with 32 channels (can verify this with ```model.summary()```). If we plot the *fourth channel* of the activation of the first layer of the original model:

In [None]:
plt.matshow(first_layer_activation[0, :, :, 4], cmap='viridis')
plt.show()

In [None]:
plt.matshow(first_layer_activation[0, :, :, 7], cmap='viridis')
plt.show()

The specific filters learned by convolution layers aren't deterministic, so they may vary from model to model, but some examples of filters are *diagonal edge detector* or *bright green dot*, etc..

We can also extract and plot every channel in each of the 8 activation maps and stack the results in one big image tensor, with channels stacked side by side:

In [None]:
# Name of the layers, to ahve them as part of the plot
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)

# Define size of big image tensor
images_per_row = 16

# Display the feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    #Number of features in the feature map
    n_features = layer_activation.shape[-1]
    
    #The feature map has shape (1, size, size, n_features) 
    size = layer_activation.shape[1]

    # Tiles of the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0, :, :, col * images_per_row + row]
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size, row * size : (row + 1) * size] = channel_image

    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    plt.show()


#### Observations
- The **first layer** acts as a collection of various **edge detectors**. The activations retain almost all of the information present in the initial picture.
- As you go **higher**, the activations become increasingly **abstract** and less visually interpretable. They begin to encode **higher-level concepts** such as “cat ear” and “cat eye.” Higher presentations carry increasingly **less information about the visual contents of the image**, and increasingly **more information related to the class of the image**.
- The sparsity of the activations increases with the depth of the layer: in the first layer, all filters are activated by the input image; but in the **higher layers, more and more filters are blank** => This means the pattern encoded by the filter isn’t found in the input image.

## Visualizing convnet filters
Another way of inspecting the filters learned by convnets is to display the visual pattern that each filter is meant to respond to. This can be done with **gradient ascent in input space** : applying **gradient descent** to the value of the input image of a convnet so as to *maximize* the response of a specific filter, starting from a blank input image. 

The resulting input image will be one that the chosen filter is maximally responsive to.

- Build a loss function that maximizes the value of a given filter in a given convolution layer
- Use stochastic gradient descent to adjust the values of the input image so as to maximize this activation value

**EXAMPLE:** loss for the activation of filter 0 in the layer block3_conv1 of the VGG16 network, pretrained on ImageNet:

In [None]:
from tensorflow.keras.applications import VGG16
from tensorflow.keras import backend as K

model = VGG16(weights='imagenet',
            include_top=False)

layer_name = 'block3_conv1'
filter_index = 0

layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])

To implement gradient descent, we need the gradient of this loss with respect to the model’s input:

In [None]:
import tensorflow as tf
with tf.GradientTape() as gtape:
    grads = gtape.gradient(loss, model.input)

A non-obvious trick to use to help the gradient-descent process go smoothly is to normalize the gradient tensor by dividing it by its L2 norm (the square root of the average of the square of the values in the tensor) => so the magnitude of the updates done to the input image is always within the same range.

In [None]:
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

## Visualizing heatmaps of class activation
This is useful for understanding which parts of a given image led a convnet to its final classification decision.
- Why did the network think the image contained the choseen class?
- Where is the class located in the picture?

This is called *class activation map (CAM)* and consists of producing heatmaps of class activation over input images. A *class activation heatmap* is a 2D grid of scores associated with a specific output class, computed for every location in any input image, indicating how important each location is.

Intuitively, is weightinf a spacial map of “how intensely the input image **activates different channels**” by “**how important each channel is** with regard to the class,” resulting in:
***a spatial map of “how intensely the input image activates the class.”***

In [None]:
from tensorflow.keras.applications.vgg16 import VGG16

model = VGG16(weights='imagenet') # include the densely connected classifier on top, previously discarded

To use this, we must convert any image into something VGG16 model can read:
- resize to 224x224
- convert to a Numpy ```float32``` tensor
- Apply preprocessing rules (```keras.applications.vgg16.preprocess_input```)

In [None]:
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np

img_path = '.......'

img = image.load_img(img_path, target_size=(224, 224))

x = image.img_to_array(img)

x = np.expand_dims(x, axis=0)

x = preprocess_input(x)

**Setting up the Grand-CAM algorithm**

In [None]:
preds = model.predict(x)


african_e66lephant_output = model.output[:, np.argmax(preds[0])]

last_conv_layer = model.get_layer('block5_conv3')

# This line doesn't work!
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]

pooled_grads = K.mean(grads, axis=(0, 1, 2))

iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])

pooled_grads_value, conv_layer_output_value = iterate([x])

for i in range(512):
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
    
heatmap = np.mean(conv_layer_output_value, axis=-1)


**Heatmap post-processing** -> normalize the heatma between 0 and 1 (visualizarion purposes)

In [None]:
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)

Use **OpenCV** to generate an image taht superimposes the original image on the heatmap:

In [None]:
import cv2

In [None]:
img = cv2.imread(img_path)

heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))

heatmap = np.uint8(255 * heatmap)

heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)

superimposed_img = heatmap * 0.4 + img

cv2.imwrite('......./elephant_cam.jpg', superimposed_img)