This code allows to visualize the intermediate activations in a trained model. 

The representations learned by convnets are highly amenable to visualization, in large part because they are representations of visual concepts. Since 2013, a wide array of techniques have been developed for visualizing and interpreting these representations. 

Visualizing intermediate convnet outputs ("intermediate activations") helps to understand how successive convnet layers transform their input, and to get a first idea of the meaning of individual convnet filters.

Visualizing intermediate activations consists in displaying the feature maps that are output by various convolution and pooling layers in a network, given a certain input. The output of a layer is often called its "activation", the output of the activation function. This gives a view into how an input is decomposed unto the different filters learned by the network. These feature maps we want to visualize have 3 dimensions: width, height, and depth (channels). Each channel encodes relatively independent features, so the proper way to visualize these feature maps is by independently plotting the contents of every channel, as a 2D image'''

In [None]:
#load libraries
from keras.preprocessing import image
import keras
import numpy as np
from keras.models import load_model
import matplotlib.pyplot as plt
from keras import models
%matplotlib inline

In [None]:
#check keras version
keras.__version__

In [None]:
#load the trained model
model = load_model('vgg16_custom.09-0.9650.h5')
model.summary()  

In [None]:
#load the image
img_path = 'cxr (1).jpg'

In [None]:
#Preprocess the image into a 4D tensor
img = image.load_img(img_path, target_size=(300,300))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)

In [None]:
# The model was trained on inputs that were preprocessed in the following way:
img_tensor /= 255. # 
print(img_tensor.shape)

#display the image
plt.imshow(img_tensor[0])
plt.show()

To extract the feature maps to look at, create a Keras model that takes batches of images as input, and outputs the activations of all convolution and pooling layers. Using the Keras class Model, the Model is instantiated using two arguments: an input tensor (or list of input tensors), and an output tensor (or list of output tensors). The resulting class is a Keras model, mapping the specified inputs to the specified outputs. It even allows for models with multiple outputs.

In [None]:
#display the numbers of the different layers in the trained model
model.summary()
for i, layer in enumerate(model.layers):
   print(i, layer.name)

In [None]:
# Extracts the outputs of the top layers everything upto the GAP layer:
layer_outputs = [layer.output for layer in model.layers[1:18]] #this model has 18 layers just before the GAP layer

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

When fed an image input, this model returns the values of the layer activations in the original model. This is a multi-output model. In the general case, a model could have any number of inputs and outputs. This one has one input and 18 outputs, one output per layer activation.

In [None]:
# This will return a list of Numpy arrays: one array per layer activation
activations = activation_model.predict(img_tensor)

# This is the activation of the first convolution layer for our image input:
second_layer_activation = activations[1]
print(second_layer_activation.shape) #(1, 300, 300, 64)

#It's a 300*300 feature map with 64 channels. Visualizing the 60th and 10th channel:
plt.matshow(second_layer_activation[0, :, :, 60], cmap='viridis')
plt.show()

#This channel appears to encode a diagonal edge detector. 
# 10th channel, since the specific filters learned by convolution layers are not deterministic.
plt.matshow(second_layer_activation[0, :, :, 10], cmap='viridis')
plt.show()

Plotting a complete visualization of all the activations in the network. Extract and plot every channel in each of the 18 activation maps, (only that of the convolutional layers) and stack the results in one big image tensor, with channels stacked side by side.

In [None]:
# Names of the layers to have them as part of the plot
layer_names = []
for layer in model.layers[:18]:
    layer_names.append(layer.name)

images_per_row = 16 

# Display feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # Tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # Tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64 
            channel_image += 128 
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show()

Things to note:
The first layer acts as a collection of various edge detectors. At that stage, the activations are still retaining almost all of the information present in the initial picture.

As we go higher-up, the activations become increasingly abstract and less visually interpretable. They start encoding higher-level concepts.

Higher-up presentations carry increasingly less information about the visual contents of the image, and increasingly more information related to the class of the image.

The sparsity of the activations is increasing with the depth of the layer: in the first layer, all filters are activated by the input image, but in the following layers more and more filters are blank. This means that the pattern encoded by the filter isn't found in the input image.

Universal characteristic of the representations learned by deep neural networks: the features extracted by a layer get increasingly abstract with the depth of the layer. The activations of layers higher-up carry less and less information about the specific input being seen, and more and more information about the target, the image calss.

A deep neural network effectively acts as an information distillation pipeline, with raw data going in, and getting repeatedly transformed so that irrelevant information gets filtered out (e.g. the specific visual appearance of the image)
while useful information get magnified and refined (e.g. the class of the image).

This is analogous to the way humans and animals perceive the world: after observing a scene for a few seconds, a human can remember which abstract objects were present in it (e.g. bicycle, tree) but could not remember the specific appearance of these objects. 

Human brain has learned to completely abstract its visual input, to transform it into high-level visual concepts while completely filtering out irrelevant visual details, making it tremendously difficult to remember how things around us actually look.