# Tutorial on visualization of Neural Networks

This exercise aims at exploring different ways of visualizing Neural Networks:
- t-SNE of representations (CIFAR10)
- grad-CAM (ImageNet)
- activation maximization (ImageNet)

First, some preliminaries that facilitate plotting and data access on Google drive ... just execute ! 

In [None]:
import numpy as np
from matplotlib import pyplot as plt 
from matplotlib.colors import ListedColormap

plt.rcParams['figure.figsize'] = (20.0, 20.0)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%matplotlib inline
%load_ext autoreload
%autoreload 2

import importlib.util
import sys
from google.colab import drive
drive.mount('/content/gdrive')

#%cd /content/gdrive/My\ Drive/Colab\ Notebooks/dlia_course/practical_sessions/


In [None]:
# basic imports
import os
import time

# import keras and tensorflow classes
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras import optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# check the hardware settings
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

# use GPU as hardware acceleration.
tf.device('/device:GPU:0')

# Visualization of the encodings by t-SNE

First, we will visualize encodings of a network trained on the CIFAR data set. Here, we import and preprocess the data. We keep the original labels in `y_data` (to be used for visualization later on). For training, we need to transform them to one-hot-vectors.

In [None]:
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

(x_data, y_data), (x_test, y_test) = cifar10.load_data()

# preprocessing (normalization)
x_data = x_data.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# encodings in one-hot-vectors
y_train_cat = to_categorical(y_data)
y_test_cat = to_categorical(y_test)

# train/val separation
x_train = x_data[:40000]
x_val = x_data[40000:]
y_train = y_train_cat[:40000]
y_val = y_train_cat[40000:]


We will first visualize the data. 

In [None]:
# helper function to plot a few images for each class
def plot_array(fig, X, Y, classes_to_plot=None, samples_per_class=7):
    if classes_to_plot is None:
        classes_to_plot = np.unique(Y)
    num_classes = len(classes_to_plot)

    for k, y in enumerate(classes_to_plot):
        idxs = np.flatnonzero(Y == y)
        idxs = np.random.choice(idxs, samples_per_class, replace=False)
        #print(y, idxs)

        for i, idx in enumerate(idxs):
            plt_idx = i * num_classes + k + 1
            ax = fig.add_subplot(samples_per_class, num_classes, plt_idx)
            ax.imshow(X[idx].astype(np.uint8))
            ax.axis('off')
fig = plt.figure(figsize=(12, 12))
plot_array(fig, x_data*255, y_data, samples_per_class=10)

Next we define a neural network and train it on the data set. We output the model summary. The model summary gives you also the names of each of the layers. 

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import SGD, RMSprop

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
# compile model
opt = RMSprop(learning_rate=0.001, rho=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()


Now, we train the neural network, we observe training and validation error, and we try to get a solution that roughly obtains 75\% or more on the test set. 

In [None]:
# model fit
history = model.fit(x_train, y_train, epochs=10, batch_size=64, 
                    validation_data=(x_val, y_val))

model.evaluate(x=x_test, y=y_test_cat)

In [None]:
# We plot the learning curves (loss and accuracy)

# plot loss
plt.title('Cross Entropy Loss')
plt.plot(history.history['loss'], color='blue', label='training loss')
plt.plot(history.history['val_loss'], color='orange', label='validation loss')
plt.show()

# plot accuracy
plt.title('Classification Accuracy')
plt.plot(history.history['accuracy'], color='blue', label='training acc')
plt.plot(history.history['val_accuracy'], color='orange', label='validation acc')
plt.show()


Now, we need to define a few plots ... just execute !

In [None]:
class_definition = {0: 'airplane', 1: 'automobile', 2: 'bird', 3: 'cat', 
                    4: 'deer', 5: 'dog', 6: 'frog', 7: 'horse', 8: 'ship', 
                    9: 'truck'}

from matplotlib.colors import to_hex

# definition of the scatterplot
def make_scatterplot(X, y, feature1=None, feature2=None, 
                     class_indices=None, class_definition=None):
    if class_indices is None:
        class_indices = np.unique(y)
    if class_definition is None:
        class_definition = dict(zip(class_indices, [str(i) for i in class_indices]))
    if feature1 is None:
      feature1 = 'Component 1'
    if feature2 is None:
      feature2 = 'Component 2'

    # colors
    colors = plt.cm.get_cmap('tab10', 10).colors[:,:3]

    fig = plt.figure(figsize = (8,8))
    ax = fig.add_subplot(1,1,1) 
    ax.set_xlabel(feature1)
    ax.set_ylabel(feature2)
    ax.set_title('Scatter plot: %s vs. %s' % (feature2, feature1))

    comp1 = X[:,0]
    comp2 = X[:,1]
    for class_index in class_indices:
        class_label = class_definition[class_index]
        ax.scatter(comp1[y==class_index],
                   comp2[y==class_index],
                   c=to_hex(colors[class_index]),
                   label=class_label,
                   s=15)
    ax.legend()
    ax.grid()

Now, we will extract the features of a layer and visualize the distribution of the encodings with t-SNE. 

First, we start with layer `flatten`. This is the last layer before the dense layers in the network. 

In [None]:
from tensorflow.keras.models import Model

# we limit ourselves to 1000 training samples. t-SNE does not scale 
# very well with the number of samples. 
x = x_data[:1000]
y = y_data[:1000]

layer_name = 'flatten'
intermediate_layer_model = Model(inputs=model.input,
                                 outputs=model.get_layer(layer_name).output)
features = intermediate_layer_model.predict(x)

print(features.shape)

Finally, we perform t-SNE. 

In [None]:
from sklearn.manifold import TSNE
X_embedded = TSNE(n_components=2, perplexity=5).fit_transform(features)
make_scatterplot(X_embedded, y.flatten(), class_definition=class_definition)

**Assignment**: Explain what is a feature vector (hint: other terms in the literature are "Encodings", "Embeddings")

**Assignment**: Try out several perplexities: 5, 10, 30. What do you observe? 

**Assignment**: Visualize now the scores at the two fully connected layers. Why are the representations so strikingly different? Imagine you would like to use the same representations in another project (same image size, but other classes). Which of the representations seems less useful? Why? 

**Assignment**: Visualize the tSNE plot of the model before training, what do you observe? (flatten layer and last fully_connected layer)

# Classification activation maps (grad-CAM)

Classification activation maps provide certainly the most popular visualization methods for network inspection.

In [None]:
!pip install --upgrade tf-keras-vis

In [None]:
from tf_keras_vis.gradcam import Gradcam

Next, we will load VGG16, pretrained on `ImageNet`. This is the network we are going to investigate. 

In [None]:
# Pretrained network: VGG16
from tensorflow.keras.applications import VGG16

# We want to extract the entire network, including the finaly layer.
model = VGG16(weights='imagenet',include_top=True)

# We show the summary of model (to recall the dimensions)
model.summary()

Now, we load an image from the `ImageNet` data base. We have downloaded these images: they are in the folder `imagenet`.

In [None]:
import os
filename = '418657219_3567961db1.jpg'

# animals:
# bird : filename = '418657219_3567961db1.jpg'
# dog : filename = '425248370_b15374000e.jpg'
# bird : filename = '485627874_8f4144223a.jpg'

# bridges: 
# filename = 'bridge1.jpg'
# filename = 'bridge2.jpg'
# filename = 'bridge3.jpg'

folder_name = '/content/gdrive/MyDrive/01_PROJECTS/practical_sessions/imagenet'
# folder_name = '/content/gdrive/My Drive/Colab Notebooks/dlia_course/practical_sessions/imagenet'
from tensorflow.keras.preprocessing.image import load_img
image = load_img(os.path.join(folder_name, filename), target_size=(224, 224))
plt.title('Bird 1')
plt.imshow(image)


Now, we will predict the label of the image.

In [None]:
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions

# convert the image pixels to a numpy array
img_prep = img_to_array(image)

# reshape data (the model expects a batch of images.)
img_prep = img_prep.reshape((1, img_prep.shape[0], img_prep.shape[1], img_prep.shape[2]))

# prepare the image for the VGG model
img_prep = preprocess_input(img_prep)

# predict the probability across all output classes
img_prediction = model.predict(img_prep)

# the output is a 1000-dimensional vector of posterior probabilities.
print('Shape of the output vector:', img_prediction.shape)

# result
print('Prediction result:', decode_predictions(img_prediction, top=3))
max_index = np.argmax(img_prediction)
print('Solution Index: ', max_index)

We need to define a few helper functions. First the `model_modifier` that replaces the `softmax` by a linear layer. The reason is that we cannot study the influence of neurons on the output $y_k$, if the output depends on all classes (which is the case when we use `softmax`). 

In [None]:
# model modifier
def model_modifier(m):
    m.layers[-1].activation = tf.keras.activations.linear
    return m

def loss(img_prediction):
    # the loss gives the score of the image for the correct class. 
    # if you want to test the importance for the prediction of another class
    # you have to adapt the index accordingly.
    correct_class_index = np.argmax(img_prediction)
    return img_prediction[0][correct_class_index]

In [None]:
from tf_keras_vis.utils import normalize
from matplotlib import cm
from tf_keras_vis.gradcam import Gradcam

# Create Gradcam object
gradcam = Gradcam(model,
                  model_modifier=model_modifier,
                  clone=False)

# Generate heatmap with GradCAM
cam = gradcam(loss,
              img_prep,
              penultimate_layer=-1, # model.layers number
             )
cam = normalize(cam)

plt.imshow(image)
heatmap = np.uint8(cm.jet(cam[0]) * 255)
plt.imshow(heatmap, cmap='jet', alpha=0.5)


**Assignment:** Test the grad-CAM first on the three animal images, and verify that you obtain a reasonable result. Then test the algorithm on the three bridge images. Visualize the top-3 predictions.(Make a nice plot with the results)

*   For `Bridge1.jpg` you obtain a wrong classification, but what can be said about the learned network in view of the visualization of the top-3 predictions? 
*   For `Bridge3.jpg` you get the right result, but what can you say about the "understanding" of the image in view of your visualization result? 

Note that the correspondence of class names and indices can be found at: 
[ImageNet](https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a)


# Activation Maximization

So far, we visualized activations of images, i.e. we focused on visualizations of inner network representations for given image data.

We can also visualize properties of the network itself. A popular method is the activation maximization, where we seek an image that would maximize a given neuron inside the network. 

For this, we solve the maximization problem: 
\begin{equation}
x^{\ast} = {\arg \max}_{x} z(x)
\end{equation}
where $z(x)$ is the value of an arbitrary neuron (or a set of neurons, e.g. the neurons in one feature map) in the network. 

Typically, $z(x)=S_c(x)$ is the value of the output layer for one particular class. We therefore seek the image that maximizes the output for a particular class (e.g. the output for `water_ouzel`, `index: 20`). 

In [None]:
# INDEX is the index of the class in the 1000-dimensional output vector
INDEX = 20

from tf_keras_vis.activation_maximization import ActivationMaximization

activation_maximization = ActivationMaximization(model,
                                                 model_modifier,
                                                 clone=False)

def loss(output):
    return output[:, INDEX]

from tf_keras_vis.utils.callbacks import Print

activation = activation_maximization(loss, callbacks=[Print(interval=50)])
generated_img = activation[0].astype(np.uint8)

plt.imshow(generated_img)
plt.tight_layout()


**Assignment**: You might want to play with this, e.g. 385: Indian elephant, 2: great white shark, 555: fire truck

Other classes can be found at: 
[ImageNet](https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a)

**Assignment:** Generate the activation maximization image the the same index, for a network initialized randomly. What do you observe? 