<a href="https://colab.research.google.com/github/jglombitza/Introspection_tutorial/blob/main/activation_maximization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Activation maximization
In this task, we use the approach of activation maximization to visualize to which patterns features of a CNN trained using on MNIST are sensitive. This will give us a deeper understanding of the working principle of CNNs.

In [1]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

KTF = keras.backend
layers = keras.layers

print("keras", keras.__version__)

keras 2.8.0


### Download and preprocess data

In [2]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype(np.float32)[...,np.newaxis] / 255.
x_test = x_test.astype(np.float32)[...,np.newaxis] / 255.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

### Set up a convolutional neural network with at least 4 CNN layers.


---


**Task 1:**
- Update the model to feature more than a single Convolutional layer.


---



In [3]:
model = keras.models.Sequential()
model = keras.models.Sequential([
    layers.Conv2D(16, (3,3), activation='relu', padding='same', input_shape=(28,28,1), name='conv2d_1'),
    layers.GlobalMaxPooling2D(),
    layers.Dropout(0.5),
    layers.Dense(10),
    layers.Activation('softmax', name='softmax_layer')])
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_1 (Conv2D)           (None, 28, 28, 16)        160       
                                                                 
 global_max_pooling2d (Globa  (None, 16)               0         
 lMaxPooling2D)                                                  
                                                                 
 dropout (Dropout)           (None, 16)                0         
                                                                 
 dense (Dense)               (None, 10)                170       
                                                                 
 softmax_layer (Activation)  (None, 10)                0         
                                                                 
Total params: 330
Trainable params: 330
Non-trainable params: 0
________________________________________________________

#### compile and train model


---


**Task 2:**
- Train the model until convergence. (Usually for MNIST not more than 20 epochs!)



---



In [4]:
model.compile(
    loss='categorical_crossentropy',
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    metrics=['accuracy'])


results = model.fit(x_train, y_train,
                    batch_size=100,
                    epochs=1,
                    verbose=1,
                    validation_split=0.1
                    )



### Implementation of activation maximization
Select a layer you want to visualize and perform activation maximization.

In [5]:
gradient_updates = 50
step_size = 1.

def normalize(x):
    '''Normalize gradients via l2 norm'''
    return x / (KTF.sqrt(KTF.mean(KTF.square(x))) + KTF.epsilon())




---


**Task 3:**
In the following, implement activation maximization to visualize to which patterns a specific feature map is sensitive:
- Start from a uniform distributed noise 'image' (note that the shape has to be `(1, 28, 28, 1)`, as we use a batch size of `1`).
- Choose one specific feature map using 'filter_index' from your tensor of activations `layer_output`
- Create a scalar loss as discussed in Chapter 12 (maximize the average feature map activation). Make use of `KTF.mean()`, which averages of the tensor.
- 
- Thereafter, calculated gradients with respect to your starting 'image' (gradient ascent step) and repeat the procedure using `gradient_updates = 50`. 
You can calculate the gradients using the following expression:
`grads = gtape.gradient(YOUR_OBJECTIVE, THE_VARIABLE_YOU_WANT_TO_OPTIMIZE)` 
- Then, normalize the gradients using: `grads = normalize(grads)` (already in the code)
- Finally, implement the gradient ascent step (you may use `assign_sub` or `assign_add` to adapt the parameters).


---


*Remember to construct a Keras variable `KTF.variable(...)` for the input (we want to find an input that 'maximizes' the output, so we build an input that holds adaptive parameters which we can train using TensorFlow / Keras)*
The following code snippet may help you to implement the maximization: 


In [None]:
visualized_feature = []  # list of visualized feature maps
layer_dict = layer_dict = dict([(layer.name, layer) for layer in model.layers[:]])
layer_name = "conv2d_3"  # visualize third layer

layer_output = layer_dict[layer_name].output
print("shape of layer output (tensor of activations):", layer_output.shape)
sub_model = keras.models.Model([model.inputs], [layer_output])

# activation maximization
for filter_index in range(layer_output.shape[-1]):  # iterate over feature maps
    print('Processing filter %d' % (filter_index+1))
    input_img = KTF.variable([0]) # instead of '[0]' use noise as the (start) input image with correct shape

    for i in range(gradient_updates):

        with tf.GradientTape() as gtape:
            layer_output = sub_model(input_img)  # propagation of the "image" through you model to a given layer. Output is tensor of all activations in all feature maps

            loss = 0  # <--: define your scalar loss HERE using keras and layer_output
            # <-- estimate gradients here
            grads = normalize(grads)
            # <-- perform gradient ascent here

    visualized_feature.append(input_img.numpy())  # cast to numpy array

**Hints:**
- You can generate uniform noise using: `np.random.uniform(0,1, shape)`
- As loss we introducted the average response of the feature map: $1/n_{pix} \sum_{ij} h_{ij}$, where $ij$ are the pixel indexes (height and width).
Thus, we need to implement it via: `loss = KTF.mean(layer_output[..., filter_index])`.

#### Plot images to visualize to which patterns the respective feature maps are sensitive.

In [None]:
def deprocess_image(x):
    # reprocess visualization to format of "MNIST images"
    x -= x.mean()
    x /= (x.std() + KTF.epsilon())
    # x *= 0.1
    x += 0.5
    x *= 255
    x = np.clip(x, 0, 255).astype('uint8')
    return x

In [None]:
plt.figure(figsize=(10,10))

for i, feature_ in enumerate(visualized_feature):
    feature_image = deprocess_image(feature_)
    ax = plt.subplot(8,8, 1+i, )
    plt.imshow(feature_image.squeeze())
    ax.axis('off')
    plt.title("feature %s" % i)
    
plt.tight_layout()

# Deep Dream on MNIST
**Bonus**
If you finished the task very fast, perform activation maximization using other objectives, e.g., the deep dream loss.