# R255: Topics of Machine Learning - Explainable AI Practical

In this practical, we will explore some of the tools and methods that we discussed and presented in our lectures. In particular we will focus on:
- Showing how to use popular model-agnostic feature attribution libraries (e.g., LIME) to explain and analyse a pre-trained classifier.
- Implementing vanilla Saliency maps for DNNs and using it to debug a DNN that fails to perform well on its test set.
- Extending vanilla Saliency maps to generate smoother heatmaps via SmoothGrad, a variation of Saliency that generates more roboust and smoother saliency maps than vanilla gradients.
- If time allows, we will also work on further extension of saliency maps.

Before getting started, a few general comments (please **read carefully**):
- We will **focus on methodology rather than fair evaluation and fine-tuning**. This means that we will avoid trying to use very complex models and architectures for the sake of time. Nevertheless, all the methods we are discussing and workign with today should be perfectly applicable to larger, more complex models. Which means you can use them for your own work and research!
- To get **ticked for participation**, please show that you have completed this notebook to one of the instructors before leaving the session. 
- As much as we would've loved to cover more advanced topics like those in Concept Learning, it is tricky to do so within an hour and thirty minutes. Nevertheless, if you are done with this worksheet early and are interested in more advanced exercises, feel free to look into the other notebook in this repo or do some of the optional exercises here. This notebook is the one we will ask students assigned to this topic as their main R255 topic to complete for their non-project marks.
- **If you are stuck at any point**, and/or if anything is unclear, please do not hestitate to ask to any of the instructors! Chances are someone else will have the same question as you do and we are here to help as much as we can.

## General Setup

Time to gets our hands dirty with some of the XAI methods we learned in class! For this, we will first install some libraries which will
be useful to explore the different methods we will play with today:

In [None]:
!pip install lime

We will also import a bunch of useful libraries which will enable us to train models that we can then dissect and explore. In this practical, we will use [TensorFlow](https://www.tensorflow.org/) as our main framework of choice. This has been chosen as TensorFlow allows easy construction of DNNs without much scaffolding and enable easy gradient manipulation in computational graphs.

In [None]:
# We will use TensorFlow for constructing DNNs
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import os
import random

# And do all plotting with matplotlib
import matplotlib.pyplot as plt

Before we move even further, let's double-check that you are connected to a GPU instance in Colab. For this, please run the following command:

In [None]:
print("Connected GPUs:", tf.config.list_physical_devices('GPU'))

If you see a non-empty list, then you are good to go. Otherwise, double-check that your runtime in Colab has been setup to a GPU by going to "Runtime" -> "Change runtime type" -> "Hardware Accelerator" -> "GPU". If after making this change, and restarting the notebook's kernel via "Runtime" -> "Restart runtime", you are still not seeing any GPUs, please contact one of the instructors for help.

## Utility Functions
Last bit of setup, we will use some simple utility functions to encourage determinism in this work. As convention, we suggest that this call is included at the begining of every answer that involves running some potentially non-deterministic code (which with TF is almost everything!).

In [None]:
def set_seeds(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
set_seeds(42)

---
---
---
---

# Part 1: Data Loading and Model Training

## Part 1.1: Dataset Loading

In this practical, we will work with a very simple task constructed from the [MNIST datset](https://en.wikipedia.org/wiki/MNIST_database). Specifically, we will work with 28 x 56 grayscale
images which have 2 handwritten digit in them. Each digit in each sample is between 0 and 5 (inclusive)
and they are placed next to each other concatenated in their x-axis (i.e., each handwritten digit was
originally 28 x 28). Each sample is annotated with one out of 11 labels indicating the result of adding
both of these numbers. Our task, therefore, is to learn to predict the sum of two handwritten digits given both numbers in a single image.


To save some time, we have pre-processed the MNIST dataset for you and saved
this dataset in this repo. We will begin by loading both its training and test splits by unpacking
`train_sum_mnist.npz` and `test_sum_mnist.npz` respectively.

In [None]:
# Let's load our MNIST Sum Training Set
train_1_data = np.load("train_sum_mnist.npz")
x_train = train_1_data['x_train']
y_train = train_1_data['y_train']
print("Training samples shape:", x_train.shape)
print("Training labels shape:", y_train.shape)

# And our MNIST Sum testing set
test_data = np.load("test_sum_mnist.npz")
x_test = test_data['x_test']
y_test = test_data['y_test']
print("Testing samples shape:", x_test.shape)
print("Testing labels shape:", y_test.shape)


Notice that we have a total of 36k samples (30k for training and 6k for testing) and labels are NOT one-hot encoded
but rather given in their categorical form.

To get a sense of how the samples in this dataset look like, it is always helpful to visualise a few samples from its training and testing sets.

Run the code below to plot 5 random samples from both datasets:

In [None]:
# We will plot 5 random samples of both the testing and training set
# to visualise how it looks like before diving into training
# our model
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes
num_samples = 5
fig, axs = plt.subplots(2, num_samples, figsize=(12, 3))
for j, (name, ds) in enumerate([
    ("Training", x_train),
    ("Testing", x_test),
]):
    idxs = np.random.choice(ds.shape[0], num_samples, replace=False)
    for i, ax in enumerate(axs[j, :]):
        X = ds[idxs[i], :, :, 0]
        ax.imshow(X, cmap='viridis', vmin=0, vmax=255)
        ax.grid(False)
        ax.set_xticks([])
        ax.set_yticks([])
        if not i:
            ax.set_ylabel(name, fontsize=20)

We can also explore how different lables are distributed in the training set. This can be important to determine how we should fairly evaluate a model trained to solve this task.

Run the following code to visualise a histogram of the training class distribution in this dataset.

In [None]:
n_classes = len(np.unique(y_train))
fig, axs = plt.subplots(1, 2, figsize=(12, 3))
for ax, (ds_name, ds) in zip(axs, [("train", y_train), ("test", y_test)]):
    ax.bar(
        list(range(n_classes)),
        np.sum(tf.one_hot(ds, n_classes), axis=0),
        align='center',
        width=0.5,
    )
    ax.set_xticks(list(range(n_classes)))
    ax.set_ylabel("Counts", fontsize=15)
    ax.set_xlabel("Class Label", fontsize=15)
    ax.set_title(f"{ds_name} set class distribution", fontsize=15)
plt.show()

We see that classes, both in the training and test sets, are roughly centered at 5. This makes sense as 5 is the most common value one gets from adding two numbers uniformly sampled from {0, 1, 2, 3, 4, 5}. Cool, all looks good so far.

Let's move on to training a model on this dataset. Notice that in practise we may want to correct at some point for this imbalance by weighting our model's loss function in favor of less represented classes, but for now we will ignore such magic tricks for the sake of simplicity.

---
---

## Part 1.2: Model Training

We will now train a simple Convolutional DNN to predict the result obtained when adding two the two numbers in the input image. Before doing so, and in order to save us some headaches when using some of the libraries out there, we will extend our samples to be RGB samples by replicating their values across 3 different channels:

In [None]:
# For simplicity, we will treat all images as RGB images.
# Notice that we will not normalise them for now even though this may help.
x_train = np.stack((x_train[:, :, :, 0].astype(np.float32),)*3, axis=-1)
x_test = np.stack((x_test[:, :, :, 0].astype(np.float32),)*3, axis=-1)

### Warmup Exercise 1.2.1
#### Time to do some training and actual coding! Please complete the missing parts in the code below:

**Hint**: If you need a super quick refresher of TensorFlow, take a look at [this](https://www.tensorflow.org/datasets/keras_example) super quick and easy official example.

In [None]:
# Fix seed for determinism purposes
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# Let's compute the number of classes in our dataset
n_classes = len(np.unique(y_train))


# And time to create our model's architecture. For the sake of simplicity, we will build a model that has:
#     - one 3x3 convolution with 4 output filters
#     - one flattening layer
#     - one fully connected (i.e., dense) layer with 64 hidden units in it and a ReLU activation
#     - one output fully connected (i.e., dense) layer with n_classes output units in it and no activation function
dnn = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        input_shape=(28, 28*2, 3),
        filters=4,
        kernel_size=(3, 3),
        strides=(1, 1),
        padding='valid',
        activation='relu'
    ),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(n_classes)
])

# Compile with:
# (1) an Adam optimizer with learning rate of 0.001
# (2) a sparse categorical cross entropy loss function (make sure it goes from the logits!)
# (3) a sparse categorical accuracy metric to track for debugging purposes
dnn.compile(
    optimizer=...TODO..., # <--- Complete this!
    loss=...TODO..., # <--- Complete this!
    metrics=[...TODO...],  # <--- Complete this!
)

# And finally train this model for a few epochs, say 40, with a relatively large batch size, say 128
_ = dnn.fit(
    x_train,
    y_train,
    epochs=...TODO..., # <--- Complete this!,
    batch_size=...TODO..., # <--- Complete this!
    validation_split=0.1, # We will use 10% of our data for validation purposes
    shuffle=True,
)

### Warmup Exercise 1.2.2
#### Compute the accuracy of this model in its training and test set. How do the two compare?

In [None]:
# TODO: write your answer here!

### Warmup Exercise 1.2.3
#### Before diving deeper into what may be happening here, do you have any guesses for the reason behind the observed generalisation gap?

**TODO**: write your answer here!

### Warmup Exercise 1.2.4
#### If you would have to design an experiment to test your hypothesis above, how would it look like?
**No need to run the actual experiment** (although feel free to do so if you have time later!) but we are looking for you to rather highlight ways in which you can use the content taught in this course for solving real research problems!

**TODO**: write your answer answer here!

---
---

# Part 2: DNN Debugging with Local Interpretable Model-agnostic Explanations (LIME)

The gap that we observed above is a weird one, specially since the model's validation accuracy wasn't bad at all. Here is when we begin to dive into how XAI can help you better understand this scenarios.

With that in mind, let's first try and understand how our train model is "attending" each of its input images when making a prediction. If we are able to notice some interesting difference in how it reasons about samples in its training set vs how it reasons about inputs test set, then we may get a big hint of what is happening here. The first and easiest tool one can use to begin such and exploration is **feature importance methods**!

In class, we learned several method within the family of feature importance XAI. However, here we will focus on exploring LIME, as this is one of the most widely used XAI libraries out there and it offers a very nice pip-installable library. To begin using LIME in image domains, we will need to import the library we recently installed:

In [None]:
from lime import lime_image

This module includes the class `LimeImageExplainer` which will allow us to generate visual explanations highlighting which parts of an image were considered to be important for a given model (see [this](https://github.com/marcotcr/lime/blob/master/doc/notebooks/Tutorial%20-%20Image%20Classification%20Keras.ipynb) short tutorial if you want more details on this module). These visual explanations, represented as masks, explain which super-pixels (or "features") are important for the top predicted class and can show positive and negative contributions using different colors (green with yellow borders for positive contibution and red for a negative contribution).

To simplify our use of LIME in the following section, **we have provided you with a function that abstracts LIME's APIs** to generate explanations for a given set of samples:

In [None]:
def generate_explanations(
    model,
    input_samples,
    num_samples=5,
):
    """
    A list of output masks, one for each input sample, of LIME explanations
    generated for `model`'s output when predicting a label for
    `input_samples`.
    Args:
        model: a valid TensorFlow Model
        input_samples: a Tensor of testing samples with shape (B, H, W, C),
        num_samples: the number of samples to generate when learning LIME's local classifiers.
    Returns:
        LIME's explanations for the given input samples
    """
    # Select a number of examples to visualise
    # And generate a LIME explanation for all of them:
    explainer = lime_image.LimeImageExplainer()
    explanations = []
    for example in input_samples:
        explanation = explainer.explain_instance(
            example.astype('double'),
            model.predict,
            top_labels=1,  # We want an explanation only for the top predicted layer
            hide_color=0,  # We will zero everything that the explanation is not looking at
            num_samples=num_samples,  # How many samples will LIME generate to learn its linear classifier
        )
        explanations.append(explanation)
    return explanations

Let's use this to generate explanations for a few random input samples in our training set:

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes
# We will generate explanations for 8 random samples of the training set
train_idxs = np.random.choice(x_train.shape[0], 8, replace=False)
train_explanations = generate_explanations(
    model=dnn,
    input_samples=x_train[train_idxs, :, :, :],
    num_samples=5,
)

This will allow us to visually inspect which parts of the input image were considered most important for the class
predicted for our model. For this, we also provide a helper function abstracting some of the details of plotting these masks (still, we would recommend going over it to make sure you fully understand it):

In [None]:
from skimage.segmentation import mark_boundaries
def visualise_explanations(
    model,
    explanations,
    input_samples,
    input_labels,
    visualise_samples=False,
    num_features=1,
    n_rows=1,
):
    """
    Visualises the given list of LIME explanations generated for corresponding
    `input_samples` with labels `input_labels`
    Args:
        model: a valid TensorFlow Model used to generate the input explanations
        explanations: a list of B LIME explanations (e.g., the ouptut of generate_explanations)
        input_samples: a Tensor of testing samples with shape (B, H, W, C),
        input_labels: a Tensor of correspondong testing labels with shape (B,),
        visualise_samples: a flag indicating whether we want to include the original
                           samples as part of the visualisation.
        n_rows: the number of rows to use to visualise all B samples. Must divide B.
    Returns:
        None
    """
    n_examples = len(explanations)
    n_cols = n_examples // n_rows
    if visualise_samples:
        n_rows = n_rows * 2
        
    width = n_cols * 4
    height = n_rows * 2
    fig, axs = plt.subplots(
        n_rows,
        n_cols,
        figsize=(width, height),
        squeeze=False,
    )
    for i, (example, label, explanation) in enumerate(zip(input_samples, input_labels, explanations)):
        r = i // n_cols
        c = i % n_cols
        if visualise_samples:
            # First plot example
            ax = axs[2*r, c]
            ax.imshow(example[:, :, 0], cmap='viridis')
            ax.grid(False)
            ax.set_xticks([])
            ax.set_yticks([])
            if c == 0:
                ax.set_ylabel("Training Sample")

        # Then plot mask showing most important features!
        if visualise_samples:
            ax = axs[2*r + 1, c]
        else:
            ax = axs[r, c]
            
        temp, mask = explanation.get_image_and_mask(
            explanation.top_labels[0],
            positive_only=False,  # We only highlight areas that positively contribute to the prediction
            num_features=num_features,  # This asks the explainer to show only the most important 'super-pixel'
            hide_rest=False,  # For now, let's show the background image together with the part that
                             # is considered most important for the input sample
        )
        boundaries = mark_boundaries(temp / 2 + 0.5, mask[:, :])
        ax.imshow(boundaries)
        ax.grid(False)
        ax.set_xticks([])
        ax.set_yticks([])
        color = 'green' if np.argmax(model(np.expand_dims(example, axis=0)), axis=-1) == label else 'red'
        for spine in ax.spines.values():
            spine.set_edgecolor(color)
            spine.set_linewidth(6)
            if color == 'green':
                ax.set_xlabel("Correctly Predicted")
            else:
                ax.set_xlabel("Mispredicted")
        if c == 0:
            ax.set_ylabel("Explanation")


In [None]:
visualise_explanations(
    model=dnn,
    explanations=train_explanations,
    input_samples=x_train[train_idxs, :, :, :],
    input_labels=y_train[train_idxs],
    visualise_samples=False,
    num_features=1,
    n_rows=2,
)

-----

### Exercise 2.1
#### (a) These explanations seem a bit off.... at close inspection of the code, we can see that we are using a very small number of generated samples to train our linear interpretable models for LIME. Below, run an ablation showing how the explanation of a training sample changes as you vary the number of samples used for LIME between {2, 10, 100, 1000, 2500}.

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

#### (b) What things do you notice as the number of LIME samples increases? Why do you think this is the case?

**TODO**: write your answer here!

#### (c) Try also playing with the number of super-pixels (or features) we show in each image (controlled via the `num_features` argument).  What do you observe?

**TODO**: write your answer here!

---

Part of understanding why our model performs well in its training set but not in its testing set is to see how it also perceives images from the testing set. Now that we know how to use LIME to do this, let's see if something clear pops up when looking at the testing set:

### Exercise 2.2
#### Visualise the explanations for this model but a few random samples from the testing dataset
When calling LIME, use a "reasonable" number of LIME samples based on your insights obtained in the previous exercise.

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

---

### Exercise 2.3
#### Do you notice anything suspicious in the images above? What can this tell you of a possible source for the observed generalisation gap?

**TODO**: write your answer here!

---

### Exercise 2.4
#### Describe an experiment that would confirm your hypothesis if successful

**TODO**: write your answer here!

---

### Exercise 2.5 (Optional)
#### If time allows, run the experiment proposed above to see if your hypothesis holds

In [None]:
# TODO: write your answer here (do it only if time allows!)

---
---
---
---

# Part 3: DNN Debugging Via Saliency Maps

So far, we have been able to analyse our misbehaving DNN using LIME and were able to come up with a testible hypothesis as to what may be happening. Nevertheless, we seen with our LIME experiments, some of the explanations that we were given were hard to interpret, making us wonder if there are better, more robust approaches we could use to debug our DNN.

In this section we will explore DNN-specific feature attribution methods (i.e., **Saliency Methods** from lecture!) to see if the explanations generated by these methods also bring evidence in favor of the conjecture we made above regarding the source of the generalisation gap in our model. We will begin with *Vanilla Gradient* and then move to more sofisticated methods after we have analysed this method a bit more.

## Part 3.1: Vanilla Gradient

If you recall from our lectures, the *Vanilla Gradient* saliency method measured how important each input feature is for a given output by looking at the derivative of the model's output with respect to its input features. This takes advantage of the fact that DNNs are differientable models and uses the definition of gradient as a measurement of first-order sensitivity to changes in the input. If you need an extra recap of what Vanilla Gradient does, please take a look at [this short but clear description](https://christophm.github.io/interpretable-ml-book/pixel-attribution.html#vanilla-gradient-saliency-maps) or, for more details, at the [original paper](https://arxiv.org/abs/1312.6034) proposing this method.

### Exercise 3.1.1
#### Implement the "vanilla gradient" method

Remember that *vanilla Gradient* generates a heatmap for the sample we are trying to explain by computing the gradient of the model's output prediction with respect to the input features. In this exercise, we ask you to implement this in TensorFlow by completing the function `vanilla_gradient` whose signature is described below:

**Hint 1**: In TensorFlow you can compute gradients of a function `fun(x)` with respect to some variable of interest `x` using a `tf.GradientTape` as follows:

```python
input_var = tf.Variable(x, dtype=float)
with tf.GradientTape() as tape:
    out = fun(x)
grads = tape.gradient(out, input_var)
```

The big caveat is that you need to make sure that the entire computation you are diffientating is done inside the scope of the tape you are using (i.e., within the code block below `with tf.GradientTape() as tape:`). Further documentation for `tf.GradientTape` can be found [here](https://www.tensorflow.org/api_docs/python/tf/GradientTape).



**Hint 2**: You may use the following function to normalise a saliency map into a grayscale image:
```python
def saliency_to_grayscale(tensor):
    """
    Transform tensor over RGB axis to grayscale.
    Args:
        tensor (tf.Tensor): 4D-Tensor with shape (batch_size, H, W, 3)
    Returns:
        tf.Tensor: 3D-Tensor of grayscale tensor, with shape (batch_size, H, W)
    """
    grayscale_tensor = tf.reduce_sum(tensor, axis=-1)
    return tf.image.per_image_standardization(grayscale_tensor)
```


In [None]:
def vanilla_gradient(model, samples):
    """
    Outputs a saliency map, using the Vanilla Gradient algorithm, indicating which input
    pixels are important for the class predicted by `model` for inputs in `samples`
    Args:
        model: a valid TensorFlow Model whose outputs we want to explain
        samples: a Tensor of testing samples with shape (B, H, W, C) which we want to explain
    Returns:
        a 3D tensor with shape (B, H, W) with as many grayscale saliency maps as inputs
        in `samples`
    """
    # TODO: Implement me!
    pass

In [None]:
def saliency_to_grayscale(tensor):
    """
    Transform tensor over RGB axis to grayscale.
    Args:
        tensor (tf.Tensor): 4D-Tensor with shape (batch_size, H, W, 3)
    Returns:
        tf.Tensor: 3D-Tensor of grayscale tensor, with shape (batch_size, H, W)
    """
    grayscale_tensor = np.mean(tensor, axis=-1)
    return tf.image.per_image_standardization(grayscale_tensor)

def vanilla_gradient(model, samples):
    """
    Outputs a saliency map, using the Vanilla Gradient algorithm, indicating which input
    pixels are important for the class predicted by `model` for inputs in `samples`
    Args:
        model: a valid TensorFlow Model whose outputs we want to explain
        samples: a Tensor of testing samples with shape (B, H, W, C) which we want to explain
    Returns:
        a 3D tensor with shape (B, H, W) with as many grayscale saliency maps as inputs
        in `samples`
    """
    images = tf.Variable(samples, dtype=float)
    with tf.GradientTape() as tape:
        pred = tf.nn.softmax(model(images, training=False), axis=-1)
        target_classes = np.argmax(pred.numpy(), axis=-1)
        loss = []
        for i, idx in enumerate(target_classes):
            loss.append(pred[i:i+1, idx])
        loss = tf.concat(loss, axis=0)
    grads = tape.gradient(loss, images)
    saliency = saliency_to_grayscale(np.abs(grads))

    return saliency

---

Now that we have implemented our vanilla gradient method, it is time to use it to explore the same network we explored with LIME. For this, as we did with LIME, we will first **provide you with a simple method to visualise saliency maps** generated:

In [None]:
def visualise_saliency_explanations(
    model,
    saliency_maps,
    input_samples,
    input_labels,
    visualise_samples=True,
    n_rows=1,
):
    """
    Visualises the given list of saliency maps generated for corresponding
    `input_samples` with labels `input_labels`
    Args:
        model: a valid TensorFlow Model used to generate the input explanations
        saliency_maps: a tensor with shape (B, H, W) with saliency maps corresponding to input_samples
        input_samples: a Tensor of testing samples with shape (B, H, W, C),
        input_labels: a Tensor of correspondong testing labels with shape (B,),
        visualise_samples: a flag indicating whether we want to include the original
                           samples as part of the visualisation.
        n_rows: the number of rows to use to visualise all B samples. Must divide B.
    Returns:
        None
    """
    samples_idxs = list(range(input_samples.shape[0]))
    n_cols = len(samples_idxs) // n_rows
    if visualise_samples:
        n_rows = n_rows * 2
        
    width = n_cols * 4
    height = n_rows * 2
    fig, axs = plt.subplots(
        n_rows,
        n_cols,
        figsize=(width, height),
        squeeze=False,
    )
    for i, idx in enumerate(samples_idxs):
        r = i // n_cols
        c = i % n_cols
        if visualise_samples:
            # First plot example
            ax = axs[2*r, c]
            ax.imshow(input_samples[idx, :, :, 0]/255.0, cmap='viridis')
            ax.grid(False)
            ax.set_xticks([])
            ax.set_yticks([])
            if c == 0:
                ax.set_ylabel("Training Sample")

        # Then plot mask showing most important features!
        if visualise_samples:
            ax = axs[2*r + 1, c]
        else:
            ax = axs[r, c]
            
        explanation = saliency_maps[idx:idx+1, :, :]
        ax.imshow(explanation[0], cmap='hot')
        ax.grid(False)
        ax.set_xticks([])
        ax.set_yticks([])
        color = 'green' if np.argmax(model(input_samples[idx:idx+1, :, :, :]), axis=-1) == input_labels[idx] else 'red'
        for spine in ax.spines.values():
            spine.set_edgecolor(color)
            spine.set_linewidth(6)
            if color == 'green':
                ax.set_xlabel("Correctly Predicted")
            else:
                ax.set_xlabel("Mispredicted")
        if c == 0:
            ax.set_ylabel("Explanation")


This should allow us to visualise the same 8 training samples we visualise for LIME but using our vanilla saliency method:

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes
samples_idxs = np.random.choice(x_train.shape[0], 8, replace=False)
visualise_saliency_explanations(
    model=dnn,
    saliency_maps=vanilla_gradient(
        model=dnn,
        samples=x_train[samples_idxs, :, :, :],
    ),
    input_samples=x_train[samples_idxs, :, :, :],
    input_labels=y_train[samples_idxs],
    visualise_samples=True,
    n_rows=2,
)

### Exercise 3.1.2
#### Visualise the explanations generated by vanilla gradient for 8 TEST samples in our MNIST additive task

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

---

### Exercise 3.1.3
#### (a) Are these results what you expected? Do any of these explanations look odd to you?

**TODO**: write your answer here!

#### (b) How do these results compare to LIME's results qualitatively? Do you have any intuition as to why they may differ the way they do?

**TODO**: write your answer here!

---
---

## Part 3.2: SmoothGrad

One of the biggest issues you can observe when using Vanilla Gradient's saliency map is that they are very noisy (for lack of a better word). This is because derivatives change a lot over very small scales/values, making these maps very noisy and fragile. To make this better, we will implement a method originally proposed by [Smilkov et al.](https://scholar.google.com/scholar_url?url=https://arxiv.org/abs/1706.03825&hl=en&sa=T&oi=gsb&ct=res&cd=0&d=17711739766775469027&ei=TTnbY4mSG-zZsQKBmrTIDQ&scisig=AAGBfm30TezairLOpExTUOEubRAjAsLRBQ) where we smooth the maps generated by vanilla gradient through a simple but clever averaging. Specifically, we will averge the result of computing saliency maps for multiple versions of the sampe input but with different Gaussian noise added to them. This will allow us to capture the salient features more clearly over small changes coming from the noise that we add to the image before computing the saliency map. For a quick and nice explanation of the mathematics behind SmoothGrad, please read [this short](https://christophm.github.io/interpretable-ml-book/pixel-attribution.html#smoothgrad) section before continuing.

### Exercise 3.2.1
#### (a) Implement the SmoothGrad function whose signature is given below:

**Hint**: it may be useful ro reuse `vanilla_saliency` for this implementation 

In [None]:
def smoothgrad_saliency(model, samples, n_trials=5, noise_level=1):
    """
    Outputs a saliency map, using the SmoothGrad on top of Saliency Gradient, indicating which input
    pixels are important for the class predicted by `model` for inputs in `samples`.
    
    SmoothGrad proceeds as follows:
        1. For each sample we want to explain, we generate `n_trials` new sample such that
           each sample is added some Gaussian noise with mean 0 and std `noise_level`.
        2. We then compute the vanilla gradient saliency map for each of these samples.
        3. We output the mean of all the saliency maps across all trials for the same sample.
    Args:
        model: a valid TensorFlow Model whose outputs we want to explain
        samples: a Tensor of testing samples with shape (B, H, W, C) which we want to explain
        n_trials: how many noisy samples we will generate for each of the B inputs in input_samples
        noise_level: the std used when adding Gaussian noise to each input sample.
    Returns:
        a 3D tensor with shape (B, H, W) with as many grayscale saliency maps as inputs
        in `samples`
    """
    pass

#### (b) Before getting any sense of SmoothGrad's hyperparameters,  run SmoothGrad with its default arguments on 8 random test samples and visualise the output masks

**Hint**: use `visualise_saliency_explanations` to do the visualisation for you!

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

#### (c) How do these explanations qualitatively compare to those extracted by both LIME and Vanilla Saliency? Why do you think that is the case?

**TODO**: write your answer here!

---

### Exercise 3.2.2
#### (a) To try and get a sense of how SmoothGrad behaves, we first explore its sensibility to the noise level. For this, run an ablation over the level of noise added for each trial.

When running this ablation, try out values of `noise_percentile` in `{0.01, 0.1, 1, 10, 100, 1000}` using a fixed number of trials (say 50 for now). 

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

#### (b) What hypothesis/conclusions may you reach from the results you are observing? Do you have any intuition as to why the ablation results look like they do?

**TODO**: write your answer here!

---

### Exercise 3.2.3
#### (a) Similarly, we now explore a related question: how does the number of samples used to compute SmoothGrad aftect its output result?

To answer this question, run a simple ablation while changing the number of trials from within {1, 2, 4, 16, 64, 256, 512} and visualise the explanations generated for a single sample with SmoothGrad. You can use your results from the previous exercise to set `noise_level` to a sensible value.

**Hint**: the code used for the previous exercise should be easily adaptable to quickly run this ablation.

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

#### (b) What hypothesis/conclusions may you reach from the results you are observing? Do you have any intuition as to why the ablation results look like they do?

**TODO**: write your answer here!

### Exercise 3.3.4
#### Finally, use the knowledge obtained from the last two ablations to select reasonable values for `n_trials` and `noise_level` to see the explanations generated by SmoothGrad for some random test samples

**Hint**: You may find it useful to rerun the code from Exercise 3.2.1.b here with the "correct" parameters

In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

---

### Exercise 3.3.5 (Optional and only if time allows!)
#### There is good evidence that scaling the gradient by the actual input may generate more meaningful saliency maps (one can thing of it as a first-order Taylor approximation). Try running the experiments above but this time scale the output of Vanilla Gradient by its inputs.

Try it out and you should get crisper heatmaps! Notice how you can apply SmoothGrad to this variation as well to get even better heatmaps. This is a good example of how SmoothGrad is a method for improving any given saliency method rather than a saliency method itself.



In [None]:
set_seeds(42)  # <---- Make sure things are deterministic for comparison purposes

# TODO: write your answer here!

---
---
---
---