<a href="https://colab.research.google.com/github/shaunak-badani/XAI/blob/Ass5/Assignment05/ExplainableDeepLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIPI 590 - XAI | Assignment #05
## Explainable Deep Learning
## Shaunak Badani

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/shaunak-badani/XAI/blob/main/Assignment05/ExplainableDeepLearning.ipynb)

> This notebook tests a hypothesis on the IMDB dataset using Integrated gradients, and then shows the output.

In [20]:
!pip install alibi[tensorflow]



In [32]:
import os
os.environ["TF_USE_LEGACY_KERAS"] = "1"

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Embedding, Conv1D, GlobalMaxPooling1D, Dropout
from tensorflow.keras.utils import to_categorical
from alibi.explainers import IntegratedGradients
import matplotlib.pyplot as plt
print('TF version: ', tf.__version__)
print('Eager execution enabled: ', tf.executing_eagerly()) # True

TF version:  2.14.1
Eager execution enabled:  True


In [2]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = 10000)

In [3]:
print(f"Number of training sequences: {len(x_train)}")
print(f"Number of testing sequences: {len(x_test)}")

Number of training sequences: 25000
Number of testing sequences: 25000


In [4]:
# Padding sequences such that all sequences have the same length
max_length = 100
x_train = sequence.pad_sequences(x_train, maxlen = max_length)
x_test = sequence.pad_sequences(x_test, maxlen = max_length)

In [5]:
y_train, y_test = to_categorical(y_train), to_categorical(y_test)


In [6]:
index = imdb.get_word_index()
reverse_index = {value: key for (key, value) in index.items()}

In [7]:
def decode_sentence(x, reverse_index):
    return " ".join([reverse_index.get(i - 3, 'UNK') for i in x])

In [8]:
decode_sentence(x_test[50], reverse_index)

"you into your hearts but really it ended and i felt like i had watched a 5 minute cartoon on kids tv br br i don't have children of my own but when i do i fully intend to show them quality children's movies like the movie toy story and finding UNK even though they are too childish for me these days i can see how they would be of great appeal to young children not so with this appalling attempt at a movie br br oh and one more thing not enough UNK he should have his own movie"

## Building the model

- We'll be training a 1 dimensional convolutional nueral network, with pooling.

- This model will be used with the Integrated Gradients method.

In [13]:
sequence_len = max_length
features = 10000
d_embed = 50
inputs = Input(shape = (sequence_len, ), dtype = tf.int32)
embedded_sequences = Embedding(features, d_embed)(inputs)

filters = 250
kernel_size = 3
hidden_dims = 250
out = Conv1D(filters, kernel_size, padding = 'valid', activation = 'relu', strides = 1)(embedded_sequences)
out = Dropout(0.4)(out)
out = GlobalMaxPooling1D()(out)
out = Dense(hidden_dims, activation = 'relu')(out)
out = Dropout(0.4)(out)

outputs = Dense(2, activation = 'softmax')(out)

In [14]:
y_train.shape

(25000, 2)

In [15]:

y_train

array([[0., 1.],
       [1., 0.],
       [1., 0.],
       ...,
       [1., 0.],
       [0., 1.],
       [1., 0.]], dtype=float32)

In [16]:
model = Model(inputs=inputs, outputs = outputs)

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.fit(x_train, y_train, batch_size = 256, epochs = 3, validation_data = (x_test, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.src.callbacks.History at 0x7f9f47cb9290>

# Integrated gradients

- Each word in the sequence is mapped to a 50 dimensional vector.
- For a 100-length sequence, this will amount to a 100 x 50 dimension matrix.
- So  the attribution matrix will also be a matrix of length 100 x 50
- If N samples are used, then attribution tensor = (N, 100, 50)

In [17]:
layer = model.layers[1]
layer

<keras.src.layers.core.embedding.Embedding at 0x7f9f472c40d0>

In [18]:
n_steps = 50
method = "gausslegendre"
internal_batch_size = 100
nb_samples = 10
ig  = IntegratedGradients(model,
                          layer=layer,
                          n_steps=n_steps,
                          method=method,
                          internal_batch_size=internal_batch_size)

In [25]:
test_batch = x_test[:nb_samples]
probabilities = model(test_batch).numpy()
preds = probabilities.argmax(axis = 1)
explanation = ig.explain(test_batch, baselines = None, target=preds, attribute_to_layer_inputs = False)

In [26]:
explanation.meta

{'name': 'IntegratedGradients',
 'type': ['whitebox'],
 'explanations': ['local'],
 'params': {'target_fn': None,
  'method': 'gausslegendre',
  'n_steps': 50,
  'internal_batch_size': 100,
  'layer': 1},
 'version': '0.9.6'}

In [28]:
explanation.attributions[0].shape

(10, 100, 50)

In [36]:
attrs = explanation.attributions[0]
attrs = attrs.sum(axis = 2) # Sum along all dimensions of the vector embedding space

### Visualize the attributions

In [38]:
sample_sentence_no = 4
x_i = test_batch[sample_sentence_no]
attrs_i = attrs[sample_sentence_no]
pred = preds[sample_sentence_no]
pred_dict = {1: 'Positive review', 0: 'Negative review'}

In [40]:
print('Predicted label =  {}: {}'.format(pred, pred_dict[pred]))

Predicted label =  1: Positive review


In [41]:
from IPython.display import HTML
def  hlstr(string, color='white'):
    """
    Return HTML markup highlighting text with the desired color.
    """
    return f"<mark style=background-color:{color}>{string} </mark>"

In [42]:
def colorize(attrs, cmap='PiYG'):
    """
    Compute hex colors based on the attributions for a single instance.
    Uses a diverging colorscale by default and normalizes and scales
    the colormap so that colors are consistent with the attributions.
    """
    import matplotlib as mpl
    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)

    # now compute hex values of colors
    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors

In [43]:
words = decode_sentence(x_i, reverse_index).split()
colors = colorize(attrs_i)

  cmap = mpl.cm.get_cmap(cmap)


In [44]:
HTML("".join(list(map(hlstr, words, colors))))