# Layer-wise Relevance Propagation

This notebook is adapted from the [BERT-explainability notebook](https://colab.sandbox.google.com/github/hila-chefer/Transformer-Explainability/blob/main/BERT_explainability.ipynb) of Hila Cifar et al. and their work [Transformer Interpretability Beyond Attention Visualization](https://arxiv.org/abs/2012.09838) published in [CVPR 2021](https://cvpr2021.thecvf.com/).

**using a GPU will speed up performance** 

Start by cloning the repo from the official (pytorch) implementation of the paper `Transformer Interpretability Beyond Attention Visualization`.
This paper introduces a novel method for visualizing classifications made by a Transformer based model for NLP (as well as vision) tasks. 

In [2]:
# !git clone https://github.com/hila-chefer/Transformer-Explainability.git

Next, install the additional requirements for this notebook.

In [4]:
# !pip install --user -r requirements.txt

#### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.
Next, we'll import the necessary pacakages and libraries.

In [5]:
import os
os.chdir('./Transformer-Explainability')

In [6]:
from BERT_explainability.modules.BERT.BertForSequenceClassification import BertForSequenceClassification
from BERT_explainability.modules.BERT.ExplanationGenerator import Generator
from captum.attr import visualization

import tensorflow as tf
import matplotlib.pyplot as plt
import torch
from transformers import AutoTokenizer
from transformers import BertTokenizer

2022-09-15 22:13:16.790792: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0


In [7]:
if torch.cuda.is_available():
    print("using GPU...")
    model = BertForSequenceClassification.from_pretrained(
        "textattack/bert-base-uncased-SST-2").to("cuda")
else:
    print("...not using GPU...")
    model = BertForSequenceClassification.from_pretrained(
        "textattack/bert-base-uncased-SST-2")

model.eval()

tokenizer = AutoTokenizer.from_pretrained(
    "textattack/bert-base-uncased-SST-2")

# initialize the explanations generator
explanations = Generator(model)

classifications = ["NEGATIVE", "POSITIVE"]

using GPU...


In [8]:
# encode a sentence
text_batch = ["If you like the original, you'll love this movie."]
encoding = tokenizer(text_batch, return_tensors='pt')

if torch.cuda.is_available():
    print("using GPU...")
    input_ids = encoding['input_ids'].to("cuda")
    attention_mask = encoding['attention_mask'].to("cuda")
else:
    print("not using GPU...")
    input_ids = encoding["input_ids"]
    attention_mask = encoding["attention_mask"]

using GPU...


Here `encoding` is a transformer tokenization containing `input_ids`, `token_type_ids`, and `attention_mask`. We'll pull these out to supply them to the LRP explanation method.

In [9]:
# true class is positive - 1
true_class = 1

# generate an explanation for the input
lrp_expl = explanations.generate_LRP(input_ids=input_ids,
                                     attention_mask=attention_mask,
                                     start_layer=0)[0]

# normalize scores
lrp_expl = (lrp_expl - lrp_expl.min()) / (lrp_expl.max() - lrp_expl.min())

In [10]:
# Get the model classification.
output = torch.nn.functional.softmax(model(input_ids=input_ids,
                                           attention_mask=attention_mask)[0],
                                     dim=-1)
classification = output.argmax(dim=-1).item()

# Get class name.
class_name = classifications[classification]

# If the classification is negative, higher explanation scores are more
# negative, so flip for visualization.
if class_name == "NEGATIVE":
    lrp_expl *= (-1)

In [11]:
tokens = tokenizer.convert_ids_to_tokens(input_ids.flatten())

for i in range(len(tokens)):
    print(f'({tokens[i]}: {lrp_expl[i].item()})')

([CLS]: 0.0)
(if: 0.28011012077331543)
(you: 0.2685689330101013)
(like: 0.18668441474437714)
(the: 0.18705449998378754)
(original: 0.17445366084575653)
(,: 0.0)
(you: 0.2654431164264679)
(': 0.031408846378326416)
(ll: 0.47678276896476746)
(love: 1.0)
(this: 0.6331197619438171)
(movie: 0.2599993348121643)
(.: 0.013178937137126923)
([SEP]: 0.1188938245177269)


In [12]:
vis_data_records = [
    visualization.VisualizationDataRecord(word_attributions=lrp_expl,
                                          pred_prob=output[0][classification],
                                          pred_class=classification,
                                          true_class=true_class,
                                          attr_class=true_class,
                                          attr_score=1,
                                          raw_input_ids=tokens,
                                          convergence_score=1)]

visualization.visualize_text(vis_data_records)
plt.show()

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (1.00),1.0,1.0,"[CLS] if you like the original , you ' ll love this movie . [SEP]"
,,,,


Copyright 2022 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License