## 3. Example Driven

## Introduction

Such approaches explain the prediction of an input instance by identifying and presenting other instances, usually from available labeled data, that are semantically similar to the input instance.

The paper proposes, among other examples: [Auditing Deep Learning processes through Kernel-based Explanatory Models](https://aclanthology.org/D19-1415.pdf).
* We tried to clone and use the library, however it is notably outdated (some libraries have long-deprecated features and the code has not been adapted), there is very few information in the ReadMe on the usage (after having updated the code by hand with the outdated libraries, the server coudl be loaded locally, but it required a user and a password, that is hardcoded in the source of the project). Even so after bypassing these problems, the server did not load anymore; so this method was desestimated for this project due to its lack of mainteinance.

# Experiments - LRP

LRP stands for layerwise relevance propagation.

In [32]:
import numpy as np
import torch
from IPython.core.display import display, HTML
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer

from interpret_nlp.modules.lrp_bert_modules import LRPBertForSequenceClassification
from interpret_nlp.visualization.heatmap import html_heatmap

from src.preprocess import get_train_dev_test_data
from tokenizers import TextInputSequence

  from IPython.core.display import display, HTML


First we need our finetuned model:

In [None]:
# Uncomment this to finetune the model!
# it takes a while, so it is commented to avoid re-doing the same step several times

# !python3 src/finetune.py

The module we are going to use for LRP requires a series of files that we need to store from our model:

In [6]:
path = "pretrained_model/"
config_path = f"{path}/config.json"
state_dict_path = f"{path}/training_args.bin"

In [7]:
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
model = AutoModelForSequenceClassification.from_pretrained(
    "pretrained_model/", num_labels=4, output_attentions=True
)

In [8]:
torch.save(model.config, config_path)
torch.save(model.state_dict(), state_dict_path)

Now that we have got everything set up, we can load our model configuration into the LRP model and use it with our test data.

In [9]:
model = LRPBertForSequenceClassification(torch.load(config_path))
model.load_state_dict(torch.load(state_dict_path))

<All keys matched successfully>

In [11]:
_, _, test = get_train_dev_test_data()

In [59]:
def plot_attr_forward_pass(sample):
    print(f"LRP results for sentence {sample}")
    inputs = tokenizer(sample, return_tensors="pt")
    model.attr()
    output = model(**inputs)

    print("Attr Forward Pass Output:")
    print(output)
    return output

In [60]:
def plot_lrp_scores(sample, output):
    tokens = tokenizer.tokenize(TextInputSequence(sample))

    rel_y = np.zeros(output.shape)
    rel_y[:, 1] = output[:, 1]
    rel_word, rel_pos, rel_type, rel_embed = model.attr_backward(rel_y, eps=0.1)
    rel_word = np.sum(rel_word[0, 1:-1], -1)
    rel_pos = np.sum(rel_pos[0, 1:-1], -1)
    rel_type = np.sum(rel_type[0, 1:-1], -1)
    rel_embed = np.sum(rel_embed[0, 1:-1], -1)

    print("LRP Scores:")
    for t, s in zip(tokens, rel_embed):
        print(t, s, sep=": ")

    print("Relevance of word embeddings:")
    display(HTML(html_heatmap(tokens, list(rel_word))))

    print("Relevance of positional embeddings:")
    display(HTML(html_heatmap(tokens, list(rel_pos))))

    print("Relevance of type embeddings:")
    display(HTML(html_heatmap(tokens, list(rel_type))))

    print("Relevance of combined embeddings:")
    display(HTML(html_heatmap(tokens, list(rel_embed))))

In [61]:
sample = test["verse_text"][22]
output = plot_attr_forward_pass(sample)
plot_lrp_scores(sample, output)

LRP results for sentence Fuck The Patriarchy keychain on the ground




Attr Forward Pass Output:
[[-0.9460913 -1.5576597  4.5470047 -2.16687  ]]
LRP Scores:
Fuck: 0.0035158381137659032
The: 0.01145847863435421
Patriarch: 0.006922768180123209
##y: -0.03876505146881995
key: -0.03356181444475432
##cha: 0.01869896553331519
##in: -0.00391570416790995
on: 0.052141691096447076
the: -0.011557318260907308
ground: -0.010196334002654885
Relevance of word embeddings:


Relevance of positional embeddings:


Relevance of type embeddings:


Relevance of combined embeddings:


In [62]:
sample = test["verse_text"][-2]
output = plot_attr_forward_pass(sample)
plot_lrp_scores(sample, output)

LRP results for sentence Just between us, did the love affair maim you all too well?
Attr Forward Pass Output:
[[ 2.465749   -1.7446934  -1.069228    0.37906298]]
LRP Scores:
Just: -0.004472918022042428
between: -0.014479319003017129
us: -0.003072913182871711
,: -0.00022758517016512742
did: -0.0230418528388121
the: 0.0027981546134820574
love: 0.0006156135456628592
affair: -0.0013547769309030957
ma: -0.07915791916835851
##im: -0.027547992649463837
you: 0.0026281125802477415
all: 0.031026367439432788
too: -0.0021704092748219767
well: 0.014077428293788757
?: 0.0034378083720426255
Relevance of word embeddings:


Relevance of positional embeddings:


Relevance of type embeddings:


Relevance of combined embeddings:


In [64]:
sample = test["verse_text"][76]
output = plot_attr_forward_pass(sample)
plot_lrp_scores(sample, output)

LRP results for sentence Now you mail back my things and I walk home alone
Attr Forward Pass Output:
[[-1.303706  -1.1475993  4.716536  -2.1116943]]
LRP Scores:
Now: -0.045022115162623444
you: 0.010050564956280254
mail: 0.11412395617036492
back: -0.0029099929285758396
my: 0.018664112751538498
things: -0.010082488754857274
and: -0.0046736754090115195
I: 0.019453498003583227
walk: -0.12283465138655994
home: 0.049154334412124905
alone: 0.38974890877967383
Relevance of word embeddings:


Relevance of positional embeddings:


Relevance of type embeddings:


Relevance of combined embeddings:


In [65]:
sample = test["verse_text"][25]
output = plot_attr_forward_pass(sample)
plot_lrp_scores(sample, output)

LRP results for sentence He's gonna say it's love
Attr Forward Pass Output:
[[-2.0716417  2.66965    1.1471639 -1.460388 ]]
LRP Scores:
He: 0.17884644893150842
': 0.047792932278276364
s: 0.11474945947259482
gonna: 0.005484593067935584
say: 1.2531748446033983
it: -0.3093842341171
': -0.020012210592356673
s: 0.04579449521539893
love: -0.6935180079603283
Relevance of word embeddings:


Relevance of positional embeddings:


Relevance of type embeddings:


Relevance of combined embeddings:
