<a href="https://colab.research.google.com/github/simulate111/Textual-Data-Analysis-25/blob/main/Exercise%20task%208.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model explainability example

* The new code related to model explainability starts half-way through, after the classifier has been trained


# Train BERT text classifier using the Transformers library

This notebook shows how to train a simple text classifier by fine-tuning a pre-trained BERT model using the Hugging Face [Transformers](https://huggingface.co/transformers/) library.

This notebook is based in part on the [Text classification on GLUE](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb) notebook.

**NOTE**: it's recommended to run this using a runtime with a GPU. Select "Runtime" -> "Change runtime type" from the top menu in Colab and set "Hardware accelerator" to "GPU" when starting

## Install libraries

First, we'll use [`pip`](https://pypi.org/project/pip/) to install two Python libraries that are used in this notebook: [`transformers`](https://huggingface.co/transformers/) and [`datasets`](https://huggingface.co/docs/datasets/).

In [None]:
!pip --quiet install transformers
!pip --quiet install datasets

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/480.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m471.0/480.6 kB[0m [31m26.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.5/143.5 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.8/194.8 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the fo

In [None]:
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments
from transformers import Trainer
from datasets import load_dataset

## Set model, dataset and hyperparameters

Let's then set some global variables such as the name of the pre-trained model and the hyperparameters we'll use for fine-tuning it.

* `MODEL_NAME`: the name of a pretrained model included in the [model repository](https://huggingface.co/models)
* `DATASET`: the path and name of a dataset included in the [dataset repository](https://huggingface.co/datasets)
* `LEARNING_RATE`, `BATCH_SIZE`, and `TRAIN_EPOCHS`: hyperparameters to use for fine-turning the model. (Try different values here!)

Here, we'll use the [Stanford Sentiment Treebank](https://huggingface.co/datasets/sst) dataset as prepared for the [GLUE](https://huggingface.co/datasets/glue) collection. This is a binary text classification task where the objective is to determine if sentences express a positive or negative sentiment.

In [None]:
!wget http://dl.turkunlp.org/TKO_8964_2023/english-binarized-weighted.model.tgz
!tar -xvzf english-binarized-weighted.model.tgz

--2025-02-06 11:38:41--  http://dl.turkunlp.org/TKO_8964_2023/english-binarized-weighted.model.tgz
Resolving dl.turkunlp.org (dl.turkunlp.org)... 195.148.30.23
Connecting to dl.turkunlp.org (dl.turkunlp.org)|195.148.30.23|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 402134026 (384M) [application/octet-stream]
Saving to: ‘english-binarized-weighted.model.tgz’


2025-02-06 11:39:03 (17.5 MB/s) - ‘english-binarized-weighted.model.tgz’ saved [402134026/402134026]

english-binarized-weighted.model/
english-binarized-weighted.model/training_args.bin
english-binarized-weighted.model/pytorch_model.bin
english-binarized-weighted.model/tokenizer.json
english-binarized-weighted.model/vocab.txt
english-binarized-weighted.model/config.json
english-binarized-weighted.model/special_tokens_map.json
english-binarized-weighted.model/tokenizer_config.json


In [None]:
MODEL_NAME = 'english-binarized-weighted.model'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
#tokenized = tokenizer(text=question, text_pair=context, return_tensors='pt')
#prediction = model(**tokenized)

In [None]:
#MODEL_NAME = 'bert-base-cased'
#DATASET = ('glue', 'sst2')

## Load dataset

We'll first load the dataset with [`load_dataset`](https://huggingface.co/docs/datasets/package_reference/loading_methods.html#datasets.load_dataset) and determine the number of distinct labels in the (training) data.

In [None]:
#dataset = load_dataset(*DATASET)
#num_labels = len(set(dataset['train']['label']))

The loaded dataset is a simple dictionary-like container for distinct `DataSet` objects for training, development (validation), and test data:

In [None]:
#print(dataset)
#print(f'number of distinct labels: {num_labels}')

To reduce training time a bit, let's just take every 10th item in the `train` subset of the dataset.

In [None]:
# I leave this undone to get the best possible model later for the explainability part
#dataset['train'] = dataset['train'].filter(lambda example, idx: idx % 10 == 0, with_indices=True)

## Load tokenizer and tokenize data

We'll then load an appropriate tokenizer for the pre-trained model we'll be using with [`AutoTokenizer.from_pretrained`](https://huggingface.co/transformers/model_doc/auto.html#transformers.AutoTokenizer.from_pretrained)

In [None]:
#tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

The tokenizer can take a string (or e.g. a list of strings) and performs all the necessary preprocessing steps to prepare its input for use by the model, including splitting input texts into tokens, adding padding and special characters, and mapping those to integer IDs (`input_ids`). The tokenizer also creates attention weights (`attention_mask`) for the model.

We'll then define an encoding function applying the tokenizer to the text data  (key `"sentence"`) of a [`Dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html?highlight=dataset#datasets.Dataset) object, and use the `map` function of the `DatasetDict` to tokenize the train, development, and test datasets.

In [None]:
def encode_dataset(d):
  return tokenizer(d['sentence'])

#encoded_dataset = dataset.map(encode_dataset)

## Load pre-trained model

Next, we'll load the pre-trained model with support for text classification output using [`AutoModelForSequenceClassification.from_prertained`](https://huggingface.co/transformers/model_doc/auto.html#transformers.AutoModelForSequenceClassification.from_pretrained).

Note that we need to provide the number of labels in the data when loading the model.

In [None]:
#model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=num_labels)

## Training parameters and metrics
We're almost ready to train. We'll next create a [`TrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments) object to hold the hyperparameters and other settings that are appropriate for training on Colab:

* `save_strategy`: set to `"epoch"` so that model checkpoints are saved every epoch, this I can do because I save to own Google Drive
* `evaluation_strategy` and `logging_strategy` set to `"epoch"` so that evaluation and logging are performed once per epoch
* The hyperparameters `LEARNING_RATE`, `BATCH_SIZE` and `TRAIN_EPOCHS` set above are passed to the training process through this object

Finally, we'll define a simple accuracy metric measuring how many predictions match their correct values.

In [None]:
'''
def compute_accuracy(pred):
    y_pred = pred.predictions.argmax(axis=1)
    y_true = pred.label_ids
    return { 'accuracy': sum(y_pred == y_true) / len(y_true) }
'''

"\ndef compute_accuracy(pred):\n    y_pred = pred.predictions.argmax(axis=1)\n    y_true = pred.label_ids\n    return { 'accuracy': sum(y_pred == y_true) / len(y_true) }\n"

# Training (fine-tuning)

For fine-tuning the pre-trained model, we'll create a [`Trainer`](https://huggingface.co/transformers/main_classes/trainer.html) object, providing it with the pre-trained model, settings, training and development (validation) data, and the evaluation metric created above.

In [None]:
'''

# Hyperparameters
LEARNING_RATE=1e-4
BATCH_SIZE=128
TRAIN_EPOCHS=4
'''

'\n\n# Hyperparameters\nLEARNING_RATE=1e-4\nBATCH_SIZE=128\nTRAIN_EPOCHS=4\n'

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

In [None]:
'''
import wandb
wandb.init(mode="disabled")

train_args = TrainingArguments(
    '/content/drive/MyDrive/WorkStuff/ssd2-sentiment-model',    # output directory for checkpoints and predictions
    load_best_model_at_end=True,
    evaluation_strategy='steps',
    logging_strategy='steps',
    learning_rate=LEARNING_RATE,
    per_device_train_batch_size=BATCH_SIZE,
    num_train_epochs=TRAIN_EPOCHS
)
'''

'\nimport wandb\nwandb.init(mode="disabled")\n\ntrain_args = TrainingArguments(\n    \'/content/drive/MyDrive/WorkStuff/ssd2-sentiment-model\',    # output directory for checkpoints and predictions\n    load_best_model_at_end=True,\n    evaluation_strategy=\'steps\',\n    logging_strategy=\'steps\',\n    learning_rate=LEARNING_RATE,\n    per_device_train_batch_size=BATCH_SIZE,\n    num_train_epochs=TRAIN_EPOCHS\n)\n'

In [None]:
'''
trainer = Trainer(
      model,
      train_args,
      train_dataset=encoded_dataset['train'],
      eval_dataset=encoded_dataset['validation'],
      tokenizer=tokenizer,
      compute_metrics=compute_accuracy
)
'''

"\ntrainer = Trainer(\n      model,\n      train_args,\n      train_dataset=encoded_dataset['train'],\n      eval_dataset=encoded_dataset['validation'],\n      tokenizer=tokenizer,\n      compute_metrics=compute_accuracy\n)\n"

Training is then performed simply by calling the `train` function of the `Trainer` object.

In [None]:
#trainer.train()

## Evaluation and predictions for user input

We can use `trainer.evaluate()` to evaluate the trained model on the `eval_dataset` given to the trainer:

In [None]:
#results = trainer.evaluate()
#print(f'Accuracy: {results["eval_accuracy"]}')

# Model save

* Now that the model is trained, we may want to save it to Google Drive
* That way we don't need to train it every time

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

In [None]:
# Oh no! Old code, don't save like this! Use the trainer!

import torch
#make sure you make this directory
#torch.save(trainer.model,"/content/drive/MyDrive/WorkStuff/sent_model.pt")

# Model load

* Now we load the model and repeat some of the imports so it is possible to run the notebook from this point onwards

In [None]:
# relevant stuff repeated from above so you can run it from here onwards if you happen to have the model trained
'''
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments
from transformers import Trainer
from datasets import load_dataset

MODEL_NAME = 'bert-base-cased'
DATASET = ('glue', 'sst2')

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
'''

"\nfrom transformers import AutoTokenizer\nfrom transformers import AutoModelForSequenceClassification\nfrom transformers import TrainingArguments\nfrom transformers import Trainer\nfrom datasets import load_dataset\n\nMODEL_NAME = 'bert-base-cased'\nDATASET = ('glue', 'sst2')\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n"

In [None]:
MODEL_NAME = "english-binarized-weighted.model"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

try that out

In [None]:
def predict_answer_presence(question, context):
    tokenized = tokenizer(text=question, text_pair=context, return_tensors='pt')
    with torch.no_grad():
        logits = model(**tokenized).logits
    probs = torch.nn.functional.softmax(logits, dim=-1)
    return "YES" if probs[0][1] > probs[0][0] else "NO"

In [None]:
question = "When was University of Turku founded?"
context = "The University of Turku (Finnish: Turun yliopisto, in Swedish: Åbo universitet, shortened UTU), located in Turku in southwestern Finland, is the third largest university in the country as measured by student enrollment, after the University of Helsinki and Tampere University. It is a multidisciplinary university with eight faculties. It was established in 1920 and also has facilities at Rauma, Pori, Kevo, and Seili. The university is a member of the Coimbra Group and the European Campus of City - Universities (EC2U)."

prediction = predict_answer_presence(question, context)
print(f"Answer Presence: {prediction}")


Answer Presence: YES


In [None]:
def highlight_answer(context, answer_snippet):
    return context.replace(answer_snippet, f"\033[92m{answer_snippet}\033[0m")  # Green highlight

answer_snippet = "established in 1920"  # Modify this as needed
highlighted_context = highlight_answer(context, answer_snippet)

print(highlighted_context)  # In Colab, color might not display. Verify manually.


The University of Turku (Finnish: Turun yliopisto, in Swedish: Åbo universitet, shortened UTU), located in Turku in southwestern Finland, is the third largest university in the country as measured by student enrollment, after the University of Helsinki and Tampere University. It is a multidisciplinary university with eight faculties. It was [92mestablished in 1920[0m and also has facilities at Rauma, Pori, Kevo, and Seili. The university is a member of the Coimbra Group and the European Campus of City - Universities (EC2U).


In [None]:
answer_snippets = ["established in 1920", "University", "University of", "University of Turku"]

def highlight_multiple_answers(context, snippets):
    for snippet in snippets:
        context = context.replace(snippet, f"\033[92m{snippet}\033[0m")  # Green highlight
    return context

highlighted_context = highlight_multiple_answers(context, answer_snippets)

print(highlighted_context)


The [92mUniversity[0m of Turku (Finnish: Turun yliopisto, in Swedish: Åbo universitet, shortened UTU), located in Turku in southwestern Finland, is the third largest university in the country as measured by student enrollment, after the [92mUniversity[0m of Helsinki and Tampere [92mUniversity[0m. It is a multidisciplinary university with eight faculties. It was [92mestablished in 1920[0m and also has facilities at Rauma, Pori, Kevo, and Seili. The university is a member of the Coimbra Group and the European Campus of City - Universities (EC2U).


# Model explainability with the captum library

* captum.ai is a nifty library with many of the explainability algorithms implemented
* we will use it here to try the techniques
* this is not the easiest of code, btw, many of the libraries are bleeding-edge, so you cut yourself here and there (the blood on the edge is yours)
* the code below is (quite substantial) adaptation of https://captum.ai/tutorials/Bert_SQUAD_Interpret

In [None]:
!pip install captum pandas matplotlib seaborn transformers datasets

Collecting captum
  Downloading captum-0.7.0-py3-none-any.whl.metadata (26 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.6->captum)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.6->captum)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.6->captum)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.6->captum)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.6->captum)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.6->captum)
  Downloading nvidia_cufft_cu1

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

In [None]:
import torch
import transformers
#tokenizer=transformers.AutoTokenizer.from_pretrained("bert-base-cased")
#model = torch.load("/content/drive/MyDrive/WorkStuff/sent_model.pt")
model.to('cpu')    # simplifies input placement

sentiment_label = [
    'negative',
    'positive'
]

def predict_sentiment(string):
    tokenized = tokenizer(text=question, text_pair=context, return_tensors='pt')
    pred = model(**tokenized)
    pred_idx = pred.logits.detach().numpy().argmax()
    return sentiment_label[pred_idx]

In [None]:
example_sentences = ["The University of Turku (Finnish: Turun yliopisto, in Swedish: Åbo universitet, shortened UTU), located in Turku in southwestern Finland, is the third largest university in the country as measured by student enrollment, after the University of Helsinki and Tampere University. It is a multidisciplinary university with eight faculties. It was established in 1920 and also has facilities at Rauma, Pori, Kevo, and Seili. The university is a member of the Coimbra Group and the European Campus of City - Universities (EC2U)."]

for e in example_sentences:
    print(e, '->', predict_sentiment(e))

The University of Turku (Finnish: Turun yliopisto, in Swedish: Åbo universitet, shortened UTU), located in Turku in southwestern Finland, is the third largest university in the country as measured by student enrollment, after the University of Helsinki and Tampere University. It is a multidisciplinary university with eight faculties. It was established in 1920 and also has facilities at Rauma, Pori, Kevo, and Seili. The university is a member of the Coimbra Group and the European Campus of City - Universities (EC2U). -> positive


In [None]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

In [None]:
#Tells the model that it is in evaluation mode, and zeroes out the gradients
model.eval()
model.zero_grad()

# Helper functions

In [None]:
# Forward on the model -> data in, prediction out, nothing fancy really
def predict(inputs, token_type_ids, attention_mask):
    pred=model(inputs, token_type_ids=token_type_ids, attention_mask=attention_mask)
    return pred.logits #return the output of the classification layer

In [None]:
# Given input text, construct a pair of (text input, blank reference input as long as the text itself)
def construct_input_ref_pair(text):
    inp=tokenizer(text,return_tensors="pt")
    #let's have our blank be all padding
    #but how long is the text?
    #one way to do this:
    tokenized=tokenizer(text,add_special_tokens=False)["input_ids"]
    blank_input=" ".join(["[PAD]"]*len(tokenized)) #and now make an input with this many [PAD] tokens
    ref=tokenizer(blank_input,return_tensors="pt")
    return (inp["input_ids"],inp["token_type_ids"],inp["attention_mask"]),\
    (ref["input_ids"],ref["token_type_ids"],ref["attention_mask"])

In [None]:
# Let's try it!

text="When was University of Turku founded? The University of Turku (Finnish: Turun yliopisto, in Swedish: Åbo universitet, shortened UTU), located in Turku in southwestern Finland, is the third largest university in the country as measured by student enrollment, after the University of Helsinki and Tampere University. It is a multidisciplinary university with eight faculties. It was established in 1920 and also has facilities at Rauma, Pori, Kevo, and Seili. The university is a member of the Coimbra Group and the European Campus of City - Universities (EC2U)."

#input:
inp, ref = construct_input_ref_pair(text)
all_tokens = tokenizer.convert_ids_to_tokens(inp[0][0])
print(all_tokens)

['[CLS]', 'When', 'was', 'University', 'of', 'Tu', '##rk', '##u', 'founded', '?', 'The', 'University', 'of', 'Tu', '##rk', '##u', '(', 'Finnish', ':', 'Tu', '##run', 'y', '##lio', '##pis', '##to', ',', 'in', 'Swedish', ':', 'Å', '##bo', 'un', '##ivers', '##ite', '##t', ',', 'shortened', 'U', '##TU', ')', ',', 'located', 'in', 'Tu', '##rk', '##u', 'in', 'southwestern', 'Finland', ',', 'is', 'the', 'third', 'largest', 'university', 'in', 'the', 'country', 'as', 'measured', 'by', 'student', 'enrollment', ',', 'after', 'the', 'University', 'of', 'Helsinki', 'and', 'Tam', '##per', '##e', 'University', '.', 'It', 'is', 'a', 'multi', '##disciplinary', 'university', 'with', 'eight', 'faculties', '.', 'It', 'was', 'established', 'in', '1920', 'and', 'also', 'has', 'facilities', 'at', 'Ra', '##uma', ',', 'Po', '##ri', ',', 'Ke', '##vo', ',', 'and', 'Se', '##ili', '.', 'The', 'university', 'is', 'a', 'member', 'of', 'the', 'Co', '##im', '##bra', 'Group', 'and', 'the', 'European', 'Campus', 'of', 

In [None]:
p=predict(*inp)
print("p=",p)
print("p.shape",p.shape)

p= tensor([[ 0.4985, -1.0528]], grad_fn=<AddmmBackward0>)
p.shape torch.Size([1, 2])


In [None]:
# Yay, now we finally made it to the attribution part
lig = LayerIntegratedGradients(predict, model.bert.embeddings) #attribute the output wrt to embeddings

In [None]:
# inputs: inputs
# baselines: the blank baseline
# target: which of the two classes in the output (pos/neg) to run the prediction against?
attrs, delta = lig.attribute(inputs=inp,
                                  baselines=ref,
                                  return_convergence_delta=True,target=1)
print("attrs shape",attrs.shape)

attrs shape torch.Size([1, 134, 768])


In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.vector_norm(attributions,dim=0)
    return attributions

attrs_sum = summarize_attributions(attrs)
print("attrs_sum shape",attrs_sum.shape)

attrs_sum shape torch.Size([134])


In [None]:
print(attrs_sum)
print(tokenizer.convert_ids_to_tokens(inp[0][0]))

for a,t in zip(attrs_sum,tokenizer.convert_ids_to_tokens(inp[0][0])):
    print(float(a),t)

tensor([ 0.0000, -0.3182, -0.0438, -0.1753, -0.0897, -0.0580,  0.0259, -0.1445,
        -0.1029, -0.0916, -0.0454, -0.1423,  0.1039, -0.0674,  0.0176, -0.0048,
         0.1217, -0.0862,  0.0658,  0.0236, -0.0278,  0.0839,  0.0575,  0.0431,
        -0.0154,  0.0661,  0.0688, -0.0389, -0.1058, -0.0328, -0.0104,  0.0435,
         0.0192,  0.0294,  0.0168,  0.1394,  0.1167,  0.0233,  0.0221,  0.0563,
         0.1624,  0.0531,  0.1353, -0.0226,  0.0111,  0.0138,  0.1702, -0.0349,
         0.0315,  0.1795,  0.0809,  0.0606,  0.0206,  0.0673,  0.0367,  0.1343,
         0.0499,  0.0079,  0.0544,  0.0394,  0.0814, -0.0124, -0.0571,  0.0379,
        -0.0211,  0.0481, -0.0739, -0.0390, -0.0423,  0.0288, -0.0196,  0.0499,
        -0.0362, -0.0779, -0.1541, -0.0502, -0.0554, -0.0481, -0.0653, -0.0564,
         0.0145, -0.0746, -0.0930,  0.0099, -0.1322, -0.0576, -0.0344, -0.1208,
        -0.0430, -0.1076, -0.0139, -0.1037, -0.0005, -0.0699, -0.0051, -0.0017,
        -0.0696, -0.0470, -0.0332, -0.06

Damn, that seems to work!

In [None]:
import captum
from IPython.core.display import HTML, display
x=captum.attr.visualization.format_word_importances(tokenizer.convert_ids_to_tokens(inp[0][0]),attrs_sum)
HTML(x)

# Almost there...

* Let's wrap this all into a function

In [None]:
def predict_and_explain(model,text):
    model.zero_grad() #to be safe perhaps it's not needed
    inp,ref = construct_input_ref_pair(text)
    all_tokens = tokenizer.convert_ids_to_tokens(inp[0][0])

    lig = LayerIntegratedGradients(predict, model.bert.embeddings)
    prediction=predict(*inp)[0]
    prediction_cls=int(torch.argmax(prediction))
    print("Prediction:", ("negative","positive")[prediction_cls],"Weights:",prediction.tolist())
    for target,classname in enumerate(("negative","positive")):

        attrs, delta = lig.attribute(inputs=inp,
                                  baselines=ref,
                                  return_convergence_delta=True,target=target)
        attrs_sum = summarize_attributions(attrs)

        x=captum.attr.visualization.format_word_importances(all_tokens,attrs_sum)
        print("ATTRIBUTION WITH RESPECT TO",classname)
        display(HTML(x))
        print()

In [None]:
predict_and_explain(model,"When was University of Turku founded? The University of Turku (Finnish: Turun yliopisto, in Swedish: Åbo universitet, shortened UTU), located in Turku in southwestern Finland, is the third largest university in the country as measured by student enrollment, after the University of Helsinki and Tampere University. It is a multidisciplinary university with eight faculties. It was established in 1920 and also has facilities at Rauma, Pori, Kevo, and Seili. The university is a member of the Coimbra Group and the European Campus of City - Universities (EC2U).")

Prediction: negative Weights: [0.49849939346313477, -1.0527673959732056]
ATTRIBUTION WITH RESPECT TO negative



ATTRIBUTION WITH RESPECT TO positive



