In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ferret import Benchmark

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
model = AutoModelForSequenceClassification.from_pretrained("g8a9/bert-base-cased_ami18")

# Individual wise evaluation

## Explain a single instance

The fastest way to get started with *ferret* is using the Benchmark interface class.

In [4]:
bench = Benchmark(model, tokenizer)

Extracting post-hoc explanations with all the supported methods and standard parameters is as easy as:

In [5]:
explanations = bench.explain("I love your style!", target = 0)

Explainer: 100%|████████████████████████████████████| 4/4 [00:04<00:00,  1.10s/it]


In [6]:
explanations

[Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=array([ 0.        ,  0.20228141,  0.34075188, -0.15254303,  0.26609211,
        -0.03833157,  0.        ]), explainer='Partition SHAP', target=0),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=tensor([-1.3836e-04,  1.3322e-01, -2.9708e-01,  6.5463e-02, -1.0162e-01,
          3.9932e-01,  3.1668e-03]), explainer='Gradient', target=0),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=tensor([ 0.0826,  0.1765, -0.1320,  0.0780, -0.1481,  0.3466, -0.0361],
        dtype=torch.float64), explainer='Integrated Gradient', target=0),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=array([ 0.04105813,  0.18526642, -0.05382237,  0.00660302,  0.2711317 ,
        -0.28812257, -0.15399579]), explainer='LIME', ta

Let's visualize the results.

In [7]:
t = bench.show_table(explanations)
t

Token,[CLS],I,love,your,style,!,[SEP]
Partition SHAP,0.0,0.202281,0.340752,-0.152543,0.266092,-0.038332,0.0
Gradient,-0.000138,0.133215,-0.297076,0.065463,-0.101621,0.399319,0.003167
Integrated Gradient,0.08265,0.176536,-0.131993,0.077973,-0.14809,0.346637,-0.036121
LIME,0.041058,0.185266,-0.053822,0.006603,0.271132,-0.288123,-0.153996


## Evaluate explanation of a single instance

Evaluate explanations with all the supported evaluators is easy as well

In [8]:
explanation_evaluations = bench.evaluate_explanations(explanations, target = 0)

Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 20.92it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 21.05it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 21.59it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 19.95it/s]


Let's visualize the results.

In [9]:
bench.show_evaluation_table(explanation_evaluations)

Unnamed: 0,[CLS],I,love,your,style,!,[SEP],aopc_compr,aopc_suff,taucorr_loo
Partition SHAP,0.0,0.202281,0.340752,-0.152543,0.266092,-0.038332,0.0,0.248723,-0.14359,0.6
Gradient,-0.000138,0.133215,-0.297076,0.065463,-0.101621,0.399319,0.003167,-0.163683,0.254994,-0.6
Integrated Gradient,0.08265,0.176536,-0.131993,0.077973,-0.14809,0.346637,-0.036121,-0.163683,0.254994,-0.8
LIME,0.041058,0.185266,-0.053822,0.006603,0.271132,-0.288123,-0.153996,0.193204,-0.114842,0.6


Area Over the Perturbation Curve (AOPC) comprehensiveness (aopc_compr), AOPC sufficiency (aopc_suff) and correlation with leave-one-out scores (taucorr_loo) are three measures of faithfulness.

**AOPC comprehensiveness**. Comprehensiveness measures the drop in the model probability if the relevant tokens of the explanations are removed. We measure comprehensiveness via the Area Over the Perturbation Curve by progressively considering the most $k$ important tokens, with $k$ from 1 to #tokens (as default) and then averaging the result. The higher the value, the more the explainer is able to select the relevant tokens for the prediction.

**AOPC sufficiency**. Sufficiency captures if the tokens in the explanation are sufficient for the model to make the prediction. As for comprehensiveness, we use the AOPC score.

**Correlation with leave-one-out scores**. We first compute the leave-one-out scores by computing the prediction difference when one feature at the time is omitted. We then measure the Spearman correlation with the explanations.

### Plausibility

We can also specify a human rationale and evaluate plausibility

In [10]:
explanation_evaluations = bench.evaluate_explanations(explanations, target =0, human_rationale=[0, 1, 0, 0, 0], \
                                                    top_k_rationale = 1)
bench.show_evaluation_table(explanation_evaluations)

Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 21.57it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 21.71it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 21.58it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 22.04it/s]


Unnamed: 0,[CLS],I,love,your,style,!,[SEP],aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.0,0.202281,0.340752,-0.152543,0.266092,-0.038332,0.0,0.248723,-0.14359,0.6,1.0,1.0,1.0
Gradient,-0.000138,0.133215,-0.297076,0.065463,-0.101621,0.399319,0.003167,-0.163683,0.254994,-0.6,0.1,0.0,0.0
Integrated Gradient,0.08265,0.176536,-0.131993,0.077973,-0.14809,0.346637,-0.036121,-0.163683,0.254994,-0.8,0.1,0.0,0.0
LIME,0.041058,0.185266,-0.053822,0.006603,0.271132,-0.288123,-0.153996,0.193204,-0.114842,0.6,0.1,0.0,0.0


Plausibility evaluates how well the explanation agree with human rationale. We evaluate plausibility via 
Area Under the Precision Recall curve (AUPRC) (auprc_plau),  token-level f1-score (token_f1_plau) and average Intersection-Over-Union (IOU) at the token level (token_iou_plau).


**Area Under the Precision Recall curve (AUPRC)** is computed by sweeping a threshold over token scores.

Token-level f1-score and the average Intersection-Over-Unionconsider discrete rationales.
We derive a discrete rationale by taking the top-k values. K in the example is set to 1. * 

**Token-level f1-score** is the token-level F1 scores derived from the token-level precision and recall. 
**Intersection-Over-Union (IOU)** is the size of the overlap of the tokens they cover divided by the size of their union.

*When the set of human rationales for the dataset is available, K is set as the average rationale length (as in ERASER)

### Per class-measures

We can also evaluate explainers at the *class level*.

We first compute explanations for both target classes (0, 1). 

The **AOPC comprehensiveness at the class level** (aopc_class_compr) evaluates comprehensiveness across classes.
It is computed as the average AOPC comprehensiveness for each class considering only the terms with positive influence. The higher the score, the more the explanation is faithful.

In [11]:
class_explanations = []

for target in [0,1]:
    explanations = bench.explain("I love your style!", target=target)
    class_explanations.append(explanations)

Explainer: 100%|████████████████████████████████████| 4/4 [00:03<00:00,  1.01it/s]
Explainer: 100%|████████████████████████████████████| 4/4 [00:04<00:00,  1.01s/it]


In [12]:
target_class = 0

explanation_evaluations = bench.evaluate_explanations(class_explanations[target_class], \
             target = target_class, class_explanations = class_explanations)
bench.show_evaluation_table(explanation_evaluations)

Evaluator: 100%|████████████████████████████████████| 7/7 [00:00<00:00, 15.05it/s]
Evaluator: 100%|████████████████████████████████████| 7/7 [00:00<00:00, 14.99it/s]
Evaluator: 100%|████████████████████████████████████| 7/7 [00:00<00:00, 15.08it/s]
Evaluator: 100%|████████████████████████████████████| 7/7 [00:00<00:00, 14.95it/s]


Unnamed: 0,[CLS],I,love,your,style,!,[SEP],aopc_compr,aopc_suff,taucorr_loo,aopc_class_compr
Partition SHAP,0.0,0.202281,0.340752,-0.152543,0.266092,-0.038332,0.0,0.248723,-0.14359,0.6,0.187838
Gradient,-0.000138,0.133215,-0.297076,0.065463,-0.101621,0.399319,0.003167,-0.163683,0.254994,-0.6,-0.162136
Integrated Gradient,0.08265,0.176536,-0.131993,0.077973,-0.14809,0.346637,-0.036121,-0.163683,0.254994,-0.8,-0.026865
LIME,0.041058,0.185266,-0.053822,0.006603,0.271132,-0.288123,-0.153996,0.193204,-0.114842,0.6,0.168513


# Dataset wise evaluation

We can directly load a dataset with rationales

In [13]:
hatexdata = bench.load_dataset("hatexplain")

Reusing dataset hatexplain (/Users/eliana/.cache/huggingface/datasets/hatexplain/plain_text/1.0.0/df474d8d8667d89ef30649bf66e9c856ad8305bef4bc147e8e31cbdf1b8e0249)
100%|██████████████████████████████████████████████| 3/3 [00:00<00:00, 614.01it/s]


Here we have an example of text and its human rationales

In [14]:
hatexdata[2]["text"], hatexdata[2]["rationale"]

('<user> <user> why are you repeating yourself are you a little retarded',
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1])

We can generate explanation for the dataset

In [15]:
dataset_explanations = bench.generate_dataset_explanations(hatexdata, n = 20)

Explainer:   0%|                                            | 0/4 [00:00<?, ?it/s]
  0%|                                                     | 0/248 [00:00<?, ?it/s][A
 47%|███████████████████▋                      | 116/248 [00:00<00:00, 383.16it/s][A
 64%|███████████████████████████▍               | 158/248 [00:02<00:01, 53.44it/s][A
 71%|██████████████████████████████▌            | 176/248 [00:03<00:01, 41.10it/s][A
 76%|████████████████████████████████▌          | 188/248 [00:03<00:01, 35.80it/s][A
 81%|██████████████████████████████████▋        | 200/248 [00:04<00:01, 31.64it/s][A
 83%|███████████████████████████████████▋       | 206/248 [00:04<00:01, 29.92it/s][A
 85%|████████████████████████████████████▊      | 212/248 [00:05<00:01, 28.05it/s][A
 88%|█████████████████████████████████████▊     | 218/248 [00:05<00:01, 26.38it/s][A
 90%|██████████████████████████████████████▊    | 224/248 [00:05<00:00, 24.89it/s][A
 93%|███████████████████████████████████████▉   | 230/248

 73%|███████████████████████████████▌           | 182/248 [00:05<00:03, 19.82it/s][A
 76%|████████████████████████████████▌          | 188/248 [00:05<00:03, 18.61it/s][A
 78%|█████████████████████████████████▋         | 194/248 [00:06<00:03, 17.97it/s][A
 81%|██████████████████████████████████▋        | 200/248 [00:06<00:02, 17.53it/s][A
 83%|███████████████████████████████████▋       | 206/248 [00:07<00:02, 17.29it/s][A
 85%|████████████████████████████████████▊      | 212/248 [00:07<00:02, 17.20it/s][A
 88%|█████████████████████████████████████▊     | 218/248 [00:07<00:01, 16.89it/s][A
 90%|██████████████████████████████████████▊    | 224/248 [00:08<00:01, 16.50it/s][A
 93%|███████████████████████████████████████▉   | 230/248 [00:08<00:01, 16.55it/s][A
 95%|████████████████████████████████████████▉  | 236/248 [00:08<00:00, 16.52it/s][A
 98%|█████████████████████████████████████████▉ | 242/248 [00:09<00:00, 16.39it/s][A
100%|███████████████████████████████████████████| 248/

 66%|████████████████████████████▍              | 164/248 [00:03<00:02, 31.36it/s][A
 69%|█████████████████████████████▍             | 170/248 [00:04<00:02, 28.94it/s][A
 71%|██████████████████████████████▌            | 176/248 [00:04<00:02, 26.68it/s][A
 73%|███████████████████████████████▌           | 182/248 [00:04<00:02, 24.67it/s][A
 76%|████████████████████████████████▌          | 188/248 [00:05<00:02, 23.00it/s][A
 78%|█████████████████████████████████▋         | 194/248 [00:05<00:02, 21.52it/s][A
 81%|██████████████████████████████████▋        | 200/248 [00:05<00:02, 20.48it/s][A
 83%|███████████████████████████████████▋       | 206/248 [00:06<00:02, 19.49it/s][A
 85%|████████████████████████████████████▊      | 212/248 [00:06<00:01, 18.99it/s][A
 88%|█████████████████████████████████████▊     | 218/248 [00:06<00:01, 18.31it/s][A
 90%|██████████████████████████████████████▊    | 224/248 [00:07<00:01, 18.00it/s][A
 93%|███████████████████████████████████████▉   | 230/

 52%|██████████████████████▏                    | 128/248 [00:02<00:02, 45.77it/s][A
 56%|████████████████████████▎                  | 140/248 [00:02<00:02, 37.72it/s][A
 61%|██████████████████████████▎                | 152/248 [00:03<00:02, 32.70it/s][A
 64%|███████████████████████████▍               | 158/248 [00:03<00:02, 30.49it/s][A
 66%|████████████████████████████▍              | 164/248 [00:04<00:02, 28.61it/s][A
 69%|█████████████████████████████▍             | 170/248 [00:04<00:02, 26.64it/s][A
 71%|██████████████████████████████▌            | 176/248 [00:04<00:02, 25.05it/s][A
 73%|███████████████████████████████▌           | 182/248 [00:05<00:02, 23.58it/s][A
 76%|████████████████████████████████▌          | 188/248 [00:05<00:02, 22.36it/s][A
 78%|█████████████████████████████████▋         | 194/248 [00:05<00:02, 21.42it/s][A
 81%|██████████████████████████████████▋        | 200/248 [00:05<00:02, 20.96it/s][A
 83%|███████████████████████████████████▋       | 206/

And store them

In [16]:
import pathlib
import os
output_dir = os.path.join(os.getcwd(), "explanation_output", "hatexplain")
pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True) 

bench.store_dataset_explanations(dataset_explanations, os.path.join(output_dir, "dataset_explanations"))

We can then evaluate the explainers at the dataset level

In [17]:
dataset_average_evaluation_scores = bench.evaluate_dataset_explanations(dataset_explanations)

Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  8.29it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  7.53it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  7.07it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  7.64it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 12.75it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 11.60it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 11.18it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 10.42it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 10.64it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 11.80it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 11.46it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 12.04it/s]
Eval

In [18]:
dataset_average_evaluation_scores = bench.evaluate_dataset_explanations(dataset_explanations)

Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  8.77it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  7.80it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  7.41it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00,  8.20it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 12.73it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 11.80it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 11.18it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 10.80it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 12.24it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 12.42it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 11.63it/s]
Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 12.40it/s]
Eval

and visualize the evaluation results

In [19]:
bench.show_dataset_evaluation_table(dataset_average_evaluation_scores)

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.19292,-0.090307,0.288564,0.783211,0.364985,0.231186
Gradient,-0.00305,0.187018,-0.13058,0.61584,0.241805,0.146576
Integrated Gradient,0.054577,0.152678,0.014748,0.669705,0.291573,0.194533
LIME,0.183653,-0.010309,0.173862,0.687404,0.302157,0.197393


# Additional

**Interface to individual explainers**

You can also use individual explainers using an object oriented interface.

In [20]:
from ferret import SHAPExplainer, LIMEExplainer

In [21]:
exp = LIMEExplainer(model, tokenizer)
exp("hello my friend")

Explanation(text='hello my friend', tokens=['[CLS]', 'hello', 'my', 'friend', '[SEP]'], scores=[-0.03134695829922635, -0.08080184940948797, -0.06166685328207583, 0.00811304319080102, 0.011299155496635654], explainer='LIME', target=1)

In [22]:
exp = SHAPExplainer(model, tokenizer)
exp("hello my friend")

Explanation(text='hello my friend', tokens=['[CLS]', 'hello', 'my', 'friend', '[SEP]'], scores=array([ 0.        , -0.16196123, -0.09804473, -0.13852775,  0.        ]), explainer='Partition SHAP', target=1)

In [23]:
exp = SHAPExplainer(model, tokenizer)
e = exp("I love your style!", target = 0)

In [24]:
bench.show_table([e])

Token,[CLS],I,love,your,style,!,[SEP]
Partition SHAP,0.0,0.067283,0.113341,-0.050739,0.088507,-0.01275,0.0


and evaluate an individual evaluation measure

In [30]:
from ferret import AOPC_Comprehensiveness_Evaluation
from ferret.evaluators import Model

aopc_compr_eval = AOPC_Comprehensiveness_Evaluation(Model(model), tokenizer)

In [31]:
aopc_compr_eval.compute_evaluation(e, target = 0)

Evaluation(name='aopc_compr', score=0.2487231)

In [34]:
ev = bench.evaluate_explanation(e, target = 0)
bench.show_evaluation_table([ev])

Evaluator: 100%|████████████████████████████████████| 6/6 [00:00<00:00, 22.92it/s]


Unnamed: 0,[CLS],I,love,your,style,!,[SEP],aopc_compr,aopc_suff,taucorr_loo
Partition SHAP,0.0,0.067283,0.113341,-0.050739,0.088507,-0.01275,0.0,0.248723,-0.14359,0.6
