In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ferret import Benchmark
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
model = AutoModelForSequenceClassification.from_pretrained("g8a9/bert-base-cased_ami18")

# Individual wise evaluation

## Explain a single instance

The fastest way to get started with *ferret* is using the Benchmark interface class.

In [4]:
bench = Benchmark(model, tokenizer)

Extracting post-hoc explanations with all the supported methods and standard parameters is as easy as:

In [5]:
explanations = bench.explain("I love your style!", target = 0)

Explainer:   0%|                                          | 0/6 [00:00<?, ?it/s]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
                                                                                

In [6]:
explanations

[Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=array([ 0.        ,  0.20228158,  0.34075237, -0.15254306,  0.2660927 ,
        -0.03833029,  0.        ]), explainer='Partition SHAP', target=0),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=array([ 0.04114738,  0.18552338, -0.05361292,  0.00676673,  0.27143413,
        -0.28780742, -0.15370803]), explainer='LIME', target=0),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=tensor([0.0399, 0.0678, 0.1425, 0.1307, 0.3435, 0.1509, 0.1247]), explainer='Gradient', target=0),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=tensor([-1.3832e-04,  1.3322e-01, -2.9708e-01,  6.5463e-02, -1.0162e-01,
          3.9932e-01,  3.1669e-03]), explainer='Gradient (x Input)', target=0),
 Explanation(text='I love yo

Let's visualize the results.

In [7]:
t = bench.show_table(explanations)
t

Token,I,love,your,style,!
Partition SHAP,0.2,0.34,-0.15,0.27,-0.04
LIME,0.19,-0.05,0.01,0.27,-0.29
Gradient,0.07,0.14,0.13,0.34,0.15
Gradient (x Input),0.13,-0.3,0.07,-0.1,0.4
Integrated Gradient,-0.2,-0.25,-0.08,0.11,-0.22
Integrated Gradient (x Input),0.02,0.18,0.08,0.18,0.55


## Evaluate explanation of a single instance

Evaluate explanations with all the supported evaluators is easy as well

In [8]:
explanation_evaluations = bench.evaluate_explanations(explanations, target = 0)

                                                                                

Let's visualize the results.

In [9]:
bench.show_evaluation_table(explanation_evaluations)

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo
Partition SHAP,0.25,-0.14,0.6
LIME,0.19,-0.11,0.6
Gradient,0.04,-0.04,0.2
Gradient (x Input),-0.16,0.25,-0.6
Integrated Gradient,0.26,-0.07,0.2
Integrated Gradient (x Input),-0.04,0.04,0.0


Area Over the Perturbation Curve (AOPC) comprehensiveness (aopc_compr), AOPC sufficiency (aopc_suff) and correlation with leave-one-out scores (taucorr_loo) are three measures of faithfulness.

**AOPC comprehensiveness**. Comprehensiveness measures the drop in the model probability if the relevant tokens of the explanations are removed. We measure comprehensiveness via the Area Over the Perturbation Curve by progressively considering the most $k$ important tokens, with $k$ from 1 to #tokens (as default) and then averaging the result. The higher the value, the more the explainer is able to select the relevant tokens for the prediction.

**AOPC sufficiency**. Sufficiency captures if the tokens in the explanation are sufficient for the model to make the prediction. As for comprehensiveness, we use the AOPC score.

**Correlation with leave-one-out scores**. We first compute the leave-one-out scores by computing the prediction difference when one feature at the time is omitted. We then measure the Spearman correlation with the explanations.

### Plausibility

We can also specify a human rationale and evaluate plausibility

In [10]:
explanation_evaluations = bench.evaluate_explanations(explanations, target =0, human_rationale=[0, 1, 0, 0, 0], \
                                                    top_k_rationale = 1)
bench.show_evaluation_table(explanation_evaluations)

                                                                                

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.25,-0.14,0.6,1.0,1.0,1.0
LIME,0.19,-0.11,0.6,0.1,0.0,0.0
Gradient,0.04,-0.04,0.2,0.17,0.0,0.0
Gradient (x Input),-0.16,0.25,-0.6,0.1,0.0,0.0
Integrated Gradient,0.26,-0.07,0.2,0.1,0.0,0.0
Integrated Gradient (x Input),-0.04,0.04,0.0,0.17,0.0,0.0


Plausibility evaluates how well the explanation agree with human rationale. We evaluate plausibility via 
Area Under the Precision Recall curve (AUPRC) (auprc_plau),  token-level f1-score (token_f1_plau) and average Intersection-Over-Union (IOU) at the token level (token_iou_plau).


**Area Under the Precision Recall curve (AUPRC)** is computed by sweeping a threshold over token scores.

Token-level f1-score and the average Intersection-Over-Unionconsider discrete rationales.
We derive a discrete rationale by taking the top-k values. K in the example is set to 1. * 

**Token-level f1-score** is the token-level F1 scores derived from the token-level precision and recall. 
**Intersection-Over-Union (IOU)** is the size of the overlap of the tokens they cover divided by the size of their union.

*When the set of human rationales for the dataset is available, K is set as the average rationale length (as in ERASER)

# Sample wise evaluation

We can directly load a dataset with rationales

In [11]:
hatexdata = bench.load_dataset("hatexplain")

Reusing dataset hatexplain (/Users/eliana/.cache/huggingface/datasets/hatexplain/plain_text/1.0.0/df474d8d8667d89ef30649bf66e9c856ad8305bef4bc147e8e31cbdf1b8e0249)
100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 333.04it/s]


Here we have an example of text and its human rationales

In [12]:
hatexdata[2]["text"], hatexdata[2]["rationale"]

('<user> <user> why are you repeating yourself are you a little retarded',
 [[], [], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]])

We can compute evaluate explanations for a set of the samples of the dataset.

As a default, explanations and their evaluation is computed w.r.t. the predicted class. We can otherwise specify the target class via the parameter 'target'

In [13]:
# Compute and average evaluation scores one of the supported dataset
samples = range(5)
sample_evaluations =  bench.evaluate_samples(hatexdata, samples)

explain:   0%|                                            | 0/5 [00:00<?, ?it/s]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  20%|███████                            | 1/5 [01:40<06:41, 100.44s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  40%|██████████████▍                     | 2/5 [02:30<03:32, 70.70s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  60%|█████████████████████▌              | 3/5 [03:15<01:58, 59.27s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
explain:  80%|████████████████████████████▊       | 4/5 [04:54<01:14, 74.69s/it]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
                                                                                

and visualize the evaluation results

In [14]:
bench.show_samples_evaluation_table(sample_evaluations)

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo,auprc_plau,token_f1_plau,token_iou_plau
Partition SHAP,0.216063,-0.175298,0.324151,0.465577,0.142857,0.076923
LIME,0.088152,-0.163018,0.221365,0.580364,0.285714,0.166667
Gradient,-0.071673,0.043794,-0.042011,0.739728,0.357143,0.217391
Gradient (x Input),-0.118201,0.005243,-0.120899,0.623282,0.285714,0.166667
Integrated Gradient,-0.039807,-0.057299,-0.048533,0.560043,0.142857,0.076923
Integrated Gradient (x Input),-0.037819,-0.012557,-0.028941,0.733529,0.214286,0.12


# Additional

**Interface to individual explainers**

You can also use individual explainers using an object oriented interface.

In [15]:
from ferret import SHAPExplainer, LIMEExplainer

In [16]:
exp = LIMEExplainer(model, tokenizer)
exp("hello my friend")

Explanation(text='hello my friend', tokens=['[CLS]', 'hello', 'my', 'friend', '[SEP]'], scores=[-0.03135305106349698, -0.08081952606570875, -0.06169874791393701, 0.008084332170415784, 0.011276823282482272], explainer='LIME', target=1)

In [17]:
exp = SHAPExplainer(model, tokenizer)
exp("hello my friend")

`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality


Explanation(text='hello my friend', tokens=['[CLS]', 'hello', 'my', 'friend', '[SEP]'], scores=array([ 0.        , -0.16196124, -0.09804487, -0.13852772,  0.        ]), explainer='Partition SHAP', target=1)

In [18]:
exp = SHAPExplainer(model, tokenizer)
e = exp("I love your style!", target = 0)

In [19]:
bench.show_table([e])

Token,I,love,your,style,!
Partition SHAP,0.07,0.11,-0.05,0.09,-0.01


and evaluate an individual evaluation measure

In [20]:
from ferret import AOPC_Comprehensiveness_Evaluation
from ferret.evaluators import Model

aopc_compr_eval = AOPC_Comprehensiveness_Evaluation(Model(model), tokenizer)

In [21]:
aopc_compr_eval.compute_evaluation(e, target = 0)

Evaluation(name='aopc_compr', score=0.24872339)

In [22]:
ev = bench.evaluate_explanation(e, target = 0)
bench.show_evaluation_table([ev])

                                                                                

Unnamed: 0,aopc_compr,aopc_suff,taucorr_loo
Partition SHAP,0.25,-0.14,0.6
