### Ferret Explainability Tutorial

This notebook is based around the tutorial provided by the package authors [here](https://github.com/g8a9/ferret). 

As with the other notebooks in this repo, we applied this to a patient safety set of fictional examples, but these steps can be easily applied to any classification task.

In [None]:
# to add a pipeline using ferret explainability

import sys

import numpy as np
import torch
from ferret import Benchmark, LIMEExplainer, SHAPExplainer
from IPython.display import display
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# add the sys path for models
sys.path.append("../")
from models.transformer_plms.hf_transformer_classifier import IncidentModel

__WARNING__ This framework will have been tested and designed to work with the AutoModelForSequenceClassification class - so make sure to load in that style of model to avoid issue

In [None]:
# set model dir for trained models
model_dir = "./model/"

# set cache dir for transformer models
cache_dir = ".cache_dir"

In [None]:
# load in model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    f"{model_dir}", cache_dir=cache_dir
)
tokenizer = AutoTokenizer.from_pretrained(f"{model_dir}")

In [None]:
model

In [None]:
tokenizer

## Explain a single instance

In [None]:
bench = Benchmark(model, tokenizer)

In [None]:
# example_text = (
# "Patient was left waiting with a very high blood pressure for longer than advised. "
# "Patient seemed very agitated by the experience"
# )
example_text = "The patient fell out of bed and broke their leg"

examples = [
    "The patient fell out of bed and broke their femur",
    "The patient fell out of bed and was helped back up by a nurse",
]

In [None]:
explanations = bench.explain(example_text, target=1)

In [None]:
explanations

In [None]:
t = bench.show_table(explanations)
t

## Evaluate explanation of a single instance

Evaluate explanations with all the supported evaluators is straightforward. Remember to specify the `target` parameter to match the one used during the explanation!

Area Over the Perturbation Curve (AOPC) Comprehensiveness (`aopc_compr`), AOPC Sufficiency (`aopc_suff`) and Correlation with Leave-One-Out scores (`taucorr_loo`) are three measures of faithfulness.

- **AOPC Comprehensiveness**. Comprehensiveness measures the drop in the model probability if the relevant tokens of the explanations are removed. We measure comprehensiveness via the Area Over the Perturbation Curve by progressively considering the most $k$ important tokens, with $k$ from 1 to #tokens (as default) and then averaging the result. The higher the value, the more the explainer is able to select the relevant tokens for the prediction.

- **AOPC Sufficiency**. Sufficiency captures if the tokens in the explanation are sufficient for the model to make the prediction. As for comprehensiveness, we use the AOPC score.

- **Correlation with Leave-One-Out scores**. We first compute the leave-one-out scores by computing the prediction difference when one feature at the time is omitted. We then measure the Spearman correlation with the explanations.

In [None]:
explanation_evaluations = bench.evaluate_explanations(explanations, target=1)

In [None]:
bench.show_evaluation_table(explanation_evaluations)

Given ground truth explanations - which are ultimately the tokens from the passage that are deemed as important to the decision in this case, we can calcualte some futher metrics. The explanations are provided as a vector similar to attention mask, with 0s for non-important tokens and 1s for important tokens. 

The bench class will (I think) expect a human rationale of the `length(tokenizer.encode(text)-2)`, essentially the number of tokens minus the bos and eos tokens.

In [None]:
tokenizer.encode(example_text)

In [None]:
len(tokenizer.encode(example_text))

### Plausability

In [None]:
explanation_evaluations = bench.evaluate_explanations(
    explanations, target=1, human_rationale=[0, 1, 0, 0], top_k_rationale=1
)
bench.show_evaluation_table(explanation_evaluations)

Plausibility evaluates how well the explanation agree with human rationale. We evaluate plausibility via 
Area Under the Precision Recall curve (AUPRC) (`auprc_plau`),  token-level f1-score (`token_f1_plau`) and average Intersection-Over-Union (`IOU`) at the token level (`token_iou_plau`).

- **Area Under the Precision Recall curve (AUPRC)** is computed by sweeping a threshold over token scores.
- **Token-level f1-score** and the **`**average Intersection-Over-Union** consider discrete rationales.
We derive a discrete rationale by taking the top-k values. K in the example is set to 1.
- **Token-level f1-score** is the token-level F1 scores derived from the token-level precision and recall. 
- **Intersection-Over-Union (IOU)** is the size of the overlap of the tokens they cover divided by the size of their union.

When the set of human rationales for the dataset is available, K is set as the average rationale length (as in ERASER)

**Interface to individual explainers**

You can also use individual explainers using an object oriented interface.

### LIME

In [None]:
lime_exp = LIMEExplainer(model, tokenizer)
lime_values = lime_exp(example_text, target=1)

In [None]:
lime_values

### SHAP

In [None]:
shap_exp = SHAPExplainer(model, tokenizer)
shap_values = shap_exp(example_text)

In [None]:
shap_values