# Tutorial: Sentiment Classification

In this short tutorial, we show how to use *ferret* to use and evaluate post-hoc approaches in the task of Sentiment Classification.

We will use `distilbert-base-uncased-finetuned-sst-2-english` as model checkpoint.

In [1]:
%load_ext autoreload
%autoreload 2

In [3]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

from ferret import (
    Benchmark,
    GradientExplainer,
    IntegratedGradientExplainer,
    LIMEExplainer,
    SHAPExplainer,
)

device = (
    "cuda:0"
    if torch.cuda.is_available()
    else "cpu"
)
device

'cuda:0'

In [4]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)

tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 48.0/48.0 [00:00<00:00, 113kB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 629/629 [00:00<00:00, 1.59MB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 998kB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 268M/268M [00:03<00:00, 77.9MB/s]


In [5]:
ig = IntegratedGradientExplainer(model, tokenizer, multiply_by_inputs=True)
g = GradientExplainer(model, tokenizer, multiply_by_inputs=True)
s = SHAPExplainer(model, tokenizer)
l = LIMEExplainer(model, tokenizer)

No helper provided. Using default 'text-classification' helper.


In [6]:
bench = Benchmark(model, tokenizer, explainers=[ig, g, s, l])

Overriding helper for explainer <ferret.explainers.gradient.IntegratedGradientExplainer object at 0x7ff5ecf96800>
Overriding helper for explainer <ferret.explainers.gradient.GradientExplainer object at 0x7ff5ee8e6470>
Overriding helper for explainer <ferret.explainers.shap.SHAPExplainer object at 0x7ff374847ee0>
Overriding helper for explainer <ferret.explainers.lime.LIMEExplainer object at 0x7ff3735a1c60>


In [7]:
text = "You are the sweatest person, I wish I had known you before."

In [8]:
# get the prediction from our model
bench.score(text)

{'NEGATIVE': 0.006744264159351587, 'POSITIVE': 0.9932557344436646}

In [9]:
# explain the positive class
exp = bench.explain(text, target="POSITIVE", normalize_scores=False)


Batch:   0%|                                                                                                                      | 0/37 [00:00<?, ?it/s][A
Batch:  38%|████████████████████████████████████████▊                                                                   | 14/37 [00:00<00:00, 130.28it/s][A
Batch:  76%|█████████████████████████████████████████████████████████████████████████████████▋                          | 28/37 [00:00<00:00, 121.02it/s][A
                                                                                                                                                         [A

In [10]:
# show token level explanations
bench.show_table(exp)

Unnamed: 0,you_0,are,the,sweat,##est,person,",",i_0,wish,i_1,had,known,you_1,before,.
Integrated Gradient (x Input),-0.15,0.77,0.0,0.15,1.75,0.4,-0.12,0.19,-0.29,0.33,0.19,-0.0,0.52,-0.32,0.02
Gradient (x Input),-0.01,-0.11,-0.32,-0.06,0.03,-0.13,-0.02,-0.61,-0.71,-0.37,0.15,0.15,0.49,0.13,-0.0
Partition SHAP,0.46,0.26,0.02,0.0,0.05,0.05,-0.09,-0.03,-0.1,-0.02,-0.0,-0.0,0.05,-0.07,-0.0
LIME,0.46,0.25,0.05,0.2,0.12,0.08,-0.06,0.01,-0.14,-0.01,0.04,-0.05,0.03,-0.0,0.05


In [11]:
# evaluate the explanations with all the supported faithfulness and plausibility metrics
evaluations = bench.evaluate_explanations(exp, target="POSITIVE")

                                                                                                                                                         

In [12]:
# evaluate explanations and show faithfulness metrics
bench.show_evaluation_table(evaluations)

Unnamed: 0_level_0,aopc_compr,aopc_suff,taucorr_loo
Explainer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Integrated Gradient (x Input),0.44,0.11,0.03
Gradient (x Input),-0.0,0.34,-0.2
Partition SHAP,0.88,-0.0,0.37
LIME,0.91,0.01,0.5
