# Tutorial: Sentiment Classification

In this short tutorial, we show how to use *ferret* to use and evaluate post-hoc approaches in the task of Sentiment Classification.

We will use `distilbert-base-uncased-finetuned-sst-2-english` as model checkpoint.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

from ferret import (
    Benchmark,
    GradientExplainer,
    IntegratedGradientExplainer,
    LIMEExplainer,
    SHAPExplainer,
)

device = (
    "cuda:0"
    if torch.cuda.is_available()
    else "cpu"
)
device

'cpu'

In [3]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)

In [4]:
ig = IntegratedGradientExplainer(model, tokenizer, multiply_by_inputs=True)
g = GradientExplainer(model, tokenizer, multiply_by_inputs=True)
s = SHAPExplainer(model, tokenizer)
l = LIMEExplainer(model, tokenizer)

No helper provided. Using default 'text-classification' helper.


In [5]:
bench = Benchmark(model, tokenizer, explainers=[ig, g, s, l])

Overriding helper for explainer <ferret.explainers.gradient.IntegratedGradientExplainer object at 0x106bfba60>
Overriding helper for explainer <ferret.explainers.gradient.GradientExplainer object at 0x106bf92d0>
Overriding helper for explainer <ferret.explainers.shap.SHAPExplainer object at 0x106bfbd30>
Overriding helper for explainer <ferret.explainers.lime.LIMEExplainer object at 0x28b8da560>


In [6]:
text = "You are the sweatest person, I wish I had known you before."

In [7]:
# get the prediction from our model
bench.score(text)

{'NEGATIVE': 0.00674427067860961, 'POSITIVE': 0.9932557344436646}

In [8]:
# explain the positive class
exp = bench.explain(text, target="POSITIVE", normalize_scores=False)

Explainer:   0%|          | 0/4 [00:00<?, ?it/s]

Batch:   0%|          | 0/37 [00:00<?, ?it/s]

In [9]:
# show token level explanations
bench.show_table(exp)

Unnamed: 0,[CLS],you_0,are,the,sweat,##est,person,",",i_0,wish,i_1,had,known,you_1,before,.,[SEP]
Integrated Gradient (x Input),0.0,-0.15,0.77,0.0,0.15,1.75,0.4,-0.12,0.19,-0.29,0.33,0.19,-0.0,0.52,-0.32,0.02,0.0
Gradient (x Input),-0.08,-0.01,-0.11,-0.32,-0.06,0.03,-0.13,-0.02,-0.61,-0.71,-0.37,0.15,0.15,0.49,0.13,-0.0,-0.04
Partition SHAP,-0.0,0.46,0.26,0.02,0.0,0.05,0.05,-0.09,-0.03,-0.1,-0.02,-0.0,-0.0,0.05,-0.07,-0.0,0.0
LIME,0.0,0.44,0.23,0.13,0.06,0.13,0.03,0.01,0.07,-0.19,-0.02,-0.01,0.1,0.06,-0.02,0.03,0.0


In [10]:
# evaluate the explanations with all the supported faithfulness and plausibility metrics
evaluations = bench.evaluate_explanations(exp, target="POSITIVE")

Explanation eval:   0%|          | 0/4 [00:00<?, ?it/s]

In [11]:
# evaluate explanations and show faithfulness metrics
bench.show_evaluation_table(evaluations)

Unnamed: 0_level_0,aopc_compr,aopc_suff,taucorr_loo
Explainer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Integrated Gradient (x Input),0.22,0.13,0.1
Gradient (x Input),-0.0,0.22,-0.19
Partition SHAP,0.8,-0.0,0.38
LIME,0.66,-0.0,0.61
