# Tutorial: Natural Language Inference

In this short tutorial, we show how to use *ferret* to use and evaluate different gradient based approaches in the task of Natural Language Inference.

We will use`distilbert-base-uncased-finetuned-sst-2-english` as model checkpoint.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import torch
from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer

from ferret import (
    Benchmark,
    GradientExplainer,
    IntegratedGradientExplainer,
    LIMEExplainer,
    SHAPExplainer,
)

device = "cuda:0" if torch.cuda.is_available() else "cpu"

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."
sample = (premise, hypothesis)

In [4]:
model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)

tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1.28k/1.28k [00:00<00:00, 2.26MB/s]
spm.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2.46M/2.46M [00:00<00:00, 25.1MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 8.66M/8.66M [00:00<00:00, 13.7MB/s]
added_tokens.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 23.0/23.0 [00:00<00:00, 46.2kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 286/286 [00:00<00:00, 714kB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1.09k/1.09k [00:00<00:00, 2.32MB/s]
model.safetensors: 100%|████████████████████████████████████████████████████

In [5]:
ig = IntegratedGradientExplainer(model, tokenizer, multiply_by_inputs=True)
g = GradientExplainer(model, tokenizer, multiply_by_inputs=True)
l = LIMEExplainer(model, tokenizer)

No helper provided. Using default 'text-classification' helper.


In [6]:
bench = Benchmark(model, tokenizer, task_name="nli", explainers=[ig, g, l])

Overriding helper for explainer <ferret.explainers.gradient.IntegratedGradientExplainer object at 0x7fb838729db0>
Overriding helper for explainer <ferret.explainers.gradient.GradientExplainer object at 0x7fb836dd4940>
Overriding helper for explainer <ferret.explainers.lime.LIMEExplainer object at 0x7fb836dd6770>


Let's print the attribute `targets` to see what are the available targets classes.
You can then use them as integers (e.g., `2`) or in textual form (e.g., `contradiction`).

In [7]:
bench.targets

{0: 'entailment', 1: 'neutral', 2: 'contradiction'}

Otherwise, you can predict the model's output and print them to know the available options. For NLI, standard classes are `entailement`, `neutral`, and `contradiction`. However, please note that different developers or model designers might use different names or number of classes.

In [8]:
# get the prediction from our model
bench.score(sample)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'entailment': 0.06558243185281754,
 'neutral': 0.17329974472522736,
 'contradiction': 0.7611178755760193}

Let's now explain the prediction for the class *contradiction*, which is the model's most likely output---and, intuitively, the correct one.

In [9]:
# explain the contradiction class
exp = bench.explain(sample, target="contradiction")


Batch:   0%|                                                                                                                      | 0/85 [00:00<?, ?it/s][A
Batch:   6%|██████▍                                                                                                       | 5/85 [00:00<00:01, 49.32it/s][A
Batch:  13%|██████████████                                                                                               | 11/85 [00:00<00:01, 51.21it/s][A
Batch:  20%|█████████████████████▊                                                                                       | 17/85 [00:00<00:01, 51.85it/s][A
Batch:  27%|█████████████████████████████▍                                                                               | 23/85 [00:00<00:01, 47.87it/s][A
Batch:  34%|█████████████████████████████████████▏                                                                       | 29/85 [00:00<00:01, 49.05it/s][A
Batch:  40%|███████████████████████████████████████████▌ 

In [10]:
# show explanations
bench.show_table(exp)

Unnamed: 0,▁I_0,▁first,▁thought_0,▁that,▁I_1,▁liked,▁the,▁movie_0,",",▁but,▁upon,▁second,▁thought_1,▁it,▁was_0,▁actually,▁disappointing,._0,[SEP],▁The,▁movie_1,▁was_1,▁good,._1
Integrated Gradient (x Input),-0.02,0.04,-0.0,-0.0,0.01,-0.13,0.0,0.01,0.0,0.03,0.03,0.15,0.05,-0.05,-0.06,-0.06,0.02,0.05,-0.04,-0.0,-0.02,-0.08,-0.09,-0.03
Gradient (x Input),0.04,0.04,0.06,0.01,0.01,0.04,-0.02,-0.06,0.02,0.03,0.01,0.0,-0.0,-0.02,-0.03,0.03,0.05,-0.08,0.04,-0.01,-0.04,-0.03,0.03,0.01
LIME,0.02,-0.01,0.05,0.05,-0.0,-0.06,-0.03,0.0,0.05,0.03,0.05,-0.03,-0.04,0.04,-0.02,0.0,0.2,0.01,-0.05,0.01,-0.06,-0.07,0.11,0.02


In [12]:
# evaluate explanations and show faithfulness metrics
bench.show_evaluation_table(bench.evaluate_explanations(exp, target="contradiction"))

                                                                                                                                                         

Unnamed: 0_level_0,aopc_compr,aopc_suff,taucorr_loo
Explainer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Integrated Gradient (x Input),0.43,0.54,-0.03
Gradient (x Input),0.73,0.47,0.39
LIME,0.6,-0.23,0.53
