# Tutorial: Named Entity Recognition

In this short tutorial, we show how to use *ferret* to use and evaluate post-hoc approaches in the task of Named Entity Recognition.

We will use `Babelscape/wikineural-multilingual-ner` as model checkpoint.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer

from ferret import (
    Benchmark,
    GradientExplainer,
    IntegratedGradientExplainer,
    LIMEExplainer,
    SHAPExplainer,
)

device = (
    "cuda:0"
    if torch.cuda.is_available()
    else "cpu"
)
device

'cpu'

In [3]:
model_name = "Babelscape/wikineural-multilingual-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name).to(device)

In [4]:
ig = IntegratedGradientExplainer(model, tokenizer, multiply_by_inputs=True)
g = GradientExplainer(model, tokenizer, multiply_by_inputs=True)
s = SHAPExplainer(model, tokenizer)
l = LIMEExplainer(model, tokenizer)

No helper provided. Using default 'text-classification' helper.


In [5]:
bench = Benchmark(model, tokenizer, explainers=[ig, g, s, l], task_name="ner")

Overriding helper for explainer <ferret.explainers.gradient.IntegratedGradientExplainer object at 0x104073a30>
Overriding helper for explainer <ferret.explainers.gradient.GradientExplainer object at 0x104072b90>
Overriding helper for explainer <ferret.explainers.shap.SHAPExplainer object at 0x104073c40>
Overriding helper for explainer <ferret.explainers.lime.LIMEExplainer object at 0x28bb82fb0>


In [6]:
example = "My name is John and I live in New York"

In [7]:
bench.score(example, return_dict=True)

{0: ('[CLS]',
  {'O': 0.9994916915893555,
   'B-PER': 2.6597157557262108e-05,
   'I-PER': 0.00017279450548812747,
   'B-ORG': 1.2901900845463388e-05,
   'I-ORG': 6.348708848236129e-05,
   'B-LOC': 9.968533049686812e-06,
   'I-LOC': 0.00011104693840024993,
   'B-MISC': 1.612466985534411e-05,
   'I-MISC': 9.544858039589599e-05}),
 1: ('My',
  {'O': 0.9999566078186035,
   'B-PER': 8.589893695898354e-06,
   'I-PER': 5.9028357100032736e-06,
   'B-ORG': 3.8377056625904515e-06,
   'I-ORG': 2.641122591739986e-06,
   'B-LOC': 3.42381417794968e-06,
   'I-LOC': 3.109362978648278e-06,
   'B-MISC': 7.962959898577537e-06,
   'I-MISC': 7.821053259249311e-06}),
 2: ('name',
  {'O': 0.9999486207962036,
   'B-PER': 2.657777713466203e-06,
   'I-PER': 8.545751370547805e-06,
   'B-ORG': 1.2938110103277722e-06,
   'I-ORG': 6.107018180045998e-06,
   'B-LOC': 8.132710149766353e-07,
   'I-LOC': 6.332852080959128e-06,
   'B-MISC': 2.279778982483549e-06,
   'I-MISC': 2.3321948901866563e-05}),
 3: ('is',
  {'O': 

In [8]:
exp = bench.explain(example, target="I-LOC", target_token="York")

Explainer:   0%|          | 0/4 [00:00<?, ?it/s]

Batch:   0%|          | 0/18 [00:00<?, ?it/s]

In [9]:
exp

[Explanation(text=['My name is John and I live in New York'], tokens=['[CLS]', 'My', 'name', 'is', 'John', 'and', 'I', 'live', 'in', 'New', 'York', '[SEP]'], scores=array([ 0.        ,  0.04970457,  0.54012624,  0.4566606 , -0.37592517,
         0.25174783, -0.42135799,  0.3426674 ,  1.91907408,  4.45603646,
         4.13728231,  0.        ]), explainer='Integrated Gradient (x Input)', target_pos_idx=6, helper_type='token-classification', target_token_pos_idx=10, target='I-LOC', target_token='York'),
 Explanation(text=['My name is John and I live in New York'], tokens=['[CLS]', 'My', 'name', 'is', 'John', 'and', 'I', 'live', 'in', 'New', 'York', '[SEP]'], scores=array([ 0.00748804, -0.00539828,  0.01949534, -0.02977793,  0.00305857,
        -0.05528358,  0.03068757,  0.06589644, -0.07422812, -0.04219029,
        -0.04760945, -0.01724538], dtype=float32), explainer='Gradient (x Input)', target_pos_idx=6, helper_type='token-classification', target_token_pos_idx=10, target='I-LOC', target

In [10]:
bench.show_table(exp)

Unnamed: 0,[CLS],My,name,is,John,and,I,live,in,New,York,[SEP]
Integrated Gradient (x Input),0.0,0.05,0.54,0.46,-0.38,0.25,-0.42,0.34,1.92,4.46,4.14,0.0
Gradient (x Input),0.01,-0.01,0.02,-0.03,0.0,-0.06,0.03,0.07,-0.07,-0.04,-0.05,-0.02
Partition SHAP,0.0,0.0,0.0,0.0,-0.0,0.0,-0.07,0.33,0.28,0.39,0.07,0.0
LIME,0.0,0.1,0.07,0.01,0.03,-0.0,-0.03,0.08,0.16,0.28,0.35,0.0


In [11]:
evaluations = bench.evaluate_explanations(exp)

Explanation eval:   0%|          | 0/4 [00:00<?, ?it/s]

NER does not support token removal. 'remove_tokens' set to False
NER does not support token removal. 'remove_tokens' set to False


In [12]:
bench.show_evaluation_table(evaluations)

Unnamed: 0_level_0,aopc_compr,aopc_suff,taucorr_loo
Explainer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Integrated Gradient (x Input),1.0,1.0,-0.32
Gradient (x Input),0.99,1.0,-0.12
Partition SHAP,1.0,1.0,-0.44
LIME,1.0,1.0,-0.29
