In [1]:
%load_ext autoreload
%autoreload 2

In [12]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from ferret import Benchmark

In [3]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
model = AutoModelForSequenceClassification.from_pretrained("g8a9/bert-base-cased_ami18")

The fastest way to get started with *ferret* is using the Benchmark interface class.

In [5]:
bench = Benchmark(model, tokenizer)

Extracting post-hoc explanations with all the supported methods and standard parameters is as easy as:

In [6]:
explanations = bench.explain("I love your style!")

Explainer:   0%|          | 0/4 [00:00<?, ?it/s]`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality
Partition explainer: 2it [00:14, 14.81s/it]               
Explainer: 100%|██████████| 4/4 [00:20<00:00,  5.11s/it]


In [8]:
explanations

[Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=array([ 0.        , -0.20228152, -0.34075298,  0.15254336, -0.26609306,
         0.03832909,  0.        ]), explainer='Partition SHAP'),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=tensor([ 0.0011, -0.1456,  0.2927, -0.0879,  0.0727, -0.3694, -0.0306]), explainer='Gradient'),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=tensor([ 0.1153, -0.1675,  0.0387, -0.2461,  0.1325, -0.2262,  0.0737],
        dtype=torch.float64), explainer='Integrated Gradient'),
 Explanation(text='I love your style!', tokens=['[CLS]', 'I', 'love', 'your', 'style', '!', '[SEP]'], scores=array([-0.04105946, -0.18526636,  0.05381994, -0.00660401, -0.2711323 ,
         0.28812219,  0.15399574]), explainer='LIME')]

Let's visualize the results.

In [9]:
t = bench.show_table(explanations)
t

Token,[CLS],I,love,your,style,!,[SEP]
Partition SHAP,0.0,-0.202282,-0.340753,0.152543,-0.266093,0.038329,0.0
Gradient,0.001106,-0.145644,0.292689,-0.08787,0.072717,-0.369399,-0.030575
Integrated Gradient,0.11534,-0.167506,0.038743,-0.246091,0.132474,-0.226191,0.073656
LIME,-0.041059,-0.185266,0.05382,-0.006604,-0.271132,0.288122,0.153996


**Interface to individual explainers**

You can also use individual explainers using an object oriented interface.

In [13]:
from ferret import SHAPExplainer, LIMEExplainer

In [14]:
exp = LIMEExplainer(model, tokenizer)
exp("hello my friend")

Explanation(text='hello my friend', tokens=['[CLS]', 'hello', 'my', 'friend', '[SEP]'], scores=[-0.03134701957661651, -0.08080190792106838, -0.06166692253270579, 0.008113092867286986, 0.0112991057803243], explainer='LIME')

In [15]:
exp = SHAPExplainer(model, tokenizer)
exp("hello my friend")

`return_all_scores` is now deprecated, use `top_k=1` if you want similar functionnality


Explanation(text='hello my friend', tokens=['[CLS]', 'hello', 'my', 'friend', '[SEP]'], scores=array([ 0.        , -0.16196121, -0.09804483, -0.13852778,  0.        ]), explainer='Partition SHAP')