# SHAP

A positive SHAP value for a token means that this token increases the likelihood of the positive class, while a negative SHAP value means it decreases that likelihood.

In [1]:
import os
import shap
import transformers
import torch

!export "CUDA_VISIBLE_DEVICES"=3 jupyter notebook

device = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu')
device

Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)


device(type='cuda', index=3)

In [2]:
# load fine-tuned model
from transformers import AutoTokenizer, AutoModelForSequenceClassification

PROJECT_DIR = os.path.abspath(os.path.join(os.getcwd(), os.pardir))
MODELS_DIR = os.path.join(PROJECT_DIR, "classification/models")

path_to_model = os.path.join(MODELS_DIR, "distilbert-base-uncased_13091207")

model = AutoModelForSequenceClassification.from_pretrained(path_to_model)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)

model.to(device)
model.config.label2id

{'LABEL_0': 0, 'LABEL_1': 1}

In [3]:
from transformers import TextClassificationPipeline

pipe = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    top_k=None, # get confidence scores for predictions
)
pipe.device = device

# `return_all_scores` is now deprecated,  if want a similar funcionality use `top_k=None` instead of `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.

In [4]:
sample_text = "I am in Finland"
sample_text1 = "seahorses are cool"

from_train = "When someone is learning about science, they are likely to become more educated about scientific concepts and principles. This can lead to excitement as they discover new ideas and understand how the world works. They may also conduct experiments to test their understanding and gain hands-on experience. Accidents are possible during experiments, but they are not a necessary or desirable outcome of learning about science. Loss of interest is also not a typical result of learning about science, as many people find the subject fascinating and engaging. Therefore, the best answer is E. become educated." # gold label is true

from_test = "First, we need to identify the geographic feature mentioned in the question. The only option that fits this description is 'river delta,' which refers to the area where a river meets the ocean and deposits sediment, creating a triangular-shaped landform. \n\nNext, we need to consider which region of the U.S. is known for having many swamps. The answer is 'southern U.S.,' which includes states like Louisiana, Florida, and Georgia. \n\nPutting these two pieces of information together, we can conclude that the correct answer is A. river delta, as many swamps in the southern U.S. are located in the deltas of major rivers like the Mississippi and the Florida Everglades." # gold is true

from_test2 = "First, we know that Sam didn't like the people he met while traveling. This means he wants to distance himself from them. \n\nOption A, coming home, is a possibility. If Sam is really unhappy with the people he's met, he may decide to cut his trip short and return home. \n\nOption B, taking an airplane, could also work. Sam could fly to a different location and start fresh with new people. \n\nOption C, learning from each other, doesn't really address the issue of Sam not liking the people he's met. It's possible that he could learn something from them, but it doesn't solve the problem of him wanting to get away from them. \n\nOption D, going out, could be a way for Sam to meet new people and potentially find some that he likes better. However, it doesn't necessarily mean he'll be able to completely distance himself from the people he didn't like. \n\nOption E, making art, doesn't really address the issue either. While it could be a way for Sam to express himself and potentially meet new people, it doesn't solve the problem of him not liking the people he's currently with. \n\nBased on these options, the best answer would be either A or B, depending on how much Sam wants to continue traveling. If he's really unhappy, he may choose to come home. If he still wants to travel, he could take an airplane to a new location and start fresh." # gold is false

texts = [from_train, from_test, from_test2]
samples = [sample_text, sample_text1]

In [5]:
explainer = shap.Explainer(pipe, seed=1)

In [6]:
shap_values = explainer(samples)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


In [7]:
len(shap_values)

2

In [8]:
type(shap_values), type(shap_values[0])

(shap._explanation.Explanation, shap._explanation.Explanation)

#### Shap values

Shap values express feature importnaces. Here features == tokens

Call to explainer returns an Explanation object with attributes:
- values: 2D array where each row corresponds to a token in your text and each column corresponds to a class (value at each position indicates the contribution of that token to the prediction of that class). A positive value means the token increases the likelihood of that class, while a negative value means it decreases it.
- base values: array where each element is the base prediction for each class (the prediction that would be made without considering any of the tokens in your text).
- data: tokens

In [9]:
shap_values

.values =
array([array([[ 0.        ,  0.        ],
              [-0.05176722,  0.05176717],
              [-0.0800604 ,  0.0800604 ],
              [ 0.01499557, -0.01499558],
              [-0.00300445,  0.00300446],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [ 0.0450623 , -0.0450623 ],
              [-0.01828007,  0.01828008],
              [-0.04273662,  0.04273665],
              [-0.01962465,  0.01962464],
              [-0.07043307,  0.0704331 ],
              [ 0.        ,  0.        ]])], dtype=object)

.base_values =
array([[0.26197895, 0.73802108],
       [0.25261274, 0.74738723]])

.data =
(array(['', 'I ', 'am ', 'in ', 'Finland', ''], dtype=object), array(['', 'sea', 'horse', 's ', 'are ', 'cool', ''], dtype=object))

In [13]:
import pickle as pkl

SPLITS_DIR = os.path.join(PROJECT_DIR, "classification/split_datasets/coqa")

with open(os.path.join(SPLITS_DIR, "shap_values.pkl"), "wb") as f:
    pkl.dump(shap_values, f)

In [14]:
# load shap values - sanity check
with open(os.path.join(SPLITS_DIR, "shap_values.pkl"), "rb") as f:
    shap_values = pkl.load(f)

#### Plots

Just trying some plots..

In [10]:
shap.plots.text(shap_values[:,:,"LABEL_0"])

Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.


In [18]:
def score_and_visualize(text):
  prediction = pipe([text])
  print(prediction[0])

  explainer = shap.Explainer(pipe)
  shap_values = explainer([text])

  shap.plots.text(shap_values)

In [19]:
score_and_visualize(from_train)

[{'label': 'LABEL_1', 'score': 0.7286875247955322}, {'label': 'LABEL_0', 'score': 0.2713124454021454}]
