<a href="https://colab.research.google.com/github/imusicmash/explainability/blob/main/nlp_transformers_shap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# natural language explainability with SHAP
# al nevarez

## Natural language example (transformers)

SHAP has specific support for natural language models like those in the Hugging Face transformers library. By adding coalitional rules to traditional Shapley values we can form games that explain large modern NLP model using very few function evaluations. Using this functionality is as simple as passing a supported transformers pipeline to SHAP:



In [1]:
!pip install shap



In [2]:
import transformers
import shap

In [3]:
# load a transformers pipeline model
model = transformers.pipeline('sentiment-analysis', return_all_scores=True)

# explain the model on two sample inputs
explainer = shap.Explainer(model)
shap_values = explainer(["What a great movie! ...if you have no taste."])

# visualize the first prediction's explanation for the POSITIVE output class
shap.plots.text(shap_values[0, :, "POSITIVE"])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


  0%|          | 0/240 [00:00<?, ?it/s]

PartitionExplainer explainer: 2it [00:30, 30.45s/it]               


In [4]:
shap_values

.values =
array([[[ 0.        ,  0.        ],
        [ 0.04310006, -0.04310005],
        [-0.12085254,  0.12085254],
        [-0.740122  ,  0.74012197],
        [-0.00951404,  0.00951403],
        [ 0.04705497, -0.04705495],
        [ 0.009709  , -0.009709  ],
        [ 0.02614682, -0.02614681],
        [ 0.02606418, -0.02606419],
        [-0.07404667,  0.07404667],
        [-0.1196875 ,  0.1196875 ],
        [-0.00950683,  0.00950682],
        [ 0.27378987, -0.27378987],
        [ 0.03514242, -0.03514243],
        [ 0.02919792, -0.02919793],
        [ 0.        ,  0.        ]]])

.base_values =
array([[0.58367169, 0.41632834]])

.data =
(array(['', 'What ', 'a ', 'great ', 'movie', '! ', '.', '.', '.', 'if ',
       'you ', 'have ', 'no ', 'taste', '.', ''], dtype=object),)

In [6]:
shap_values.shape

(1, 16, 2)

In [7]:
# prompt: what are the unique values for the 3rd component of shap_values?

# Access the unique values for the 3rd component (assuming it's the output class labels)
unique_labels = set()
for i in range(shap_values.shape[0]):
    for j in range(shap_values.shape[1]):
        unique_labels.add(shap_values[i,j].data)

print("Unique labels:", unique_labels)

Unique labels: {'', '! ', 'What ', 'great ', 'taste', 'movie', 'no ', 'you ', '.', 'have ', 'a ', 'if '}


In [5]:
shap_values = explainer(["What a great movie! ...if you have no taste."])

# visualize the first prediction's explanation for the POSITIVE output class
shap.plots.text(shap_values[0, :, "NEGATIVE"])

  0%|          | 0/240 [00:00<?, ?it/s]

PartitionExplainer explainer: 2it [00:27, 27.38s/it]               
