# Explaining a language model for sentiment analysis

This notebook shows how we can use `shapiq` to explain the predictions of a language sentiment analysis model. For that, we will create a custom *game* that will be used for the explanation. The benchmark game resulting from this tutorial is available as `shapiq.games.SentimentClassificationGame`.

First, we need to install the required packages next to `shapiq`. We will use a language model from the `transformers` library; specifically relying on `torch`.

In [1]:
# Install the required packages
!pip install transformers torch

zsh:1: command not found: pip


In [2]:
# Import the required libraries
import numpy as np
from transformers import pipeline

import shapiq

shapiq.__version__

'1.2.3'

### Language model
We will use a pre-trained BERT model for sentiment analysis. We will use the `transformers` library to load the model and tokenizer. We will use the `lvwerra/distilbert-imdb` model for this tutorial.

The model predicts the sentiment of the sentence as **positive**. For this model (and other sentiment-analysis models), the output is a list of dictionaries, where each dictionary contains the `label` and the `score` of the sentiment. The label can be either `POSITIVE` or `NEGATIVE`. The score is the probability of the sentiment being positive or negative. The tokenized sentence contains the tokens of the sentence. The special tokens map contains the special tokens used by the model. We will need the `mask_token` later in the game.

In [3]:
# Load the model and tokenizer
classifier = pipeline(task="sentiment-analysis", model="lvwerra/distilbert-imdb")
tokenizer = classifier.tokenizer

test_sentence = "I love this movie!"
print(f"Classifier output: {classifier(test_sentence)}")

tokenized_sentence = tokenizer(test_sentence)
print(f"Tokenized sentence: {tokenized_sentence}")

special_tokens = tokenizer.special_tokens_map
print(f"Special tokens: {tokenizer.special_tokens_map}")

mask_token_id = tokenizer.mask_token_id
print(f"Mask token id: {mask_token_id}")

Device set to use mps:0


Classifier output: [{'label': 'POSITIVE', 'score': 0.9951981902122498}]
Tokenized sentence: {'input_ids': [101, 1045, 2293, 2023, 3185, 999, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
Special tokens: {'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}
Mask token id: 103


We can inspect the behavior of the model by checking the output of the classifier for different sentences and by decoding the tokenized sentences. The `tokenizer.decode` function can be used to decode the tokenized sentence. The `[CLS]` token is used to mark the beginning of the sentence, and the `[SEP]` token is used to mark the end of the sentence. Notice that also the `!` token is tokenized.

In [4]:
# Test the tokenizer
decoded_sentence = tokenizer.decode(tokenized_sentence["input_ids"])
print(f"Decoded sentence: {decoded_sentence}")

# Remove the start and end tokens
tokenized_input = np.asarray(tokenizer(test_sentence)["input_ids"][1:-1])
decoded_sentence = tokenizer.decode(tokenized_input)
print(
    f"Decoded sentence: {decoded_sentence} - Tokenized input: {tokenized_input} - {len(tokenized_input)} tokens.",
)

Decoded sentence: [CLS] i love this movie! [SEP]
Decoded sentence: i love this movie! - Tokenized input: [1045 2293 2023 3185  999] - 5 tokens.


Since the start and end tokens are always present this information is not relevant for our explanation. To explain this classifier we need to model its behavior as a cooperative game.

### Treating the language model as a game with a value function
For all Shapley-based feature attribution methods, we need to model the problem as a cooperative game. We need to define a **value function** that assigns a real-valued worth to each coalition of features. In this case, the features are the tokens of the sentence (without the `[CLS]` and `[SEP]` tokens). The value of the coalition is the sentiment score of the sentence with tokens that are not participating in the coalition `masked` or `removed`.

A value function has the following formal definition:
$$v: 2^N \rightarrow \mathbb{R}$$
where $N$ is the set of features (tokens in our case). 

To be able to model `POSITIVE` and `NEGATIVE` sentiments, we need to map the output of the classifier to be in the range $[-1, 1]$. We can do this with the following function which accepts a list of input texts and returns a vector of the sentiment of the input texts.


In [5]:
# Define the model call function
def model_call(input_texts: list[str]) -> np.ndarray[float]:
    """Calls the sentiment classification model with a list of texts.

    Args:
        input_texts: A list of input texts.

    Returns:
        A vector of the sentiment of the input texts.

    """
    outputs = classifier(input_texts)
    outputs = [
        output["score"] * 1 if output["label"] == "POSITIVE" else output["score"] * -1
        for output in outputs
    ]
    sentiments = np.array(outputs, dtype=float)

    return sentiments


# Test the model call function
print(f"Model call: {model_call(['I love this movie!', 'I hate this movie!'])}")

Model call: [ 0.99519819 -0.95526284]


With this model call function, we can now define the value function. In our world the value function accepts one-hot-encoded numpy matrices denoting the coalitions.

In [6]:
# Show coalitions
n_players = len(tokenized_sentence["input_ids"]) - 2  # remove [CLS] and [SEP]

empty_coalition = np.zeros((1, n_players), dtype=bool)  # empty coalition
full_coalition = np.ones((1, n_players), dtype=bool)  # full coalition

print(f"Empty coalition: {empty_coalition}")
print(f"Full coalition: {full_coalition}")

Empty coalition: [[False False False False False]]
Full coalition: [[ True  True  True  True  True]]


With these coalitions we can now define the value function. However, for most algorithms it is important that the value function is normalized (also known as centered). This means that the value of the empty coalition is 0. We can achieve this by subtracting the value of the empty coalition from the value of the coalition. This is done in the `shapiq` library, but we can also do it here.

Formally, the normalized value function is defined as:
$$v_0 := v(S) - v(\emptyset)$$
where $v(S)$ is the value of the coalition $S$ and $v(\emptyset)$ is the value of the empty coalition.

In [7]:
# Define the value function
def value_function(
    coalitions: np.ndarray[bool],
    tokenized_input: np.ndarray[int],
    normalization_value: float = 0.0,
) -> np.ndarray[float]:
    """Computes the value of the coalitions.

    Args:
        coalitions: A numpy matrix of shape (n_coalitions, n_players).
        tokenized_input: A numpy array of the tokenized input sentence.
        normalization_value: The value of the empty coalition. Default is 0.0 (no normalization).

    Returns:
        A vector of the value of the coalitions.

    """
    texts = []
    for coalition in coalitions:
        tokenized_coalition = tokenized_input.copy()
        # all tokens not in the coalition are set to mask_token_id
        tokenized_coalition[~coalition] = mask_token_id
        coalition_text = tokenizer.decode(tokenized_coalition)
        texts.append(coalition_text)

    # get the sentiment of the texts (call the model as defined above)
    sentiments = model_call(texts)

    # normalize/center the value function
    normalized_sentiments = sentiments - normalization_value

    return normalized_sentiments

We can test the value function without normalization. The output of the value function for the grand coalition (full coalition) should be the same as the output of the classifier. The output of the value function for the empty coalition is some bias value in the model which often is not zero.

In [8]:
# Test the value function without normalization
print(f"Output of the classifier: {classifier(test_sentence)}")

print(
    f"Value function for the full coalition: {value_function(full_coalition, tokenized_input=tokenized_input)[0]}",
)
print(
    f"Value function for the empty coalition: {value_function(empty_coalition, tokenized_input=tokenized_input)[0]}",
)

Output of the classifier: [{'label': 'POSITIVE', 'score': 0.9951981902122498}]
Value function for the full coalition: 0.9951981902122498
Value function for the empty coalition: 0.5192136764526367


If we normalize the value function, the output of the value function for the empty coalition should be zero.

In [9]:
# Test the value function with normalization
normalization_value = float(value_function(empty_coalition, tokenized_input=tokenized_input)[0])
print(
    f"Value function for the full coalition: {value_function(full_coalition, tokenized_input=tokenized_input, normalization_value=normalization_value)[0]}",
)
print(
    f"Value function for the empty coalition: {value_function(empty_coalition, tokenized_input=tokenized_input, normalization_value=normalization_value)[0]}",
)

Value function for the full coalition: 0.47598451375961304
Value function for the empty coalition: 0.0


`shapiq` expects the game to be only dependent on the coalitions. For this we can write a small wrapper function:

In [10]:
# Define the game function
def game_fun(coalitions: np.ndarray[bool]) -> np.ndarray[float]:
    """Wrapper function for the value function.

    Args:
        coalitions: A numpy matrix of shape (n_coalitions, n_players).

    Returns:
        A vector of the value of the coalitions.

    """
    return value_function(
        coalitions,
        tokenized_input=tokenized_input,
        normalization_value=normalization_value,
    )


# Test the game function
print(f"Game for the full coalition: {game_fun(full_coalition)[0]}")
print(f"Game for the empty coalition: {game_fun(empty_coalition)[0]}")

Game for the full coalition: 0.47598451375961304
Game for the empty coalition: 0.0


We can use this callable already in `shapiq`, but we can also define it as a proper `Game` object, which comes with some additional functionality. Notice that the `value_function` function is now a method of the `SentimentClassificationGame` class and you do not have to worry about the normalization. This is done automatically by the `Game` class which also contains the `__call__` method meaning that this class is also callable.

In [11]:
class SentimentClassificationGame(shapiq.Game):
    """The sentiment analysis classifier modeled as a cooperative game.

    Args:
        classifier: The sentiment analysis classifier.
        tokenizer: The tokenizer of the classifier.
        test_sentence: The sentence to be explained.

    """

    def __init__(self, classifier, tokenizer, test_sentence):
        self.classifier = classifier
        self.tokenizer = tokenizer
        self.test_sentence = test_sentence
        self.mask_token_id = tokenizer.mask_token_id
        self.tokenized_input = np.asarray(tokenizer(test_sentence)["input_ids"][1:-1])
        self.n_players = len(self.tokenized_input)

        empty_coalition = np.zeros((1, len(self.tokenized_input)), dtype=bool)
        self.normalization_value = float(self.value_function(empty_coalition)[0])
        super().__init__(n_players=self.n_players, normalization_value=self.normalization_value)

    def value_function(self, coalitions: np.ndarray[bool]) -> np.ndarray[float]:
        """Computes the value of the coalitions.

        Args:
            coalitions: A numpy matrix of shape (n_coalitions, n_players).

        Returns:
            A vector of the value of the coalitions.

        """
        texts = []
        for coalition in coalitions:
            tokenized_coalition = self.tokenized_input.copy()
            # all tokens not in the coalition are set to mask_token_id
            tokenized_coalition[~coalition] = self.mask_token_id
            coalition_text = self.tokenizer.decode(tokenized_coalition)
            texts.append(coalition_text)

        # get the sentiment of the texts (call the model as defined above)
        sentiments = self._model_call(texts)

        return sentiments

    def _model_call(self, input_texts: list[str]) -> np.ndarray[float]:
        """Calls the sentiment classification model with a list of texts.

        Args:
            input_texts: A list of input texts.

        Returns:
            A vector of the sentiment of the input texts.

        """
        outputs = self.classifier(input_texts)
        outputs = [
            output["score"] * 1 if output["label"] == "POSITIVE" else output["score"] * -1
            for output in outputs
        ]
        sentiments = np.array(outputs, dtype=float)

        return sentiments


# Test SentimentClassificationGame
game_class = SentimentClassificationGame(classifier, tokenizer, test_sentence)
print(f"Game for the full coalition: {game_class(full_coalition)[0]}")
print(f"Game for the empty coalition: {game_class(empty_coalition)[0]}")

Game for the full coalition: 0.47598451375961304
Game for the empty coalition: 0.0


### Computing Shapley interactions
We can now use the `game_fun` function or the `SentimentClassificationGame` class to compute the Shapley interactions with methods provided in `shapiq`.

In [12]:
# Compute Shapley interactions with the ShapIQ approximator for the game function
approximator = shapiq.KernelSHAPIQ(n=n_players, max_order=2, index="k-SII")
sii_values = approximator.approximate(budget=2**n_players, game=game_fun)
sii_values.dict_values

{(): 0.0,
 (0,): 0.09466196418889895,
 (1,): 0.2519671876192255,
 (2,): 0.06853008486648426,
 (3,): 0.06228182818457484,
 (4,): 0.1502293735159068,
 (0, 1): -0.023901337999199194,
 (0, 2): -0.015578424181147861,
 (0, 3): 0.013715559286939426,
 (0, 4): -0.012585067760777176,
 (1, 2): 0.03777686295041179,
 (1, 3): -0.07309907222393518,
 (1, 4): -0.055708334461501,
 (2, 3): 0.01566102726098756,
 (2, 4): -0.06081690989708467,
 (3, 4): 0.022849774106212306}

In [13]:
# Compute Shapley interactions with the ShapIQ approximator for the game object
approximator = shapiq.KernelSHAPIQ(n=game_class.n_players, max_order=2, index="k-SII")
sii_values = approximator.approximate(budget=2**game_class.n_players, game=game_class)
sii_values.dict_values

{(): 0.0,
 (0,): 0.09466196418889895,
 (1,): 0.2519671876192255,
 (2,): 0.06853008486648426,
 (3,): 0.06228182818457484,
 (4,): 0.1502293735159068,
 (0, 1): -0.023901337999199194,
 (0, 2): -0.015578424181147861,
 (0, 3): 0.013715559286939426,
 (0, 4): -0.012585067760777176,
 (1, 2): 0.03777686295041179,
 (1, 3): -0.07309907222393518,
 (1, 4): -0.055708334461501,
 (2, 3): 0.01566102726098756,
 (2, 4): -0.06081690989708467,
 (3, 4): 0.022849774106212306}