# AIPI 590 - XAI | Assignment #4
### Hongxuan Li

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ba_9sGCfP0BW0fcbe3pRLCdxzsckDT6P?usp=sharing)

#### References

- dataset https://archive.ics.uci.edu/dataset/2/adult
- imodels https://github.com/csinva/imodels/tree/master

In [4]:
# import os

# # Remove Colab default sample_data
# !rm -r ./sample_data

# # Clone GitHub files to colab workspace
# repo_name = "/content/AIPI590-XAI" # Change to your repo name
# git_path = 'https://github.com/h0ngxuanli/AIPI590-XAI.git' #Change to your path
# !git clone "{git_path}"


# # Install dependencies from requirements.txt file
# !pip install -r "{os.path.join(repo_name,'assignment4/requirements.txt')}" #Add if using requirements.txt

# # Change working directory to location of notebook
# notebook_dir = 'assignment4/'
# path_to_notebook = os.path.join(repo_name,notebook_dir)
# %cd "{path_to_notebook}"
# %ls


# Dependencies

In [5]:
!pip install shap



In [6]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
import shap
import numpy as np

# Dataset

Test sentiment sentences with various difficulties

In [7]:
test_examples = [
    "The software update fixed some issues but introduced new bugs.",
    "Despite the rain, we had a great time at the outdoor festival.",
    "The customer service was helpful, but I still haven't resolved my issue.",
    "I can't recommend this enough.",
    "The performance was out of this world.",
    "This solution is a drop in the bucket."
]

# Load BERT for sentiment analysis

Choose BERT for sentiment analysis, and examine whether SHAP could capture esstential sentiment words

In [8]:
# Load pre-trained BERT and tokenizer
model_name = "google-bert/bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

# Move to cuda
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

# Inference Pipeline

In [9]:
def inference(texts):

    processed_texts = []
    for text in texts:
      processed_texts.append(text)

    # tokenize input
    inputs = tokenizer(
        processed_texts,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=512
    ).to(device)

    # get BERT output
    with torch.no_grad():
        outputs = model(**inputs)

    # get prediction based on output logits
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)

    return probs.cpu().numpy()

# Initialize SHAP

In [13]:
# initialize masker to mask words
masker = shap.maskers.Text(tokenizer)

# create SHAP explainer
explainer = shap.Explainer(inference, masker)

# get SHAP explanation
def analyze_sentiment_with_shap(text):

    # Get sentiment prediction
    prediction = inference([text])[0]

    # Get binary prediction results
    sentiment = "Positive" if prediction[1] > prediction[0] else "Negative"
    confidence = float(max(prediction))

    sentiment_idx = 1 if sentiment == "Positive" else 0
    # Compute SHAP values
    shap_values = explainer([text])[:, sentiment_idx]

    return sentiment, confidence, shap_values

# Visulize Explanations

In [None]:
for example in test_examples:
    sentiment, confidence, shap_values = analyze_sentiment_with_shap(example)
    print(f"\nText: {example}")
    print(f"Sentiment: {sentiment}")
    print(f"Confidence: {confidence:.4f}")
    shap.plots.text(shap_values[0])


    # Display the most influential words
    # Access the SHAP values for the positive class (index 1)
    shap_values_array = shap_values[0].values[:, 1]

    # Get the tokens
    tokens = shap_values[0].data

    # Pair tokens with their corresponding SHAP values
    token_importance = list(zip(tokens, shap_values_array))

    # Sort tokens by absolute SHAP value (importance)
    token_importance.sort(key=lambda x: np.abs(x[1]), reverse=True)

    print("\nMost influential words (absolute SHAP value):")
    for token, importance in token_importance[:5]:  # Top 5 most influential words
        print(f"{token}: {importance:.4f}")

  0%|          | 0/156 [00:00<?, ?it/s]

# Discussion

### The reason choose it
- Apply SHAP to different layers of BERT
Helps understand how importance evolves through the network
-
- model agnostic
- SHAP is able to provide exact importane for each token, which provides evidence to know how the BERT make wrong predictions
- capture the interaction between tokens
- Help find corner cases that Model fail to handle with

### Strength

.	深入理解模型决策过程：SHAP 可以将 BERT 的预测分解为各个词语的贡献，提供对模型决策过程的深入理解，帮助识别模型关注的关键情感词汇。
	2.	识别情感反转和复杂表达：在处理包含否定、双重否定或反语的句子时，SHAP 能够揭示这些语言现象对情感预测的影响，显示模型是否正确理解了复杂的情感表达。
	3.	辅助模型调优和改进：通过分析 SHAP 值，开发者可以发现模型可能存在的偏差或弱点，针对性地调整训练数据或模型结构，提高模型性能。
	4.	促进公平性和伦理审查：SHAP 可以帮助识别模型在不同人群或主题上的偏差，确保模型的预测公平公正，符合伦理标准。
	5.	增强用户交互体验：在用户反馈或社交媒体分析等应用中，提供模型预测的可解释性可以提高用户对系统的信任和满意度。
	6.	教育和研究价值：使用 SHAP 解释 BERT 的情感分析可以帮助学生和研究者理解深度学习模型如何处理自然语言，有助于教学和学术研究

### Limitations

- SHAP struggles to fully capture the contextual nature of BERT's embeddings, where the same word can have different representations based on its context.
- The large feature space in longer sentences may lead to sparse and less reliable SHAP values, reducing the credibility of explanations.
- SHAP's effectiveness is contingent on the stability and consistency of BERT's output, which can vary due to factors like fine-tuning, input preprocessing, or even minor changes in input.

### Improvement

- BERT's use of subword tokenization can lead to unintuitive SHAP attributions, where importance is assigned to partial words rather than complete, meaningful units.
