# Local Explanations: Highlights

- 📺 **Video:** [https://youtu.be/ZVElc4CvHpk](https://youtu.be/ZVElc4CvHpk)

## Overview
- Generate local explanations that highlight specific tokens responsible for a model decision.
- Compare gradient-based saliency with perturbation-based scores.

## Key ideas
- **Token-level attribution:** assign importance scores to words.
- **Perturbation tests:** mask or drop tokens to measure confidence changes.
- **Stability:** explanations should not vary wildly under small changes.
- **Human alignment:** highlighted spans should align with intuitive evidence.

## Demo
Compute occlusion-based importance scores by masking each token and measuring the drop in predicted probability, as in the lecture (https://youtu.be/pru2Pg1usjI).

In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
import numpy as np

text = 'The plot was surprisingly fresh and heartfelt'
training = [
    'Fresh storytelling and great chemistry',
    'Heartfelt performances and witty dialogue',
    'Predictable and boring plot',
    'Flat characters and dull pacing'
]
labels = [1, 1, 0, 0]

vec = TfidfVectorizer()
X = vec.fit_transform(training)
clf = LogisticRegression(max_iter=1000)
clf.fit(X, labels)

orig_prob = clf.predict_proba(vec.transform([text]))[0, 1]
tokens = text.split()
importances = []
for i in range(len(tokens)):
    masked = tokens.copy()
    masked[i] = '[MASK]'
    prob = clf.predict_proba(vec.transform([' '.join(masked)]))[0, 1]
    importances.append(orig_prob - prob)

for token, score in zip(tokens, importances):
    print(f"{token:>12s} | importance={score:.3f}")


         The | importance=0.000
        plot | importance=-0.041
         was | importance=0.000
surprisingly | importance=0.000
       fresh | importance=0.028
         and | importance=-0.002
   heartfelt | importance=0.028


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [The Mythos of Model Interpretability](https://arxiv.org/pdf/1606.03490.pdf)
- [Deep Unordered Composition Rivals Syntactic Methods for Text Classification](https://www.aclweb.org/anthology/P15-1162/)
- [Analysis Methods in Neural Language Processing: A Survey](https://arxiv.org/pdf/1812.08951.pdf)
- ["Why Should I Trust You?" Explaining the Predictions of Any Classifier](https://arxiv.org/pdf/1602.04938.pdf)
- [Axiomatic Attribution for Deep Networks](https://arxiv.org/pdf/1703.01365.pdf)
- [BERT Rediscovers the Classical NLP Pipeline](https://arxiv.org/pdf/1905.05950.pdf)
- [What Do You Learn From Context? Probing For Sentence Structure In Contextualized Word Represenations](https://arxiv.org/pdf/1905.06316.pdf)
- [Annotation Artifacts in Natural Language Inference Data](https://www.aclweb.org/anthology/N18-2017/)
- [Hypothesis Only Baselines in Natural Language Inference](https://www.aclweb.org/anthology/S18-2023/)
- [Did the Model Understand the Question?](https://www.aclweb.org/anthology/P18-1176/)
- [Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference](https://www.aclweb.org/anthology/D18-1009.pdf)
- [Generating Visual Explanations](https://arxiv.org/pdf/1603.08507.pdf)
- [e-SNLI: Natural Language Inference with Natural Language Explanations](https://arxiv.org/abs/1812.01193)
- [Explaining Question Answering Models through Text Generation](https://arxiv.org/pdf/2004.05569.pdf)
- [Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems](https://arxiv.org/abs/1705.04146)
- [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
- [The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning](https://arxiv.org/abs/2205.03401)
- [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916)
- [Complementary Explanations for Effective In-Context Learning](https://arxiv.org/pdf/2211.13892.pdf)
- [PAL: Program-aided Language Models](https://arxiv.org/abs/2211.10435)
- [Measuring and Narrowing the Compositionality Gap in Language Models](https://arxiv.org/abs/2210.03350)


*Links only; we do not redistribute slides or papers.*