# Model Probing

- 📺 **Video:** [https://youtu.be/a6u6WM5wcLQ](https://youtu.be/a6u6WM5wcLQ)

## Overview
Now shifts to analyzing what linguistic information is encoded in model representations (internal vectors) via probing tasks It explains that one approach to interpret models is to take the hidden states (like BERT's word embeddings from various layers) and train simple classifiers to predict linguistic properties (e.g., POS tag, syntactic chunk, dependency relation, semantic role, etc.) from those states. If the classifier can predict well, it implies the model's representation contains that info.

In [None]:
import os, random
random.seed(0)
CI = os.environ.get('CI') == 'true'

## Key ideas
- The video references Tenney et al.
- 2019 “BERT rediscovers the classical NLP pipeline” which found that BERT's layers progressively encode syntax then semantic info, akin to old pipeline stages - e.g., earlier layers encode POS and phrase chunking, mid layers dependencies, later layers coreference or semanti It likely visualizes that result: tasks like POS, parsing, NER can be extracted at different depths of BERT, showing an ordering.
- Also references Tenney's other 2019 work on probing context (“What do you learn from context?”) exploring how sentence structure info is encoded.
- The video might discuss Structural Probe (Hewitt & Manning 2019) that found a linear transformation of BERT's space where distances correlate with parse tree distances - implying BERT implicitly learned a syntactic tree geometry.

## Demo

In [None]:
print('Try the exercises below and follow the linked materials.')

## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [The Mythos of Model Interpretability](https://arxiv.org/pdf/1606.03490.pdf)
- [Deep Unordered Composition Rivals Syntactic Methods for Text Classification](https://www.aclweb.org/anthology/P15-1162/)
- [Analysis Methods in Neural Language Processing: A Survey](https://arxiv.org/pdf/1812.08951.pdf)
- ["Why Should I Trust You?" Explaining the Predictions of Any Classifier](https://arxiv.org/pdf/1602.04938.pdf)
- [Axiomatic Attribution for Deep Networks](https://arxiv.org/pdf/1703.01365.pdf)
- [BERT Rediscovers the Classical NLP Pipeline](https://arxiv.org/pdf/1905.05950.pdf)
- [What Do You Learn From Context? Probing For Sentence Structure In Contextualized Word Represenations](https://arxiv.org/pdf/1905.06316.pdf)
- [Annotation Artifacts in Natural Language Inference Data](https://www.aclweb.org/anthology/N18-2017/)
- [Hypothesis Only Baselines in Natural Language Inference](https://www.aclweb.org/anthology/S18-2023/)
- [Did the Model Understand the Question?](https://www.aclweb.org/anthology/P18-1176/)
- [Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference](https://www.aclweb.org/anthology/D18-1009.pdf)
- [Generating Visual Explanations](https://arxiv.org/pdf/1603.08507.pdf)
- [e-SNLI: Natural Language Inference with Natural Language Explanations](https://arxiv.org/abs/1812.01193)
- [Explaining Question Answering Models through Text Generation](https://arxiv.org/pdf/2004.05569.pdf)
- [Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems](https://arxiv.org/abs/1705.04146)
- [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
- [The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning](https://arxiv.org/abs/2205.03401)
- [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916)
- [Complementary Explanations for Effective In-Context Learning](https://arxiv.org/pdf/2211.13892.pdf)
- [PAL: Program-aided Language Models](https://arxiv.org/abs/2211.10435)
- [Measuring and Narrowing the Compositionality Gap in Language Models](https://arxiv.org/abs/2210.03350)


*Links only; we do not redistribute slides or papers.*