# Problems with Reading Comprehension

- 📺 **Video:** [https://youtu.be/tCvAHmrxPvY](https://youtu.be/tCvAHmrxPvY)

## Overview
- Diagnose vulnerabilities in reading comprehension datasets and models.
- Investigate adversarial distractions, shallow cues, and annotation leakage.

## Key ideas
- **Adversarial questions:** distractors can mislead lexical baselines.
- **Shortcut learning:** models exploit n-gram overlap rather than deep understanding.
- **Robust evaluation:** adversarial testing and perturbations reveal weaknesses.
- **Data augmentation:** paraphrases and counterfactuals improve robustness.

## Demo
Demonstrate how adding an adversarial sentence with misleading cues breaks a lexical overlap baseline, as in the lecture (https://youtu.be/26wBqXc9EZI).

In [1]:
passage = (
    'Marie Curie studied radioactivity and won the Nobel Prize. '
    'Adversarial: Marie Curie did not study biology.'
)
question = 'What did Marie Curie study?'

sentences = passage.split('. ')
question_tokens = set(question.lower().split())

scores = []
for sent in sentences:
    score = len(question_tokens.intersection(sent.lower().split()))
    scores.append((score, sent))

scores.sort(reverse=True)
for score, sent in scores:
    print(f'Score={score} | sentence={sent}')


Score=3 | sentence=Adversarial: Marie Curie did not study biology.
Score=2 | sentence=Marie Curie studied radioactivity and won the Nobel Prize


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text](https://www.aclweb.org/anthology/D13-1020.pdf)
- [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://www.aclweb.org/anthology/D16-1264/)
- [Adversarial Examples for Evaluating Reading Comprehension Systems](https://www.aclweb.org/anthology/D17-1215/)
- [Reading Wikipedia to Answer Open-Domain Questions](https://arxiv.org/abs/1704.00051)
- [Latent Retrieval for Weakly Supervised Open Domain Question Answering](https://www.aclweb.org/anthology/P19-1612.pdf)
- [[Website] Natural Questions](https://ai.google.com/research/NaturalQuestions)
- [retrieval-augmented generation](https://arxiv.org/pdf/2005.11401.pdf)
- [WebGPT](https://arxiv.org/abs/2112.09332)
- [HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering](https://arxiv.org/abs/1809.09600)
- [Understanding Dataset Design Choices for Multi-hop Reasoning](https://www.aclweb.org/anthology/N19-1405/)
- [Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering](https://openreview.net/forum?id=SJgVHkrYDH)
- [QAMPARI](https://arxiv.org/abs/2205.12665)
- [Wizards of Wikipedia: Knowledge-Powered Conversational Agents](https://arxiv.org/pdf/1811.01241.pdf)
- [Task-Oriented Dialogue as Dataflow Synthesis](https://arxiv.org/abs/2009.11423)
- [A Neural Network Approach to Context-Sensitive Generation of Conversational Responses](https://arxiv.org/abs/1506.06714)
- [A Diversity-Promoting Objective Function for Neural Conversation Models](https://arxiv.org/abs/1510.03055)
- [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf)
- [Kurt Shuster et al.](https://arxiv.org/abs/2208.03188)
- [character.ai](https://character.ai)


*Links only; we do not redistribute slides or papers.*