# BERT for QA

- 📺 **Video:** [https://youtu.be/F8hWZ4xaVkA](https://youtu.be/F8hWZ4xaVkA)

## Overview
- Adapt BERT to predict answer spans with start/end token classifiers.
- Fine-tune on QA datasets with span supervision.

## Key ideas
- **Input packing:** concatenate [CLS] question [SEP] passage [SEP].
- **Span prediction:** two classifiers estimate start and end probabilities.
- **Answer selection:** choose span with highest start/end score combination.
- **Fine-tuning:** backpropagate cross-entropy losses for start/end indices.

## Demo
Simulate span scoring by computing dot products between a question vector and passage token vectors, echoing the lecture (https://youtu.be/XpIxA9Ql0AU).

In [1]:
import numpy as np

question_vec = np.array([0.6, 0.2, 0.5])
passage_tokens = np.array([
    [0.1, 0.5, 0.2],
    [0.7, 0.1, 0.6],
    [0.2, 0.2, 0.3],
    [0.8, 0.3, 0.7]
])

start_scores = passage_tokens @ question_vec
end_scores = start_scores[::-1]
start_idx = int(np.argmax(start_scores))
end_idx = len(passage_tokens) - int(np.argmax(end_scores)) - 1
print('Start index:', start_idx)
print('End index:', end_idx)


Start index: 3
End index: 3


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text](https://www.aclweb.org/anthology/D13-1020.pdf)
- [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://www.aclweb.org/anthology/D16-1264/)
- [Adversarial Examples for Evaluating Reading Comprehension Systems](https://www.aclweb.org/anthology/D17-1215/)
- [Reading Wikipedia to Answer Open-Domain Questions](https://arxiv.org/abs/1704.00051)
- [Latent Retrieval for Weakly Supervised Open Domain Question Answering](https://www.aclweb.org/anthology/P19-1612.pdf)
- [[Website] Natural Questions](https://ai.google.com/research/NaturalQuestions)
- [retrieval-augmented generation](https://arxiv.org/pdf/2005.11401.pdf)
- [WebGPT](https://arxiv.org/abs/2112.09332)
- [HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering](https://arxiv.org/abs/1809.09600)
- [Understanding Dataset Design Choices for Multi-hop Reasoning](https://www.aclweb.org/anthology/N19-1405/)
- [Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering](https://openreview.net/forum?id=SJgVHkrYDH)
- [QAMPARI](https://arxiv.org/abs/2205.12665)
- [Wizards of Wikipedia: Knowledge-Powered Conversational Agents](https://arxiv.org/pdf/1811.01241.pdf)
- [Task-Oriented Dialogue as Dataflow Synthesis](https://arxiv.org/abs/2009.11423)
- [A Neural Network Approach to Context-Sensitive Generation of Conversational Responses](https://arxiv.org/abs/1506.06714)
- [A Diversity-Promoting Objective Function for Neural Conversation Models](https://arxiv.org/abs/1510.03055)
- [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf)
- [Kurt Shuster et al.](https://arxiv.org/abs/2208.03188)
- [character.ai](https://character.ai)


*Links only; we do not redistribute slides or papers.*