# Refining Grammars

- 📺 **Video:** [https://youtu.be/f1o1_bPWzM0](https://youtu.be/f1o1_bPWzM0)

## Overview
- Improve PCFG accuracy by refining nonterminals with subcategories or split-merge operations.
- Capture lexical or contextual distinctions otherwise conflated.

## Key ideas
- **Vertical splitting:** differentiate nonterminals based on parent context.
- **Horizontal splitting:** distinguish productions based on lexical heads.
- **Merge step:** collapse splits that do not improve likelihood.
- **Latent annotations:** automatically discover refinements via EM.

## Demo
Split NP into subject/object subclasses, recompute rule probabilities, and show how the preferred parse changes as in the lecture (https://youtu.be/xYjaCB1WH6I).

In [1]:
base_rules = {
    ('S', ('NP', 'VP')): 1.0,
    ('VP', ('V', 'NP')): 0.6,
    ('VP', ('VP', 'PP')): 0.4,
    ('NP', ('Det', 'N')): 1.0,
    ('PP', ('P', 'NP')): 1.0
}
lexicon = {
    ('Det', 'the'): 1.0,
    ('N', 'cat'): 0.5,
    ('N', 'mat'): 0.5,
    ('V', 'sat'): 1.0,
    ('P', 'on'): 1.0
}

refined_rules = {
    ('S', ('NP_subj', 'VP')): 0.6,
    ('S', ('NP_obj', 'VP')): 0.4,
    ('VP', ('V', 'NP_obj')): 0.7,
    ('VP', ('VP', 'PP')): 0.3,
    ('NP_subj', ('Det', 'N')): 1.0,
    ('NP_obj', ('Det', 'N')): 1.0,
    ('PP', ('P', 'NP_obj')): 1.0
}

sentence_parse = ('S', ('NP_subj', ('Det', 'the'), ('N', 'cat')), ('VP', ('V', 'sat'), ('PP', ('P', 'on'), ('NP_obj', ('Det', 'the'), ('N', 'mat')))))

base_prob = base_rules[('S', ('NP', 'VP'))] * base_rules[('NP', ('Det', 'N'))] * lexicon[('Det', 'the')] * lexicon[('N', 'cat')] * base_rules[('VP', ('V', 'NP'))] * lexicon[('V', 'sat')] * base_rules[('NP', ('Det', 'N'))] * lexicon[('Det', 'the')] * lexicon[('N', 'mat')] * base_rules[('PP', ('P', 'NP'))] * lexicon[('P', 'on')] * base_rules[('NP', ('Det', 'N'))]
refined_prob = refined_rules[('S', ('NP_subj', 'VP'))] * refined_rules[('NP_subj', ('Det', 'N'))] * lexicon[('Det', 'the')] * lexicon[('N', 'cat')] * refined_rules[('VP', ('V', 'NP_obj'))] * lexicon[('V', 'sat')] * refined_rules[('NP_obj', ('Det', 'N'))] * lexicon[('Det', 'the')] * lexicon[('N', 'mat')] * refined_rules[('PP', ('P', 'NP_obj'))] * lexicon[('P', 'on')] * refined_rules[('NP_obj', ('Det', 'N'))]

print('Base parse probability:', base_prob)
print('Refined parse probability:', refined_prob)
print('Refinement increases score by factor:', refined_prob / base_prob)


Base parse probability: 0.15
Refined parse probability: 0.105
Refinement increases score by factor: 0.7


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://www.aclweb.org/anthology/W19-4302/)
- [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/pdf/1804.07461.pdf)
- [What Does BERT Look At? An Analysis of BERT's Attention](https://arxiv.org/abs/1906.04341)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/pdf/1907.11692.pdf)
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
- [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700)
- [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf)
- [Byte Pair Encoding is Suboptimal for Language Model Pretraining](https://arxiv.org/pdf/2004.03720.pdf)
- [Eisenstein 8.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [TnT - A Statistical Part-of-Speech Tagger](https://arxiv.org/abs/cs/0003055)
- [Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger](https://www.aclweb.org/anthology/W00-1308/)
- [Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?](https://link.springer.com/chapter/10.1007/978-3-642-19400-9_14)
- [Natural Language Processing with Small Feed-Forward Networks](https://www.aclweb.org/anthology/D17-1309.pdf)
- [Eisenstein 10.1-10.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3-10.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Accurate Unlexicalized Parsing](https://www.aclweb.org/anthology/P03-1054/)
- [Eisenstein 10.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 11.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Finding Optimal 1-Endpoint-Crossing Trees](https://www.aclweb.org/anthology/Q13-1002/)
- [Eisenstein 11.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)


*Links only; we do not redistribute slides or papers.*