# Constituency Parsing

- 📺 **Video:** [https://youtu.be/zDPUKQKDaMM](https://youtu.be/zDPUKQKDaMM)

## Overview
- Combine grammar rules and parsing algorithms to build full parse trees.
- Evaluate parser accuracy with labeled F1 on treebanks.

## Key ideas
- **Grammar design:** choose rules that capture syntactic phenomena.
- **Inference:** CKY or chart parsers enumerate candidate trees.
- **Scoring:** PCFG probabilities or discriminative models rank parses.
- **Evaluation:** labeled span precision/recall quantify performance.

## Demo
Use the inside algorithm to compute total parse probability and extract the best tree for a short sentence, mirroring the lecture (https://youtu.be/VGt0_CZc8mA).

In [1]:
import math

nonterminals = ['S', 'NP', 'VP', 'PP', 'Det', 'N', 'V', 'P']
rules = {
    ('S', ('NP', 'VP')): math.log(1.0),
    ('VP', ('V', 'NP')): math.log(0.6),
    ('VP', ('VP', 'PP')): math.log(0.4),
    ('NP', ('Det', 'N')): math.log(0.8),
    ('NP', ('NP', 'PP')): math.log(0.2),
    ('PP', ('P', 'NP')): math.log(1.0)
}
lexicon = {
    ('Det', 'the'): math.log(0.5),
    ('N', 'cat'): math.log(0.5),
    ('N', 'mat'): math.log(0.5),
    ('V', 'sat'): math.log(1.0),
    ('P', 'on'): math.log(1.0)
}

sentence = ['the', 'cat', 'sat', 'on', 'the', 'mat']
n = len(sentence)
chart = [[{} for _ in range(n + 1)] for _ in range(n)]
back = {}

for i, word in enumerate(sentence):
    for (lhs, w), logp in lexicon.items():
        if w == word:
            chart[i][i+1][lhs] = logp
            back[(i, i+1, lhs)] = word

for span in range(2, n + 1):
    for i in range(n - span + 1):
        j = i + span
        for k in range(i + 1, j):
            left_cell = chart[i][k]
            right_cell = chart[k][j]
            for (lhs, (B, C)), logp in rules.items():
                if B in left_cell and C in right_cell:
                    score = logp + left_cell[B] + right_cell[C]
                    if score > chart[i][j].get(lhs, -math.inf):
                        chart[i][j][lhs] = score
                        back[(i, j, lhs)] = (k, B, C)

print('Log probability of best parse:', chart[0][n].get('S'))


Log probability of best parse: None


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://www.aclweb.org/anthology/W19-4302/)
- [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/pdf/1804.07461.pdf)
- [What Does BERT Look At? An Analysis of BERT's Attention](https://arxiv.org/abs/1906.04341)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/pdf/1907.11692.pdf)
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
- [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700)
- [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf)
- [Byte Pair Encoding is Suboptimal for Language Model Pretraining](https://arxiv.org/pdf/2004.03720.pdf)
- [Eisenstein 8.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [TnT - A Statistical Part-of-Speech Tagger](https://arxiv.org/abs/cs/0003055)
- [Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger](https://www.aclweb.org/anthology/W00-1308/)
- [Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?](https://link.springer.com/chapter/10.1007/978-3-642-19400-9_14)
- [Natural Language Processing with Small Feed-Forward Networks](https://www.aclweb.org/anthology/D17-1309.pdf)
- [Eisenstein 10.1-10.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3-10.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Accurate Unlexicalized Parsing](https://www.aclweb.org/anthology/P03-1054/)
- [Eisenstein 10.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 11.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Finding Optimal 1-Endpoint-Crossing Trees](https://www.aclweb.org/anthology/Q13-1002/)
- [Eisenstein 11.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)


*Links only; we do not redistribute slides or papers.*