# Transition-based Dependency Parsing

- 📺 **Video:** [https://youtu.be/ypoaw7lJ6Rk](https://youtu.be/ypoaw7lJ6Rk)

## Overview
- Parse sentences by incrementally building dependency arcs with shift/reduce transitions.
- Use classifiers to predict transitions from parser configurations.

## Key ideas
- **Stack and buffer:** maintain processed and remaining tokens.
- **Transitions:** SHIFT, LEFT-ARC, RIGHT-ARC actions build arcs.
- **Feature extraction:** use top-of-stack and buffer tokens to choose actions.
- **Linear-time:** transition-based parsing runs in O(n).

## Demo
Simulate an arc-eager parser on a short sentence to replicate the step-by-step examples from the lecture (https://youtu.be/sIP5F2r6XUc).

In [1]:
sentence = ['ROOT', 'She', 'enjoys', 'fresh', 'coffee']
stack = [0]
buffer = list(range(1, len(sentence)))
arcs = []

transitions = [
    ('SHIFT', None),
    ('RIGHT-ARC', 'nsubj'),
    ('SHIFT', None),
    ('SHIFT', None),
    ('LEFT-ARC', 'amod'),
    ('RIGHT-ARC', 'obj'),
    ('RIGHT-ARC', 'root')
]

for action, label in transitions:
    if action == 'SHIFT':
        stack.append(buffer.pop(0))
    elif action == 'LEFT-ARC':
        dependent = stack.pop(-2)
        head = stack[-1]
        arcs.append((head, dependent, label))
    elif action == 'RIGHT-ARC':
        head = stack[-1]
        dependent = buffer.pop(0) if buffer else stack.pop()
        if buffer:
            stack.append(dependent)
        arcs.append((head, dependent, label))
    print(f"Action: {action:9s} | Stack: {[sentence[i] for i in stack]} | Buffer: {[sentence[i] for i in buffer]}")

print()

print('Arcs built:')
for head, dep, label in arcs:
    print(f"{sentence[head]} --{label}--> {sentence[dep]}")


Action: SHIFT     | Stack: ['ROOT', 'She'] | Buffer: ['enjoys', 'fresh', 'coffee']
Action: RIGHT-ARC | Stack: ['ROOT', 'She', 'enjoys'] | Buffer: ['fresh', 'coffee']
Action: SHIFT     | Stack: ['ROOT', 'She', 'enjoys', 'fresh'] | Buffer: ['coffee']
Action: SHIFT     | Stack: ['ROOT', 'She', 'enjoys', 'fresh', 'coffee'] | Buffer: []
Action: LEFT-ARC  | Stack: ['ROOT', 'She', 'enjoys', 'coffee'] | Buffer: []
Action: RIGHT-ARC | Stack: ['ROOT', 'She', 'enjoys'] | Buffer: []
Action: RIGHT-ARC | Stack: ['ROOT', 'She'] | Buffer: []

Arcs built:
She --nsubj--> enjoys
coffee --amod--> fresh
coffee --obj--> coffee
enjoys --root--> enjoys


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf)
- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://www.aclweb.org/anthology/W19-4302/)
- [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/pdf/1804.07461.pdf)
- [What Does BERT Look At? An Analysis of BERT's Attention](https://arxiv.org/abs/1906.04341)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/pdf/1907.11692.pdf)
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
- [UnifiedQA: Crossing Format Boundaries With a Single QA System](https://arxiv.org/abs/2005.00700)
- [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf)
- [Byte Pair Encoding is Suboptimal for Language Model Pretraining](https://arxiv.org/pdf/2004.03720.pdf)
- [Eisenstein 8.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.4.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 7.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [TnT - A Statistical Part-of-Speech Tagger](https://arxiv.org/abs/cs/0003055)
- [Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger](https://www.aclweb.org/anthology/W00-1308/)
- [Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?](https://link.springer.com/chapter/10.1007/978-3-642-19400-9_14)
- [Natural Language Processing with Small Feed-Forward Networks](https://www.aclweb.org/anthology/D17-1309.pdf)
- [Eisenstein 10.1-10.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3-10.4](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 10.3.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Accurate Unlexicalized Parsing](https://www.aclweb.org/anthology/P03-1054/)
- [Eisenstein 10.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Eisenstein 11.1](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Finding Optimal 1-Endpoint-Crossing Trees](https://www.aclweb.org/anthology/Q13-1002/)
- [Eisenstein 11.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)


*Links only; we do not redistribute slides or papers.*