In [None]:
# Author: Zhengxiang (Jack) Wang 
# Date: 2021-08-06, modified on 2021-10-06
# GitHub: https://github.com/jaaack-wang 
# About: Word Vectors and Word Senses for Stanford CS224N- NLP with Deep Learning | Winter 2019

# Table of Contents
- [1. Casual takeaways](#1)
- [2. Syntactic structure: constituency and dependency](#2)
    - [2.1 Constituency](#2-1)
    - [2.2 Dependency](#2-2)
    - [2.3 Sources of ambiguity related to dependency](#2-3)
        - [2.3.1 Prepositional phrase attachment ambiguity](#2-3-1)
        - [2.3.2 PP attachment ambiguities multiply](#2-3-2)
        - [2.3.3 Coordination scope ambiguity](#2-3-3)
        - [2.3.4 Adjectival modifier ambiguity](#2-3-4)
        - [2.3.5 Verb phrase (VP) attachment ambiguity](#2-3-5)
- [3. Dependency grammar and dependency structure](#3)
    - [3.1 A bit history of dependency grammar/parsing](#3-1)
    - [3.2 Dependency grammar and its graphic representation](#3-2)
    - [3.3 Universal dependencies treebanks and annotated data](#3-3)
- [4. Dependency parser](#4)
    - [4.1 Dependency conditioning preferences](#4-1)
    - [4.2 General parsing rules](#4-2)
    - [4.3 Projectivity](#4-3)
    - [4.4 Methods of dependency parsing](#4-4)
- [5. Greedy transition-based parsing](#5)
    - [5.1 Overview](#5-1)
    - [5.2 Arc-standard transition-based parser](#5-2)
    - [5.3 ML transition-based parser](#5-3)
    - [5.4 Evaluation of accuracy](#5-4)
    - [5.5 Problems](#5-5)
- [6. Neural dependency parser](#6)
    - [6.1 General idea](#6-1)
    - [6.2 Preprocessing](#6-2)
    - [6.3 Model architecture](#6-3)
    - [6.4 Comparison](#6-4)
- [7. References](#7)

<a name='1'></a>
# 1. Casual takeaways

- The hard computation in [greedy transition-based parsing](#5) needs to be worked out sometimes later. -- 2021-10-06

<a name='2'></a>
# 2. Syntactic structure: constituency and dependency

<a name='2-1'></a>
## 2.1 Constituency

- Also known as **phrase structure grammar** or **context-free grammars (CFGs)**

- Key assumption: **Phrase structure organizes words into nested constituents**
    - Words (basic units, assigned with parts of speech) $\longrightarrow$ phrases $\longrightarrow$ bigger phrases $\longrightarrow$ ... 
- Example:

<img src='../images/5-constituency.png' width='600' height='300'>

<br>


<a name='2-2'></a>
## 2.2 Dependency

- Dependency structure: **which words depend on (modify or are arguments of) which other words**; semantic relationship

- Example (from the internet):


<img src='../images/5-dependency-example.png' width='400' height='200'>

<br>

- Why dependency?
    - We need to understand sentence structure in order to be able to interpret language correctly
    - Humans communicate complex ideas by composing words together into bigger units to convey complex meanings
    - We need to know what is connected to what
    - Otehrwise, ambiguity
    

- Dependency paths identify semantic relations
    - e.g.: The result demonstrated that KaiC interacts rhythmically with SasA, KaiA, and KaiB.


<img src='../images/5-dependency-example2.png' width='400' height='200'>

    
    
<a name='2-3'></a>
## 2.3 Sources of ambiguity related to dependency 


<a name='2-3-1'></a>
### 2.3.1  Prepositional phrase attachment ambiguity

- e.g.: Scientists count <ins>**whales** from space</ins>; Scientists <ins>**count**</ins> whales <ins>from space</ins>
    
<img src='../images/5-ambiguity-example.png' width='400' height='200'>


<a name='2-3-2'></a>
### 2.3.2 PP attachment ambiguities multiply
    - e.g.: The board approved \[its acquisition\] \[by Royal Trustco Ltd.\] \[of Toronto\] \[for \$27 a share\] \[at its monthly meeting\].
    
    
<img src='../images/5-ambiguity-example2.png' width='600' height='300'>


<br>

- **Catalan number**: $\frac{(2n)!}{(n+1)!n!}$. 
    - Wikipedia entry: https://en.wikipedia.org/wiki/Catalan_number
    - An exponentially growing series, which arises in many tree-like contexts
        - E.g., the number of possible triangulations of a polygon with n+2 sides
            - Turns up in triangulation of probabilistic graphical models (CS228)....
                        
<br>

<a name='2-3-3'></a>
### 2.3.3 Coordination scope ambiguity

e.g.: 
- \[Shuttle veteran and longtime NASA executive\] <ins>Fred Gregory</ins> appointed to board
- <ins>Shuttle veteran</ins> and \[longtime NASA executive\] <ins>Fred Gregory</ins>  appointed to board


<img src='../images/5-ambiguity-example3.png' width='600' height='300'>


e.g.: No heart, cognitive issues: 
- No heart issues and no cognitive issues = No heart or cognitive issues 
- No heart, \[but\] cognitive issues



<a name='2-3-4'></a>
### 2.3.4 Adjectival modifier ambiguity

e.g.: Students get first hand job experience
- Students get first-hand job experience
- Students get first \[hand job experience\]

<img src='../images/5-ambiguity-example4.png' width='600' height='300'>


<a name='2-3-5'></a>
### 2.3.5 Verb phrase (VP) attachment ambiguity

e.g.: Multilated body washes up on Rio beach to be used for Olympics beach volleyball
- <ins>\[Multilated body\]</ins> washes up on Rio beach <ins>\[to be used for Olympics beach volleyball\]</ins> 
- Multilated body washes up on <ins>\[Rio beach\]</ins>  <ins>\[to be used for Olympics beach volleyball\]</ins> 


<a name='3'></a>
# 3. Dependency grammar and dependency structure

<a name='3-1'></a>
## 3.1 A bit history of dependency grammar/parsing


<img src='../images/5-dependency-parsing-history.png' width='600' height='300'>

<br>

<a name='3-2'></a>
## 3.2 Dependency grammar and its graphic representation 

Dependency syntax postulates that syntactic structure consists of relations between lexical items, **normally binary asymmetric relations** (“arrows”) called **dependencies**.

Currently, there is no unified method to represent dependency structures graphically. For example, in the notebook, the arrows start from the head and point to the dependent. However, some may have the arrows start from the dependent and point to the head. Nevertheless, the following two methods are two basic ways to represent dependency structures in a graph. 


<br>

- Represented by a connected, acyclic, single-head tree 

<img src='../images/5-dependency grammar.png' width='600' height='300'>

<br>

- Represented by dependency arcs (curved arrows) above a sentence
    - Usually add a fake ROOT so every word is a dependent of precisely 1 other node

<img src='../images/5-dependency-grammar-repr.png' width='600' height='300'>

<br>


<a name='3-3'></a>
## 3.3 Universal dependencies treebanks and annotated data

- Universal Dependencies: http://universaldependencies.org/
    - The Penn Treebank
    - Related paper: [Marcus et al. 1993. Building a Large Annotated Corpus of English: The Penn Treebank](https://aclanthology.org/J93-2004.pdf)

<img src='../images/5-the-penn-treebank.png' width='600' height='300'>

<br>

- Why spending time building treebanks?
    - Reusability of the labor
    - Many parsers, part-of-speech taggers, etc. can be built on it
    - Valuable resource for linguistics
    - Broad coverage,not just a few intuitions
    - Frequencies and distributional information 
    - A way to evaluate systems



<a name='4'></a>
# 4. Dependency parser

<a name='4-1'></a>
## 4.1 Dependency conditioning preferences

- Questions to ask before building a dependency parser?

<img src='../images/5-dependency-conditioning-preferences.png' width='600' height='300'>



<a name='4-2'></a>
## 4.2 General parsing rules

- If arrows can across, then a word can have an arrow pointed to it and have an arrow starting from it at the same time. (also see below)

<img src='../images/5-parsing-general.png' width='600' height='300'>


<a name='4-3'></a>
## 4.3 Projectivity

- Projectivity means no crossing, non-projective means crossing is allowable. 
- In the example before, "coffee" and "from" are the non-projective examples. 

<img src='../images/5-projectivity.png' width='600' height='300'>


<a name='4-4'></a>
## 4.4 Methods of dependency parsing

- Transition-based parsing will be introduced below. 

<img src='../images/5-parsing-methods.png' width='600' height='300'>


<a name='5'></a>
# 5. Greedy transition-based parsing

<a name='5-1'></a>
## 5.1 Overview

- Do not quite understand yet...
- Related paper: [Nivre. 2003. An Efficient Algorithm for Projective Dependency Parsing](https://aclanthology.org/W03-3017.pdf)

**Text description**

<img src='../images/5-transition-based-parsing.png' width='600' height='300'>


**Symbolic description**

<img src='../images/5-transition-based-parsing2.png' width='600' height='300'>


<a name='5-2'></a>
## 5.2 Arc-standard transition-based parser

<img src='../images/5-transition-based-parser-processing-example.png' width='600' height='300'>

<img src='../images/5-transition-based-parser-processing-example2.png' width='600' height='300'>


<a name='5-3'></a>
## 5.3 ML transition-based parser
- Related paper: [Nivre, J; Hall, J; & Nilssonm J. 2005. MaltParser: A Data-Driven Parser-Generator for Dependency Parsing](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.380.8455&rep=rep1&type=pdf)
- This can work very well, but require extensive feature engineering, so it is expensive to do. 

<img src='../images/5-MaltParser.png' width='600' height='300'>

<img src='../images/5-MaltParser2.png' width='600' height='300'>

<a name='5-4'></a>
## 5.4 Evaluation of accuracy
<img src='../images/5-accuracy.png' width='600' height='300'>

<a name='5-5'></a>
## 5.5 Problems

<img src='../images/5-problems.png' width='600' height='300'>

<a name='6'></a>
# 6. Neural dependency parser

- Related paper: [Chen & Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks](https://aclanthology.org/D14-1082/)

<a name='6-1'></a>
## 6.1 General idea

<img src='../images/5-neuralparserGeneral.png' width='600' height='300'>

<a name='6-2'></a>
## 6.2 Preprocessing
<img src='../images/5-neuralparserPreprocessing.png' width='600' height='300'>

<a name='6-3'></a>
## 6.3 Model architecture

<img src='../images/5-neuralparserModel.png' width='600' height='300'>

<a name='6-4'></a>
## 6.4 Comparison

<img src='../images/5-neuralParser.png' width='600' height='300'>
<img src='../images/5-neuralParser2.png' width='600' height='300'>

<a name='7'></a>
# 7. References

- [Course website](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/index.html)

- [Lecture video](https://www.youtube.com/watch?v=nC9_RfjYwqA&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z&index=6) 

- [Lecture slide](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/slides/cs224n-2019-lecture05-dep-parsing.pdf)

- [Lecture slide annotated](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/slides/cs224n-2019-lecture05-dep-parsing-scrawls.pdf)