# Dependency Parsing

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Inline plotting
%matplotlib inline

Dependency parsing is the task of analyzing the structure of sentences in (human) languages.

Why study the structure of sentences? Because we want to be able to interpret language correctly. Humans communicate complex ideas by composing words together into bigger units to convey complex meanings. We need to know what is connected to what.

There are two key tools to solve this problem:
- Constituency Grammar
- Dependency Structure/Grammar (very popular today)

## Constituency Grammar

Constituency grammar is a way to describe the syntax of a language or the structure of sentences. The idea is to use phrase structure grammers (or context-free grammers) to organize words into nested constituents. A grammer is a set of rules that allows us to generate valid sentences of a language.

<img src="figures/chomsky.png" width="400" />





## Dependency Structure

The other tool, Dependency Structure, describes which words depend on which other words. When one word $w_1$ is dependent on another word $w_1$, then $w_1$ modifies or is an argument of $w_2$.

Suppose we have a sentence:

"Look for the large barking dog by the door in a crate."

- The word *barking* is dependent on *dog* because *barking* modifies *dog*. 
- The word *large* is dependent on *dog* since it modifies *dog*.


Normally, we indicate dependencies by arrows.

<img src="figures/dependency-structure-example.png" width="500" />





### Ambiguous Sentences

An ambiguous sentence can be thought about in terms of different dependencies:

<img src="figures/dependency-structure-ambiguity.png" width="500" />









### Universal Dependencies

Universal Dependencies treebank is a database of sentences where people have manually annotated sentences into dependency graphs. 

<img src="figures/ud-treebank-example.png" width="700" />








Source: [SETS treebank](http://bionlp-www.utu.fi/dep_search/) search maintained by the University of Turku

Building a treebank seems a lot slower and less useful than building a grammar that can generate infinite number of sentences.

A treebank has many advantages:
- Reusability of the labor (grammars cannot be reused because everyone makes up their own)
  - Many parsers, part-of-speech taggers, etc. can be built on it
  - They are valuable resource for linguistics because they give examples of how a language is spoken and syntactic analysis of sentences
- Broad coverage, not just a few intuitions
- Frequencies and distributional information
  - E.g. co-occurence of words
- More crucially, it provides a way to evaluate systems because it provides ground-truth gold standard data



## Dependency Syntax

The theory of dependency syntax postulates that syntactic structure consists of relations between lexical items (i.e., words), normally binary asymmetric relations (i.e., we draw arrows) called dependencies.

The arrows/dependencies are commonly typed with the name of grammatical relations such as subject, prepositional object, apposition, etc. This is referred to as **typed dependency grammar**.

<img src="figures/dependency-structure-example-2.png" width="400" />
















The arrow connects a **head** (or governor/superior/regent) with a **dependent** (or modifier/inferior/subordinate).

Usually, dependencies form a tree which means that they have certain graph theorical properties:
- they have a single head (root node). 
- connected
- they are acyclic
