# Grammatical dependencies

**Dependencies** are **binary relations** that link two words. The links are importantly **directed** and can either be **incoming** or **outgoing**.

# The case for dependency parsing over constituency parsing

The value of a **dependency grammar** is that we now have greater expressive power to handle languages that do not behave like English. For example, languages like Czech, Finnish, or Turkish have highly **flexible word order**. 

Dependency parses do not represent word order information. Instead, they represent grammatical relations between words (or tokens). 

# Universal Dependencies

http://universaldependencies.org/docsv1/#en

* a universal inventory of categories and guidelines
* allows for consistent annotation of similar constructions across languages
* permits language-specific changes if necessary

### Guiding principles of Universal Dependencies


1. UD needs to be satisfactory on linguistic analysis grounds for individual languages.
2. UD needs to be good for linguistic typology, i.e., providing a suitable basis for bringing out cross-linguistic parallelism across languages and language families.
3. UD must be suitable for rapid, consistent annotation by a human annotator.
4. UD must be suitable for computer parsing with high accuracy.
5. UD must be easily comprehended and used by a non-linguist, whether a language learner or an engineer with prosaic needs for language processing. We refer to this as seeking a habitable design, and it leads us to favor traditional grammar notions and terminology.
6. UD must support well downstream language understanding tasks (relation extraction, reading comprehension, machine translation, ...).

### Universal dependency relations for English

    acl                  clausal modifier of noun
    acl:relcl            relative clause modifier
    advcl                adverbial clause modifier
    advmod               adverbial modifier
    amod                 adjectival modifier
    appos                appositional modifier
    aux                  auxiliary
    auxpass              passive auxiliary
    case                 case marking
    cc                   coordination
    cc:preconj           preconjunct
    ccomp                clausal complement
    compound             compound
    compound:prt         phrasal verb particle
    conj                 conjunct
    cop                  copula
    csubj                clausal subject
    csubjpass            clausal passive subject
    dep                  dependent
    det                  determiner
    det:predet           predeterminer
    discourse            discourse element
    dislocated           dislocated elements
    dobj                 direct object
    expl                 expletive
    foreign              foreign words
    goeswith             goes with
    iobj                 indirect object
    list                 list
    mark                 marker
    mwe                  multi-word expression
    name                 name
    neg                  negation modifier
    nmod                 nominal modifier
    nmod:npmod           noun phrase as adverbial modifier
    nmod:poss            possessive nominal modifier
    nmod:tmod            temporal modifier
    nsubj                nominal subject
    nsubjpass            passive nominal subject
    nummod               numeric modifier
    parataxis            parataxis
    punct                punctuation
    remnant              remnant in ellipsis
    reparandum           overridden disfluency
    root                 root
    vocative             vocative
    xcomp                open clausal complement

# Finding matching syntactic patterns

Sometimes, we are interested in working with specific relevant examples for our work. We don't necessarily want to run everything through BERT and hope we magically get the right structure (or spend a lot of time working with embeddings), so we often need to search (or colloquially sometimes referred to as "grep", named after the command line function `grep`) for matching patterns.

So far, we have talked about **constituents** and how dependency parsers allow for links to exist between the different elements of a sentence. 

How do we find "chunks" or constituents that we care about? 

Turns out, spaCy gives us that type of tool for free! So, we are going to use a combination of the spacy visualizer and the dependency matcher to extract different types of constituents (with a focus on noun phrases and their semantic roles).

In [None]:
# # If you are using colab you can uncomment this
# !pip install stanza
# !pip install spacy-stanza
# # https://spacy.io/universe/project/spacy-stanza

In [1]:
import stanza
import spacy_stanza
from spacy.matcher import DependencyMatcher
from spacy import displacy

stanza.download("en")
nlp = spacy_stanza.load_pipeline("en")

doc = nlp("Radish likes chicken.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)

  from .autonotebook import tqdm as notebook_tqdm
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.5.0.json: 200kB [00:00, 33.8MB/s]                    
2023-03-25 11:45:03 INFO: Downloading default packages for language: en (English) ...
Downloading https://huggingface.co/stanfordnlp/stanza-en/resolve/v1.5.0/models/default.zip: 100%|██████████| 594M/594M [00:12<00:00, 48.6MB/s] 
2023-03-25 11:45:20 INFO: Finished downloading models and saved to /Users/cj/stanza_resources.
2023-03-25 11:45:20 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.5.0.json: 200kB [00:00, 18.5MB/s]                    
2023-03-25 11:45:22 INFO: Loading these models for language: en (English):
| Processor    | Package   |
---------------

Radish radish NOUN nsubj 
likes like VERB root 
chicken chicken NOUN obj 
. . PUNCT punct 
()


[W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.


# Prepositional Datives and Double Object Datives

* PO: Dr. Jacobs gave chicken to Radish
* DO: Dr. Jacobs gave || Radish || his chicken. ||
* DO_2: Dr. Jacobs gave || Radish || chicken. ||

Subject: Dr. Jacobs \\
Verb: gave \\
Direct object: chicken/his chicken \\
Indirect object: Radish (to Radish)

In [2]:
# PO: Dr. Jacobs gave the chicken to Radish
PO_matcher = DependencyMatcher(nlp.vocab)
PO = [  # looks for direct objects followed by indirect objects
    {"RIGHT_ID": "direct_obj",
     "RIGHT_ATTRS": {
        "DEP": "obj" # dobj in spacy
     }},
    {
      "LEFT_ID": "direct_obj",
      "LEFT_ATTRS": {
          "DEP": "obj" # dobj in spacy
      },
      "RIGHT_ID": "indirect_obj",
      "REL_OP": "$++",  # to the right and sibling
      "RIGHT_ATTRS": {
          "DEP": "obl", # dative in spacy
      }
   }
]
PO_matcher.add("PO_dative", [PO])

PO_doc = nlp("The doctor gave the briefcase to the architect.")
PO_matches = PO_matcher(PO_doc)
print(PO_matches) # [(8853417898123013068, [4, 7])]
match_id, token_ids = PO_matches[0]
for i in range(len(token_ids)):
    token = PO_doc[token_ids[i]]
    print([(x.dep_, x.text) for x in token.subtree])

[(8853417898123013068, [4, 7])]
[('det', 'the'), ('obj', 'briefcase')]
[('case', 'to'), ('det', 'the'), ('obl', 'architect')]


In [3]:
displacy.render(nlp("The doctor gave the briefcase to the architect."), jupyter=True)

In [None]:
displacy.render(nlp("The doctor sent the lemon with the brown spots to the architect."), jupyter=True)

In [4]:
displacy.render(nlp("The doctor gave the lemon that had the brown spots to the architect."), jupyter=True)

In [None]:
displacy.render(nlp("The doctor gave the architect the lemon that had the brown spots."), jupyter=True)

# Relative clause extractions

In [10]:
for token in nlp("The doctor liked the lemon that had the brown spots."):
    # print(' '.join([x.text for x in token.subtree]))
    print(' '.join([x.dep_ for x in token.subtree]))

det
det nsubj
det nsubj root det obj nsubj acl:relcl det amod obj punct
det
det obj nsubj acl:relcl det amod obj
nsubj
nsubj acl:relcl det amod obj
det
amod
det amod obj
punct


In [17]:
# Relative clauses

relcl_doc = nlp("The doctor liked the lemon that had the brown spots.")
matches = [token for token in relcl_doc if token.dep_=='acl:relcl']
for token in matches:
    print([(x.dep_, x.text) for x in token.subtree])

[('nsubj', 'that'), ('acl:relcl', 'had'), ('det', 'the'), ('amod', 'brown'), ('obj', 'spots')]


In [18]:
obj_doc = nlp("Dr. Jacobs gave Radish chicken.")
matches = [token for token in obj_doc if token.dep_=='obj']
for token in matches:
    print([(x.dep_, x.text) for x in token.subtree])

[('compound', 'Radish'), ('obj', 'chicken')]


In [19]:
obj_doc2 = nlp("Dr. Jacobs gave chicken to Radish.")
matches = [token for token in obj_doc2 if token.dep_=='obj']
for token in matches:
    print([(x.dep_, x.text) for x in token.subtree])

[('obj', 'chicken')]


In [20]:
obj_doc = nlp("Dr. Jacobs gave Radish fresh chicken.")
matches = [token for token in obj_doc if token.dep_=='obj']
for token in matches:
    print([(x.dep_, x.text) for x in token.subtree])

[('amod', 'fresh'), ('obj', 'chicken')]


In [24]:
obl_doc = nlp("Dr. Jacobs gave Radish fresh chicken.")
# for token in obl_doc:
#     print(' '.join([x.dep_ for x in token.subtree]))
matches = [token for token in obj_doc if token.dep_=='iobj']
for token in matches:
    print([(x.dep_, x.text) for x in token.subtree])

[('iobj', 'Radish')]
