### What is Dependency Parsing?

Dependency Parsing is the process to analyze the grammatical structure in a sentence and find out related words as well as the type of the relationship between them.

Each relationship:

1. Has one head and a dependent that modifies the head.
2. Is labeled according to the nature of the dependency between the head and the dependent. These labels can be found at https://universaldependencies.org/u/dep/


Source: https://towardsdatascience.com/natural-language-processing-dependency-parsing-cf094bbbe3f7

There are different ways to do dependency parsing. The main 3 ways are:

1. Using spaCy library
2. Using standard nltk with standard Stanford CoreNLP
3. Using stanza

#### Using spaCy

In [1]:
import spacy
from spacy import displacy

In [3]:
nlp = spacy.load("en_core_web_sm")

In [5]:
sentence = "The llama couldn't resist trying the lemonade."

In [6]:
doc = nlp(sentence)

In [12]:
print ("{:<15} | {:<8} | {:<15} | {:<20}".format('Token','Relation','Head', 'Children'))
print ("-" * 70)

for token in doc:
  # Print the token, dependency nature, head and all dependents of the token
  print ("{:<15} | {:<8} | {:<15} | {:<20}"
         .format(str(token.text), str(token.dep_), str(token.head.text), str([child for child in token.children])))

# Use displayCy to visualize the dependency 
displacy.render(doc, style='dep', jupyter=True, options={'distance': 120})

Token           | Relation | Head            | Children            
----------------------------------------------------------------------
The             | det      | llama           | []                  
llama           | nsubj    | resist          | [The]               
could           | aux      | resist          | []                  
n't             | neg      | resist          | []                  
resist          | ROOT     | resist          | [llama, could, n't, trying, .]
trying          | xcomp    | resist          | [lemonade]          
the             | det      | lemonade        | []                  
lemonade        | dobj     | trying          | [the]               
.               | punct    | resist          | []                  


#### Using stanza (Neural network based approach)

In [14]:
import stanza
stanza.download('en')

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.6.0.json:   0%|   …

2023-11-03 13:26:48 INFO: Downloading default packages for language: en (English) ...


Downloading https://huggingface.co/stanfordnlp/stanza-en/resolve/v1.6.0/models/default.zip:   0%|          | 0…

2023-11-03 13:27:28 INFO: Finished downloading models and saved to /Users/adityasingh/stanza_resources.


In [15]:
nlp = stanza.Pipeline('en', processors = 'tokenize,mwt,pos,lemma,depparse')

2023-11-03 13:27:49 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.6.0.json:   0%|   …

2023-11-03 13:27:50 INFO: Loading these models for language: en (English):
| Processor | Package           |
---------------------------------
| tokenize  | combined          |
| pos       | combined_charlm   |
| lemma     | combined_nocharlm |
| depparse  | combined_charlm   |

2023-11-03 13:27:50 INFO: Using device: cpu
2023-11-03 13:27:50 INFO: Loading: tokenize
2023-11-03 13:27:50 INFO: Loading: pos
2023-11-03 13:27:51 INFO: Loading: lemma
2023-11-03 13:27:51 INFO: Loading: depparse
2023-11-03 13:27:51 INFO: Done loading processors!


In [16]:
doc = nlp(sentence)

In [17]:
doc.sentences

[[
   {
     "id": 1,
     "text": "The",
     "lemma": "the",
     "upos": "DET",
     "xpos": "DT",
     "feats": "Definite=Def|PronType=Art",
     "head": 2,
     "deprel": "det",
     "start_char": 0,
     "end_char": 3
   },
   {
     "id": 2,
     "text": "llama",
     "lemma": "llama",
     "upos": "NOUN",
     "xpos": "NN",
     "feats": "Number=Sing",
     "head": 5,
     "deprel": "nsubj",
     "start_char": 4,
     "end_char": 9
   },
   {
     "id": 3,
     "text": "could",
     "lemma": "could",
     "upos": "AUX",
     "xpos": "MD",
     "feats": "VerbForm=Fin",
     "head": 5,
     "deprel": "aux",
     "start_char": 10,
     "end_char": 15
   },
   {
     "id": 4,
     "text": "n't",
     "lemma": "not",
     "upos": "PART",
     "xpos": "RB",
     "head": 5,
     "deprel": "advmod",
     "start_char": 15,
     "end_char": 18
   },
   {
     "id": 5,
     "text": "resist",
     "lemma": "resist",
     "upos": "VERB",
     "xpos": "VB",
     "feats": "VerbForm=Inf",
    

In [18]:
doc.sentences[0].print_dependencies()

('The', 2, 'det')
('llama', 5, 'nsubj')
('could', 5, 'aux')
("n't", 5, 'advmod')
('resist', 0, 'root')
('trying', 5, 'xcomp')
('the', 8, 'det')
('lemonade', 6, 'obj')
('.', 5, 'punct')


In [19]:
print ("{:<15} | {:<10} | {:<15} ".format('Token', 'Relation', 'Head'))
print ("-" * 50)
  
# Convert sentence object to dictionary  
sent_dict = doc.sentences[0].to_dict()

# iterate to print the token, relation and head
for word in sent_dict:
  print ("{:<15} | {:<10} | {:<15} "
         .format(str(word['text']),str(word['deprel']), str(sent_dict[word['head']-1]['text'] if word['head'] > 0 else 'ROOT')))

Token           | Relation   | Head            
--------------------------------------------------
The             | det        | llama           
llama           | nsubj      | resist          
could           | aux        | resist          
n't             | advmod     | resist          
resist          | root       | ROOT            
trying          | xcomp      | resist          
the             | det        | lemonade        
lemonade        | obj        | trying          
.               | punct      | resist          
