## Assignment: Working with Dependency Graphs (Parses)

The objective of the assignment is to learn how to work with dependency graphs by defining functions.

Read [spaCy documentation on dependency parser](https://spacy.io/api/dependencyparser) to learn provided methods.

Define functions to:
- Extract a path of dependency relations from the ROOT to a token
- Extract subtree of a dependents given a token
- Check if a given list of tokens (segment of a sentence) forms a subtree
- Identify head of a span, given its tokens
- Extract sentence subject, direct object and indirect object spans


In [9]:
import spacy

sentence = 'I saw the man with a telescope.'

nlp = spacy.load('en_core_web_sm') # Load the English model
doc = nlp(sentence)

# spacy.displacy.render(doc) # Shows the parsing result

### 1. Extract a path of dependency relations from the ROOT to a token

In [10]:
def get_paths(sentence):
    doc = nlp(sentence)
    results = []
    for token in doc:
        results.append([*reversed(list([a.dep_ for a in token.ancestors])), token.dep_]) # Extract the path from the ROOT to the token. Reversed to get the order from ROOT to the token and not the opposite.
    return results

# Examples
paths = get_paths(sentence)
for i, path in enumerate(paths):
    print(f'Path from ROOT to "{doc[i]}":  {" -> ".join(path)}')

Path from ROOT to "I":  ROOT -> nsubj
Path from ROOT to "saw":  ROOT
Path from ROOT to "the":  ROOT -> dobj -> det
Path from ROOT to "man":  ROOT -> dobj
Path from ROOT to "with":  ROOT -> dobj -> prep
Path from ROOT to "a":  ROOT -> dobj -> prep -> pobj -> det
Path from ROOT to "telescope":  ROOT -> dobj -> prep -> pobj
Path from ROOT to ".":  ROOT -> punct


### 2. Extract subtree of a dependents given a token

In [11]:
def get_subtrees(sentence):
    doc = nlp(sentence)
    results = []
    for token in doc:
        results.append([t.text for t in token.subtree]) # Extract the subtree of the token as a list of strings
    return results

# Examples
subtrees = get_subtrees(sentence)
for i, subtree in enumerate(subtrees):
    print(f'Subtree of "{doc[i]}": {subtree}')
    

Subtree of "I": ['I']
Subtree of "saw": ['I', 'saw', 'the', 'man', 'with', 'a', 'telescope', '.']
Subtree of "the": ['the']
Subtree of "man": ['the', 'man', 'with', 'a', 'telescope']
Subtree of "with": ['with', 'a', 'telescope']
Subtree of "a": ['a']
Subtree of "telescope": ['a', 'telescope']
Subtree of ".": ['.']


### 3. Check if a given list of tokens (segment of a sentence) forms a subtree

In [12]:
def is_subtree(sentence, subtree):
    doc = nlp(sentence)
    for token in doc: # Check for each possible subtree in the document
        if [t.text for t in token.subtree] == subtree: # Check if the two subtrees are equal 
            return True
    return False

# Examples
examples = [['saw', 'with', 'telescope'], ['a', 'telescope']]
for e in examples:
    print(f'Is {e} a subtree: {is_subtree(sentence, e)}')

Is ['saw', 'with', 'telescope'] a subtree: False
Is ['a', 'telescope'] a subtree: True


### 4. Identify head of a span, given its tokens

In [13]:
def get_head(span):
    doc = nlp(span)
    return list(doc.sents)[0].root # Returns the root of the first span. It assumes that the input sentence contains only one span.

# Examples
examples = [sentence, 'The quick brown fox jumps over the lazy dog.']
for span in examples:
    print(f'Head of "{span}": {get_head(span)}')

Head of "I saw the man with a telescope.": saw
Head of "The quick brown fox jumps over the lazy dog.": jumps


### 5. Extract sentence subject, direct object and indirect object spans

In [14]:
def extract_deps(sentence):
    doc = nlp(sentence)
    result = {
        'nsubj': [],
        'dobj': [],
        'iobj': []
    }
    for token in doc: 
        if token.dep_ in result.keys(): # For each token in the parsed doc, check if it is a nsubj, dobj or iobj
            result[token.dep_].append(' '.join([t.text for t in token.subtree])) # Extract the subtree of the token as resulting span
    return result

# Example
r = extract_deps(sentence)
for key, value in r.items():
    print(f'{key}: {value}')

nsubj: ['I']
dobj: ['the man with a telescope']
iobj: []
