## Assignment: Working with Dependency Graphs (Parses)

The objective of the assignment is to learn how to work with dependency graphs by defining functions.

Read [spaCy documentation on dependency parser](https://spacy.io/api/dependencyparser) to learn provided methods.

Define functions to:
- Extract a path of dependency relations from the ROOT to a token
- Extract subtree of a dependents given a token
- Check if a given list of tokens (segment of a sentence) forms a subtree
- Identify head of a span, given its tokens
- Extract sentence subject, direct object and indirect object spans

## Assignment: Training Transition-Based Dependency Parser (Optional & Advanced)

- Modify [NLTK Transition parser](https://github.com/nltk/nltk/blob/develop/nltk/parse/transitionparser.py)'s `Configuration` class to use better features.
- Evaluate the features comparing performance to the original
- Replace `SVM` classifier with an alternative of your choice.

In [1]:
import spacy

sentence = 'I saw the man with a telescope.'

nlp = spacy.load('en_core_web_sm')

### 1. Extract a path of dependency relations from the ROOT to a token

In [2]:
def get_paths(sentence):
    doc = nlp(sentence)
    result = {}
    for token in doc:
        result[token.text] = [*reversed(list([a.dep_ for a in token.ancestors])), token.dep_]
    return result

# Examples
paths = get_paths(sentence)
for token, path in paths.items():
    print(f'Path from ROOT to "{token}":  {" -> ".join(path)}')

Path from ROOT to "I":  ROOT -> I
Path from ROOT to "saw":  saw
Path from ROOT to "the":  ROOT -> dobj -> the
Path from ROOT to "man":  ROOT -> man
Path from ROOT to "with":  ROOT -> with
Path from ROOT to "a":  ROOT -> prep -> pobj -> a
Path from ROOT to "telescope":  ROOT -> prep -> telescope
Path from ROOT to ".":  ROOT -> .


### 2. Extract subtree of a dependents given a token

In [3]:
def get_subtrees(sentence):
    doc = nlp(sentence)
    result = {}
    for t in doc:
        result[t.text] = list(t.subtree)
    return result

# Examples
subtrees = get_subtrees(sentence)
for token, subtree in subtrees.items():
    print(f'Subtree of "{token}": {subtree}')
    

Subtree of "I": [I]
Subtree of "saw": [I, saw, the, man, with, a, telescope, .]
Subtree of "the": [the]
Subtree of "man": [the, man]
Subtree of "with": [with, a, telescope]
Subtree of "a": [a]
Subtree of "telescope": [a, telescope]
Subtree of ".": [.]


### 3. Check if a given list of tokens (segment of a sentence) forms a subtree

In [4]:
def is_subtree(sentence, subtree):
    doc = nlp(sentence)
    for query_token in subtree: # Check for each possible root of the given subtree
        for t in doc:
            if t.text == query_token and [x.text for x in t.subtree] == subtree:
                return True
    return False

# Examples
examples = [['saw', 'with', 'telescope'], ['a', 'telescope']]
for e in examples:
    print(f'Is {e} a subtree: {is_subtree(sentence, e)}')

Is ['saw', 'with', 'telescope'] a subtree: False
Is ['a', 'telescope'] a subtree: True


### 4. Identify head of a span, given its tokens

In [5]:
def get_head(span):
    doc = nlp(span)
    return next(doc.sents).root

# Examples
examples = ['I saw the man with a telescope.', 'The quick brown fox jumps over the lazy dog.']
for span in examples:
    print(f'Head of "{span}": {get_head(span)}')

Head of "I saw the man with a telescope.": saw
Head of "The quick brown fox jumps over the lazy dog.": jumps


### 5. Extract sentence subject, direct object and indirect object spans

In [6]:
def extract_dep(sentence):
    doc = nlp(sentence)
    result = {
        'nsubj': [],
        'dobj': [],
        'iobj': []
    }
    for token in doc:
        if token.dep_ in result.keys():
            result[token.dep_].append(token.text)
    return result

r = extract_dep(sentence)
for key, value in r.items():
    print(f'{key}: {value}')

nsubj: ['I']
dobj: ['man']
iobj: []


### 6. Modify NLTK Transition parser's Configuration class to use better features.

In [7]:
from nltk.parse.transitionparser import TransitionParser
from nltk.corpus import dependency_treebank
from nltk.parse import DependencyEvaluator


### 7. Evaluate the features comparing performance to the original


In [8]:
'''
train_size = 100
test_size = 10

# Init models
tp_original = TransitionParser('arc-standard') # or 'arc-eager'
tp_custom = TransitionParser('arc-standard') # or 'arc-eager'

# Train
tp_original.train(dependency_treebank.parsed_sents()[:train_size], 'models/tp_original.model')
tp_custom.train(dependency_treebank.parsed_sents()[:train_size], 'models/tp_custom.model')

# Test and evaluation
parses_original = tp.parse(dependency_treebank.parsed_sents()[-test_size:], 'models/tp_original.model')
parses_custom = tp.parse(dependency_treebank.parsed_sents()[-test_size:], 'models/tp_custom.model')
las_original, uas_original = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-test_size:]).eval()
las_custom, uas_custom = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-test_size:]).eval()
print(f'\n\nResults:')
print(f'- Original TransitionParser: LAS={las_original:.04f}, UAS={uas_original:.04f}')
print(f'- Custom TransitionParser:   LAS={las_custom:.04f}, UAS={uas_custom:.04f}')
'''

"\ntrain_size = 100\ntest_size = 10\n\n# Init models\ntp_original = TransitionParser('arc-standard') # or 'arc-eager'\ntp_custom = TransitionParser('arc-standard') # or 'arc-eager'\n\n# Train\ntp_original.train(dependency_treebank.parsed_sents()[:train_size], 'models/tp_original.model')\ntp_custom.train(dependency_treebank.parsed_sents()[:train_size], 'models/tp_custom.model')\n\n# Test and evaluation\nparses_original = tp.parse(dependency_treebank.parsed_sents()[-test_size:], 'models/tp_original.model')\nparses_custom = tp.parse(dependency_treebank.parsed_sents()[-test_size:], 'models/tp_custom.model')\nlas_original, uas_original = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-test_size:]).eval()\nlas_custom, uas_custom = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-test_size:]).eval()\nprint(f'\n\nResults:')\nprint(f'- Original TransitionParser: LAS={las_original:.04f}, UAS={uas_original:.04f}')\nprint(f'- Custom TransitionParser:   LAS={las_custo

### 8. Replace SVM classifier with an alternative of your choice.