## Assignment: Working with Dependency Graphs (Parses)

The objective of the assignment is to learn how to work with dependency graphs by defining functions.

Read [spaCy documentation on dependency parser](https://spacy.io/api/dependencyparser) to learn provided methods.

Define functions to:
- Extract a path of dependency relations from the ROOT to a token
- Extract subtree of a dependents given a token
- Check if a given list of tokens (segment of a sentence) forms a subtree
- Identify head of a span, given its tokens
- Extract sentence subject, direct object and indirect object spans

## Assignment: Training Transition-Based Dependency Parser (Optional & Advanced)

- Modify [NLTK Transition parser](https://github.com/nltk/nltk/blob/develop/nltk/parse/transitionparser.py)'s `Configuration` class to use better features.
- Evaluate the features comparing performance to the original
- Replace `SVM` classifier with an alternative of your choice.

In [2]:
import spacy

doc_text = 'I saw the man with a telescope.'

nlp = spacy.load('en_core_web_sm')
doc = nlp(doc_text)

### 1. Extract a path of dependency relations from the ROOT to a token

In [3]:
def get_path(doc, token):
    for t in doc:
        if t.text == token: 
            path = list(t.ancestors)
            path.reverse()
            path.append(t)
            return path
    return None

# Examples
examples = ['the', 'a']
for e in examples:
    print(f'Path from ROOT to "{e}":', get_path(doc, e))

Path from ROOT to "the": [saw, man, the]
Path from ROOT to "a": [saw, with, telescope, a]


### 2. Extract subtree of a dependents given a token

In [4]:
def get_subtree(doc, token):
    for t in doc:
        if t.text == token: 
            return list(t.subtree)
    return None

# Examples
examples = ['saw', 'telescope']
for e in examples:
    print(f'Subtree of "{e}":', get_subtree(doc, e))

Subtree of "saw": [I, saw, the, man, with, a, telescope, .]
Subtree of "telescope": [a, telescope]


### 3. Check if a given list of tokens (segment of a sentence) forms a subtree

In [5]:
def is_subtree(doc, subtree):
    for query_token in subtree: # Check for each possible root of the given subtree
        for t in doc:
            if t.text == query_token and [x.text for x in t.subtree] == subtree:
                return True
    return False

# Examples
examples = [['saw', 'with', 'telescope'], ['a', 'telescope']]
for e in examples:
    print(f'Is {e} a subtree:', is_subtree(doc, e))

Is ['saw', 'with', 'telescope'] a subtree: False
Is ['a', 'telescope'] a subtree: True


### 4. Identify head of a span, given its tokens

In [6]:
def get_head(span):
    return span.root

# Examples
for span in doc.sents:
    print(f'Head of "{span}":', get_head(span))

Head of "I saw the man with a telescope.": saw


### 5. Extract sentence subject, direct object and indirect object spans

In [14]:
def get_spans(doc):
    nsubj, dobj, iobj = None, None, None
    for token in doc:
        if token.dep_ == 'nsubj': nsubj = token.text
        elif token.dep_ == 'dobj': dobj = token.text
        elif token.dep_ == 'iobj': iobj = token.text
        if nsubj is not None and dobj is not None and iobj is not None:
            break
    return {
        'Sentence subject (NSUBJ)': nsubj,
        'Direct object (DOBJ)': dobj,
        'Indirect object (IOBJ)': iobj
    }

r = get_spans(doc)
for key, value in r.items():
    print(f'{key}: {value if value is not None else "NOT AVAILABLE"}')

Sentence subject (NSUBJ): I
Direct object (DOBJ): man
Indirect object (IOBJ): NOT AVAILABLE


### 6. Modify NLTK Transition parser's Configuration class to use better features.

In [8]:
from nltk.parse.transitionparser import TransitionParser
from nltk.corpus import dependency_treebank
from nltk.parse import DependencyEvaluator


 Number of training examples : 100
 Number of valid (projective) examples : 100
[LibSVM]LAS: 0.7917, UAS: 0.7917


### 7. Evaluate the features comparing performance to the original


In [12]:
train_size = 100
test_size = 10

# Init models
tp_original = TransitionParser('arc-standard') # or 'arc-eager'
tp_custom = TransitionParser('arc-standard') # or 'arc-eager'

# Train
tp_original.train(dependency_treebank.parsed_sents()[:train_size], 'models/tp_original.model')
tp_custom.train(dependency_treebank.parsed_sents()[:train_size], 'models/tp_custom.model')

# Test and evaluation
parses_original = tp.parse(dependency_treebank.parsed_sents()[-test_size:], 'models/tp_original.model')
parses_custom = tp.parse(dependency_treebank.parsed_sents()[-test_size:], 'models/tp_custom.model')
las_original, uas_original = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-test_size:]).eval()
las_custom, uas_custom = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-test_size:]).eval()
print(f'\n\nResults:')
print(f'- Original TransitionParser: LAS={las_original:.04f}, UAS={uas_original:.04f}')
print(f'- Custom TransitionParser:   LAS={las_custom:.04f}, UAS={uas_custom:.04f}')

 Number of training examples : 100
 Number of valid (projective) examples : 100
[LibSVM] Number of training examples : 100
 Number of valid (projective) examples : 100
[LibSVM]

Results:
- Original TransitionParser: LAS=0.7750, UAS=0.7750
- Custom TransitionParser:   LAS=0.7750, UAS=0.7750


### 8. Replace SVM classifier with an alternative of your choice.