# Spacy Dpendency Grapah 



* [Dependency Grammer](https://en.wikipedia.org/wiki/Dependency_grammar)

> Dependency is the notion that linguistic units, e.g. words, are connected to each other by directed links. 
> DGs have treated the syntactic functions (= grammatical functions, grammatical relations) as primitive. They posit an inventory of functions (e.g. subject, object, oblique, determiner, attribute, predicative, etc.). These functions can appear as labels on the dependencies in the tree structures.

> <img src="./image/dependency_grammer.png" align="left" width=500/>

* [Navigating the parse tree](https://spacy.io/usage/linguistic-features#navigating)

In [70]:
import pandas as pd
import spacy
from spacy.symbols import nsubj, VERB

In [90]:
def _to_text(tokens, sep=',') -> str:
    return sep.join(map(str, list(tokens)))

# Model

In [4]:
nlp = spacy.load("en_core_web_sm")

# Document

In [34]:
doc = nlp("Autonomous cars shift insurance liability toward manufacturers.")

# Dependency

* ```.dep_``` attribute is the label of the arc that points to the unit (e.g. token) and tells the function (e.g. nsubj, verb) of the unit, 
* ```.head``` attribute is the source unit of the dependency arc. Every token has exactly one head.
* ```.children``` attribute lists the direct descendant units of the dependencies from the unit.
* ```.subtree``` attribute gives the entire hierarchy/tree of the unit (**including the unit itself**).
* ```.left_edge``` points the left-most descendant in the subtree.
* ```.right_edge``` points to the right-most descendant in the subtree.

In [67]:
df = pd.DataFrame([
        {
            "index": token.i,
            "token": token.text,
            "pos"  : token.pos_,
            "dependency": f"{token.dep_} ({spacy.explain(token.dep_)})",
            "head/parent": token.head.text,
            "direct children": _to_text(token.children),
            "n left": token.n_lefts,
            "left children": _to_text(token.lefts),
            "n right": token.n_rights,
            "right children": _to_text(token.rights),
            "subtree": _to_text(token.subtree),
            "ancestors": _to_text(token.ancestors)
        }
        for token in doc
    ]
).set_index('index')
df.index.name = None
df

Unnamed: 0,token,pos,dependency,head/parent,direct children,n left,left children,n right,right children,subtree,ancestors
0,Autonomous,ADJ,amod (adjectival modifier),cars,,0,,0,,Autonomous,"cars,shift"
1,cars,NOUN,nsubj (nominal subject),shift,Autonomous,1,Autonomous,0,,"Autonomous,cars",shift
2,shift,VERB,ROOT (root),shift,"cars,liability,toward,.",1,cars,3,"liability,toward,.","Autonomous,cars,shift,insurance,liability,towa...",
3,insurance,NOUN,compound (compound),liability,,0,,0,,insurance,"liability,shift"
4,liability,NOUN,dobj (direct object),shift,insurance,1,insurance,0,,"insurance,liability",shift
5,toward,ADP,prep (prepositional modifier),shift,manufacturers,0,,1,manufacturers,"toward,manufacturers",shift
6,manufacturers,NOUN,pobj (object of preposition),toward,,0,,0,,manufacturers,"toward,shift"
7,.,PUNCT,punct (punctuation),shift,,0,,0,,.,shift


### Subtree Left/Right Edge

If a unit has no child, ```.left_edge``` and ```.right_edge``` point itself.

In [68]:
insurance = doc[3]
print(f"[{insurance}] left edge [{insurance.left_edge}] right edge [{insurance.right_edge}].")

[insurance] left edge [insurance] right edge [insurance].


# Dependency Tree

In [44]:
spacy.displacy.serve(doc, style="dep", auto_select_port=True, page=False)




Using the 'dep' visualizer
Serving on http://0.0.0.0:5001 ...

Shutting down server on port 5001.


# Finding Subject Noun Phrase

In [121]:
text = """
Autonomous electric cars in Europe shifted insurance liability to manufacturers, 
although the manufacturers did not like it, causing significant spending.
"""
doc = nlp(' '.join(text.split()))

subjects = []
for candidate in doc:
    if candidate.dep == nsubj and candidate.head.pos == VERB:
        subjects.append(candidate)

with doc.retokenize() as retokenizer:
    for subject in subjects:
        span = doc[subject.left_edge.i : subject.right_edge.i+1]
        retokenizer.merge(span)
        
for token in doc:
    print(f"{token.i:<4}{token.text:40}{token.pos_:7}{token.dep_}")

0   Autonomous electric cars in Europe      NOUN   nsubj
1   shifted                                 VERB   ROOT
2   insurance                               NOUN   compound
3   liability                               NOUN   dobj
4   to                                      ADP    prep
5   manufacturers                           NOUN   pobj
6   ,                                       PUNCT  punct
7   although                                SCONJ  mark
8   the manufacturers                       NOUN   nsubj
9   did                                     AUX    aux
10  not                                     PART   neg
11  like                                    VERB   advcl
12  it                                      PRON   dobj
13  ,                                       PUNCT  punct
14  causing                                 VERB   advcl
15  significant                             ADJ    amod
16  spending                                NOUN   dobj
17  .                                   