### Overview
* <a href="#section1">What is a grammar, how does ambiguity emerge?</a>
* <a href="#section2">What is dependency grammar?</a>
    * contrast with constituency/CFG
    * mention the assumptions of DG, any issues there
    * briefly mention the shift-reduce parsing method SpaCy uses
* <a href="#section3">why is it useful?</a>
    * deal with languages with free word order
        * phrase structure rules/constituency parsers explode with rules.
        * eg NP -> ADJ NN and NN ADJ
    * applications: 
        * Grammar checker
        * chunking for information retrieval: 
            * which words modify the noun or verb?
            * extract noun phrases, verb phrases, adjective phrases, and prepositional phrases
            * see <a href="#relationextractor">example</a>
        * question answering 
        * feature engineering for: 
            * coreference resolution
                * When Bob goes to the movies [he] likes to buy popcorn with extra butter.          
            * machine translation:
                * "Bob goes" -> subject verb -> "Bob va"
* <a href="#section4"> Dependency Grammar with SpaCy:</a>
    * Traversing paths
    * Grabbing entities
    * Exercise:
        * extract all noun phrases from a sentence

In [1]:
from warnings import warn
def from_scratch():
    try:
        import gensim
    except ModuleNotFoundError:
        !pip install gensim >> ~/gensim.log
    try:
        import spacy
    except ModuleNotFoundError:
        !conda install spacy -y >> ~/spacy.log
    
    import spacy
    nlp = spacy.load('en')
    if nlp.parser is None:
        !python -m spacy download en >> ~/spacy.log
    del nlp
    
    !jupyter nbextension enable --py --sys-prefix widgetsnbextension >> ~enable-nbe.log
    try:
        import nltk
    except ModuleNotFoundError:
        !conda install nltk -y
    
from_scratch()

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


In [5]:
del syntax

In [1]:
from IPython.display import HTML
from assets.static_html import syntax
HTML(syntax)

The,man,went,to,the,store
determiner,noun/simple subject,verb,preposition,determiner,object (of preposition)
subject phrase/determiner phrase,subject phrase/determiner phrase,,prepositional phrase,prepositional phrase,prepositional phrase
,,verb phrase,verb phrase,verb phrase,verb phrase


In [3]:
from assets.static_html import morphology
HTML(morphology)

Unnamed: 0,Effect,Morpheme
,,
un,Negation,Prefix
reach,Verb,Root
able,To Adjective,Suffix


In [6]:
from assets.static_html import morph_and_syntax
HTML(morph_and_syntax)

# Models of Grammar

In [5]:
from static_html import valency
HTML(valency)

### Context Free Grammar, production rules

|Root| |Components |
|-----||---------------------------------------|
| S   |$\rightarrow$|NP VP                                 |
| NP |$\rightarrow$| Pron or PropN or Det Nom   |
| VP  |$\rightarrow$|V or V NP or V NP NN or V PP |
| PP  |$\rightarrow$|Prep NP                        |
| etc |$\rightarrow$|etc                                   |

### Constituency and Dependency Grammars

In [6]:
from IPython.display import Image
Image(url='https://upload.wikimedia.org/wikipedia/commons/8/8e/Thistreeisillustratingtherelation%28PSG%29.png')

In [8]:
from assets.static_html import dep_axioms
HTML(dep_axioms)

<a name="section3"</a>
### Applications of DG

<a name="section4"></a>
### DG with SpaCy

In [8]:
from util import displacy
displacy("Harry and little Sally like to eat", width=1500, height=500)

In [9]:
from IPython.display import HTML
from static_html import accessing_dependents
HTML(accessing_dependents)

### Examples

In [10]:
import spacy
nlp = spacy.load('en')

### Simple Relation Extractor

In [151]:
def is_verb(token):
    return token.pos_ == 'VERB'

def is_nsubj(token):
    return token.dep_ == 'nsubj'

def is_property(token):
    return token.dep_ in ('attr','acomp','dobj','pobj','prep')
    
def subtree(token):
    return list(map(lambda x: x.orth_, token.subtree))

def extract_relations(sentence):
    relations = []
    for VERB in filter(is_verb,sentence):
        for SUBJECT in filter(is_nsubj, VERB.children):
            for PROPERTY in filter(is_property, VERB.children):
                PROPERTY_SUBTREE = PROPERTY.doc[PROPERTY.left_edge.i:PROPERTY.right_edge.i+1]
                SUBJECT_SUBTREE = SUBJECT.doc[SUBJECT.left_edge.i:SUBJECT.right_edge.i+1]
                yield {'subject root':SUBJECT, 
                       'subject_subtree': SUBJECT_SUBTREE,
                       'property root':PROPERTY, 
                       'property_subtree': PROPERTY_SUBTREE,
                       'relation root':VERB, 
                       }

text = 'The man reluctantly gobbles cake, but the woman eats ice cream.'
doc = nlp(text)
list(extract_relations(doc))

[{'property root': cake,
  'property_subtree': cake,
  'relation root': gobbles,
  'subject root': man,
  'subject_subtree': The man},
 {'property root': cream,
  'property_subtree': ice cream,
  'relation root': eats,
  'subject root': woman,
  'subject_subtree': the woman}]

In [40]:
list(extract_relations(nlp("Jupyter was a cool tool")))

[{'property root': tool,
  'property_subtree': [a, cool, tool],
  'relation root': was,
  'subject root': Jupyter,
  'subject_subtree': [Jupyter]}]

In [14]:
text = """
This coffee is very smooth and flavorful. 
I will be buying more as gifts and more for myself. 
I like this company and will be seeing what other flavors I can check out. 
So glad I took a chance on this coffee.
"""
list(extract_relations(nlp(text), s_subtree=False, p_subtree=False))

[{'property': smooth, 'relation': is, 'subject': coffee},
 {'property': more, 'relation': buying, 'subject': I},
 {'property': company, 'relation': like, 'subject': I},
 {'property': flavors, 'relation': check, 'subject': I},
 {'property': chance, 'relation': took, 'subject': I}]

### Exercise:
Build a function that collects all descriptions of python.

##### Method
Method: collect all dependent clauses of "equivalence" verbs where python is the subject.
E.g: Python is <u>great</u>


##### Components:
* a matcher with an entity PYTHON, with an associated pattern for PYTHON
* an nlp pipeline that uses the custom matcher.
* a function that filters all the relations found in text to those that make equivalence statements about Python.


In [157]:
# Our data:
!pip install wikipedia >> ~/wikilog.txt
import wikipedia
page = wikipedia.page("Python_(programming_language)").content

In [159]:
from spacy.matcher import Matcher
from spacy import attrs

def merge_matches(matcher, doc, i, matches):
    '''
    Merge a phrase. We have to be careful here because we'll change the token indices.
    To avoid problems, merge all the phrases once we're called on the last match.
    '''
    if i != len(matches)-1:
        return None

    spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches]
    for ent_id, label, span in spans:
        span.merge(label=label, tag='NNP' if label else span.root.tag_)

#this should be a list of dictionaries, where each dictionary is {TOKEN PROPERTY: TOKEN VALUE}
#PROPERTIES are found in spacy.attrs, e.g. POS

my_python_pattern = []        
matcher = Matcher(nlp.vocab)
matcher.add_entity("PYTHON", on_match = merge_matches)
matcher.add_pattern("PYTHON", my_python_pattern, label='Python')
nlp.pipeline = [nlp.tagger, nlp.parser, matcher, nlp.entity]

equivalence_verbs = ['be']

def get_all_properties_of_python(text):
    """Converts text to document, and extracts all relations that define python equivalences"""
    doc = nlp(text)
    python_properties = []
    for relation in extract_relations(doc):
        ###Add your code here!
        pass
    return python_properties
