We're going to use the [spaCy](https://spacy.io/) grammar and parsing functionality for this first part. That means you need to run the following two lines at the command line: 
```
pip install -U spacy
python -m spacy download en_core_web_sm
```

We also make use of the SVO parser found [here](https://github.com/NSchrading/intro-spacy-nlp). The entire introductory notebook is pretty cool. 

In [1]:
import nltk
import sqlite3
import spacy
from subject_object_extraction import findSVOs

In [2]:
# Set up our parser
parser = spacy.load('en_core_web_sm')

Works pretty well on simple sentences.

In [3]:
doc = parser("The Republican Party will remain the voice of America") #the patriotic heroes who keep America safe and salute the American flag.")
print(findSVOs(doc))

[('party', 'remain', 'voice')]


And includes a tremendous about of information about the words. Look at the below and see if you can figure out what these elements mean.

In [4]:
for word in doc :
    print(f'{word.text:<15}{word.tag_:<5}{word.dep_:<10}{word.pos_:<10}{word.head.text}')

The            DT   det       DET       Party
Republican     NNP  compound  PROPN     Party
Party          NNP  nsubj     PROPN     remain
will           MD   aux       VERB      remain
remain         VB   ROOT      VERB      remain
the            DT   det       DET       voice
voice          NN   attr      NOUN      remain
of             IN   prep      ADP       voice
America        NNP  pobj      PROPN     of


In fact, each element of these parsed docs is a "token" which is a spaCy object.

In [5]:
type(word)

spacy.tokens.token.Token

Use the cell below to see all the attributes and methods that the token has:

In [7]:
word

America

This parser has a slightly harder time with more complicated structures.

In [8]:
doc = parser(
    "The Republican Party will remain the voice of the patriotic heroes who keep America safe and salute the American flag."
)
print(findSVOs(doc))

[('party', 'remain', 'voice'), ('who', 'keep', 'america'), ('who', 'keep', 'safe'), ('heroes', 'salute', 'flag')]


It can also work with multiple sentences, but again not all SVOs will be extracted correctly.

In [9]:
doc = parser("""
    They call them peaceful protestors. 
    We’re honored to be joined tonight by his wonderful wife Ann, 
    and beloved family members Brian and [inaudible]. 
    To each of you, we will never forget the heroic legacy of Captain David Dorn. 
    Thank you very much for being here. 
    Thank you. Thank you very much. 
    Great man. Great man. 
    As long as I am President, we will defend the absolute right of every American citizen 
    to live in security, dignity, and peace. 
    If the Democrat Party wants to stand with anarchists, agitators, rioters, 
    looters, and flag burners, that is up to them. 
    But I, as your President, will not be a part of it. 
    The Republican Party will remain the voice of the patriotic 
    heroes who keep America safe and salute the American flag.
""")

In [10]:
print(findSVOs(doc))

[('they', 'call', 'them'), ('they', 'call', 'protestors'), ('we', '!forget', 'legacy'), ('we', 'defend', 'right'), ('party', 'stand', 'anarchists'), ('party', 'remain', 'voice'), ('who', 'keep', 'america'), ('who', 'keep', 'safe'), ('heroes', 'salute', 'flag')]


Notice what sort of object is returned and appreciate the fact that you could store and analyze these. 