### spaCY

The spaCy Python library is designed for 'industrial-strength' NLP. Read installation instructions [here](https://spacy.io/). You should be able to install with pip or pip3:

```
pip3 install -U spacy
```

If you have a GPU, read the instructions for linking spacy with your cuda library.

After spaCy is installed, you should download at least one pretrained model. There are three models, small, medium and large that can be downloaded as follows:

```
$python3 -m spacy download en_core_web_sm
```
Simply change the 'sm' at the end to 'md' or 'bg' for the medium or large model.

In [1]:
import spacy

# load a model
nlp = spacy.load('en_core_web_md')

In [2]:
text = "Barack Obama was born in Hawaii.  He was elected president in 2008."

In [3]:
# create a spacy object
doc = nlp(text)

In [4]:
for token in doc:
    print(token, token.lemma_, token.pos_, token.is_alpha, token.is_stop)
    
# other attributes: token.dep_, token.shape_

Barack Barack PROPN True False
Obama Obama PROPN True False
was be AUX True True
born bear VERB True False
in in ADP True True
Hawaii Hawaii PROPN True False
. . PUNCT False False
    SPACE False False
He -PRON- PRON True True
was be AUX True True
elected elect VERB True False
president president NOUN True False
in in ADP True True
2008 2008 NUM False False
. . PUNCT False False


In [5]:
# extract noun phrases
[chunk.text for chunk in doc.noun_chunks]

['Barack Obama', 'Hawaii', 'He']

In [6]:
# get verbs
[token.lemma_ for token in doc if token.pos_ == "VERB"]

['bear', 'elect']

In [7]:
# NER
for entity in doc.ents:
    print(entity.text, entity.label_)

Barack Obama PERSON
Hawaii GPE
2008 DATE


In [10]:
# output dependency

for token in doc:
    print(token, token.dep_, token.head.text, token.head.pos_,
         [child for child in token.children])
    

Barack compound Obama PROPN []
Obama nsubjpass born VERB [Barack]
was auxpass born VERB []
born ROOT born VERB [Obama, was, in, .]
in prep born VERB [Hawaii]
Hawaii pobj in ADP []
. punct born VERB [ ]
   . PUNCT []
He nsubjpass elected VERB []
was auxpass elected VERB []
elected ROOT elected VERB [He, was, president, in, .]
president oprd elected VERB []
in prep elected VERB [2008]
2008 pobj in ADP []
. punct elected VERB []


In [39]:
# vizualization
from spacy import displacy

displacy.render(doc, style='dep')

In [40]:
for sentence in doc.sents:
    displacy.render(sentence, style='dep')

### Find the subject and predicate of a sentence

The next few blocks of code show how to extract the subject and predicate of a sentence. This code assumes that doc is one sentence in length, so that word.i gives an index of a word in a doc. If the doc contains multiple sentences, the index will be:

```
word.i - sent.start
```

You can also use the sentence span referenced in the spaCy [documentation](https://spacy.io/api/span). 

In [59]:
sentence = "Barack Obama was born in Hawaii."
doc = nlp(sentence)

In [60]:
subject = None
pred = None

In [61]:
# tuple together token, text and index
token_i = [(token, token.text, token.i) for token in doc]
print(token_i[0])

(Barack, 'Barack', 0)


In [62]:
root = [t[2] for t in token_i if t[0].dep_ == 'ROOT']
if root:
    root = root[0]
else:
    root = -1

In [63]:
# find predicate
if root >= 0:
    pred = token_i[root]
    print('Sentence predicate:', pred[1])
else:
    print('This sentence did not have a root dependency item')

Sentence predicate: born


In [64]:
# find subject
children = token_i[root][0].children

if children:
    subj_list = [token for token in children if token.dep_.startswith('nsubj')]
subj_list

[Obama]

In [65]:
if subj_list:
    # take first item as the subject
    subject = subj_list[0]
else:
    print('This sentence did not have a subject')

In [66]:
if subject and pred:
    print('Sentence subject and predicate:', subject.text, pred[1])

Sentence subject and predicate: Obama born
