# spaCy

The spaCy Python library is designed for 'industrial-strength' NLP. Read installation instructions [here](https://spacy.io/usage). You should be able to install with pip or pip3:

```
pip3 install -U spacy
```

spaCy can also be installed with conda, or compiled from source. If you have a GPU, read the instructions for linking spacy with your cuda library.

After spaCy is installed, you should download at least one pretrained model. There are three models, small, medium and large that can be downloaded as follows:

```
$python3 -m spacy download en_core_web_sm
```

Simply change the 'sm' at the end to 'md' or 'bg' for the medium or large model.

In [1]:
import spacy

# load a model
nlp = spacy.load('en_core_web_sm')

In [2]:
# sample text

text = "Since turning cautious Friday morning, the DJIA\
   has dropped approximately 1,700 from peak to trough."

In [3]:
# create a spacy object
doc = nlp(text)

In [4]:
for token in doc:
    print(token, token.lemma_, token.pos_, token.is_alpha, token.is_stop)
    
# other attributes: token.dep_, token.shape_

Since since SCONJ True True
turning turn VERB True False
cautious cautious ADJ True False
Friday Friday PROPN True False
morning morning NOUN True False
, , PUNCT False False
the the DET True True
DJIA DJIA PROPN True False
      SPACE False False
has have AUX True True
dropped drop VERB True False
approximately approximately ADV True False
1,700 1,700 NUM False False
from from ADP True True
peak peak NOUN True False
to to ADP True True
trough trough NOUN True False
. . PUNCT False False


In [5]:
# extract noun phrases
[chunk.text for chunk in doc.noun_chunks]

['the DJIA', 'peak', 'trough']

In [6]:
# get verbs
[token.lemma_ for token in doc if token.pos_ == "VERB"]

['turn', 'drop']

In [7]:
# NER
for entity in doc.ents:
    print(entity.text, entity.label_)

Friday DATE
morning TIME
approximately 1,700 CARDINAL


In [8]:
# visualize the dependency parse

from spacy import displacy

In [9]:
doc = nlp("The quick brown fox jumped over the lazy river.")

In [10]:
for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
         [child for child in token.children])

The det fox NOUN []
quick amod fox NOUN []
brown amod fox NOUN []
fox nsubj jumped VERB [The, quick, brown]
jumped ROOT jumped VERB [fox, over, .]
over prep jumped VERB [river]
the det river PROPN []
lazy amod river PROPN []
river pobj over ADP [the, lazy]
. punct jumped VERB []


In [11]:
displacy.render(doc, style='dep')