# spaCy Objects
After importing the spacy module in the cell below we loaded a model and named it nlp.
Next we created a Doc object by applying the model to our text, and named it doc.
spaCy also builds a companion Vocab object.
The Doc object that holds the processed text is our focus here.

In [None]:
# Import spaCy and load the language library
import spacy
nlp = spacy.load('en_core_web_sm')

# Create a Doc object
doc = nlp(u'Tesla is looking at buying U.S. startup for $6 million')

In [None]:
# Print each token separately
for token in doc:
    print(token.text)

# spaCy Pipeline

In [None]:
#To check what components currently live in the pipeline
print(nlp.pipeline)
print(nlp.pipe_names)

# Tokenization
The first step in processing text is to split up all the component parts (words & punctuation) into "tokens". These tokens are annotated inside the Doc object to contain descriptive information.

** Detailed notes in other books

In [None]:
doc2 = nlp(u"Tesla isn't   looking into startups anymore.")

for token in doc2:
    print(token.text)

Notice how isn't has been split into two tokens. spaCy recognizes both the root verb is and the negation attached to it. Notice also that both the extended whitespace and the period at the end of the sentence are assigned their own tokens.
It's important to note that even though doc2 contains processed information about each token, it also retains the original text:


In [None]:
doc2

In [None]:
doc2[0]

In [None]:
type(doc2)

# Part-of-Speech Tagging (POS)
This will assign parts of speech. 
Words that follow "the" are typically nouns.
For a full list of POS Tags visit https://spacy.io/api/annotation#pos-tagging

** Detailed notes in other books

In [None]:
doc2[0].pos_

![image.png](attachment:image.png)

# Dependencies
We also looked at the syntactic dependencies assigned to each token. Tesla is identified as an nsubj or the nominal subject of the sentence.
For a full list of Syntactic Dependencies visit https://spacy.io/api/annotation#dependency-parsing 
A good explanation of typed dependencies can be found here

In [None]:
doc2[0].dep_

In [None]:
#To see the full name of a tag use spacy.explain(tag)
print(spacy.explain('PROPN'))
print(spacy.explain('nsubj'))

# Additional Token Attributes
Some of the other information that spaCy assigns to tokens:

![image.png](attachment:image.png)

** Detailed notes in other books

In [None]:
# Lemmas (the base form of the word):
print(doc2[4].text)
print(doc2[4].lemma_)

In [None]:
# Simple Parts-of-Speech & Detailed Tags:
print(doc2[4].pos_)
print(doc2[4].tag_ + ' / ' + spacy.explain(doc2[4].tag_))

In [None]:
# Word Shapes:
print(doc2[0].text+': '+doc2[0].shape_)
print(doc[5].text+' : '+doc[5].shape_)

In [None]:
# Boolean Values:
print(doc2[0].is_alpha)
print(doc2[0].is_stop)

# Spans
Large Doc objects can be hard to work with at times. A span is a slice of Doc object in the form Doc[start:stop].

** Detailed notes in other books

In [None]:
doc3 = nlp(u'Although commmonly attributed to John Lennon from his song "Beautiful Boy", \
the phrase "Life is what happens to us while we are making other plans" was written by \
cartoonist Allen Saunders and published in Reader\'s Digest in 1957, when Lennon was 17.')

In [None]:
life_quote = doc3[16:30]
print(life_quote)

In [None]:
type(life_quote)

# Sentences
Certain tokens inside a Doc object may also receive a "start of sentence" tag. While this doesn't immediately build a list of sentences, these tags enable the generation of sentence segments through Doc.sents. Later we'll write our own segmentation rules.

** Detailed notes in other books

In [None]:
doc4 = nlp(u'This is the first sentence. This is another sentence. This is the last sentence.')

In [None]:
for sent in doc4.sents:
    print(sent)

In [None]:
doc4[6].is_sent_start