# spaCy Objects

## NLP Object
* Contains processing pipeline  
* Language specific rules for tokenization  

In [3]:
import spacy
# English language class
from spacy.lang.en import English
nlp = English()

## Doc object
* Created when nlp instantiated.

In [24]:
doc = nlp("It costs $5.")

In [25]:
for token in doc:
    print(token.text)

It
costs
$
5
.


## Token object
* Doc is made of tokens  
* example: word or punctuation char

In [26]:
token = doc[1]
token.text

'costs'

## Span Object
Slice of document consisting of one or more tokens

In [27]:
span = doc[1:4]
print(span.text)

costs $5


In [28]:
# Lexical Attributes are part of tokens
print(f"Index of token in doc: {[token.i for token in doc]}")
print(f"word of token in doc: {[token.text for token in doc]}")
print(f"Alphabetical? of token in doc: {[token.is_alpha for token in doc]}")
print(f"Punctuation? of token in doc: {[token.is_punct for token in doc]}")
print(f"Like Number? of token in doc: {[token.like_num for token in doc]}") # ten or 10

Index of token in doc: [0, 1, 2, 3, 4]
word of token in doc: ['It', 'costs', '$', '5', '.']
Alphabetical? of token in doc: [True, True, False, False, False]
Punctuation? of token in doc: [False, False, False, False, True]
Like Number? of token in doc: [False, False, False, True, False]


In [1]:
#getting next token
doc[token.i + 1]

NameError: name 'doc' is not defined

# Statistical models
* predict linguistic attributes in context  
  * Part-of-speech tags  
  * Syntactic dependencies  
  * Named Entities
* Trained on labeled example texts
* Can be updated with more examples to fine-tune predictions

## Model Packages
* `en_core_web_sm` - Small english on supports all core capabilities trained on web text
* contains:  
  * binary weights to allow to make predictions
  * Vocabulary
  * Meta info (language, pipeline)

In [4]:
nlp = spacy.load('en_core_web_sm')

## Predict part of speech tags

attributes returning strings usually end in underscore (`_`).      
without underscore returns id  
`token.pos_`: Part of Speach  
`token.dep_`: Syntactic dependencies  
`token.head`: Parent token that the word is attached to


In [5]:
doc = nlp('She ate the pizza')

In [6]:
for token in doc:
    print(token.text, token.pos_)

She PRON
ate VERB
the DET
pizza NOUN


In [7]:
for token in doc:
    print(token.text, token.pos_, token.dep_, token.head.text)

She PRON nsubj ate
ate VERB ROOT ate
the DET det pizza
pizza NOUN dobj ate


nsubj: nominal subject (example She)  
dobj: direct object (example pizza)  
det: determiner (article) (example: the)  

## Named Entities
`doc.ents`: iterable span object  
`Apple`(ORG) is looking at buying `U.K.`(GPE) startup for `$1 billion` (MONEY)

In [9]:
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
    print(ent.text, ent.label_)

Apple ORG
U.K. GPE
$1 billion MONEY


`explain()` method Tip

In [11]:
spacy.explain('GPE')

'Countries, cities, states'