### Tokenization 
Basic building blocks for docs

Types
* Prefix - examples: $ ( "
* suffix - examples km ) , . ! "
* infix - examples - -- / ...
* exception - special case rule to split into tokens such as (let's, U.S.)

In [1]:
import spacy

# Small web language library for spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
mystring = "\"We're moving to L.A.!\""
doc = nlp(u"\"We're moving to L.A.!\"")

In [4]:
for token in doc:
    print(token)

"
We
're
moving
to
L.A.
!
"


Note that 
* snail-mail is 3 tokens
* email addresses and websites are 1 token
* 5km = 2 tokens
* \\$10.30 is split into $ and 10.30

In [8]:
doc2 = nlp('We\'re here to help! Send snail-mail, email support@oursite.com or visit us at http://www.oursite.com!')

for t in doc2:
    print(t)

We
're
here
to
help
!
Send
snail
-
mail
,
email
support@oursite.com
or
visit
us
at
http://www.oursite.com
!


In [9]:
doc3 = nlp('A 5km NYC cab ribe costs $10.30')
for t in doc3:
    print(t)

A
5
km
NYC
cab
ribe
costs
$
10.30


In [11]:
doc4 = nlp(u'Let\'s visit St. Louis in the U.S. next year')
for t in doc4:
    print(t)

Let
's
visit
St.
Louis
in
the
U.S.
next
year


In [13]:
len(doc4.vocab)

794

In [14]:
doc5 = nlp(u'It is better to give than receive.')
doc5[2:5]

better to give

In [16]:
doc8 = nlp(u'Apple to build a Hong Kong factory for $6 million')
for token in doc8:
    print(token.text, end=' | ')

Apple | to | build | a | Hong | Kong | factory | for | $ | 6 | million | 

### Entity
* Spacy is able to recognize a number of common entities from organizations to money format to multi-word locations

In [26]:
for entity in doc8.ents:
    print(entity, entity.label_, '(Entity: {})\n'.format(str(spacy.explain(entity.label_))))

Apple ORG (Entity: Companies, agencies, institutions, etc.)

Hong Kong GPE (Entity: Countries, cities, states)

$6 million MONEY (Entity: Monetary values, including unit)



In [27]:
doc9 = nlp(u'Autonomous cars shift insurance liability toward manufacturers.')
for chunk in doc9.noun_chunks:
    print(chunk)

Autonomous cars
insurance liability
manufacturers


### 3.18.p2

In [28]:
from spacy import displacy

In [29]:
doc = nlp(u'Apple is going to build a U.K. factory for $6 million.')

In [38]:
displacy.render(doc,style='dep',jupyter=True, options={'distance':100})

In [39]:
doc = nlp(u'Over the last quarter Apple sold nearly 20 thousand iPhones for a profit of $6 million')

In [41]:
displacy.render(doc,style='ent',jupyter=True)

In [None]:
doc = nlp(u'This is a sentence')
displacy.serve(doc,style='dep')




Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

