In [1]:
import spacy
spacy.__version__

'2.3.2'

## POS Tagging

In [3]:
# example 1
nlp = spacy.load("en")
doc = nlp(u"I am learning how to build chatbots")
for token in doc:
    print(token.text, token.pos_)

I PRON
am AUX
learning VERB
how ADV
to PART
build VERB
chatbots NOUN


In [5]:
# example 2
doc = nlp(u'I am going to London next week for a meeting.')
for token in doc:
    print(token.text, token.pos_)

I PRON
am AUX
going VERB
to ADP
London PROPN
next ADJ
week NOUN
for ADP
a DET
meeting NOUN
. PUNCT


In [9]:
# example 3

doc = nlp(u"Google release 'Movie Mirror' AI experiment taht matches your pose from 80,000 images")
          
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)

Google Google PROPN NNP compound Xxxxx True False
release release NOUN NN poss xxxx True False
' ' PUNCT `` punct ' False False
Movie Movie PROPN NNP compound Xxxxx True False
Mirror Mirror PROPN NNP poss Xxxxx True False
' ' PUNCT '' case ' False False
AI AI PROPN NNP compound XX True False
experiment experiment NOUN NN compound xxxx True False
taht taht NOUN NN nsubj xxxx True False
matches match VERB VBZ ROOT xxxx True False
your -PRON- DET PRP$ poss xxxx True True
pose pose NOUN NN dobj xxxx True False
from from ADP IN prep xxxx True True
80,000 80,000 NUM CD nummod dd,ddd False False
images image NOUN NNS pobj xxxx True False


## Named-Entity Recognition

Named-entity recognition (NER) is a process of finding and classifying named entities existing in the given text into pre-defined categories. 

spaCy comes with a very fast entity recognition model that is capable of identifying entity phrases from a given document. 

In [11]:
# Example 1

my_string = u"Google has its headquarters in Muntain View, Californina having revenue amounted to 109.65 billion US dollars"
doc = nlp(my_string)

for ent in doc.ents:
    print(ent.text, ent.label_)

Google ORG
Muntain View GPE
Californina GPE
109.65 billion US dollars MONEY


In [12]:
# Example 2

my_string = u"Mark Zuckerberg was born May 14, 1984 in New York is an American technology entrepreneur and philanthropist best known for co-founding and leading Facebook as its chairman and CEO."

doc = nlp(my_string)

for ent in doc.ents:
    print(ent.text, ent.label_)

Mark Zuckerberg PERSON
May 14, 1984 DATE
New York GPE
American NORP
Facebook ORG


### Dependecy Pasing 

This feature gives you a parsed tree that explains the parent-child relationship between the words or phrases and independent of the order in which words occur. 

In [13]:
## Example 1

doc = nlp(u'Book me a flight from Bangalore to Goa')
blr, goa = doc[5], doc[7]
list(blr.ancestors)

[from, flight, Book]

The above output can tell us that user is looking to book the flight from Bangalore

In [15]:
list(goa.ancestors)

[to, flight, Book]

## What is the use of dependency parsing in chatbots?

Dependency parsing is one of the most important parts  when building chatbots. 

* It helps in finding relationships between words of grammatically correct sentences
* It can be used for sentenced boundary detection
* It is quite useful tofind out if the user is taking about more than one context simultaniously

## Noun Chunchs

"base noun phrases"

In [17]:
## example 1:
doc = nlp(u"Boston Dynamics is gearing up to produce thousands of robot dogs")
list(doc.noun_chunks)

[Boston Dynamics, thousands, robot dogs]

Though having noun chuncks from a given sentence helps a lot, spaCy provides other attributes that can be helpful too. Let's try to explore some of those

In [18]:
## example 2

doc = nlp(u"Deep learning cracks the code of the messenger RNAs and protein-coding potential")
for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text)

Deep learning learning nsubj cracks
the code code dobj cracks
the messenger RNAs RNAs pobj of
protein-coding potential potential conj RNAs


## Finding Similarity

GloVe is an unserpervised learning algorithm for obtaining vector representations for words. GloVe algorithm uses aggregated global word-word co-occurenece statistics from a corpus to train the model.

In [20]:
doc = nlp(u"How are you doing today?")
for token in doc:
    print(token.text, token.vector[:5])

How [ 0.6124835   1.4812975  -0.72636694  0.06934005  4.5061274 ]
are [-3.683414   1.6739068  0.8058587 -1.5030262 -1.456842 ]
you [-1.982389   0.1802355 -1.5821393 -0.9292287  3.5329905]
doing [ 1.4059318  -3.2353983  -0.499872   -1.289648   -0.42563045]
today [ 1.0131264 -3.218736  -2.291664  -1.326089  -0.6509166]
? [ 1.5639621  -1.0553905  -1.8733073  -1.7224176  -0.41755062]
