# Input Parsing with `SpaCy`
Let's look at some methods for parsing and extracting meaning from user input, using the `SpaCy` package.

In [1]:
import spacy
from spacy import displacy

In [2]:
nlp = spacy.load('en_core_web_sm')

One of the things we need to be able to distinguish between is user input that is a question, that would require our chatbot to answer, or a statement, that would require our chatbot to ask a question.  
I think this is going to be information that we can get using dependancy parsing.

In [3]:
qdoc = nlp(u"How can I add a new line to my cell phone plan")

In [4]:
displacy.render(qdoc, style='dep', options={'compact': True})

In [5]:
for chunk in qdoc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
            chunk.root.head.text)

I I nsubj add
a new line line dobj add
my cell phone plan plan pobj to


In [6]:
for token in qdoc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children])

How advmod add VERB []
can aux add VERB []
I nsubj add VERB []
add ROOT add VERB [How, can, I, line, to]
a det line NOUN []
new amod line NOUN []
line dobj add VERB [a, new]
to prep add VERB [plan]
my poss plan NOUN []
cell compound phone NOUN []
phone compound plan NOUN [cell]
plan pobj to ADP [my, phone]


In [7]:
[token for token in qdoc[3].subtree]

[How, can, I, add, a, new, line, to, my, cell, phone, plan]

In [8]:
[token.is_stop for token in qdoc]

[True, True, True, False, True, False, False, True, True, False, False, False]

In [9]:
[token.lemma_ for token in qdoc if not token.is_stop]

['add', 'new', 'line', 'cell', 'phone', 'plan']

In [25]:
spacy.explain('advmod')

'adverbial modifier'

In [18]:
sdoc = nlp(u"I know how to add a new line to my cell phone plan")

In [19]:
displacy.render(sdoc, style='dep', options={'compact': True})

In [30]:
for chunk in sdoc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
            chunk.root.head.text)

I I nsubj know
a new line line dobj add
my cell phone plan plan pobj to


In [26]:
for token in sdoc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children])

I nsubj know VERB []
know ROOT know VERB [I, add]
how advmod add VERB []
to aux add VERB []
add xcomp know VERB [how, to, line, to]
a det line NOUN []
new amod line NOUN []
line dobj add VERB [a, new]
to prep add VERB [plan]
my poss plan NOUN []
cell compound phone NOUN []
phone compound plan NOUN [cell]
plan pobj to ADP [my, phone]


### Some Other Tests

In [15]:
d1 = nlp("The man bit the dog.")

In [19]:
d2 = nlp("The dog bit the man.")

In [20]:
displacy.render(d1, style='dep', options={'compact': True})

In [21]:
displacy.render(d2, style='dep', options={'compact': True})

## Using Parsing in a Chatbot

In [30]:
doc = nlp("How did you get here?")

In [42]:
for token in doc:
    print(token.text, token.dep_, token.pos_)

How advmod ADV
did aux VERB
you nsubj PRON
get ROOT VERB
here advmod ADV
? punct PUNCT


In [48]:
spacy.explain('nsubj')

'nominal subject'

In [32]:
d2 = nlp("Is this the best option?")

In [83]:
from spacy.symbols import nsubj, aux

Here is a good question rule resource.
https://www.englishclub.com/grammar/questions.htm

In [270]:
def user_question(user_input):
    """In a yes no question, the root will come 
       before the nsubj
       """
    doc = nlp(user_input)
    wh_word = ['what', 'where', 'when', 'which', 'who', 'whom', 'whose', 'why', 'how']
    wh_i = -1
    q_i = -1
    for tok in doc:
        if tok.head == tok:
            root_i = tok.i
        if tok.lower_ in wh_word:
            wh_i = tok.i
        if (tok.lower_ in ['is', 'was', 'do']) and tok.i == 0:
            q_i = tok.i
            root_i -= 1
        elif tok.dep == aux:
            q_i = aux
    return (q_i > root_i) | (wh_i == 0)

In [271]:
user_question("is this the place?")

True

In [272]:
user_question('is this money')

True

In [273]:
user_question("what is a car")

True

In [274]:
user_question("why can you understand me?")

True

In [275]:
user_question("Can I know if this is the best option?")

True

In [276]:
user_question("Give me?")

False

In [277]:
user_question("Is this the best option?")

True

In [278]:
user_question("This is the best option?")

False

In [279]:
user_question("what can you understand")

True

In [259]:
for token in nlp("what is a car"):
    print(token.text, token.dep_, token.tag_)

what attr WP
is ROOT VBZ
a det DT
car nsubj NN


In [236]:
for token in nlp("what can you understand?"):
    print(token.text, token.dep_, token.tag_)

what dobj WP
can aux MD
you nsubj PRP
understand ROOT VB
? punct .


In [53]:
for tok in d2:
    if tok.dep == nsubj:
        print(tok.i)

1


In [175]:
sentence_dep("Are you?")

['ROOT', 'nsubj', 'punct']

In [33]:
for token in d2:
    print(token.text, token.dep_)

Is ROOT
this nsubj
the det
best amod
option attr
? punct


In [36]:
d3 = nlp("This is the best option.")

In [37]:
for token in d3:
    print(token.text, token.dep_)

This nsubj
is ROOT
the det
best amod
option attr
. punct


In [54]:
for token in nlp('what is your name'):
    print(token.text, token.dep_)

what attr
is ROOT
your poss
name nsubj


## Forming Response
One thing we need to understand is the tense that we should be responding in. How can we tell what tense the user input is?

In [32]:
pdoc = nlp(u"I go")

In [33]:
for tok in pdoc:
    print(tok.tag_)
    #print(nlp.vocab.morphology.tag_map[tok.tag_])

PRP
VBP


In [43]:
def get_tense(uinput):
    doc = nlp(uinput)
    dtl = []
    for tok in doc:
        if tok.tag_ in ['VBD', 'VBN']:
            dtl.append('PAST')
        elif tok.tag_ in ['VBG', 'VBP', 'VBZ']:
            dtl.append('PRESENT')
        else:
            dtl.append('UKNOWN')
    if 'PAST' in dtl:
        doc_tense = 'PAST'
    elif 'PRESENT' in dtl:
        doc_tense = 'PRESENT'
    else:
        doc_tense = 'UNKNOWN'
    return doc_tense

In [48]:
get_tense('were you gone?')

'PAST'

In [51]:
hash('this')

1884303347816098938