# Natural Language Understanding

Starts with some more useful regex patterns:

`r"\bme\b"` will match only the word "me"  
`[A-Z]{1}[a-z]*` will match any title case word.  

If you're going to use a pattern several times, then store it with `re.compile()`.

Use of pipe operators within a pattern to match several, also use of `pattern.findall()` for multiple matches within a sentence.

## Flexibly match intents

In [1]:
import re
import time

In [2]:
intent_dict = {
    'goodbye': ['see ya', 'bye'],
    'greet': [r'\bhi\b', 'hola', 'heya'],
    'thankyou': ['appreciate', 'thank', r'\bta\b']
 }

In [3]:
# compile a dict of regex patterns that can look for any of the above pattern matches
intent_patterns = {}
for key, values in intent_dict.items():
    multi_pat = "|".join(values)
    compiled_pat = re.compile(multi_pat)
    # label this flexible pattern with the intent key it came with
    intent_patterns[key] = compiled_pat
intent_patterns

{'goodbye': re.compile(r'see ya|bye', re.UNICODE),
 'greet': re.compile(r'\bhi\b|hola|heya', re.UNICODE),
 'thankyou': re.compile(r'appreciate|thank|\bta\b', re.UNICODE)}

In [4]:
# Define a function to find the intent of a message
def find_intent(pat_dict, some_input):
    matched = None
    for intent, patterns in pat_dict.items():
        # Check for a pattern match first
        if patterns.search(some_input):
            matched = intent
    return matched

In [5]:
print(find_intent(intent_patterns, "hola! Como estas"))
print(find_intent(intent_patterns, "thankee sai"))
print(find_intent(intent_patterns, "see ya bud!"))

greet
thankyou
goodbye


## Respond to the intent

In [6]:
response_dict = {
    'default': '...',
    'goodbye': 'Have a great day',
    'greet': 'Hi there',
    'thankyou': "no problem, that's my job"
 }

In [7]:
def answer(string_input, pat_dict, resps):
    # get the matched intent
    intent = find_intent(pat_dict, string_input.lower())
    # Use default as the fll back value
    key = "default"
    if intent in resps:
        key = intent
    return resps[key]

In [8]:
answer("See ya", intent_patterns, response_dict)

'Have a great day'

Update the wrapper function from module 1 that uses a nice display template.

In [9]:
# update params to include the lookup dicts as default
def user_speaks(user_input, pat_dict=intent_patterns, resps=response_dict, user_format="USER:", bot_format="BOT:"):
    """Passes the user's input to response handler."""
    time.sleep(0.6)
    print(f"{user_format} {user_input}")
    # update the line below to use the new flexible match functions
    resp = answer(user_input, pat_dict, resps)
    time.sleep(0.6)
    return f"{bot_format} {resp}"

In [10]:
print(user_speaks("Hi hi cherry pie!"))
print(user_speaks("Ta very much my lovely..."))
print(user_speaks("Gotta go. Bye bye hunny pie."))

USER: Hi hi cherry pie!
BOT: Hi there
USER: Ta very much my lovely...
BOT: no problem, that's my job
USER: Gotta go. Bye bye hunny pie.
BOT: Have a great day


## Basic NER

Named Entity Recognition. 

In [11]:
def get_names(string_input):
    """Searches a string for an indication that a name is being discussed, then search
    for a proper noun and return it if found."""
    # ensure None is returned if no match is found
    entity = None
    name_pat = re.compile("name|call")
    proper_noun_pat = re.compile("[A-Z]{1}[a-z]*")
    # look for a sentence about a named entity:
    if name_pat.search(string_input):
        entity = proper_noun_pat.findall(string_input)
        if len(entity) > 0:
            # several hits means we need to concatenate values
            entity = " ".join(entity)
    return entity

In [12]:
print(get_names("my name is Jimmy."))
print(get_names("My name is Jimmy."))
# you can see how this would be limited and won't work with lowering an input string

Jimmy
My Jimmy


In [13]:
# Define respond()
def answer_name(str_input):
    name = get_names(str_input)
    if name is None:
        return "You're mysterious, tell me your name."
    else:
        return f"Hello, {name}!"

In [14]:
# update params to include the lookup dicts as default
def user_speaks(user_input, pat_dict=intent_patterns, resps=response_dict, user_format="USER:", bot_format="BOT:"):
    """Passes the user's input to response handler."""
    time.sleep(0.6)
    print(f"{user_format} {user_input}")
    # update the line below to use the name retrieval funcs
    resp = answer_name(user_input)
    time.sleep(0.6)
    return f"{bot_format} {resp}"

In [15]:
print(user_speaks("i am called John Snow"))
print(user_speaks("my name is Spartacus"))
print(user_speaks("My name is Spartacus"))

USER: i am called John Snow
BOT: Hello, John Snow!
USER: my name is Spartacus
BOT: Hello, Spartacus!
USER: My name is Spartacus
BOT: Hello, My Spartacus!


## Wordvec with spaCy

Great little intro on word vectors, where tokens - floats - are assigned to words, word parts, letters or sentences. These can then be used within ML workflows. spaCy makes several wordvec models available. Here we are using `en_core_web_sm` which is trained upon a large corpus with the GloVe algorithm.

Tokens can be compared to others using their cosine similarity:

* Vector directions point in same direction = 1
* Vector directions are perpindicular = 0 
* Vector directions are opposite = -1

In [22]:
import spacy
nlp = spacy.load('en_core_web_md')

In [24]:
nlp.vocab.vectors_length

300