# Session 2, part II

- Representing words and meanings
- Language modeling

<img src="images/_99.jpg" width="100%">

How to represent the meaning of a word?
=====================================

A machine's perspective
----------------------------------

Human beings are entrenched in the symbols and social categories of natural language, while machines are not. Hence, machines are not able to associate meanings with lexemes.

This explains why analyzing massive datasets of natural language has been traditionally taxing/impossible. In fact, human beings are really good at making sense of language but they are bad at computing (so, hand-curated work-flows are not scalable). On the contrary, machines are really good at computing but they're just dull (so, computation capacity looks for a work-flow to scale-up).

Mainly, there are two strategies through which machines can handle meanings:

+ human beings can provide machines with 'pattern-matching' rules that induce meaningful responses vis a' vis natural language inputs
+ with the aid of statistical frameworks (e.g. Distributional Representations), machines can discover/learn the  

Pattern-matching route
===================

Two prominent natural language tools tools that draw on pattern matching:

+ regular expressions
+ WordNet (an example of annotated dataset)

Regular expressions
=================

Regular expressions use a special kind (class) of formal language grammar called a regular grammar.

Regular grammars have predictable, provable behavior, and yet are flexible enough to power some of the most sophisticated dialog engines and chatbots on the market. Amazon Alexa and Google Now are mostly pattern-based engines that rely on regular grammars.

Deep, complex regular grammar rules can often be expressed in a single line of code called a regular expression. There are successful chatbot frameworks in Python, like Will, that rely exclusively on this kind of language to produce some useful and interesting behavior.

<img src="images/_9.png" width="100%">

Examples of home assistant products.

Regular expressions: A minimal chatbot (1/3)
=================================

In [1]:
'''
Credits to Lane, Howard & Hapke (2019)
'''

# load re module
import re

# greeting matcher
r = "(hi|hello|hey)[ ]*([a-z]*)"

# matcher in action
m0 = re.match(r, 'Hello Rosa', flags=re.IGNORECASE)
m1 = re.match(r, "hi ho, hi ho, it's off to work ...", flags=re.IGNORECASE)
m2 = re.match(r, "hey, what's up", flags=re.IGNORECASE)

print("""
m0 : {}

m1 : {}

m2 : {}
""".format(m0, m1, m2))


m0 : <re.Match object; span=(0, 10), match='Hello Rosa'>

m1 : <re.Match object; span=(0, 5), match='hi ho'>

m2 : <re.Match object; span=(0, 3), match='hey'>



Regular expressions: A minimal chatbot (2/3)
=================================

In [2]:
# let's expand the greeting matcher
r = r"[^a-z]*([y]o|[h']?ello|ok|hey|(good[ ])?(morn[gin']{0,3}|"\
    r"afternoon|even[gin']{0,3}))[\s,;:]{1,3}([a-z]{1,20})"

# ... and ignore the case of text
re_greeting = re.compile(r, flags=re.IGNORECASE)

# matcher in action (uncomment the below to run)
re_greeting.match('Hello Rosa')
re_greeting.match('Hello Rosa').groups()
re_greeting.match("Good morning Rosa")
re_greeting.match("Good Manning Rosa")
re_greeting.match('Good evening Rosa Parks').groups() 
re_greeting.match("Good Morn'n Rosa")
re_greeting.match("yo Rosa")

<re.Match object; span=(0, 7), match='yo Rosa'>

Regular expressions: A minimal chatbot (3/3)
=================================

In [6]:
# set of name for the bot
my_names = set(['rosa', 'rose', 'chatty', 'chatbot', 'bot', 'chatterbot'])

# possible curt names to use in the conversation
curt_names = set(['hal', 'you', 'u'])

# name of the conversant (we pretend to know her/him)
greeter_name = 'Simone'

# let's recycle the matcher
match = re_greeting.match(input())

# conditional statment that initiates the conversation (run and populate)
if match:
    at_name = match.groups()[-1]
    if at_name in curt_names:
        print("Good one.")
    elif at_name.lower() in my_names:
        print("Hi {}, How are you?".format(greeter_name))

hello bot
Hi Simone, How are you?
