# Capstone 3: Modeling #

This is the stage of the project where we finally build a chatbot!  Actually we are going to build two, and compare how well they work. We are going to use Chatterbot, a Python library for building chatbots that is relatively easy to use. During my time as a teacher, one of the most common things I heard from my students was that they wanted their errors to be corrected. This is especially true of simple errors related to content the student has seen before. I played around with a few commercial ESL chatbots while doing resarch for this project, and to my surprise, none of them performed error correction. In addition, they mostly just respond to the user's input without asking follow-up questions to keep the conversation going. These are a couple of problems we will try to solve with the chatbots we build.

In [1]:
# Import libraries
import pandas as pd
from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
from chatterbot.trainers import ChatterBotCorpusTrainer
from gingerit.gingerit import GingerIt
from PyDictionary import PyDictionary
from chatterbot.comparisons import SpacySimilarity
from chatterbot.comparisons import LevenshteinDistance
from chatterbot.response_selection import get_random_response
from chatterbot.response_selection import get_most_frequent_response
import re
import random

In [2]:
# Import starter_questions.csv and convert to list
starter_questions = pd.read_csv('starter_questions.csv')
question_list = starter_questions['0'].to_list()
question_list[:10]

['Do you have any pets?',
 'What was the last book you read?',
 'Do you like to cook?',
 "What's your favorite food?",
 'Are you good at cooking/swimming/etc?',
 'Are you married or single?',
 'Do you like baseball?',
 'Do you live alone?',
 'Do you live in a house or an apartment?',
 'Have you ever lived in another country?']

In [3]:
# Add a few more questions to the list
print("before:", len(question_list))

extra_questions = [
    "Have you ever bought a car?",
    "How do you get to work or school?",
    "What did you do yesterday?",
    "What did you do last weekend?",
    "Where did you go on your last vacation?",
    "What are your plans for next year?",
    "How are you feeling today?",
    "What is your favorite movie?",
    "What is your city like?"
]

question_list.extend(extra_questions)

print("after:", len(question_list))

before: 58
after: 67


Now it is time to create the first chatbot. This bot is named "Alan" after Alan Turing. Both of the bots we create will make use of the Best Match logic adapter, which finds the statement in the training data that most closely resembles the input statement. In this case, the comparison will be made using the `SpacySimilarity` function, which employs `spaCy`, a popular NLP library, to measure the similarity of the strings. This is done by comparing word vectors, which are high-dimensional embeddings of words to capture their meanings. Using `spaCy`, two words like "cat" and "feline" would achieve a high similarity score, despite not sharing any letters. Once a match is made, there may be multiple possible responses, especially if the line appears more than once in the training data. For Alan, we will use the `get_random_response` function to make a random choice from the list of possible responses. The similarity threshold is set to 0.95, so two strings must be 95% similar to be considered a match. If no match is found, the bot will return a random question from the `question_list` we created above.

In [4]:
# Create instance of ChatBot
alan = ChatBot(
    'Alan',
    logic_adapters=[
        {
            'import_path': 'chatterbot.logic.BestMatch',
            'statement_comparison_function': SpacySimilarity,
            'response_selection_method': get_random_response,
            'default_response': question_list,
            'maximum_similarity_threshold': 0.95,
        }
    ]
)

Now that the bot has been created, we will train it, first with the Chatterbot English Corpus, and then with the ESL dialogues that were acquired in an earlier stage of this project.

In [5]:
# Train bot on Chatterbot English Corpus
trainer = ChatterBotCorpusTrainer(alan)
trainer.train("chatterbot.corpus.english")

Training ai.yml: [####################] 100%
Training botprofile.yml: [####################] 100%
Training computers.yml: [####################] 100%
Training conversations.yml: [####################] 100%
Training emotion.yml: [####################] 100%
Training food.yml: [####################] 100%
Training gossip.yml: [####################] 100%
Training greetings.yml: [####################] 100%
Training health.yml: [####################] 100%
Training history.yml: [####################] 100%
Training humor.yml: [####################] 100%
Training literature.yml: [####################] 100%
Training money.yml: [####################] 100%
Training movies.yml: [####################] 100%
Training politics.yml: [####################] 100%
Training psychology.yml: [####################] 100%
Training science.yml: [####################] 100%
Training sports.yml: [####################] 100%
Training trivia.yml: [####################] 100%


In [6]:
# Import dialogue dataset and convert the text column to list
dialogues = pd.read_csv('all_dialogues.csv')
dialogue_list = dialogues['dialogue_line'].to_list()
dialogue_list[:10]

['Where do you live?',
 'I live in Pasadena.',
 'Where is Pasadena?',
 "It's in California.",
 'Is it in northern California?',
 "No. It's in southern California.",
 'Is Pasadena a big city?',
 "It's pretty big.",
 'How big is "pretty big"?',
 'It has about 140,000 people.']

In [49]:
# Add a few extra lines to the dialogue list
#print("before:", len(dialogue_list))

#extra_lines = [
    "Hi there!",
    "How are you?",
    "Fine, thanks. And you?",
    "I'm fine, thanks for asking.",
    "What's your name?",
    "My name is Alan.",
    "Nice to meet you.",
    "Nice to meet you too."
]

#dialogue_list.extend(extra_questions)

#print("after:", len(dialogue_list))

before: 25556
after: 25563


In [7]:
# Train bot on dialogue list
trainer = ListTrainer(alan)
trainer.train(dialogue_list)

List Trainer: [####################] 100%


The bot is trained and almost ready to use. Next, we are going to create an instance of `PyDictionary`, which will look up the definitions of words. Below, we build a function around it to print the definition in a more readable bot-friendly format. We will also create an instance of `GingerIt`, which will be used later in the while loop to perform spelling and grammar corrections. `GingerIt` does not catch every error, but it does a pretty good job.

In [8]:
# Create instance of PyDictionary to look up definitions of words
dictionary = PyDictionary()
# Create instance of GingerIt to perform spelling and grammar checks
ginger_parser = GingerIt()

In [9]:
# Function to print definition of word in readable format
def look_up_word(word):
    definition = dictionary.meaning(word)
    if definition == None:
        print("Alan: Sorry, I don't know that word.")
    else:
        print('Alan: "{}" means...'.format(word))
        for k in definition.keys():
            for i in definition[k][:3]:
                print('\t-', i)

Time to create the while loop in which the bot will run. The code starts by printing a general introduction with some instructions, and then Alan asks a random question to kick off the conversation.  The user's input will be corrected if any errors are found. It is also possible to look up the the definition of a word by typing "define" plus the word. While chatting with the bot, I will purposely make spelling and grammar errors on occasion in order to test the bot's ability to correct them.

In [10]:
# The chatbot runs in a while loop 
print(
    """
    Hi, my name is Alan. I'm here to help you practice English. If you want to see the definition of a word, 
    just type 'define' plus the word. Say "bye" when you are ready to finish.
    """
)

print("Alan: ", random.choice(question_list))

while True:
    text = input('>>user: ')
    correction = ginger_parser.parse(text)
    request = correction['result']
    if request == 'Bye' or request == 'bye':
        print('Alan: Bye')
        break
    elif re.match('define', text.lower()):
        term = re.split(' ', text)[-1]
        look_up_word(term)
    else:
        response = alan.get_response(request)
        if correction['result'] == correction['text']:
            print('Alan: ', response)
        else:
            print("Alan: ** {} **".format(correction['result']))
            print('Alan: ', response)


    Hi, my name is Alan. I'm here to help you practice English. If you want to see the definition of a word, 
    just type 'define' plus the word. Say "bye" when you are ready to finish.
    
Alan:  Have you ever bought a car?
>>user: Yes, I buyed a Cadillac.
Alan: ** Yes, I bought a Cadillac. **
Alan:  A luxury car.
>>user: define luxury
Alan: "luxury" means...
	- something that is an indulgence rather than a necessity
	- the quality possessed by something that is excessively expensive
	- wealth as evidenced by sumptuous living
>>user: What's for dinner?
Alan:  I'm not sure.
>>user: How about a pizza?
Alan:  You had pizza for lunch.
>>user: But I love pizza.
Alan:  I had pizza for lunch yesterday.
>>user: i did'nt know you could eat
Alan: ** I didn't know you could eat **
Alan:  hard to tell, i have never tried anything but electricity
>>user: define electricity
Alan: "electricity" means...
	- a physical phenomenon associated with stationary or moving electrons and protons
	- energy

So Alan works pretty well, although not perfectly. The parameters used to create Alan are those that I believe will create the most effective chatbot. However, it might be interesting to creat a second chatbot using some different parameters, and then compare performance. The new chatbot will be named Alma. Instead of the `SpacySimilarity` function, the `LevenshteinDistance` function will be used. Levenshtein Distance is a measure of similarity between two strings based on the minimum number of edits (such as insertions or deletions) necessary to change one string into another.  Using this metric, the strings "cat" and "cart" have a high similarity score, despite having completely different meanings, where as "cat" and "feline" would have a very low score, despite the words being related. However, this metric might do a better job of capturing meaning found in the syntax of sentences. `SpacySimilarity` first transforms sentences into a bag of words, so the phrases "the boy ate chicken", and "a chicken ate the boy" would be very similar, despite having different meanings. In this case, Levenshtein Distance would do a better job at capturing the difference. For response selection, this new not will use the `get_most_frequent_response` function, making it a little more predictable the first one.

In [11]:
# Create instance of ChatBot
alma = ChatBot(
    'Alma',
    logic_adapters=[
        {
            'import_path': 'chatterbot.logic.BestMatch',
            'statement_comparison_function': LevenshteinDistance,
            'response_selection_method': get_most_frequent_response,
            'default_response': question_list,
            'maximum_similarity_threshold': 0.95,
        }
    ]
)

In [12]:
# Train bot on Chatterbot English Corpus
trainer = ChatterBotCorpusTrainer(alma)
trainer.train("chatterbot.corpus.english")

Training ai.yml: [####################] 100%
Training botprofile.yml: [####################] 100%
Training computers.yml: [####################] 100%
Training conversations.yml: [####################] 100%
Training emotion.yml: [####################] 100%
Training food.yml: [####################] 100%
Training gossip.yml: [####################] 100%
Training greetings.yml: [####################] 100%
Training health.yml: [####################] 100%
Training history.yml: [####################] 100%
Training humor.yml: [####################] 100%
Training literature.yml: [####################] 100%
Training money.yml: [####################] 100%
Training movies.yml: [####################] 100%
Training politics.yml: [####################] 100%
Training psychology.yml: [####################] 100%
Training science.yml: [####################] 100%
Training sports.yml: [####################] 100%
Training trivia.yml: [####################] 100%


In [13]:
# Train bot on dialogue list
trainer = ListTrainer(alma)
trainer.train(dialogue_list)

List Trainer: [####################] 100%


In [14]:
# Function to print definition of word in readable format
def look_up_word(word):
    definition = dictionary.meaning(word)
    if definition == None:
        print("Alma: Sorry, I don't know that word.")
    else:
        print('Alma: "{}" means...'.format(word))
        for k in definition.keys():
            for i in definition[k][:3]:
                print('\t-', i)

In [15]:
# The chatbot runs in a while loop 
print(
    """
    Hi, my name is Alma. I'm here to help you practice English. If you want to see the definition of a word, 
    just type 'define' plus the word. Say "bye" when you are ready to finish.
    """
)

print("Alma: ", random.choice(question_list))

while True:
    text = input('>>user: ')
    correction = ginger_parser.parse(text)
    request = correction['result']
    if request == 'Bye' or request == 'bye':
        print('Alma: Bye')
        break
    elif re.match('define', text.lower()):
        term = re.split(' ', text)[-1]
        look_up_word(term)
    else:
        response = alma.get_response(request)
        if correction['result'] == correction['text']:
            print('Alma: ', response)
        else:
            print("Alma: ** {} **".format(correction['result']))
            print('Alma: ', response)


    Hi, my name is Alma. I'm here to help you practice English. If you want to see the definition of a word, 
    just type 'define' plus the word. Say "bye" when you are ready to finish.
    
Alma:  Where do you live?
>>user: I live in Pasadena.
Alma:  Where is Pasadena?
>>user: Its in California.
Alma: ** It's in California. **
Alma:  Is it in northern California?
>>user: No, southern California.
Alma:  Is Pasadena a big city?
>>user: Its prety big.
Alma: ** It's pretty big. **
Alma:  How big is "pretty big"?
>>user: I'm not sure
Alma:  You should know that already.
>>user: Whats you problem?
Alma: ** What's your problem? **
Alma:  I owe fees, but a bank statement never came in the mail about it.
>>user: Oh no!
Alma:  I had to blow my nose.
>>user: Were you sick?
Alma:  Even though I'm a construct I do have a subjective experience of the universe, as simplistic as it may be.
>>user: define simplistic
Alma: "simplistic" means...
	- characterized by extreme and often misleading simpli

Alma also works pretty well, despite the fact that it (or she) makes use of different functions to compare the statements and select responses. Both of the bots stumble from time to time. Sometimes they give nonsensical responses, and the `GingerIt` parser sometimes misses grammatical errors or gives bad corrections. In general, however, they work pretty well, especially considering how simple they are. The last thing we will do is have the bots chat with each other little. Below is a conversation between Alan and Alma.

In [16]:
alma_message = "Where are you going?"
print("Alma: {}".format(alma_message))

for i in range(30):
    alan_message = alan.get_response(alma_message)
    print("Alan: {}".format(alan_message))
    alma_message = alma.get_response(alan_message)
    print("Alma: {}".format(alma_message))

Alma: Where are you going?
Alan: I have to walk the dog.
Alma: What kind of dog do you have?
Alan: So she likes poodles.
Alma: She says they're good watchdogs.
Alan: Can I borrow $5?
Alma: Sure. Why do you need it?
Alan: I like this magazine.
Alma: So do I.
Alan: I read it once, and I subscribed.
Alma: It gives you all the news.
Alan: Will you be voting?
Alma: Yes, I am going to vote.
Alan: Make sure you make it to your polling place early on Election Day.
Alma: Thank you and good luck.
Alan: Could I speak to you for a minute?
Alma: What do you want to speak to me about?
Alan: Do you know where your nearest emergency room is?
Alma: It is near the bridge that crosses over the river.
Alan: I am driving to you even as we speak. Hang in there!
Alma: You don't look so good.
Alan: It's a lot better than soda.
Alma: My throat is really dry.
Alan: I wonder if they left it sitting around too long before they served it.
Alma: That's exactly what I was thinking had happened.
Alan: Are your vegeta