# Meet Monty The Professional Python Chatbot

[![chatbot](img/chatbot.png)](img/chatbot.png)

## The problem:
As online users have become used to the fast-paced microwave lifestyle, their immediate need and attention are of utmost important. According to research done by Neilsen Norman Group “Users often leave Web pages in 10–20 seconds, but pages with a clear value proposition can hold people's attention for much longer.” (Nielsen 2011) 

[![weibull.png](img/weibull.png)](img/weibull.png)

“It's clear from the chart that the first 10 seconds of the page visit are critical for users' decision to stay or leave. The probability of leaving is very high during these first few seconds because users are extremely skeptical, having suffered countless poorly designed web pages in the past. People know that most web pages are useless, and they behave accordingly to avoid wasting more time than absolutely necessary on bad pages.” (Nielsen 2011)

## The question:
So what can we do engage the users in a way that will not leave them skeptical or give the impression that the website is poorly designed?

## The solution:
A chatbot! Yes, “chatbots are beneficial for both parties: developing chatbots is cheaper than training and hiring human customer service agents for the company, and customers often prefer a brisk mobile interaction over talking with someone in person or with the call center. Consider this statistic from Gartner, that artificial intelligence will amount for 85% of customer relationships by 2020.” (Morgan 2017)

## Purpose:
The purpose of this chatbot is to have a an **easy to implement chatbot that only uses a csv containing questions and response**. This can be used to enhance customer experience with the frequently asked questions and answer immediately on any website. This model can be quickly implemented for any online business.

## Benefits:
This will benefit the company with an immediate customer engagement, as shown on the diagram after the first 20 seconds users are more likely to stay long term. In addition this will add effeciency and lower overhead on employee cost for the company.

In [1]:
# import all dependencies
from numpy import array
from numpy import argmax
from numpy import array_equal
from keras.models import Sequential
from keras.layers import LSTM
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
from keras.models import Model
from attention_decoder import AttentionDecoder
from nltk.stem import PorterStemmer
from tqdm import tqdm
import string
import pandas as pd

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


[![chatbot](img/cogs.jpeg)](img/cogs.jpeg)

## COGS:
The following functions are the COGS in making this implemenation smooth. These helper functions and please read each description and code for detail information on the purpose of each function. These cogs are literally what makes monty ticks. (pun intended)

In [2]:
# return all sentences from the 2 list (q&a)
def return_sents(df_col1, df_col2):
    return [sent for sent in df_col1] + [sent for sent in df_col2]

In [3]:
# return unique words from list of sentences
def return_unique_words(all_sents):
    table = str.maketrans({key: None for key in string.punctuation})
    all_words = [words.split() for words in all_sents] 
    word_list = [word.lower() for sublist in all_words for word in sublist]
    word_list = [word.translate(table) for word in word_list] 

# *** removed stemming to enhance Monty's reponse    
#    word_list = [ps.stem(word) for word in word_list]
    return list(set(word_list))

In [4]:
# return a dataframe of words from 2 list, this data frame is used as a form of hash table
# key would be the index and the value would be the word
def df_to_df(df_col1, df_col2):
    all_sent = return_sents(df_col1, df_col2)
    word_list = return_unique_words(all_sent)
    word_list.insert(0, ' ')
    t_df = pd.DataFrame()
    t_df['word'] = word_list
    t_df['idx'] = t_df.index
    return t_df

In [5]:
# return unique words from list of sentences
def return_unique_words_single(sent):
    table = str.maketrans({key: None for key in string.punctuation})
    all_words = sent.split()
    word_list = [word.lower() for word in all_words]
    word_list = [word.translate(table) for word in word_list] 

# *** removed stemming to enhance Monty's reponse    
#    word_list = [ps.stem(word) for word in word_list]
    return word_list

In [6]:
# function takes the sentences and the hash table of word index and returns the array 
# equivalent index of the words in each sentence
def word_to_array(sents, t_df):
    l = []
    l2 = []
    for sent in sents:
        b = []
        a = return_unique_words_single(sent)
        for w in a:
            try:
                b.append(t_df.loc[t_df.word == w, 'idx'].iloc[0])
            except:
                b.append(0)
        l.append(a)
        l2.append(b)
    return l, l2

In [7]:
# decodes the array back into string so humans can understand what Monty is saying
def array_to_string(ar, t_df):
    c = [t_df.loc[t_df.idx == i, 'word'].iloc[0] for i in ar]
    s = ' '.join(c)
    return s

In [8]:
# one hot encode the array sequence
def one_hot_encode(sequence, n_unique):
    encoding = list()
    for value in sequence:
        vector = [0 for _ in range(n_unique)]
        vector[value] = 1
        encoding.append(vector)
    return array(encoding)

In [9]:
# decode a one hot encoded array sequence
def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]

In [10]:
# transform the X & y into one hot format and reshape it into proper input shape
def transform_xy(sequence_in, sequence_out, n_features):
    X = one_hot_encode(sequence_in, n_features)
    y = one_hot_encode(sequence_out, n_features)
    X = X.reshape((1, X.shape[0], X.shape[1]))
    y = y.reshape((1, y.shape[0], y.shape[1]))
    return X,y

In [11]:
# for user interaction purposes Monty breaks down each word from each sentence and looks
# up the equivalent 
def sent_to_array(sent, t_df):
    a = []
    b = []
    a = return_unique_words_single(sent)
    for w in a:
        try:
            b.append(t_df.loc[t_df.word == w, 'idx'].iloc[0])
        except:
            b.append(0)
    return a, b

In [12]:
# for user interaction purposes Monty breaks down each word from the user input and encode
# and shapes it into something the model can use to predict
def transform_x(sequence_in, n_features):
    X = one_hot_encode(sequence_in, n_features)
    X = X.reshape((1, X.shape[0], X.shape[1]))
    return X

In [13]:
# for ease of use this function was created to make it easier to interact with Monty
# it takes the user input and returns a response
def get_response(sent, t_df, max_length, n_features, model):
    w, q = sent_to_array(sent, t_df)
    q_pad = pad_sequences([q], maxlen=max_length, padding='post')
    X2 = transform_x(q_pad[0], n_features)
    yhat2 = model.predict(X2, verbose=0)
    return array_to_string(one_hot_decode(yhat2[0]), t_df)

## End of COGS and the start of Monty's professional career.
[![chatbot](img/start.png)](img/start.png)

## Monty's requirement:
Monty cannot understand human words, however Monty does understand numbers, lucky for us Monty comes with a translator, it's part of his COGS function listed above. The translator works as it convert each word to a specific index number much like a hash table, key : value mapping.

[![chatbot](img/monty.png)](img/monty.png)

In [14]:
# load csv into pandas dataframe
ps = PorterStemmer()
df = pd.read_csv('qna.csv')
df.head(10)

Unnamed: 0,question,answer
0,hello,HI
1,hey,Hello
2,hi,Hey
3,how are you?,"good, you?"
4,how is it going?,great
5,good,same here
6,great,that is good to hear
7,what color is the sky,blue
8,bye,bye
9,goodbye,goodbye


In [15]:
# create an index of words, please note that index 0 is set to an empty space
t_df = df_to_df(df.question, df.answer)
t_df.head()

Unnamed: 0,word,idx
0,,0
1,single,1
2,something,2
3,well,3
4,interesting,4


In [16]:
# transform questions & answers to a word array and a sequence array
# this is our translator so that monty will understand what we're saying
q_list, q_as_array = word_to_array(df.question, t_df)
a_list, a_as_array = word_to_array(df.answer, t_df)

# print the first 5 array
print('Question word list:\n', q_list[:5], '\n'*2,'Question array list:\n', q_as_array[:5], '\n'*2)
print('Answer word list:\n', a_list[:5],'\n'*2, 'Answer array list:\n', a_as_array[:5],'\n')

Question word list:
 [['hello'], ['hey'], ['hi'], ['how', 'are', 'you'], ['how', 'is', 'it', 'going']] 

 Question array list:
 [[82], [37], [22], [52, 100, 61], [52, 54, 6, 66]] 


Answer word list:
 [['hi'], ['hello'], ['hey'], ['good', 'you'], ['great']] 

 Answer array list:
 [[22], [82], [37], [77, 61], [84]] 



In [17]:
# use the length of the index of the word matrix as the vocabulary size
vocab_size = len(t_df)
print('Vocab Size: ', vocab_size)

# set max features(vocab size) equal to vocab size
n_features = vocab_size
print('Number of features: ', n_features, '\n')

# find the max length of question & answer
max_q_l = len(max(q_as_array,key=len))
max_a_l = len(max(a_as_array,key=len))
max_l = max(max_q_l, max_a_l)
print('Max Length of Question: ', max_q_l)
print('Max Length of Answer: ', max_a_l)

# set max length equal to max length + 3 to ensure ample padding
max_length = max_l + 3
print('Max Padded Length: ', max_length)

Vocab Size:  137
Number of features:  137 

Max Length of Question:  8
Max Length of Answer:  9
Max Padded Length:  12


[![chatbot](img/limit.jpg)](img/limit.jpg)

## Monty's Limit:
Monty, like human beings has limits.

1. He can only respond with what you teach him. His knowledge is limited to his vocab size, meaning if you teach him 137 words, he only knows 137 words. His vocabulary is derived from the csv file.
2. User can only ask & Monty can only respond to the maximum padded length. This is determined by the longest question or answer/response plus a padding. In this example we used 3 as the padding amount.
3. Monty's learning speed/rate is determined by more so by the maximum length of a sentence and the vocabulary size than the overall document.

In [18]:
# using the keras function pad_sequences, we pad with the default value of 0 up to the max length of any q&a

# pad questions to max length
padded_q_docs = pad_sequences(q_as_array, maxlen=max_length, padding='post')
print('Padded questions array:\n', padded_q_docs[:10])

# pad answers to max length
padded_a_docs = pad_sequences(a_as_array, maxlen=max_length, padding='post')
print('\nPadded answers array:\n', padded_a_docs[:10])

Padded questions array:
 [[ 82   0   0   0   0   0   0   0   0   0   0   0]
 [ 37   0   0   0   0   0   0   0   0   0   0   0]
 [ 22   0   0   0   0   0   0   0   0   0   0   0]
 [ 52 100  61   0   0   0   0   0   0   0   0   0]
 [ 52  54   6  66   0   0   0   0   0   0   0   0]
 [ 77   0   0   0   0   0   0   0   0   0   0   0]
 [ 84   0   0   0   0   0   0   0   0   0   0   0]
 [ 98 111  54  60   8   0   0   0   0   0   0   0]
 [103   0   0   0   0   0   0   0   0   0   0   0]
 [ 88   0   0   0   0   0   0   0   0   0   0   0]]

Padded answers array:
 [[ 22   0   0   0   0   0   0   0   0   0   0   0]
 [ 82   0   0   0   0   0   0   0   0   0   0   0]
 [ 37   0   0   0   0   0   0   0   0   0   0   0]
 [ 77  61   0   0   0   0   0   0   0   0   0   0]
 [ 84   0   0   0   0   0   0   0   0   0   0   0]
 [ 80 132   0   0   0   0   0   0   0   0   0   0]
 [ 26  54  77 117  39   0   0   0   0   0   0   0]
 [120   0   0   0   0   0   0   0   0   0   0   0]
 [103   0   0   0   0   0   0   

In [19]:
# define model
model = Sequential()
model.add(LSTM(150, input_shape=(max_length, n_features), return_sequences=True))
model.add(AttentionDecoder(150, n_features))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 12, 150)           172800    
_________________________________________________________________
AttentionDecoder (AttentionD (None, 12, 137)           324906    
Total params: 497,706
Trainable params: 497,706
Non-trainable params: 0
_________________________________________________________________


## The Model Explained:
"The Encoding/decoding model of communication was first developed by cultural studies scholar Stuart Hall in 1973. Titled 'Encoding and Decoding in the Television Discourse', Hall's essay offers a theoretical approach of how media messages are produced, disseminated, and interpreted.[1] As an important member of the Birmingham School of Cultural Studies, Hall had a major influence on media studies. His model claims that television and other media audiences are presented with messages that are decoded, or interpreted in different ways depending on an individual's cultural background, economic standing, and personal experiences. In contrast to other media theories that disempower audiences, Hall proposed that audience members can play an active role in decoding messages as they rely on their own social contexts, and might be capable of changing messages themselves through collective action. In simpler terms, encoding/decoding is the translation of a message that is easily understood. When you decode a message, you extract the meaning of that message in ways that make sense to you. Decoding has both verbal and non-verbal forms of communication: Decoding behavior without using words means observing body language and its associated emotions. For example, some body language signs for when someone is upset, angry, or stressed would be a use of excessive hand/arm movements, red in the face, crying, and even sometimes silence. Sometimes when someone is trying to get a message across to someone, the message can be interpreted differently from person to person. Decoding is all about the understanding of what someone already knows, based on the information given throughout the message being received. Whether there is a large audience or exchanging a message to one person, decoding is the process of obtaining, absorbing, understanding, and sometimes using the information that was given throughout a verbal or non-verbal message." ~ Wikipedia (https://en.wikipedia.org/wiki/Encoding/decoding_model_of_communication)

[![chatbot](img/seq2seq.png)](img/seq2seq.png)

In [20]:
# train the model for 40% of the length for number of features
for a in tqdm(range(0, n_features//10*4)):
    for n in range(0, len(padded_q_docs)):
        # transform xy
        X,y = transform_xy(padded_q_docs[n], padded_a_docs[n], n_features)
        
        # fit model for one epoch on this sequence
        model.fit(X, y, epochs=1, verbose=0)

100%|██████████| 52/52 [00:56<00:00,  1.09s/it]


# It took Monty about a minute to learn...

[![chatbot](img/study.jpg)](img/study.jpg)

In [21]:
# print 3 sets of questions, expected response and predicted response
for n in range(12, 15):
    X,y = transform_xy(padded_q_docs[n], padded_a_docs[n], n_features)
    yhat = model.predict(X, verbose=0)
    print('Set #{}'.format(n))
    print('Question Array:', one_hot_decode(X[0]),'\nQuestion :', array_to_string(one_hot_decode(X[0]), t_df), '\n')
    print('Expected Response Array:', one_hot_decode(y[0]), '\nExpected Response:', array_to_string(one_hot_decode(y[0]), t_df), '\n')
    print('Predicted Response Array:', one_hot_decode(yhat[0]), '\nPredicted Response:', array_to_string(one_hot_decode(yhat[0]), t_df), '\n')

# print accuracy of model
total, correct = len(padded_q_docs), 0
for n in range(total):
    X,y = transform_xy(padded_q_docs[n], padded_a_docs[n], n_features)
    yhat = model.predict(X, verbose=0)
    if array_equal(one_hot_decode(y[0]), one_hot_decode(yhat[0])):
        correct += 1
print('Total Training Accuracy: %.2f%%' % (float(correct)/float(total)*100.0))

Set #12
Question Array: [98, 54, 27, 108, 0, 0, 0, 0, 0, 0, 0, 0] 
Question : what is your name                 

Expected Response Array: [64, 96, 79, 61, 118, 61, 100, 58, 45, 0, 0, 0] 
Expected Response: i cannot tell you because you are a stranger       

Predicted Response Array: [64, 96, 79, 61, 118, 61, 100, 58, 45, 0, 0, 0] 
Predicted Response: i cannot tell you because you are a stranger       

Set #13
Question Array: [52, 102, 100, 61, 0, 0, 0, 0, 0, 0, 0, 0] 
Question : how old are you                 

Expected Response Array: [26, 54, 106, 2, 61, 87, 58, 129, 0, 0, 0, 0] 
Expected Response: that is not something you ask a lady         

Predicted Response Array: [26, 54, 106, 2, 61, 87, 58, 129, 0, 0, 0, 0] 
Predicted Response: that is not something you ask a lady         

Set #14
Question Array: [98, 25, 61, 25, 0, 0, 0, 0, 0, 0, 0, 0] 
Question : what do you do                 

Expected Response Array: [17, 119, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Expected Response: profes

In [22]:
test_df = pd.read_csv('test.csv')
test_q_list, test_q_as_array = word_to_array(test_df.question, t_df)
test_a_list, test_a_as_array = word_to_array(test_df.answer, t_df)

test_padded_q_docs = pad_sequences(test_q_as_array, maxlen=max_length, padding='post')
test_padded_a_docs = pad_sequences(test_a_as_array, maxlen=max_length, padding='post')

# print 3 sets of questions, expected response and predicted response
for n in range(len(test_padded_q_docs)):
    X,y = transform_xy(test_padded_q_docs[n], test_padded_a_docs[n], n_features)
    yhat = model.predict(X, verbose=0)
    print('Set #{}'.format(n))
    print('Question Array:', one_hot_decode(X[0]), '\nQuestion :', array_to_string(one_hot_decode(X[0]), t_df), '\n')
    print('Expected Response Array:', one_hot_decode(y[0]), '\nExpected Response:', array_to_string(one_hot_decode(y[0]), t_df), '\n')
    print('Predicted Response Array:', one_hot_decode(yhat[0]), '\nPredicted Response:', array_to_string(one_hot_decode(yhat[0]), t_df), '\n')

# print accuracy of model
total, correct = len(test_padded_q_docs), 0
for n in range(total):
    X,y = transform_xy(test_padded_q_docs[n], test_padded_a_docs[n], n_features)
    yhat = model.predict(X, verbose=0)
    if array_equal(one_hot_decode(y[0]), one_hot_decode(yhat[0])):
        correct += 1
print('Total Test Accuracy: %.2f%%' % (float(correct)/float(total)*100.0), '\n') 

Set #0
Question Array: [82, 91, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Question : hello there                     

Expected Response Array: [22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Expected Response: hi                       

Predicted Response Array: [22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Predicted Response: hi                       

Set #1
Question Array: [52, 54, 6, 66, 0, 0, 0, 0, 0, 0, 0, 0] 
Question : how is it going                 

Expected Response Array: [84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Expected Response: great                       

Predicted Response Array: [84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Predicted Response: great                       

Set #2
Question Array: [52, 100, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Question : how are you                   

Expected Response Array: [77, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Expected Response: good you                     

Predicted Response Array: [77, 61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
Predicted Response: good you                

In [23]:
# create sentences that are not in the list of questions and answers list
sent0 = "hello"
sent1 = "how goes it?"
sent2 = "what do you do for work?"
sent3 = "can you tell me your name?"
sent4 = "you're a mean robot"
sent5 = "bye"

for n in range(0, 6):
    print('User: ', eval('sent'+str(n)),
          '\nMonty: ', get_response(eval('sent'+str(n)), t_df, max_length, n_features, model))

User:  hello 
Monty:  hi                      
User:  how goes it? 
Monty:  great                      
User:  what do you do for work? 
Monty:  talking to interesting people someone thats not you        
User:  can you tell me your name? 
Monty:  i cannot tell you because you are a stranger      
User:  you're a mean robot 
Monty:  hahahahaha                      
User:  bye 
Monty:  goodbye                      


[![chatbot](img/thankyou.jpg)](img/thankyou.jpg)
# ... for chatting with Monty
<br/><br/><br/>
# Works Cited
Morgan, Blake. Forbes. 03 21, 2017. https://www.forbes.com/sites/blakemorgan/2017/03/21/how-chatbots-will-transform-customer-experience-an-infographic/#646faa017fb4 (accessed 03 22, 2018).

Nielsen, Jakob. Nielsen Norman Group. 09 12, 2011. https://www.nngroup.com/articles/how-long-do-users-stay-on-web-pages/ (accessed 03 22, 2018).