In [15]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
import spacy
import json
import string
import re
from nltk.tokenize import word_tokenize
from gensim.corpora import Dictionary
from sklearn.preprocessing import LabelBinarizer
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Embedding, LSTM, Dense, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Introduction
Chatbots are nearly ubiquitous now, performing tasks like answering customer service questions on websites. Two basic tasks that a chatbot must perform are intent classification and named entity recognition. This can be explained by example. Suppose a user goes on a website for a clothing retailer and asks the chatbot "Do you have any red dresses." First the bot must figure out what the user wants to do, what their intent is. In this case, the intent is something like ItemLookup. But what does the user want to look up? The bot must parse the text and figure that out. This is named entity recognition.

In this notebook, I build a simple stateless chatbot that performs intent classification and basic named entity recognition. To classify intents, I use an LSTM. To perform named entity recognition, I use spaCy. Note that there are MANY limitations on this notebook, largely due to the small corpus. For example, I do not split the data into training and test sets and the model is unlikely to generalize well. These are known limitations, not methodological oversights.

# Loading the Data
The data is stored in JSON format. The data consists of intents along with associated input text and responses. At a high level, the LSTM will be trained on the input text to predict intents. Predicted intents will then be used to randomly select appropriate response text.

In [16]:
intents = json.load(open('intent.json', 'r'))

Examine all of the intents

In [17]:
for intent in intents['intents']:
    print(intent['intent'])

Greeting
GreetingResponse
CourtesyGreeting
CourtesyGreetingResponse
CurrentHumanQuery
NameQuery
RealNameQuery
TimeQuery
Thanks
NotTalking2U
UnderstandQuery
Shutup
Swearing
GoodBye
CourtesyGoodBye
WhoAmI
Clever
Gossip
Jokes
PodBayDoor
PodBayDoorResponse
SelfAware


I want to know the structure of each intent in the JSON so I can iterate through it to make my inputs and targets. To do so, I examine a single intent at random.

In [18]:
intents['intents'][10]

{'intent': 'UnderstandQuery',
 'text': ['Do you understand what I am saying',
  'Do you understand me',
  'Do you know what I am saying',
  'Do you get me',
  'Comprendo',
  'Know what I mean'],
 'responses': ['Well I would not be a very clever AI if I did not would I?',
  'I read you loud and clear!',
  'I do in deed!'],
 'extension': {'function': '', 'entities': False, 'responses': []},
 'context': {'in': '', 'out': '', 'clear': False},
 'entities': []}

I want to do NER myself, so I only need intents and texts to train. I will also need responses later.

In [20]:
X = []
y = []
all_text = ''
response_lookup = {}

for intent in intents['intents']:
    for input_text in intent['text']:
        # do some preprocessing on the text
        input_no_punct = ''.join([char.lower() for char in input_text if char not in string.punctuation]) # Remove punctuation
        # Replace "hi" (first word in corpus) with UNK
        input_with_unk = re.sub(fr'\bhi\b', 'UNK', input_no_punct) # I create an unknown token so the model can handle out of vocabulary (oov) words
        input_tokens = word_tokenize(input_with_unk) # Tokenize

        X.append(input_tokens)
        y.append(intent['intent'])

    response_lookup[intent['intent']] = []
    for response_text in intent['responses']:
        response_lookup[intent['intent']].append(response_text)

In [93]:
len(X)

143

Note there are very few input strings. If you examine the JSON, you'll note that these are disproportionately associated with things like the Joke intent as opposed to the Greeting intent. The small number of input strings are why I do not use a train/test/validation split, despite obvious drawbacks.

# Vectorize Text and Labels
The LSTM will take in the input text not as text but as lists of integers. Each integer will correspond to a word.

In [21]:
# The dictionary maps words to integers
input_dictionary = Dictionary(documents=X)

In [22]:
# Convert lists of tokens to list of indices and pad lists
pad_length = max([len(sent) for sent in X])

X_vecs = np.zeros((len(X), pad_length)) # This effectively pads shorter sequences with 0s

for i, sent in enumerate(X):
    vectorized_sent = input_dictionary.doc2idx(sent)
    X_vecs[i, :len(vectorized_sent)] = vectorized_sent

Next, I one-hot encode the labels.

In [23]:
enc = LabelBinarizer()
y_enc = enc.fit_transform(y)

# Train model
The model is simple. It uses trainable embeddings followed by an LSTM then two dense layers, the former of which has a ReLu activation and the latter of which uses a softmax for classification.

In [25]:
epochs = 100
vocab_size = len(input_dictionary)
embed_dim = 100
units = 256
output_size = y_enc.shape[1]

In [26]:
inputs = Input(shape=X_vecs.shape[1])
x = Embedding(vocab_size, embed_dim)(inputs)
x = LSTM(units)(x)
x = Dense(units, activation='relu')(x)
outputs = Dense(output_size, activation='softmax')(x)

model = Model(inputs, outputs)
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 9)]               0         
                                                                 
 embedding (Embedding)       (None, 9, 100)            11700     
                                                                 
 lstm (LSTM)                 (None, 256)               365568    
                                                                 
 dense (Dense)               (None, 256)               65792     
                                                                 
 dense_1 (Dense)             (None, 22)                5654      
                                                                 
Total params: 448,714
Trainable params: 448,714
Non-trainable params: 0
_________________________________________________________________


In [27]:
opt = Adam()
es = EarlyStopping(monitor='loss', patience=5, min_delta=0.001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics='accuracy')
model.fit(X_vecs, y_enc, epochs=epochs, callbacks=es)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100


<keras.callbacks.History at 0x20d03fcc910>

# Predictions, NER and Chatbot
The following functions are used to process input text, predict off of the processed input, perform NER if necessary and to actually run the chatbot.

In [94]:
# Need to know the vocab to know whether to replace tokens with the UNK token
vocab = list(input_dictionary.values())

The function below processes the input text. The input text is generally processed the same way as the text used to train the model. The exception is that words that are not in the training corpus are placed with the unknown token.

In [59]:
pad_token = 0 # Change in future. Not ideal since 0 = 'hi' as well

def process_message(message, input_dictionary=input_dictionary, pad_length=pad_length):
    message_no_punct = ''.join([char.lower() for char in message if char not in string.punctuation]) # Remove punctuation
    message_tokens = word_tokenize(message_no_punct) # Tokenize
    message_unk_replace = [tok if tok in vocab else 'UNK' for tok in message_tokens] # Replace oov tokens with UNK

    message_vectorized = input_dictionary.doc2idx(message_unk_replace)
    
    while len(message_vectorized) < pad_length:
        message_vectorized.append(pad_token)

    return np.array([message_vectorized])

Next, I write a function that takes in a message and returns a predicted intent. This is simple. A process message is fed into the model, which returns a probability distribution. The argmax of the distribution is used to index a list of classes.

In [80]:
intents = enc.classes_
test_greetresponse = 'my user is patrick'
test_greet = 'hi, how are you'
test_bye ='bye'

def predict_intent(message):
    processed_message = process_message(message)
    pred_dist = model.predict(processed_message)
    pred_idx = np.argmax(pred_dist)
    pred_intent = intents[pred_idx]
    return pred_intent

print(predict_intent(test_greetresponse))
print(predict_intent(test_greet))
print(predict_intent(test_bye))

GreetingResponse
CourtesyGreeting
GoodBye


The next step is to take a predicted intent and have the bot respond to it. This generally consists of printing a random response associated with an intent to the screen. There are two exceptions. First, when a user gives the bot their user name, I would like the bot to respond using the name. Second, when a user says goodbye, it should end the conversation.

The first is a named entity recognition task. When a user gives their name, I find the name using spaCy, so the bot can repeat it back. To address the second issue, I have the function return a boolean.

In [86]:
nlp = spacy.load('en_core_web_md')

def predict_and_respond(message):
    to_continue = True
    ner_intents = ['GreetingResponse', 'CourtesyGreetingResponse']
    quit_intents = ['GoodBye', 'CourtesyGoodBye']
    pred_intent = predict_intent(message)

    response = np.random.choice(response_lookup[pred_intent])

    # Find entities, if necessary
    if pred_intent in ner_intents:
        doc = nlp(message)
        entities = doc.ents
        for ent in entities:
            if ent.label_ == 'PERSON':
                user=ent.text
        response = response.replace('<HUMAN>', user)
    # Set to_continue to False if necessary
    elif pred_intent in quit_intents:
        to_continue = False

    return response, to_continue

def test_response(message):
    '''
    A function to test eh predictio and response function in a concise manner.
    '''
    print(f'Input message: {message}')
    response, to_continue = predict_and_respond(message)
    print(f'Predicted response: {response}')
    print(f'Whether to continue chatting: {to_continue}')

In [87]:
test_response(test_greet)
test_response(test_greetresponse)
test_response(test_bye)

Input message: hi, how are you
Predicted response: Hi, I am great, how are you? Please tell me your GeniSys user
Whether to continue chatting: True
Input message: my user is patrick
Predicted response: Cool! Hello patrick, what can I do for you?
Whether to continue chatting: True
Input message: bye
Predicted response: See you later
Whether to continue chatting: False


Finally, I create the chatbot. This is simply a while loop. As long as to_continue is True, the user can enter text. If the user says goodbye, to_continue is set to False and the loop breaks.

In [91]:
def chat():
    to_continue = True
    while to_continue:
        message = input('> ')
        print(message)
        response, cont = predict_and_respond(message)
        
        print(response)
        to_continue = cont
chat()

hi
Hola human, please tell me your GeniSys user
my user is patrick
Good! Hi patrick, how can I help you?
tell me a joke
A woman goes to the doctor and says, 'Doctor, my husband limps because his left leg is an inch shorter than his right leg. What would you do in his case?' 'Probably limp, too', says the doc.
bye
Bye! Come back again soon.


# Conclusion
Noting the limitations discussed in the introduction, the bot has been successfully developed.