<h1>Example of using tensorflow and DNN to create an intent based chatbot</h1>

<h3>Import NLP libraries and set stemmer to Lancaster Stemmer</h3>

In [1]:
import nltk
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()

<h3>Import libraries for Tensorflow and DNN</h3>

In [2]:
import numpy as np
import tflearn
import tensorflow as tf
import random

curses is not supported on this machine (please install/reinstall curses for an optimal experience)


<h3>Import json library and read data into memory</h3>

In [3]:
import json
with open('intents.json') as json_data:
    intents = json.load(json_data)

<h3>Create a list of each tags and their corresponding set of words</h3>

In [4]:
words = []
classes = []
documents = []
ignore_words = ['?']
for intent in intents['intents']:
    for pattern in intent['patterns']:
        w = nltk.word_tokenize(pattern)
        words.extend(w)
        documents.append((w, intent['tag']))
        if intent['tag'] not in classes:
            classes.append(intent['tag'])

<h3>Lower the case and stem each word in 'words', remove ignore words. Create sets of words and classes.</h3>

In [5]:
words = [stemmer.stem(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))

classes = sorted(list(set(classes)))

print (len(documents), "documents")
print (len(classes), "classes", classes)
print (len(words), "unique stemmed words", words)

27 documents
9 classes ['goodbye', 'greeting', 'hours', 'mopeds', 'opentoday', 'payments', 'rental', 'thanks', 'today']
48 unique stemmed words ["'d", "'s", 'a', 'acceiv', 'anyon', 'ar', 'bye', 'can', 'card', 'cash', 'credit', 'day', 'do', 'doe', 'good', 'goodby', 'hav', 'hello', 'help', 'hi', 'hour', 'how', 'i', 'is', 'kind', 'lat', 'lik', 'mastercard', 'mop', 'of', 'on', 'op', 'rent', 'see', 'tak', 'thank', 'that', 'ther', 'thi', 'to', 'today', 'we', 'what', 'when', 'which', 'work', 'yo', 'you']


<h3>Create a bag of stemmed words for each sentence and for the corresponding tag. Replacing words with a series of 0s and 1s</h3>

In [6]:
training = []
output = []
output_empty = [0] * len(classes)

for doc in documents:
    bag = []
    pattern_words = doc[0]
    pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)
    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1
    training.append([bag, output_row])

<h3>Create training x and y datasets</h3>

In [7]:
training = np.array(training)
train_x = list(training[:,0])
train_y = list(training[:,1])

<h3>Build DNN with two layers, softmax activation layer and regression estimation output layer</h3>

In [8]:
tf.reset_default_graph()
net = tflearn.input_data(shape=[None, len(train_x[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(train_y[0]), activation='softmax')
net = tflearn.regression(net)

Instructions for updating:
keep_dims is deprecated, use keepdims instead


<h3>Setup tensorboard and train model over 1000 epochs</h3>

In [9]:
model = tflearn.DNN(net, tensorboard_dir='tflearn_logs')
model.fit(train_x, train_y, n_epoch=1000, batch_size=8, show_metric=True)

Training Step: 3999  | total loss: [1m[32m0.00778[0m[0m | time: 0.015s
| Adam | epoch: 1000 | loss: 0.00778 - acc: 1.0000 -- iter: 24/27
Training Step: 4000  | total loss: [1m[32m0.00819[0m[0m | time: 0.019s
| Adam | epoch: 1000 | loss: 0.00819 - acc: 1.0000 -- iter: 27/27
--


<h3>Create function to tokenize, stem and remvoe any words not in the training set of words</h3>

In [10]:
def clean_up_sentence(sentence):
    sentence_words = nltk.word_tokenize(sentence)
    sentence_words = [stemmer.stem(word.lower()) for word in sentence_words]
    return sentence_words

<h3>Create function to create bag of words for a cleaned sentence</h3>

In [11]:
def bow(sentence, words, show_details=False):
    sentence_words = clean_up_sentence(sentence)
    bag = [0]*len(words)  
    for s in sentence_words:
        for i,w in enumerate(words):
            if w == s: 
                bag[i] = 1
                if show_details:
                    print ("found in bag: %s" % w)
    return(np.array(bag))

<h3>Define error threshold, so only consider a possible tag if the probability it is that tag is greater than 0.25</h3>

In [12]:
ERROR_THRESHOLD = 0.25

<h3>Function defined to use the model to predict a list of probabilities for each tag, exlcluding those less than the threshold</h3>

In [13]:
def classify(sentence):
    results = model.predict([bow(sentence, words)])[0]
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], r[1]))
    return return_list

<h3>Examples of classify function output</h3>

In [14]:
classify('is your shop going to be open today?')

[('payments', 0.86203825)]

In [15]:
classify('What are your hours today?')

[('opentoday', 0.9971852)]

<h3>Print a random response and its probability from the list of results generated by the classify function</h3>

In [16]:
def response(sentence):
    results = classify(sentence)
    # if we have a classification then find the matching intent tag
    print(results)
    if results:
        # loop as long as there are matches to process
        while results:
            for i in intents['intents']:
                # find a tag matching the first result
                if i['tag'] == results[0][0]:
                    # a random response from the intent
                    print(random.choice(i['responses']))
                    return (random.choice(i['responses']))
            results.pop(0)

<h3>Examples of response function output</h3>

In [17]:
response('do you take cash?')

[('payments', 0.9801344)]
We accept VISA, Mastercard and AMEX


'We accept most major credit cards'

In [18]:
response('we want to rent a moped')

[('rental', 0.9931359)]
Are you looking to rent today or later this week?


'Are you looking to rent today or later this week?'