# Easy Chat Bot
Let's import some useful tools

In [1]:
import json
from collections import defaultdict
import string
import numpy as np
from tensorflow import keras
from random import randint

The data is stored in 'intents.json' file that I'm not sure where I have found

In [2]:
with open('intents.json') as fp:
    contents = json.loads(fp.read())['intents']

Let's create a function for parsing a string into separate words

In [3]:
translator = str.maketrans('','',string.punctuation + '0123456789')

# function for parsing. We remove punctuation and numbers, map to lowercase and split at spaces
def parse(sentence):
    return sentence.translate(translator).lower().split()

In [4]:
unique_words = set() # we'll need a set to find every word occuring in the training data

We'll prepare the training data

In [5]:
tokenized_sentences = [] # arrays of tokenized sentences for each class of input text
class_responses = [] # responses for each class of input text

for i, text_data in enumerate(contents):
    name = text_data['tag']
    tokenized_sentences.append([])
    class_responses.append([])
    
    for sentence in text_data['patterns']:
        tok_sentence = parse(sentence)
        tokenized_sentences[i].append(tok_sentence)
        
        for word in tok_sentence:
            unique_words.add(word)
    
    for sentence in text_data['responses']:
        class_responses[i].append(sentence)

To create a bag of words vector we need to assign an index for each word occuring in training input data

In [6]:
unique_word_list = sorted(unique_words)
indexing = defaultdict(lambda: -1)
for i, word in enumerate(unique_word_list):
    indexing[word] = i

In [7]:
# function for bag of words encoding
def encode_bow(string_list,indexing,unique):
    vector = np.zeros(len(indexing.keys()))
    for word in string_list:
        if word in unique:
            vector[indexing[word]] += 1
    return vector

Now we only need to vectorize the sentences ...

In [8]:
sentences_encoded = []
X,Y = [],[]

for i, sentences in enumerate(tokenized_sentences):
    y_vector = np.zeros(len(class_responses))
    y_vector[i] = 1
    for sentence in sentences:
        X.append(encode_bow(sentence,indexing,unique_words))
        Y.append(y_vector)
        
order = np.random.permutation(len(X))

X,Y = np.array(X)[order],np.array(Y)[order]

... initialize a model ...

In [9]:
model = keras.Sequential()
model.add(keras.layers.Dense(200, input_shape=(len(X[0]),), activation='relu'))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(len(Y[0]), activation='softmax'))

optimizer = keras.optimizers.SGD(momentum=0.6,decay=10**-6,nesterov=True)

model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

... and train it

In [10]:
model.fit(X, Y, epochs=300, batch_size=5,verbose=0)

<keras.callbacks.History at 0x26a73d1cac0>

### Output
To choose a response we first need to categorize user's input. Our model will classify the input text

In [11]:
def class_predict(sentence, model, indexing, unique_words):
    vector = encode_bow(parse(sentence),indexing, unique_words)
    prediction = model.predict(vector.reshape(1,-1)).squeeze()
    if prediction.max() < 0.2:
        return None
    else: return prediction.argmax()

We'll be choosing a random sentence from responses for each input class

In [12]:
def get_response(class_index, class_responses):
    responses = class_responses[class_index]
    n = len(responses)
    return responses[randint(0,n-1)]

Let's see how it works

In [13]:
v = class_predict("hello there",model,indexing,unique_words)
print(get_response(v,class_responses))

Hi there, how can I help?


In [14]:
v2 = class_predict("can you help me out please?",model,indexing,unique_words)
print(get_response(v2,class_responses))


Offering support for Adverse drug reaction, Blood pressure, Hospitals and Pharmacies


In [15]:
v3 = class_predict("Okay thank you I'll need to develop you further in the future",model,indexing,unique_words)
print(get_response(v3,class_responses))

Any time!


Unsurprisingly such small dataset won't allow us to create any complex chatbot. Moreover the bag of words methode isn't perfect(for example it doesn't allow us to include such important information as word order). I'll need to get better knowledge on natural language processing to develop this project further