![title](chat_bot_2.PNG)

# Build a chatbot using deep neural network

### Screenshot of chatbot Data

![title](chat_bot.PNG)

## Preprocessing steps:


We want our data to be in this format.

    patterns (X)                tags(y)
    (1)'hi'                    'greetings'
    (2)'How are you'           'greetings'
    (3)'Is anyone there'       'greetings'
    (4)'Bye'                   'goodbye'
    (5)'See you later'         'goodbye'

The 'pattern' will be our train data (X) while our tags will be our 'Target variable' i.e y (what we want to predict).<br>
Based on the predicted tags (y) we can then select response associated to the predicted tag.

we tokenized each sentence, lowercase and stem each word. Then finally we encode each words using bag of words

# Load Libaries

In [1]:
import nltk
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()

import numpy
import random
import json
import pickle

## Bag of words from coding from the scratch

In [2]:
# open json file
with open("chatbot_data.json") as file:
    data = json.load(file)

try:
    with open("data.pickle", "rb") as f:
        words, labels, training, output = pickle.load(f)
except:
    words = []
    labels = []
    docs_x = []
    docs_y = []

    for intent in data["intents"]:
        for pattern in intent["patterns"]: 
            # tokenized each word in a pattern
            wrds = nltk.word_tokenize(pattern)
            
           # append all words into a single list.
            words.extend(wrds)
            
            # append words list as a document in a docs_x list.
            docs_x.append(wrds)
            
            # append each tags into a list (docs_y)
            docs_y.append(intent["tag"])

        if intent["tag"] not in labels:
            # append unique tages
            labels.append(intent["tag"])

            # lowercase each words and stem
    words = [stemmer.stem(w.lower()) for w in words if w != "?"]
    words = sorted(list(set(words))) #ensuring uniqueness of words

    labels = sorted(labels)

    training = []
    output = []

    out_empty = [0 for _ in range(len(labels))]

    for x, doc in enumerate(docs_x):
        bag = []
        
        #stem and normalised each words in doc_x
        wrds = [stemmer.stem(w.lower()) for w in doc]
        
        #apply bag of words
        for w in words:
            if w in wrds:
                bag.append(1)
            else:
                bag.append(0)

        output_row = out_empty[:]
        output_row[labels.index(docs_y[x])] = 1

        training.append(bag)
        output.append(output_row)


    training = numpy.array(training)
    output = numpy.array(output)

    with open("data.pickle", "wb") as f:
        pickle.dump((words, labels, training, output), f)



## Bag of words from Keras Api
#### (Alternative method)

In [3]:
with open("chatbot_data.json") as file:
    data = json.load(file)
    
all_pattern_sentence = []
for intent in data["intents"]:
        for pattern in intent["patterns"]:
            all_pattern_sentence.append(pattern)
            

In [4]:
#import tensorflow
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
tran = Tokenizer()
tran.fit_on_texts(all_pattern_sentence)

a = tran.texts_to_matrix(all_pattern_sentence)


Using TensorFlow backend.


In [5]:
all_pattern_sentence[:5]

['Hi', 'How are you', 'Is anyone there?', 'Hello', 'Good day']

### Compare results from both method of Bag Of Words

In [6]:
a[:5] # bag of words from keras api

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0.]])

In [7]:
training[:5] # bag of words from manual coding

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

***The example shown above is to show that we can do the bag of words encoding using python library.***

***However, in this illustration we will be using Bag of words (BOW) code from the scratch ***

## Building the deeplearning model using Keras

In [8]:
from keras.layers import Dense
from keras.models import Sequential

In [21]:
model = Sequential()
model.add(Dense(5,input_dim =len(training[0])))
model.add(Dense(8,activation = 'relu'))
model.add(Dense(8,activation = 'relu'))
model.add(Dense(len(output[0]),activation="softmax"))
#model.summary()
model.compile(optimizer = 'adam',loss='categorical_crossentropy', metrics = ['accuracy'])

try:
    model.load("model.h5")
except:
    model.fit(training, output, epochs=170, batch_size=8)
    model.save("model.h5")

Epoch 1/170
Epoch 2/170
Epoch 3/170
Epoch 4/170
Epoch 5/170
Epoch 6/170
Epoch 7/170
Epoch 8/170
Epoch 9/170
Epoch 10/170
Epoch 11/170
Epoch 12/170
Epoch 13/170
Epoch 14/170
Epoch 15/170
Epoch 16/170
Epoch 17/170
Epoch 18/170
Epoch 19/170
Epoch 20/170
Epoch 21/170
Epoch 22/170
Epoch 23/170
Epoch 24/170
Epoch 25/170
Epoch 26/170
Epoch 27/170
Epoch 28/170
Epoch 29/170
Epoch 30/170
Epoch 31/170
Epoch 32/170
Epoch 33/170
Epoch 34/170
Epoch 35/170
Epoch 36/170
Epoch 37/170
Epoch 38/170
Epoch 39/170
Epoch 40/170
Epoch 41/170
Epoch 42/170
Epoch 43/170
Epoch 44/170
Epoch 45/170
Epoch 46/170
Epoch 47/170
Epoch 48/170
Epoch 49/170
Epoch 50/170
Epoch 51/170
Epoch 52/170
Epoch 53/170
Epoch 54/170
Epoch 55/170
Epoch 56/170
Epoch 57/170
Epoch 58/170
Epoch 59/170
Epoch 60/170
Epoch 61/170
Epoch 62/170
Epoch 63/170
Epoch 64/170
Epoch 65/170
Epoch 66/170
Epoch 67/170
Epoch 68/170
Epoch 69/170
Epoch 70/170
Epoch 71/170
Epoch 72/170
Epoch 73/170
Epoch 74/170
Epoch 75/170
Epoch 76/170
Epoch 77/170
Epoch 78

Epoch 82/170
Epoch 83/170
Epoch 84/170
Epoch 85/170
Epoch 86/170
Epoch 87/170
Epoch 88/170
Epoch 89/170
Epoch 90/170
Epoch 91/170
Epoch 92/170
Epoch 93/170
Epoch 94/170
Epoch 95/170
Epoch 96/170
Epoch 97/170
Epoch 98/170
Epoch 99/170
Epoch 100/170
Epoch 101/170
Epoch 102/170
Epoch 103/170
Epoch 104/170
Epoch 105/170
Epoch 106/170
Epoch 107/170
Epoch 108/170
Epoch 109/170
Epoch 110/170
Epoch 111/170
Epoch 112/170
Epoch 113/170
Epoch 114/170
Epoch 115/170
Epoch 116/170
Epoch 117/170
Epoch 118/170
Epoch 119/170
Epoch 120/170
Epoch 121/170
Epoch 122/170
Epoch 123/170
Epoch 124/170
Epoch 125/170
Epoch 126/170
Epoch 127/170
Epoch 128/170
Epoch 129/170
Epoch 130/170
Epoch 131/170
Epoch 132/170
Epoch 133/170
Epoch 134/170
Epoch 135/170
Epoch 136/170
Epoch 137/170
Epoch 138/170
Epoch 139/170
Epoch 140/170
Epoch 141/170
Epoch 142/170
Epoch 143/170
Epoch 144/170
Epoch 145/170
Epoch 146/170
Epoch 147/170
Epoch 148/170
Epoch 149/170
Epoch 150/170
Epoch 151/170
Epoch 152/170
Epoch 153/170
Epoch 154/

### prediction

In [23]:
#keras model
print("Start talking with the bot (type quit to stop)!")
while True:
    inp = input("You: ")
    if inp.lower() == "quit":
        break
        
    # initializing the list with zeros
    bag = [0 for _ in range(len(words))] 
    
    # tokenized and stem
    s_words = nltk.word_tokenize(inp)
    s_words = [stemmer.stem(word.lower()) for word in s_words]
    
    # bag of word
    for se in s_words:
        for i, w in enumerate(words):
            if w == se:
                bag[i] = 1
       # create a document matrix containing 1's and 0's  
    bag_of_words = numpy.array(bag)
    
    # reshape data  and predict tag.
    results = model.predict([bag_of_words.reshape(-1,32)])
    results_index = numpy.argmax(results)
    tag = labels[results_index]

    # select response based on the predicted tag.
    for tg in data["intents"]:
        if tg['tag'] == tag:
            responses = tg['responses']
            
    # select response
    print('bot :' ,random.choice(responses))


Start talking with the bot (type quit to stop)!
You: hi
bot : Hello, thanks for visiting
You: hello
bot : Hi there, how can I help?
You: when do you open
bot : Our hours are 9am-9pm every day
You: thank you
bot : Happy to help!
You: bye
bot : See you later, thanks for visiting
You: quit
