#### What is a Chatbot?
A chatbot is an AI-based software designed to interact with humans in their natural languages. These chatbots are usually converse via auditory or textual methods, and they can effortlessly mimic human languages to communicate with human beings in a human-like manner. A chatbot is arguably one of the best applications of natural language processing.

Chatbots can be categorized into two primary variants – Rule-Based and Self-learning.

#### Rule-Based Conversational Chatbot
The Rule-based approach trains a chatbot to answer questions based on a set of pre-determined rules on which it was initially trained. These set rules can either be very simple or very complex. While rule-based chatbots can handle simple queries quite well, they usually fail to process more complicated queries/requests.

#### Self-Learning Chatbots

self-learning bots are chatbots that can learn on their own. These leverage advanced technologies like Artificial Intelligence and Machine Learning to train themselves from instances and behaviours. Naturally, these chatbots are much smarter than rule-based bots. 

Self-learning bots can be further divided into two categories – **Retrieval Based and Generative.**

### 1. Retrieval-based Chatbots

A retrieval-based chatbot is one that functions on predefined input patterns and set responses. Once the question/pattern is entered, the chatbot uses a heuristic approach to deliver the appropriate response. The retrieval-based model is extensively used to design goal-oriented chatbots with customized features like the flow and tone of the bot to enhance the customer experience.

Retrieval based bots work on the principle of directed flows or graphs.The bot is trained to rank the best response from a finite set of predefined responses. The responses here are entered manually, or based on a knowledge base of pre-existing information.

Eg. What are your store timings?
Answer: 9 to 5 pm

These systems can be extended to integrate with 3rd Party systems as well.

Eg. Where is my order?
Answer: It’s on its way and should reach you in 10 mins

Retrieval based bots are the most common types of chatbots that you see today. They allow bot developers and UX to control the experience and match it to the expectations of our customers. They work best for goal-oriented bots in customer support, lead generation and feedback. We can decide the tone of the bot, and design the experience, keeping in mind the customer’s brand and reputation.

### 2. Generative Chatbots
Unlike retrieval-based chatbots, generative chatbots are not based on predefined responses – they leverage seq2seq neural networks. This is based on the concept of machine translation where the source code is translated from one language to another language. In seq2seq approach, the input is transformed into an output.

### How To Make A Chatbot In Python?

To build a chatbot in Python, we have to import all the necessary packages and initialize the variables to use in your chatbot project. Also, remember that when working with text data, we need to perform data preprocessing on we dataset before designing an ML model.

This is where tokenizing helps with text data – it helps fragment the large text dataset into smaller, readable chunks (like words). Once that is done, we can also go for lemmatization that transforms a word into its lemma form. Then it creates a pickle file to store the python objects that are used for predicting the responses of the bot. 

Another vital part of the chatbot development process is creating the training and testing datasets.

### Retrieval-based Chatbot building with help of Natural Language Processing(NLP) using NLTK and Deep Learning.



In [None]:
# Libraries needed for NLP
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
lemmatizer = WordNetLemmatizer()
nltk.download('punkt')

# Libraries needed for Tensorflow processing
import tensorflow as tf
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import SGD
import random
import json

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
from google.colab import files
files.upload()

# import our chat-bot intents file
with open('intents.json') as json_data:
    intents = json.load(json_data)

Saving intents.json to intents.json


In [None]:
words = []
classes = []
documents = []
ignore = ['?',"'s",'!','.']
# loop through each sentence in the intent's patterns
for intent in intents['intents']:
    for pattern in intent['patterns']:
        # tokenize each and every word in the sentence
        w = nltk.word_tokenize(pattern)
        # add word to the words list
        words.extend(w)
        # add word(s) to documents
        documents.append((w, intent['tag']))
        # add tags to our classes list
        if intent['tag'] not in classes:
            classes.append(intent['tag'])

# Perform lemmation and lower each word as well as remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore]
words = sorted(list(set(words)))

# remove duplicate classes
classes = sorted(list(set(classes)))

print (len(documents), "documents")
print (len(classes), "classes", classes)
print (len(words), "unique stemmed words", words)

73 documents
15 classes ['CertificateofDeposits', 'CheckingAccount', 'MoneyMarketAccount', 'SavingAccount', 'accounts', 'accountstype', 'fun', 'goodbye', 'greeting', 'hours', 'location', 'noanswer', 'operateoption', 'options', 'thanks']
85 unique stemmed words ['123', '555', 'a', 'account', 'address', 'again', 'an', 'anyone', 'are', 'bank', 'be', 'bye', 'byeee', 'can', 'certificate', 'checking', 'close', 'could', 'current', 'day', 'deposit', 'different', 'do', 'ffff', 'go', 'good', 'goodbye', 'hello', 'help', 'helpful', 'hey', 'hi', 'hour', 'how', 'i', 'in', 'individual', 'infomation', 'interest', 'is', 'it', 'later', 'located', 'location', 'market', 'me', 'meant', 'meet', 'money', 'new', 'nice', 'nnnn', 'of', 'offered', 'open', 'operate', 'operating', 'provide', 'rate', 'real', 'really', 'restaurant', 's', 'saving', 'see', 'situated', 'sup', 'support', 'thank', 'thanks', 'that', 'the', 'there', 'to', 'type', 'u', 'up', 'way', 'what', 'when', 'where', 'will', 'yo', 'you', 'your']


In [None]:
# create training data
training = []
output = []
# create an empty array for output
output_empty = [0] * len(classes)

# create training set, bag of words for each sentence
for doc in documents:
    # initialize bag of words
    bag = []
    # list of tokenized words for the pattern
    pattern_words = doc[0]
    # stemming each word
    pattern_words = [lemmatizer.lemmatize(w.lower()) for w in pattern_words]
    # create bag of words array

    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)

    # output is '1' for current tag and '0' for rest of other tags
    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1

    training.append([bag, output_row])

# shuffling features and turning it into np.array
random.shuffle(training)
training = np.array(training)

# creating training lists
train_x = list(training[:,0])
train_y = list(training[:,1])



In [None]:
# Create model - 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
#fitting and saving the model 
hist = model.fit(np.array(train_x), np.array(train_y), epochs=1000, batch_size=5, verbose=1)
model.save('chatbot_model.h5', hist)
print("model created")

In [None]:
print(hist.history['loss'][-1],hist.history['accuracy'][-1])


0.046818431466817856 0.9863013625144958


In [None]:
import pickle
pickle.dump( {'words':words, 'classes':classes}, open( "training_data", "wb" ) )
# restoring all the data structures
data = pickle.load( open("training_data", "rb"))
words = data['words']
classes = data['classes']

with open('intents.json') as json_data:
    intents = json.load(json_data)

In [None]:
def clean_up_sentence(sentence):
    # tokenizing the pattern
    sentence_words = nltk.word_tokenize(sentence)
    # stemming each word
    sentence_words = [lemmatizer.lemmatize(w.lower()) for w in sentence_words]
    return sentence_words

# returning bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=False):
    # tokenizing the pattern
    sentence_words = clean_up_sentence(sentence)
    # generating bag of words
    bag = [0]*len(words)  

    for s in sentence_words:
        for i,w in enumerate(words):
            if w == s: 
                bag[i] = 1
                if show_details:
                    print ("found in bag: %s" % w)

    return(np.array(bag))

In [None]:
ERROR_THRESHOLD = 0.30
from keras.models import load_model
model = load_model('chatbot_model.h5')

def classify(sentence):
    # generate probabilities from the model
    p = bow(sentence, words)
    results = model.predict(np.array([p]))[0]
    # filter out predictions below a threshold
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], r[1]))
    # return tuple of intent and probability
    return return_list

def response(sentence, show_details=False):
    results = classify(sentence)
    # if we have a classification then find the matching intent tag
    if results:
        # loop as long as there are matches to process
        while results:
            for i in intents['intents']:
                # find a tag matching the first result
                if i['tag'] == results[0][0]:
                    # a random response from the intent
                    return random.choice(i['responses'])

            results.pop(0)
    else:
       return "Sorry, can't understand you"

In [None]:
classify('What are you hours of operation?')

[('hours', 0.98955965)]

In [None]:
classify(' ')

[('noanswer', 0.9451791)]

In [None]:
response('What are you hours of operation?')

"We're open every day 9am-4pm except friday 9am-2pm"

In [None]:
response(' ')

"Sorry, can't understand you"

In [None]:
flag=True
print("My name is Chatterbot and I'm a chatbot. If you want to exit, type Bye!")

while(flag==True):
    user_response = input("You- ")
    if(user_response not in ['bye','shutdown','exit', 'quit']):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("Chatterbot : You are welcome..")
        else:
            answer = response(user_response)
            print(answer)
    else:
        flag=False
        print("Chatterbot : Bye!!! ")

My name is Chatterbot and I'm a chatbot. If you want to exit, type Bye!
You- hi
Hi there, how can I help?
You- Are you real?
I'm as real as you believe I'm
You- real?
I'm as real as you believe I'm
You- ccss
Sorry, can't understand you
You-  help you provide
I can guide you through Account, hours are we open, home delivery options
You- What hours are you open?
Our hours are 9am-4pm every day except friday 9am-2pm
You- open?
Our hours are 9am-4pm every day except friday 9am-2pm
You- your location?
We are on the intersection of London Alley and Bridge Avenue.
You- accounts types in banks?
The types of accounts are Checking Account, Saving Account, Money Market Account and CD (Certificate of Deposits) Account
You- accounts types 
The types of accounts are Checking Account, Saving Account, Money Market Account and CD (Certificate of Deposits) Account
You- accounts
Saving Account: You can save your money in such account and also earn interest(5.05%) on it. The number of withdrawal is limite

In [None]:
flag=True
print("My name is Chatterbot and I'm a chatbot. If you want to exit, type Bye!")

while(flag==True):
    user_response = input("You- ")
    if(user_response not in ['bye','shutdown','exit', 'quit']):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("Chatterbot : You are welcome..")
        else:
            answer = response(user_response)
            print(answer)
    else:
        flag=False
        print("Chatterbot : Bye!!! ")

My name is Chatterbot and I'm a chatbot. If you want to exit, type Bye!
Sorry, can't understand you
Please give me more info
Sorry, can't understand you
Not sure I understand
Not sure I understand
Not sure I understand
Saving Account: You can save your money in such account and also earn interest(5.05%) on it. The number of withdrawal is limited and need to maintain the minimum amount of balance in the account to remain active.
Not sure I understand
Please give me more info
Please give me more info
Sorry, can't understand you
Not sure I understand
Saving Account: You can save your money in such account and also earn interest(5.05%) on it. The number of withdrawal is limited and need to maintain the minimum amount of balance in the account to remain active.
Saving Account: You can save your money in such account and also earn interest(5.05%) on it. The number of withdrawal is limited and need to maintain the minimum amount of balance in the account to remain active.
Sorry, can't underst