<center><img src="https://github.com/insaid2018/Term-1/blob/master/Images/INSAID_Full%20Logo.png?raw=true" width="25%" /></center>

# <center><b>Making of a simple interactive chatbot<b></center>

# **Table of Contents**
---

**1.** [**Problem Statement**](#Section1)<br>
**2.** [**Importing Libraries**](#Section2)<br>
**3.** [**Data-Corpus Formation**](#Section3)<br>
**4.** [**Preprocessing**](#Section4)<br>
**5.** [**Greetings**](#Section5)<br>
**6.** [**Genarate response**](#Section6)<br>
**7.** [**Applications**](#Section7)<br>
**8.** [**Limitations**](#Section8)<br>
**9.** [**Conclusion**](#Section9)<br>

---
<a name = Section1></a>
# **1. Problem Statement:**
---
- A **chatbot or chatterbot** is a **software application** used to **conduct an on-line chat conversation** via **text or text-to-speech**, in lieu of providing direct contact with a live human agent

- The goal of this project is to develop a **simple chatbot** from the scratch.

<br> 
<center><img src="https://trymondo.com/wp-content/uploads/2020/11/Chatbot.gif" /></center>

### **Scenario:**

- You are working for a technical start up who are willing to launch their own chatbot.

- They want you to develop this **in house chatbot**.

- The bot should be able to return responses for a given set of input(s).



---
<a name = Section2></a>
# **2. Importing Libraries:**
---

In [1]:
import nltk                                                     # importign nltk
from nltk.stem import WordNetLemmatizer                         # Importing wordnetlematizer
lemmatizer = WordNetLemmatizer()                                # Initiating instance
import json                                                     # importing json    
import pickle                                                   # importing pickle    
import numpy as np                                              # importing numpy
from keras.models import Sequential                             # importing sequential layers
from keras.layers import Dense, Activation, Dropout             # importing other layers
from tensorflow.keras.optimizers import SGD                     # importing SGD
import random                                                   # importing random
import tensorflow                                               # importing tensorflow
from tensorflow.keras import Sequential                         # importing sequential   
from tensorflow.keras.layers import Dense, Dropout              # importing dense and dropout layers
nltk.download('punkt')                                          # downloading punkt
nltk.download('wordnet')                                        # downloading wordnet
import string                                                   # importing string        
from sklearn.feature_extraction.text import TfidfVectorizer     # importing TfIDF    
from sklearn.metrics.pairwise import cosine_similarity          # importing cosine similarity

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


---
<a name = Section3></a>
# **3. Data-Corpus Formation:**
---
- Here, we will taking the .txt file which has been taken from <a href="https://en.wikipedia.org/wiki/Chatbot">here</a>
and make the **data corpus** out of it.

- A **data corpus** is a collection of linguistic data.

- Then we will set all the **sent tokens** and the **word tokens**.

In [3]:
# Reading the data from the json file
data = open('/content/AI Content for chatbot.txt','r', errors ='ignore')
raw = data.read()

# Making a list for all the words
all_words=[]

# Making a list for all the classes
all_classes = []

# Making a list for all documents
all_documents = []

# Ignoring ? and !
ignore_words = ['?', '!']

In [4]:
# Lowercasing every word of the corpus
raw = raw.lower()

In [5]:
# Raw data to sentence
sent_tokens =  nltk.sent_tokenize(raw)

In [6]:
sent_tokens

['artificial intelligence (ai) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans.',
 'leading ai textbooks define the field as the study of "intelligent agents": any system that perceives its environment and takes actions that maximize its chance of achieving its goals.',
 '[a] some popular accounts use the term "artificial intelligence" to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving", however, this definition is rejected by major ai researchers.',
 '[b]\n\nai applications include advanced web search engines (e.g., google), recommendation systems (used by youtube, amazon and netflix), understanding human speech (such as siri and alexa), self-driving cars (e.g., tesla), automated decision-making and competing at the highest level in strategic game systems (such as chess and go).',
 '[2][citation needed] as machines become increasingly

In [7]:
# Raw data to list of words
word_tokens = nltk.word_tokenize(raw)

In [8]:
word_tokens

['artificial',
 'intelligence',
 '(',
 'ai',
 ')',
 'is',
 'intelligence',
 'demonstrated',
 'by',
 'machines',
 ',',
 'as',
 'opposed',
 'to',
 'natural',
 'intelligence',
 'displayed',
 'by',
 'animals',
 'including',
 'humans',
 '.',
 'leading',
 'ai',
 'textbooks',
 'define',
 'the',
 'field',
 'as',
 'the',
 'study',
 'of',
 '``',
 'intelligent',
 'agents',
 "''",
 ':',
 'any',
 'system',
 'that',
 'perceives',
 'its',
 'environment',
 'and',
 'takes',
 'actions',
 'that',
 'maximize',
 'its',
 'chance',
 'of',
 'achieving',
 'its',
 'goals',
 '.',
 '[',
 'a',
 ']',
 'some',
 'popular',
 'accounts',
 'use',
 'the',
 'term',
 '``',
 'artificial',
 'intelligence',
 "''",
 'to',
 'describe',
 'machines',
 'that',
 'mimic',
 '``',
 'cognitive',
 "''",
 'functions',
 'that',
 'humans',
 'associate',
 'with',
 'the',
 'human',
 'mind',
 ',',
 'such',
 'as',
 '``',
 'learning',
 "''",
 'and',
 '``',
 'problem',
 'solving',
 "''",
 ',',
 'however',
 ',',
 'this',
 'definition',
 'is',
 'r

---
<a name = Section4></a>
# **4. Preprocessing**
---

In [9]:
# Creating a lemitization object
lemmer = nltk.stem.WordNetLemmatizer()

In [10]:
def Lemtokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

In [11]:
# removing punctutations:
remove_punc = dict((ord(x), None) for x in string.punctuation)

In [12]:
def LemNormalize(text):
    return Lemtokens(nltk.word_tokenize(text.lower().translate(remove_punc)))

---
<a name = Section5></a>
# **5. Greetings:**
---

- In this section we will making the greeting function.
- This greeting function will take a user input such as **Hello**.
- After this this greeting function will return the value of the corresponding greeting message such as **Hello there!**
- This greeting function has been made with only few of the elements which are there in the greeting_input_list and hence, the response is also limied.

In [13]:
greeting_input_list = ["hello","hi","greetings","what's up","hey"]

In [14]:
greeting_response_list = ['Hello there!', 'how are you', 'greetings to you too','hello!']

In [15]:
# Once called this will return response for every user input given:
def greet(sentence):
    for word in sentence.split():
        if word.lower() in greeting_input_list:
            return(random.choice(greeting_response_list))

---
<a name = Section6></a>
# **6. Genarating Responses:**
---
- In this section we will be feeding a list of inputs to the bot and genarate the responses from the bot accordingly.

- If the bot fails to predict the output of a response or the response is not known the bot will return **'Aplogies! Could not understand that'**



In [16]:
def responses(user_input):
    bot_responses = ''
    sent_tokens.append(user_input)
    vect = TfidfVectorizer(tokenizer=LemNormalize,
                           stop_words = 'english')
    
    tfidf = vect.fit_transform(sent_tokens)
    
    vals = cosine_similarity(tfidf[-1],
                             tfidf)
    idx = vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        bot_responses = bot_responses + 'Aplogies! Could not understand that'
        return bot_responses
    else:
        bot_responses = bot_responses+sent_tokens(idx)
        return bot_responses

In [18]:
flag =  True
print('MyBOT: Hello there! I am the MyBOT and I am there to help you out and if you wanna quit simply type thank you or Bye')

while(flag == True):
    user_input = input()
    user_input = user_input.lower()
    # print(user_input)
    if (user_input == 'Bye' or user_input == 'bye'):
        flag = False
        print('MyBOT: It was great talking to you. Goodbye')
    elif (user_input != 'Bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            flag = False
            print('MyBOT: Welcome')
        else:
            if(greet(user_input)!= None):
                print('MyBOT: '+ greet(user_input))
            else:
                print('MyBOT: ',end='')
                print(responses(user_input))
                sent_tokens.remove(user_input)

MyBOT: Hello there! I am the MyBOT and I am there to help you out and if you wanna quit simply type thank you or Bye
Hello
MyBOT: greetings to you too
How are you ?
MyBOT: 

  % sorted(inconsistent)


Aplogies! Could not understand that
Ok


  % sorted(inconsistent)


MyBOT: Aplogies! Could not understand that
Thank you
MyBOT: Welcome


---
<a name = Section7></a>
# **7. Applications:**
---

- This chatbot if trained with a large amount of data can be used as a **general purpose chatbot**.

- Areas such as **BFSI, Retail and Ecommerce** will have a wide range of applications at the time of customer dealing with the help of this chatbot.

---
<a name = Section8></a>
# **8. Limitations:**
---

- As because this is a **prototype** this will thorow errors as a bulk amount of data processing is required.

- The **accuracy** of this bot is relatively low as because of **the size of the training data corpus**.

---
<a name = Section9></a>
# **9. Conclusion:**
---

- In this project we have successfully developed a chatbot that can deal with **human responses.**

- This can be further tuned to develop a **human handoff** such as **RASA NLU**

- As because this is a prototype and the size of the **training data corpus** this chatbot is throwing with low accuracy and can accurately work the data that has been fed.