## Creating ChatBot Using Natural Language Processing in Python

### What is a chatbot ? 

##### A chatbot is a computer program that is designed to simulate conversation with human users, typically over messaging platforms, such as Facebook Messenger, WhatsApp, or Slack. Chatbots use Natural Language Processing (NLP) and Artificial Intelligence (AI) to understand and respond to user queries in a human-like manner.

##### Chatbots can be designed to perform a variety of tasks, such as answering customer queries, providing product recommendations, scheduling appointments, or even just engaging in casual conversation. Some chatbots are rule-based, meaning they respond based on pre-programmed rules, while others are more advanced and use machine learning algorithms to learn from user interactions and improve their responses over time.


Here are some of the major fields of Natural Language Processing (NLP):

- Morphology: This field deals with the study of the internal structure of words and how they are formed.

- Syntax: This field deals with the study of the structure of sentences and the rules governing the arrangement of words and phrases.

- Semantics: This field deals with the study of meaning in language and how words and sentences are used to convey meaning.

- Discourse Analysis: This field deals with the study of how language is used in a given context, including the relationship between speakers, the intended audience, and the purpose of communication.

- Text Mining: This field involves the use of statistical and machine learning techniques to analyze and extract information from large collections of text.

- Sentiment Analysis: This field involves the use of NLP techniques to automatically determine the sentiment or emotion expressed in a piece of text.

- Named Entity Recognition: This field involves the use of NLP techniques to automatically identify and classify named entities such as people, organizations, and locations in text.

- Machine Translation: This field involves the use of NLP techniques to automatically translate text from one language to another.

- Speech Recognition: This field involves the use of NLP techniques to automatically transcribe spoken language into text.

- Text-to-Speech: This field involves the use of NLP techniques to convert text into spoken language.

#### Note : 

- In Natural Language Processing (NLP), we often work with JSON data because JSON is a lightweight and flexible data format that is easy to read and write for both humans and machines. JSON stands for "JavaScript Object Notation" and it is a text-based format for representing data in the form of key-value pairs.

- JSON is widely used in web development and API (Application Programming Interface) design, and many NLP tools and platforms also use JSON to represent and exchange data. For example, in NLP, we may use JSON to represent text data, such as sentences or documents, along with metadata such as author, date, or source.

- JSON data can be easily parsed and manipulated by programming languages such as Python, making it a popular choice for NLP tasks that involve data preprocessing, analysis, or modeling. JSON also allows for hierarchical and nested data structures, which can be useful for representing complex linguistic data such as parse trees or dependency graphs.

### 1 - Importing libraries

- JSON: It is possible to utilize it to work with JSON data.

- String: Provides access to several potentially valuable constants.

- Random: For various distributions, this module implements pseudo-random number generators.

- WordNetLemmatizer: It can lemmatize. In other terms, Lemmatization is the process of reducing a word to its base or root form. WordNetLemmatizer is available through the NLTK (Natural Language Toolkit) library

- Tensorflow: A multidimensional array of elements is represented by this symbol.

- Sequential: Sequential groups a linear stack of layers into a tf.keras.Model.

In [None]:
import json
import string
import random
import nltk
import numpy as num
from nltk.stem import WordNetLemmatizer # It has the ability to lemmatize.
import tensorflow as tensorF # A multidimensional array of elements is represented by this symbol.
from tensorflow.keras import Sequential # Sequential groups a linear stack of layers into a tf.keras.Model
from tensorflow.keras.layers import Dense, Dropout

nltk.download("punkt")# required package for tokenization
nltk.download("wordnet")# word database

In [None]:
#3 Loading the Dataset: intents.json

data_file = open('/content/Data.json').read()
data = json.loads(data_file)

data

### Processing data

In [None]:
lm = WordNetLemmatizer() #reducing words to their base or dictionary form

ourClasses = []
newWords = []
documentX = []
documentY = []
# Each intent is tokenized into words and the patterns and their associated tags are added to their respective lists.
for intent in data["ourIntents"]:
    for pattern in intent["patterns"]:
        ournewTkns = nltk.word_tokenize(pattern)
        newWords.extend(ournewTkns)
        documentX.append(pattern)
        documentY.append(intent['tag'])
    if intent["tag"] not in ourClasses:
        ourClasses.append(intent["tag"])

newWords = [lm.lemmatize(word.lower()) for word in newWords if word not in string.punctuation]
newWords = sorted(set(newWords))
ourClasses = sorted(set(ourClasses))

This is the preparation of the data that will be used to train an NLP model to recognize intents and generate appropriate responses. The newWords list is likely to be used to create a vocabulary of all the unique words that appear in the training data, while the documentX and documentY lists are probably going to be used as the input and output data for the NLP model. The ourClasses list may be used to define the set of possible intents that the chatbot can understand.

### Designing a neural network model

The code below is used to turn our data into numerical values using bag of words (BoW) encoding system:

In [None]:
trainingData  = []
outEmpty = [0] * len(ourClasses)

for idx, doc in enumerate(documentX):
    bag0words = []
    text = lm.lemmatize(doc.lower())
    for word in newWords :
        bag0words.append(1) if word in text else bag0words.append(0)

    outputRow = list(outEmpty)
    outputRow[ourClasses.index(documentY[idx])] = 1 
    trainingData.append([bag0words, outputRow])

random.shuffle(trainingData)
trainingData = num.array(trainingData, dtype=object)

x = num.array(list(trainingData[:,0]))
y = num.array(list(trainingData[:,1]))

- DocumentX contains a list of documents (or text data) to be classified.

- lm.lemmatize is a method that is likely used to perform lemmatization on the text data. 

- Lemmatization is the process of reducing a word to its base form (e.g., "running" to "run").

- newWords appears to be a list of words that the model will use to create the bag-of-words representation of the text data.

- For each document, the code creates a bag0words list that contains 1's and 0's to indicate whether each word in newWords appears in the document. This is the bag-of-words representation.

- ourClasses appears to be a list of the classes (or labels) that the model will be trained to predict.

- documentY contains a list of labels corresponding to each document in documentX.

- outEmpty is a list of 0's with length equal to the number of classes.

- For each document, the code creates an outputRow list that is initialized with outEmpty, and then sets the corresponding index to 1 to indicate the 
correct class for that document.

The trainingData list is created by appending each bag0words list and outputRow list as a pair.

- The trainingData list is shuffled to randomize the order of the pairs.

- num.array is used to create arrays from the bag0words and outputRow lists in trainingData, which are assigned to x and y, respectively.

#### Defining and training a neural network model 


In [None]:
iShape = (len(x[0]),)
oShape = len(y[0])

Model = Sequential()

Model.add(Dense(128, activation="relu" , input_shape=iShape))

Model.add(Dropout(0.5)) #Dropout is a regularization technique that randomly drops out (sets to zero) some of the inputs to a layer during training to prevent overfitting.

Model.add(Dense(64, activation="relu"))

Model.add(Dropout(0.3))

Model.add(Dense(oShape, activation='softmax'))

md = tensorF.keras.optimizers.Adam(learning_rate= 0.01)

Model.compile(optimizer=md, loss='categorical_crossentropy', metrics=['accuracy'])

print(Model.summary())

Model.fit(x,y, epochs=200, verbose=1)

#### Building useful features

In [None]:
def ourText(text):
  newtkns = nltk.word_tokenize(text)
  newtkns = [lm.lemmatize(word) for word in newtkns]
  return newtkns

def wordBag(text, vocab):
  newtkns = ourText(text)
  bagOwords = [0] * len(vocab)
  for w in newtkns:
    for idx, word in enumerate(vocab):
      if word == w:
        bagOwords[idx] = 1
  return num.array(bagOwords)

def Pclass(text, vocab, labels):
  bagOwords = wordBag(text, vocab)
  ourResult = Model.predict(num.array([bagOwords]))[0]
  newThresh = 0.2
  yp = [[idx, res] for idx, res in enumerate(ourResult) if res > newThresh]

  yp.sort(key=lambda x: x[1], reverse=True)
  newList = []
  for r in yp:
    newList.append(labels[r[0]])
  return newList

def getRes(firstlist, fJson):
  tag = firstlist[0]
  listOfIntents = fJson["ourIntents"]
  for i in listOfIntents:
    if i["tag"] == tag:
      ourResult = random.choice(i["responses"])
      break
  return ourResult

In [None]:
while True:
    newMessage = input("")
    intents = Pclass(newMessage, newWords, ourClasses)
    ourResult = getRes(intents, data)
    print(ourResult)

## And yeah!! This is our little chatbot *_*

#### At the end of our notebook, I want to mention that this is a simple chatbot which is a rule based chatbot, that can answer just questions existing in the JSON file, and if we ask a new unexisting question this chatbot will answer with anaccurate answers, in contarary with AI based chatbots which learn from data and adapt its responses over time. AI chatbots are trained on large datasets of conversations, which helps them to understand the nuances of language and provide more personalized responses. They can handle a wider range of inputs and can provide more intelligent and sophisticated responses