## Creating ChatBot Using Natural Language Processing in Python

### What is a chatbot ? 

##### A chatbot is a computer program that is designed to simulate conversation with human users, typically over messaging platforms, such as Facebook Messenger, WhatsApp, or Slack. Chatbots use Natural Language Processing (NLP) and Artificial Intelligence (AI) to understand and respond to user queries in a human-like manner.

##### Chatbots can be designed to perform a variety of tasks, such as answering customer queries, providing product recommendations, scheduling appointments, or even just engaging in casual conversation. Some chatbots are rule-based, meaning they respond based on pre-programmed rules, while others are more advanced and use machine learning algorithms to learn from user interactions and improve their responses over time.

##### Chatbots have become increasingly popular in recent years, as they provide a cost-effective and scalable way for businesses to provide customer support and interact with users. They can be accessed 24/7, and can handle a high volume of inquiries without the need for human intervention.

Here are some of the major fields of Natural Language Processing (NLP):

- Morphology: This field deals with the study of the internal structure of words and how they are formed.

- Syntax: This field deals with the study of the structure of sentences and the rules governing the arrangement of words and phrases.

- Semantics: This field deals with the study of meaning in language and how words and sentences are used to convey meaning.

- Discourse Analysis: This field deals with the study of how language is used in a given context, including the relationship between speakers, the intended audience, and the purpose of communication.

- Text Mining: This field involves the use of statistical and machine learning techniques to analyze and extract information from large collections of text.

- Sentiment Analysis: This field involves the use of NLP techniques to automatically determine the sentiment or emotion expressed in a piece of text.

- Named Entity Recognition: This field involves the use of NLP techniques to automatically identify and classify named entities such as people, organizations, and locations in text.

- Machine Translation: This field involves the use of NLP techniques to automatically translate text from one language to another.

- Speech Recognition: This field involves the use of NLP techniques to automatically transcribe spoken language into text.

- Text-to-Speech: This field involves the use of NLP techniques to convert text into spoken language.

###### Note : 

- In Natural Language Processing (NLP), we often work with JSON data because JSON is a lightweight and flexible data format that is easy to read and write for both humans and machines. JSON stands for "JavaScript Object Notation" and it is a text-based format for representing data in the form of key-value pairs.

- JSON is widely used in web development and API (Application Programming Interface) design, and many NLP tools and platforms also use JSON to represent and exchange data. For example, in NLP, we may use JSON to represent text data, such as sentences or documents, along with metadata such as author, date, or source.

- JSON data can be easily parsed and manipulated by programming languages such as Python, making it a popular choice for NLP tasks that involve data preprocessing, analysis, or modeling. JSON also allows for hierarchical and nested data structures, which can be useful for representing complex linguistic data such as parse trees or dependency graphs.

### 1 - Importing libraries

- JSON: It is possible to utilize it to work with JSON data.

- String: Provides access to several potentially valuable constants.

- Random: For various distributions, this module implements pseudo-random number generators.

- WordNetLemmatizer: It can lemmatize. In other terms, Lemmatization is the process of reducing a word to its base or root form. WordNetLemmatizer is available through the NLTK (Natural Language Toolkit) library

- Tensorflow: A multidimensional array of elements is represented by this symbol.

- Sequential: Sequential groups a linear stack of layers into a tf.keras.Model.

In [None]:
import json
import string
import random
import nltk
import numpy as num
from nltk.stem import WordNetLemmatizer # It has the ability to lemmatize.
#import tensorflow as tensorF # A multidimensional array of elements is represented by this symbol.
#from tensorflow.keras import Sequential # Sequential groups a linear stack of layers into a tf.keras.Model
#from tensorflow.keras.layers import Dense, Dropout

nltk.download("punkt")# required package for tokenization
nltk.download("wordnet")# word database

##### Tokenization is the process of breaking down a text into smaller units, called tokens. In Natural Language Processing (NLP), tokenization is often the first step in processing text data. The resulting tokens can then be used for various NLP tasks such as text classification, sentiment analysis, and machine translation.

##### A token can be defined as a sequence of characters that represents a unit of meaning. The most common type of tokenization involves breaking down text into words, also known as word tokenization. However, other types of tokenization can involve breaking down text into individual characters, phrases, or sentences.

##### In word tokenization, the input text is typically split into words based on whitespace or punctuation marks. For example, the sentence "I love natural language processing!" could be tokenized into the following words: "I", "love", "natural", "language", "processing". Some tokenization algorithms may also take into account additional factors such as capitalization and context.

##### Tokenization can be performed using a variety of tools and libraries in NLP, such as NLTK (Natural Language Toolkit) and spaCy in Python. These tools provide pre-trained models and functions for performing tokenization, as well as options for customizing the tokenization process based on specific needs.

##### Tokenization is a fundamental step in many NLP tasks, as it enables the computer to process and analyze text data at a more granular level. By breaking down text into smaller units, tokenization helps to reduce the complexity of the data and makes it easier for NLP algorithms to extract meaningful information from the text.

In [None]:
#3 Loading the Dataset: intents.json

data_file = open('Data.json').read()
data = json.loads(data_file)

data

### Processing data

In [None]:
lm = WordNetLemmatizer() #reducing words to their base or dictionary form

ourClasses = []
newWords = []
documentX = []
documentY = []
# Each intent is tokenized into words and the patterns and their associated tags are added to their respective lists.
for intent in data["ourIntents"]:
    for pattern in intent["patterns"]:
        ournewTkns = nltk.word_tokenize(pattern)
        newWords.extend(ournewTkns)
        documentX.append(pattern)
        documentY.append(intent['tag'])
    if intent["tag"] not in ourClasses:
        ourClasses.append(intent["tag"])

newWords = [lm.lemmatize(word.lower()) for word in newWords if word not in string.punctuation]
newWords = sorted(set(newWords))
ourClasses = sorted(set(ourClasses))

This code seems to be preparing the data that will be used to train an NLP model to recognize intents and generate appropriate responses. The newWords list is likely to be used to create a vocabulary of all the unique words that appear in the training data, while the documentX and documentY lists are probably going to be used as the input and output data for the NLP model. The ourClasses list may be used to define the set of possible intents that the chatbot can understand.

### Designing a neural network model