## Code Explanation

1. **Importing necessary libraries:**
   - The code starts by importing the required libraries/modules such as `io`, `random`, `string`, `warnings`, `numpy`, `nltk`, and specific modules from `sklearn`.

In [2]:
# Import necessary libraries
import io
import random
import string  # To process standard python strings
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

2. **Downloading NLTK packages:**
   - The code downloads popular NLTK packages needed for text processing. It checks for the presence of the packages and downloads them if necessary.

In [3]:
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True)  # For downloading packages

# Uncomment the following only the first time
#nltk.download('punkt')  # First-time use only
#nltk.download('wordnet')  # First-time use only

True

3. **Reading the corpus:**
   - The code opens a file named "chatbot.txt" and reads its content into the `raw` variable. The corpus is converted to lowercase.

In [4]:
# Reading in the corpus
with open('C:\\Users\\Larissa Lorenzi\\Documents\\chatbot.txt','r', encoding='utf8', errors ='ignore') as fin:
    raw = fin.read()

4. **Tokenization:**
   - The corpus is tokenized into sentences using `nltk.sent_tokenize()`, and the result is stored in the `sent_tokens` list.
   - The corpus is also tokenized into words using `nltk.word_tokenize()`, and the result is stored in the `word_tokens` list.

In [5]:
# Tokenization
sent_tokens = nltk.sent_tokenize(raw)  # Converts the corpus into a list of sentences
word_tokens = nltk.word_tokenize(raw)  # Converts the corpus into a list of words

5. **Preprocessing:**
   - The code defines a function called `LemTokens()` that lemmatizes the given tokens using `WordNetLemmatizer`.
   - A dictionary `remove_punct_dict` is created to remove punctuation from text.
   - Another function called `LemNormalize()` is defined, which normalizes the text by converting it to lowercase, removing punctuation, and lemmatizing the tokens.

In [6]:
# Preprocessing
lemmer = WordNetLemmatizer()

def LemTokens(tokens):
    '''Lemmatizes tokens'''
    return [lemmer.lemmatize(token) for token in tokens]

remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

def LemNormalize(text):
    '''Normalizes and lemmatizes text'''
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

6. **Keyword Matching:**
   - The code defines two tuples: `GREETING_INPUTS` containing various greeting phrases and `GREETING_RESPONSES` containing corresponding responses.
   - The function `greeting()` checks if the user's input matches any greeting phrase and returns a randomly selected response from `GREETING_RESPONSES`.

In [7]:
# Keyword Matching
GREETING_INPUTS = ('hello', 'hi', 'greetings', 'sup', "what's up", 'hey')
GREETING_RESPONSES = ['Hi!', 'Hey!', '*nods*', 'Hi, there!', 'Hello!', 'I am glad! You are talking to me.']

def greeting(sentence):
    """If the user's input is a greeting, return a greeting response."""
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

7. **Generating response:**
   - The function `response()` generates a response to the user's input based on the TF-IDF (Term Frequency-Inverse Document Frequency) approach.
   - The user's input is appended to `sent_tokens` to include it in the corpus for generating appropriate responses.
   - TF-IDF vectorization is applied using `TfidfVectorizer` to convert the corpus into a numerical representation.
   - Cosine similarity is calculated between the user's input (vectorized) and all other sentences in the corpus using `cosine_similarity`.
   - The most similar sentence is identified based on the cosine similarity scores, and its index is stored in `idx`.
   - The response is constructed by retrieving the most similar sentence from `sent_tokens`.

In [8]:
# Generating response
def response(user_response):
    """Generates a response to the user's input"""
    robot_response = ''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)  # Applies TF-IDF vectorization to the sentences
    vals = cosine_similarity(tfidf[-1], tfidf)  # Computes cosine similarity between user's input and corpus sentences
    idx = vals.argsort()[0][-2]  # Retrieves the index of the most similar sentence
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]  # Retrieves the cosine similarity score of the most similar sentence
    if req_tfidf == 0:
        robot_response = robot_response + "I am sorry! I don't understand you"
        return robot_response
    else:
        robot_response = robot_response + sent_tokens[idx]  # Retrieves the most similar sentence as the response
        return robot_response

8. **Main conversation loop:**
   - A flag `flag` is set to `True` to indicate the start of the conversation.
   - The program displays an introductory message.
   - Inside a while loop, it prompts the user for input using `input()`.
   - The user's input is converted to lowercase.
   - If the user doesn't input "bye," it checks for expressions of gratitude. If found, it exits the loop with a response.
   - If the user's input is a greeting, it responds with a randomly selected greeting from `GREETING_RESPONSES`.
   - Otherwise, it generates a response using the `response()` function and removes the user's input from `sent_tokens`.
   - If the user inputs "bye," it exits the loop and says goodbye.

In [9]:
flag = True
print('ROBOT: My name is Robot. I will answer your queries about Chatbots. If you want to exit, type Bye!')
while flag:
    user_response = input()
    user_response = user_response.lower()
    if user_response != 'bye':
        if user_response == 'thanks' or user_response == 'thank you':
            flag = False
            print('ROBOT: You are welcome.')
        else:
            if greeting(user_response) is not None:
                print('ROBOT: ' + greeting(user_response))  # Responds with a greeting if the user's input is a greeting
            else:
                print('ROBOT: ', end="")
                print(response(user_response))  # Generates and prints a response based on user's input
                sent_tokens.remove(user_response)  # Removes the user's input from the list of sentences for future comparisons
    else:
        flag = False
        print('ROBOT: Bye! Take care...')  # Says goodbye and exits the conversation loop

ROBOT: My name is Robot. I will answer your queries about Chatbots. If you want to exit, type Bye!
ROBOT: Hi!
ROBOT: In order to speed up this process, designers can use dedicated chatbot design tools, that allow for immediate preview, team collaboration and video export.An important part of the chatbot design is also centered around user testing.
ROBOT: Design
The chatbot design is the process that defines the interaction between the user and the chatbot.The chatbot designer will define the chatbot personality, the questions that will be asked to the users, and the overall interaction.It can be viewed as a subset of the conversational design.
ROBOT: Bye! Take care...
