### Types of Chatbots:

There are broadly two variants of chatbots: Rule-Based and Self-learning.

    1. In a Rule-based approach, a bot answers questions based on some rules, which it is trained on. The rules defined can be very simple to very complex. The bots can handle simple queries but fail to manage complex ones.
    
    2. Self-learning bots are the ones that use some Machine Learning-based approaches and are more efficient than rule-based bots. These bots can be of further two types: Retrieval Based or Generative.


    2.a) In retrieval-based models, a chatbot uses some heuristic to select a response from a library of predefined responses. The chatbot uses the message and context of the conversation for choosing the best response from a predefined list of bot messages. The context can include a current position in the dialogue tree, all previous messages in the conversation, previously saved variables (e.g., username). Heuristics for selecting a response can be engineered in many different ways, from rule-based if-else conditional logic to machine learning classifiers.

    2. b) Generative bots can generate the answers and not always replies with one of the answers from a set of answers. This makes them more intelligent as they take word by word from the query and generates the answers

### Examples

Retrieval-Based Chatbot Example:


Predefined Responses:

    "The library is open from 9 AM to 5 PM on weekdays."
    "You can borrow up to 5 books at a time."
    "To renew a book, please visit the library's website or contact the help desk."

Conversation:
User: "What are the library's opening hours?"
Bot: "The library is open from 9 AM to 5 PM on weekdays."

How it works:

    Message: "What are the library's opening hours?"
    Heuristic: The bot matches the user's message to the closest predefined response using keywords or patterns (e.g., "opening hours").
    Selected Response: "The library is open from 9 AM to 5 PM on weekdays."

Generative Chatbot Example:

Imagine a more advanced chatbot that can generate responses on the fly.

Conversation:
User: "What are the library's opening hours?"
Bot: "The library is open from 9 AM to 5 PM on weekdays, but it is closed on weekends."

How it works:

    Message: "What are the library's opening hours?"
    Generative Model: The bot processes the input using a neural network that has been trained on a large dataset of conversational text. It generates a response word by word.
    Generated Response: "The library is open from 9 AM to 5 PM on weekdays, but it is closed on weekends."

Key Differences:

    Retrieval-Based Bot:
        Response Source: Predefined responses.
        Selection Method: Heuristics like keyword matching or pattern recognition.
        Flexibility: Limited to the responses it has been given.

    Generative Bot:
        Response Source: Generates responses dynamically.
        Selection Method: Uses machine learning models to create a response based on the input.
        Flexibility: More adaptable and can handle a wider range of queries with nuanced answers

In [None]:
# pip install nltk

Text Pre- Processing with NLTK

The main issue with text data is that it is all in text format (strings). However, Machine learning algorithms need some sort of numerical feature vector to perform the task. So before we start with any NLP project, we need to pre-process it to make it ideal for work. Basic text pre-processing includes:

    Converting the entire text into uppercase or lowercase so that the algorithm does not treat the same words in different cases as different
### Tokenization
    Tokenization is just the term used to describe the process of converting the normal text strings into a list of tokens, i.e., words that we want. A sentence tokenizer can be used to find the list of sentences, and a Word tokenizer can be used to find the list of words in strings.

In [None]:
#lowercase
import nltk
nltk.download('punkt')  # Downloading the punkt tokenizer models

text = "Natural Language Processing with NLTK is Fun!"
text_lowercase = text.lower()
print(text_lowercase)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


natural language processing with nltk is fun!


### Tokenization:

Purpose: Breaking down the text into smaller pieces like sentences or words.

In [None]:
#sentence tokenization
from nltk.tokenize import sent_tokenize

text = "Hello World. Natural Language Processing with NLTK is Fun!"
sentences = sent_tokenize(text)
print(sentences)


['Hello World.', 'Natural Language Processing with NLTK is Fun!']


In [None]:
#word tokenization
from nltk.tokenize import word_tokenize

text = "Natural Language Processing with NLTK is Fun!"
words = word_tokenize(text)
print(words)


['Natural', 'Language', 'Processing', 'with', 'NLTK', 'is', 'Fun', '!']


## Term Frequency(TF) and Inverse Document Frequency (IDF):

Term Frequency (TF) and Inverse Document Frequency (IDF) are fundamental concepts in Natural Language Processing (NLP) used to measure the importance of a word in a document relative to a collection of documents (corpus).

a) Term Frequency (TF):
TF measures how frequently a term appears in a document. It is the ratio of the number of times a term appears in a document to the total number of terms in the document.

Formula:
TF(t,d)=Number of times term t appears in document d / Total number of terms in document d

Example: TF(t,d)= 3/100 =0.03

b) Inverse Document Frequency (IDF):
IDF measures how important a term is. While computing TF, all terms are considered equally important. However, certain terms like "is", "of", and "that" may appear frequently but have little importance. IDF weighs down the frequent terms while scaling up the rare ones.

Formula:
IDF(t,D)=log⁡(Total number of documents (N) / Number of documents with term t)

Example: IDF(t,D)=log(1000/10)=log(100)≈2

c) TF-IDF:
TF-IDF is a numerical statistic that is intended to reflect how important a word is to a document in a corpus.

Formula:
TF-IDF(t,d,D)= TF(t,d) × IDF(t,D)

Example: TF-IDF(t,d,D)=0.03×2=0.06

In [None]:
import nltk
import math
from collections import Counter

# Download necessary resources
nltk.download('punkt')

# Example documents
documents = [
    "Natural, Language, Processing with NLTK is fun.",
    "Natural Language Processing and machine learning are closely related.",
    "Text processing with NLTK and Python is powerful."
]

# Step 1: Convert to lowercase and tokenize the text
tokenized_documents = [nltk.word_tokenize(doc.lower()) for doc in documents]

# Step 2: Calculate Term Frequency (TF)
def compute_tf(word_dict, doc):
    tf_dict = {}
    doc_count = len(doc)
    for word, count in word_dict.items():
        tf_dict[word] = count / float(doc_count)
    return tf_dict

# Compute TF for each document
tf_documents = []
for doc in tokenized_documents:
    word_counts = Counter(doc)
    tf_documents.append(compute_tf(word_counts, doc))

# Step 3: Calculate Inverse Document Frequency (IDF)
def compute_idf(documents):
    N = len(documents)
    unique_words = set(word for doc in documents for word in doc)
    idf_dict = dict.fromkeys(unique_words, 0)
    # idf_dict = dict.fromkeys(documents[0], 0)
    for doc in documents:
        for word in set(doc):
            idf_dict[word] += 1
    for word, val in idf_dict.items():
        idf_dict[word] = math.log(N / float(val))
    return idf_dict

# Compute IDF
idf_dict = compute_idf(tokenized_documents)

# Step 4: Calculate TF-IDF
def compute_tfidf(tf_doc, idf_dict):
    tfidf_dict = {}
    for word, tf_val in tf_doc.items():
        tfidf_dict[word] = tf_val * idf_dict[word]
    return tfidf_dict

# Compute TF-IDF for each document
tfidf_documents = [compute_tfidf(tf_doc, idf_dict) for tf_doc in tf_documents]

# Print results
for i, doc in enumerate(tfidf_documents):
    print(f"\nDocument {i+1} TF-IDF scores:")
    for word, score in doc.items():
        print(f"{word}: {score:.4f}")



Document 1 TF-IDF scores:
natural: 0.0405
,: 0.2197
language: 0.0405
processing: 0.0000
with: 0.0405
nltk: 0.0405
is: 0.0405
fun: 0.1099
.: 0.0000

Document 2 TF-IDF scores:
natural: 0.0405
language: 0.0405
processing: 0.0000
and: 0.0405
machine: 0.1099
learning: 0.1099
are: 0.1099
closely: 0.1099
related: 0.1099
.: 0.0000

Document 3 TF-IDF scores:
text: 0.1221
processing: 0.0000
with: 0.0451
nltk: 0.0451
and: 0.0451
python: 0.1221
is: 0.0451
powerful: 0.1221
.: 0.0000


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
# pip install torch

In [None]:
import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using Device: {device}")

    device_no = torch.cuda.current_device()
    print(f"Current device number is: {device_no}")

    device_name = torch.cuda.get_device_name(device_no)
    print(f"GPU name is: {device_name}")
else:
    print("CUDA is not available")

Using Device: cuda
Current device number is: 0
GPU name is: Tesla T4


In [None]:
import torch #an open source ML library used for creating deep neural networks
import torch.nn as nn # A module in PyTorch that provides classes and functions to build neural networks
import torch.optim as optim # A module in PyTorch that provides various optimization algorithms for training neural networks
import random # A module that implements pseudo-random number generators for various distributions
import re # A module for working with regular expressions to match and manipulate strings
import numpy as np

# Sample dataset
data = [
    ("hello", "hi"),
    ("how are you", "I'm good, how about you?"),
    ("what is your name", "I'm a chatbot"),
    ("bye", "goodbye"),
]

# Preprocessing
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text) # removes all characters from the input text that are not word characters or whitespace
    return text

# Vocabulary
all_words = []
for (pattern, response) in data:
    pattern = preprocess(pattern)
    response = preprocess(response)
    words = pattern.split() + response.split() #splits it into a list of words, using whitespace as the delimiter: "hello Ameer" to ["hello","Aeer"]
    all_words.extend(words) #appends the elements of the list words to the end of the list all_words
all_words = sorted(set(all_words))


# Word to index mapping
word_to_idx = {word: idx for idx, word in enumerate(all_words)} #enumerate iterates on pairs of (index, value) tuples.

# Encode patterns and responses
def encode(text):
    text = preprocess(text)
    return [word_to_idx[word] for word in text.split() if word in word_to_idx]

encoded_data = [(encode(pattern), encode(response)) for (pattern, response) in data]

# Pad sequences: ad_sequence function is used to ensure that all sequences in a dataset have the same length
def pad_sequence(seq, max_len, padding_value=0): #seq is input sequence that needs to be padded, max_len is desired length for all sequences, padding_val is alue used to fill the sequence to reach the maximum length
    return seq + [padding_value] * (max_len - len(seq)) #If the input sequence is shorter than the max_len, it appends padding_value to the end of the sequence until it reaches the desired length.

# Determine the maximum length of patterns and responses
max_pattern_len = max(len(pattern) for pattern, response in encoded_data)
max_response_len = max(len(response) for pattern, response in encoded_data)
max_len = max(max_pattern_len, max_response_len)

# Pad patterns and responses
padded_patterns = [pad_sequence(pattern, max_pattern_len) for pattern, response in encoded_data]
padded_responses = [pad_sequence(response, max_response_len) for pattern, response in encoded_data]

# Convert to tensors

# this two lines is giving errors:
# patterns = torch.tensor([pattern for pattern, response in encoded_data], dtype=torch.long)
# responses = torch.tensor([response for pattern, response in encoded_data], dtype=torch.long)

#this is corrected two lines:
# patterns = torch.tensor(padded_patterns, dtype=torch.long)
# responses = torch.tensor(padded_responses, dtype=torch.long)

# Tensors

### A tensor is a multi-dimensional array, a generalization of vectors (1D tensors) and matrices (2D tensors) to potentially higher dimensions.

--Tensors are a fundamental data structure in PyTorch and other deep learning frameworks because they allow for efficient computation and automatic differentiation.

--Scalar: A single number (0-dimensional tensor.
Vector: A list of numbers (1-dimensional tensor).
Matrix: A table of numbers (2-dimensional tensor).
Tensor: A multi-dimensional array of numbers (3 or more dimensions).

--Data Representation: Tensors are used to represent various types of data:
Images: 3D tensors (height, width, color channels)
Text: 2D tensors (sequence length, embedding dimension)
Audio: 3D tensors (time, frequency, channel)


In [None]:
# Convert to tensors: preparing it for training for efficient computations, leverages GPU acceleration, and integrates seamlessly with the PyTorch ecosystem.
patterns = torch.tensor(padded_patterns, dtype=torch.long)
responses = torch.tensor(padded_responses, dtype=torch.long)

# patterns = torch.tensor([pattern for pattern in padded_patterns], dtype=torch.long)
# responses = torch.tensor([response for response in padded_responses], dtype=torch.long)


In [None]:
patterns # gives 2D tensor intergers with 4 rows and 4 columns, making it a 4x4 matrix.

tensor([[ 7,  0,  0,  0],
        [ 9,  2, 14,  0],
        [13, 11, 15, 12],
        [ 3,  0,  0,  0]])

In [None]:
class ChatbotModel(nn.Module): #defines a basic chatbot model architecture using PyTorch
    def __init__(self, vocab_size, embed_size, hidden_size, output_size): #Initializes the model with hyperparameters
        super(ChatbotModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size) #An embedding layer to convert word indices to dense vectors. word indices are numerical representations of words in a vocabulary. vocabulary:["AMeer", "Rai"], and word indices: "Ameer":0, "rai":1
        self.fc1 = nn.Linear(embed_size * max_len, hidden_size) #The first fully connected layer.
        self.fc2 = nn.Linear(hidden_size, output_size) #he second fully connected layer, output layer.

    def forward(self, x): #efines the forward pass of the mode, x: Input tensor of word indices.,
        x = self.embedding(x) #Embeds/convers the input words indices "x" into dense embedding vector. Embeddings are dense vector representations of words. They capture semantic and syntactic information about word for mathematical operations. word "Ameer" might have an embedding like [0.23, -0.15, 0.42, ...]
        x = x.view(x.size(0), -1) #Reshapes the tensor into a 2D tensor.
        x = torch.relu(self.fc1(x)) #Applies a ReLU activation function to the output of the first fully connected layer.
        x = self.fc2(x) #Passes the output through the second fully connected laye
        return x

vocab_size = 15 #Total number of unique words in the vocabulary. Each unique word is assigned a unique integer index
embed_size = 8 #Dimensionality of word embeddings.
hidden_size = 8 #number of neurons in the hidden laye
output_size = max_response_len #Size of the output layer (likely the maximum length of a response

model = ChatbotModel(vocab_size, embed_size, hidden_size, output_size)  #simple feed-forward neural network for generating chatbot responses
model


ChatbotModel(
  (embedding): Embedding(15, 8)
  (fc1): Linear(in_features=40, out_features=8, bias=True)
  (fc2): Linear(in_features=8, out_features=5, bias=True)
)

ChatbotModel(
  (embedding): Embedding(15, 8)
  (fc1): Linear(in_features=40, out_features=8, bias=True)
  (fc2): Linear(in_features=8, out_features=5, bias=True)
)

### Embedding(15, 8):
This indicates the embedding layer has a vocabulary size of 15 words (hence 15 rows) and each word is represented by a 8-dimensional vector (hence 8 columns).

### Linear(in_features=40, out_features=8, bias=True):
 This is the first fully connected (FC) layer. It takes a 40-dimensional input (likely the flattened embedding output) and produces an 8-dimensional output. The bias=True indicates that the layer uses a bias term.

##Linear(in_features=8, out_features=5, bias=True):
This is the second FC layer. It takes an 8-dimensional input (output from the first FC layer) and produces a 5-dimensional output. Again, it uses a bias term.

#Implications(suggesstion or analysis):
--Small Vocabulary: With a vocabulary size of 15, the chatbot might have limited conversational capabilities.

--Shallow Architecture: The model consists of only two linear layers, which might limit its complexity and performance.

--Fixed Output Size: The output size of 5 suggests a fixed response length of 5 words.