### Types of Chatbots:

There are broadly two variants of chatbots: Rule-Based and Self-learning.

    1. In a Rule-based approach, a bot answers questions based on some rules, which it is trained on. The rules defined can be very simple to very complex. The bots can handle simple queries but fail to manage complex ones.
    
    2. Self-learning bots are the ones that use some Machine Learning-based approaches and are more efficient than rule-based bots. These bots can be of further two types: Retrieval Based or Generative.


    2.a) In retrieval-based models, a chatbot uses some heuristic to select a response from a library of predefined responses. The chatbot uses the message and context of the conversation for choosing the best response from a predefined list of bot messages. The context can include a current position in the dialogue tree, all previous messages in the conversation, previously saved variables (e.g., username). Heuristics for selecting a response can be engineered in many different ways, from rule-based if-else conditional logic to machine learning classifiers.

    2. b) Generative bots can generate the answers and not always replies with one of the answers from a set of answers. This makes them more intelligent as they take word by word from the query and generates the answers

### Examples

Retrieval-Based Chatbot Example:


Predefined Responses:

    "The library is open from 9 AM to 5 PM on weekdays."
    "You can borrow up to 5 books at a time."
    "To renew a book, please visit the library's website or contact the help desk."

Conversation:
User: "What are the library's opening hours?"
Bot: "The library is open from 9 AM to 5 PM on weekdays."

How it works:

    Message: "What are the library's opening hours?"
    Heuristic: The bot matches the user's message to the closest predefined response using keywords or patterns (e.g., "opening hours").
    Selected Response: "The library is open from 9 AM to 5 PM on weekdays."

Generative Chatbot Example:

Imagine a more advanced chatbot that can generate responses on the fly.

Conversation:
User: "What are the library's opening hours?"
Bot: "The library is open from 9 AM to 5 PM on weekdays, but it is closed on weekends."

How it works:

    Message: "What are the library's opening hours?"
    Generative Model: The bot processes the input using a neural network that has been trained on a large dataset of conversational text. It generates a response word by word.
    Generated Response: "The library is open from 9 AM to 5 PM on weekdays, but it is closed on weekends."

Key Differences:

    Retrieval-Based Bot:
        Response Source: Predefined responses.
        Selection Method: Heuristics like keyword matching or pattern recognition.
        Flexibility: Limited to the responses it has been given.

    Generative Bot:
        Response Source: Generates responses dynamically.
        Selection Method: Uses machine learning models to create a response based on the input.
        Flexibility: More adaptable and can handle a wider range of queries with nuanced answers

In [None]:
# pip install nltk

Text Pre- Processing with NLTK

The main issue with text data is that it is all in text format (strings). However, Machine learning algorithms need some sort of numerical feature vector to perform the task. So before we start with any NLP project, we need to pre-process it to make it ideal for work. Basic text pre-processing includes:

    Converting the entire text into uppercase or lowercase so that the algorithm does not treat the same words in different cases as different
### Tokenization
    Tokenization is just the term used to describe the process of converting the normal text strings into a list of tokens, i.e., words that we want. A sentence tokenizer can be used to find the list of sentences, and a Word tokenizer can be used to find the list of words in strings.

In [None]:
#lowercase
import nltk
nltk.download('punkt')  # Downloading the punkt tokenizer models

text = "Natural Language Processing with NLTK is Fun!"
text_lowercase = text.lower()
print(text_lowercase)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


natural language processing with nltk is fun!


### Tokenization:

Purpose: Breaking down the text into smaller pieces like sentences or words.

In [None]:
#sentence tokenization
from nltk.tokenize import sent_tokenize

text = "Hello World. Natural Language Processing with NLTK is Fun!"
sentences = sent_tokenize(text)
print(sentences)


['Hello World.', 'Natural Language Processing with NLTK is Fun!']


In [None]:
#word tokenization
from nltk.tokenize import word_tokenize

text = "Natural Language Processing with NLTK is Fun!"
words = word_tokenize(text)
print(words)


['Natural', 'Language', 'Processing', 'with', 'NLTK', 'is', 'Fun', '!']


## Term Frequency(TF) and Inverse Document Frequency (IDF):

Term Frequency (TF) and Inverse Document Frequency (IDF) are fundamental concepts in Natural Language Processing (NLP) used to measure the importance of a word in a document relative to a collection of documents (corpus).

a) Term Frequency (TF):
TF measures how frequently a term appears in a document. It is the ratio of the number of times a term appears in a document to the total number of terms in the document.

Formula:
TF(t,d)=Number of times term t appears in document d / Total number of terms in document d

Example: TF(t,d)= 3/100 =0.03

b) Inverse Document Frequency (IDF):
IDF measures how important a term is. While computing TF, all terms are considered equally important. However, certain terms like "is", "of", and "that" may appear frequently but have little importance. IDF weighs down the frequent terms while scaling up the rare ones.

Formula:
IDF(t,D)=log⁡(Total number of documents (N) / Number of documents with term t)

Example: IDF(t,D)=log(1000/10)=log(100)≈2

c) TF-IDF:
TF-IDF is a numerical statistic that is intended to reflect how important a word is to a document in a corpus.

Formula:
TF-IDF(t,d,D)= TF(t,d) × IDF(t,D)

Example: TF-IDF(t,d,D)=0.03×2=0.06

In [None]:
import nltk
import math
from collections import Counter

# Download necessary resources
nltk.download('punkt')

# Example documents
documents = [
    "Natural, Language, Processing with NLTK is fun.",
    "Natural Language Processing and machine learning are closely related.",
    "Text processing with NLTK and Python is powerful."
]

# Step 1: Convert to lowercase and tokenize the text
tokenized_documents = [nltk.word_tokenize(doc.lower()) for doc in documents]

# Step 2: Calculate Term Frequency (TF)
def compute_tf(word_dict, doc):
    tf_dict = {}
    doc_count = len(doc)
    for word, count in word_dict.items():
        tf_dict[word] = count / float(doc_count)
    return tf_dict

# Compute TF for each document
tf_documents = []
for doc in tokenized_documents:
    word_counts = Counter(doc)
    tf_documents.append(compute_tf(word_counts, doc))

# Step 3: Calculate Inverse Document Frequency (IDF)
def compute_idf(documents):
    N = len(documents)
    unique_words = set(word for doc in documents for word in doc)
    idf_dict = dict.fromkeys(unique_words, 0)
    # idf_dict = dict.fromkeys(documents[0], 0)
    for doc in documents:
        for word in set(doc):
            idf_dict[word] += 1
    for word, val in idf_dict.items():
        idf_dict[word] = math.log(N / float(val))
    return idf_dict

# Compute IDF
idf_dict = compute_idf(tokenized_documents)

# Step 4: Calculate TF-IDF
def compute_tfidf(tf_doc, idf_dict):
    tfidf_dict = {}
    for word, tf_val in tf_doc.items():
        tfidf_dict[word] = tf_val * idf_dict[word]
    return tfidf_dict

# Compute TF-IDF for each document
tfidf_documents = [compute_tfidf(tf_doc, idf_dict) for tf_doc in tf_documents]

# Print results
for i, doc in enumerate(tfidf_documents):
    print(f"\nDocument {i+1} TF-IDF scores:")
    for word, score in doc.items():
        print(f"{word}: {score:.4f}")



Document 1 TF-IDF scores:
natural: 0.0405
,: 0.2197
language: 0.0405
processing: 0.0000
with: 0.0405
nltk: 0.0405
is: 0.0405
fun: 0.1099
.: 0.0000

Document 2 TF-IDF scores:
natural: 0.0405
language: 0.0405
processing: 0.0000
and: 0.0405
machine: 0.1099
learning: 0.1099
are: 0.1099
closely: 0.1099
related: 0.1099
.: 0.0000

Document 3 TF-IDF scores:
text: 0.1221
processing: 0.0000
with: 0.0451
nltk: 0.0451
and: 0.0451
python: 0.1221
is: 0.0451
powerful: 0.1221
.: 0.0000


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
# pip install torch

In [None]:
import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using Device: {device}")

    device_no = torch.cuda.current_device()
    print(f"Current device number is: {device_no}")

    device_name = torch.cuda.get_device_name(device_no)
    print(f"GPU name is: {device_name}")
else:
    print("CUDA is not available")

Using Device: cuda
Current device number is: 0
GPU name is: Tesla T4


In [None]:
import torch #an open source ML library used for creating deep neural networks
import torch.nn as nn # A module in PyTorch that provides classes and functions to build neural networks
import torch.optim as optim # A module in PyTorch that provides various optimization algorithms for training neural networks
import random # A module that implements pseudo-random number generators for various distributions
import re # A module for working with regular expressions to match and manipulate strings
import numpy as np

# Sample dataset
data = [
    ("hello", "hi"),
    ("how are you", "I'm good, how about you?"),
    ("what is your name", "I'm a chatbot"),
    ("bye", "goodbye"),
]

# Preprocessing
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text) # removes all characters from the input text that are not word characters or whitespace
    return text

# Vocabulary
all_words = []
for (pattern, response) in data:
    pattern = preprocess(pattern)
    response = preprocess(response)
    words = pattern.split() + response.split() #splits it into a list of words, using whitespace as the delimiter: "hello Ameer" to ["hello","Aeer"]
    all_words.extend(words) #appends the elements of the list words to the end of the list all_words
all_words = sorted(set(all_words))


# Word to index mapping
word_to_idx = {word: idx for idx, word in enumerate(all_words)} #enumerate iterates on pairs of (index, value) tuples.

# Encode patterns and responses
def encode(text):
    text = preprocess(text)
    return [word_to_idx[word] for word in text.split() if word in word_to_idx]

encoded_data = [(encode(pattern), encode(response)) for (pattern, response) in data]

# Pad sequences: ad_sequence function is used to ensure that all sequences in a dataset have the same length
def pad_sequence(seq, max_len, padding_value=0): #seq is input sequence that needs to be padded, max_len is desired length for all sequences, padding_val is alue used to fill the sequence to reach the maximum length
    return seq + [padding_value] * (max_len - len(seq)) #If the input sequence is shorter than the max_len, it appends padding_value to the end of the sequence until it reaches the desired length.

# Determine the maximum length of patterns and responses
max_pattern_len = max(len(pattern) for pattern, response in encoded_data)
max_response_len = max(len(response) for pattern, response in encoded_data)
max_len = max(max_pattern_len, max_response_len)

# Pad patterns and responses
# padded_patterns = [pad_sequence(pattern, max_pattern_len) for pattern, response in encoded_data]
# padded_responses = [pad_sequence(response, max_response_len) for pattern, response in encoded_data]

padded_patterns = [pad_sequence(pattern, max_len) for pattern, response in encoded_data]
padded_responses = [pad_sequence(response, max_len) for pattern, response in encoded_data]

print(f"Max Pattern Length: {max_pattern_len}")
print(f"Max Response Length: {max_response_len}")
# print(f"Max Length: {max_len}")
print(f"Padded Pattern: {padded_patterns}")
print(f"Padded Response: {padded_responses}")

# Additional debugging:
print("Example padded pattern:", padded_patterns[0])
print("Example padded response:", padded_responses[0])

# Convert to tensors

# this two lines is giving errors: but solved later
# patterns = torch.tensor([pattern for pattern, response in encoded_data], dtype=torch.long)
# responses = torch.tensor([response for pattern, response in encoded_data], dtype=torch.long)

#this is corrected two lines:
# patterns = torch.tensor(padded_patterns, dtype=torch.long)
# responses = torch.tensor(padded_responses, dtype=torch.long)

Max Pattern Length: 4
Max Response Length: 5
Padded Pattern: [[7, 0, 0, 0, 0], [9, 2, 14, 0, 0], [13, 11, 15, 12, 0], [3, 0, 0, 0, 0]]
Padded Response: [[8, 0, 0, 0, 0], [10, 5, 9, 1, 14], [10, 0, 4, 0, 0], [6, 0, 0, 0, 0]]
Example padded pattern: [7, 0, 0, 0, 0]
Example padded response: [8, 0, 0, 0, 0]


Max Pattern Length: 4
Max Response Length: 5
Max Length: 5
Padded Pattern: [[7, 0, 0, 0, 0], [9, 2, 14, 0, 0], [13, 11, 15, 12, 0], [3, 0, 0, 0, 0]]
Padded Response: [[8, 0, 0, 0, 0], [10, 5, 9, 1, 14], [10, 0, 4, 0, 0], [6, 0, 0, 0, 0]]
Example padded pattern: [7, 0, 0, 0, 0]
Example padded response: [8, 0, 0, 0, 0]


# Tensors

### A tensor is a multi-dimensional array, a generalization of vectors (1D tensors) and matrices (2D tensors) to potentially higher dimensions.

--Tensors are a fundamental data structure in PyTorch and other deep learning frameworks because they allow for efficient computation and automatic differentiation.

--Scalar: A single number (0-dimensional tensor.
Vector: A list of numbers (1-dimensional tensor).
Matrix: A table of numbers (2-dimensional tensor).
Tensor: A multi-dimensional array of numbers (3 or more dimensions).

--Data Representation: Tensors are used to represent various types of data:
Images: 3D tensors (height, width, color channels)
Text: 2D tensors (sequence length, embedding dimension)
Audio: 3D tensors (time, frequency, channel)


In [None]:
# Convert to tensors: preparing it for training for efficient computations, leverages GPU acceleration, and integrates seamlessly with the PyTorch ecosystem.
patterns = torch.tensor(padded_patterns, dtype=torch.long)
responses = torch.tensor(padded_responses, dtype=torch.long)

# patterns = torch.tensor([pattern for pattern in padded_patterns], dtype=torch.long)
# responses = torch.tensor([response for response in padded_responses], dtype=torch.long)


In [None]:
patterns # gives 2D tensor intergers with 4 rows and 4 columns, making it a 4x4 matrix.
# responses

tensor([[ 7,  0,  0,  0,  0],
        [ 9,  2, 14,  0,  0],
        [13, 11, 15, 12,  0],
        [ 3,  0,  0,  0,  0]])

In [None]:
class ChatbotModel(nn.Module): #defines a basic chatbot model architecture using PyTorch
    def __init__(self, vocab_size, embed_size, hidden_size, output_size,max_len): #Initializes the model with hyperparameters
        super(ChatbotModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size) #An embedding layer to convert word indices to dense vectors. word indices are numerical representations of words in a vocabulary. vocabulary:["AMeer", "Rai"], and word indices: "Ameer":0, "rai":1
        # self.fc1 = nn.Linear(embed_size * max_len, hidden_size) #The first fully connected layer.
        # self.fc2 = nn.Linear(hidden_size, output_size) #he second fully connected layer, output layer.
        self.lstm = nn.LSTM(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size * max_len, output_size * max_len)

    # def forward(self, x): #efines the forward pass of the mode, x: Input tensor of word indices.,
    #     x = self.embedding(x) #Embeds/convers the input words indices "x" into dense embedding vector. Embeddings are dense vector representations of words. They capture semantic and syntactic information about word for mathematical operations. word "Ameer" might have an embedding like [0.23, -0.15, 0.42, ...]
    #     x = x.view(x.size(0), -1) #Reshapes the tensor into a 2D tensor.
    #     x = torch.relu(self.fc1(x)) #Applies a ReLU activation function to the output of the first fully connected layer.
    #     x = self.fc2(x) #Passes the output through the second fully connected laye
    #     return x

    def forward(self, x):
        x = self.embedding(x)
        x, _ = self.lstm(x)
        x = x.contiguous().view(x.size(0), -1)  # Flatten the output for the linear layer
        x = self.fc(x)
        return x.view(x.size(0), max_len, -1)  # Reshape to (batch_size, max_len, output_size)

# Initialize model
vocab_size = len(all_words)
# vocab_size = len(word_to_idx)  #Total number of unique words in the vocabulary. Each unique word is assigned a unique integer index
embed_size = 10 #Dimensionality of word embeddings.
hidden_size = 20 #number of neurons in the hidden laye
# output_size = max_response_len #Size of the output layer (likely the maximum length of a response
output_size = vocab_size  # Output size should match the vocabulary size


max_len = max(max_pattern_len, max_response_len)
# max_len = max(max(len(p) for p in padded_patterns), max(len(r) for r in padded_responses))
model = ChatbotModel(vocab_size, embed_size, hidden_size, output_size,max_len)  #simple feed-forward neural network for generating chatbot responses
print(model)

print(f"Vocabulary Size: {vocab_size}")
print(f"Output Size: {output_size}")
print(f"Word to Index Mapping: {word_to_idx}")
print(f"Patterns Tensor: {patterns}")
print(f"Max Length: {max_len}")

for pattern in patterns:
    for idx in pattern:
        if idx >= vocab_size:
            print(f"Invalid Index: {idx}")


ChatbotModel(
  (embedding): Embedding(16, 10)
  (lstm): LSTM(10, 20, batch_first=True)
  (fc): Linear(in_features=100, out_features=80, bias=True)
)
Vocabulary Size: 16
Output Size: 16
Word to Index Mapping: {'a': 0, 'about': 1, 'are': 2, 'bye': 3, 'chatbot': 4, 'good': 5, 'goodbye': 6, 'hello': 7, 'hi': 8, 'how': 9, 'im': 10, 'is': 11, 'name': 12, 'what': 13, 'you': 14, 'your': 15}
Patterns Tensor: tensor([[ 7,  0,  0,  0,  0],
        [ 9,  2, 14,  0,  0],
        [13, 11, 15, 12,  0],
        [ 3,  0,  0,  0,  0]])
Max Length: 5


ChatbotModel(
  (embedding): Embedding(15, 8)
  (fc1): Linear(in_features=40, out_features=8, bias=True)
  (fc2): Linear(in_features=8, out_features=5, bias=True)
)

## Embedding(15, 8):
This indicates the embedding layer has a vocabulary size of 15 words (hence 15 rows) and each word is represented by a 8-dimensional vector (hence 8 columns).

## Linear(in_features=40, out_features=8, bias=True):
 This is the first fully connected (FC) layer. It takes a 40-dimensional input (likely the flattened embedding output) and produces an 8-dimensional output. The bias=True indicates that the layer uses a bias term.

##Linear(in_features=8, out_features=5, bias=True):
This is the second FC layer. It takes an 8-dimensional input (output from the first FC layer) and produces a 5-dimensional output. Again, it uses a bias term.

#Implications(suggesstion or analysis):
--Small Vocabulary: With a vocabulary size of 15, the chatbot might have limited conversational capabilities.

--Shallow Architecture: The model consists of only two linear layers, which might limit its complexity and performance.

--Fixed Output Size: The output size of 5 suggests a fixed response length of 5 words.

#### Neural network model

#### Neural network model designed to map input sequences (patterns) to output sequences (responses). It consists of an embedding layer followed by two fully connected (linear) layers
--Imagine you want to teach a computer to have a conversation. This model is like a basic language learner

--Components of the Model

    Embedding Layer (nn.Embedding):
        This layer converts input word indices into dense vectors of fixed size (embed_size).
        Input: (batch_size, sequence_length) #nput shape represents the raw text data, organized into batches and sequences.
        Output: (batch_size, sequence_length, embed_size) #utput shape represents the transformed text data, where each word is converted into a fixed-size vector (embedding), maintaining the original batch and sequence structure.

        --batch_size: number of samples or sequences processed together in a single pass. For example, if you have 32 sentences, and you process them in batches of 4, the batch_size would be 4.
        --sequence_length: This is the length of each individual sequence (or sentence) in the batch. For instance, if the longest sentence in your batch has 10 words, the sequence_length would be 10.

        --input(4,10): If you have a batch of 4 sentences, each with a maximum of 10 words, the input shape would be (4, 10).

        --embed_size: This is the dimensionality of the embedding vector produced for each word in the sequence. It's the size of the dense vector representing a word.

        --output(4,10,64): if each word is represented by a 64-dimensional embedding, the output shape would be (4, 10, 64).

       --This is like teaching the computer what words mean. It turns words into numbers that the computer can understand. For example, the word "hello" becomes a list of numbers.

    First Fully Connected Layer (nn.Linear):
        This layer transforms the concatenated embeddings into a hidden representation.
        Input: (batch_size, embed_size * sequence_length)
        Output: (batch_size, hidden_size)

        --This part helps the computer understand the meaning of a whole sentence by combining the meanings of individual words. It's like understanding the difference between "i know you" and "you know me".

    Second Fully Connected Layer (nn.Linear):
        This layer maps the hidden representation to the output size.
        Input: (batch_size, hidden_size)
        Output: (batch_size, output_size)

        --This part decides what to say back. It takes the understanding of the sentence and turns it into a new sentence.

## Forward Pass


### Embedding
x = self.embedding(x) #
Converts input indices to dense vectors.
Shape: (batch_size, sequence_length, embed_size)

### Flattening:
x = x.view(x.size(0), -1) #
Reshapes the tensor by flattening the last two dimensions.
Shape: (batch_size, sequence_length * embed_size)

### First Linear Layer with ReLU Activation:  
x = torch.relu(self.fc1(x)) #
Applies a linear transformation followed by a ReLU activation.
Shape: (batch_size, hidden_size)
 f(x) = max(0, x)

### Second Linear Layer:
x = self.fc2(x) #
Applies another linear transformation.
Shape: (batch_size, output_size)

#### ReLU Activation Function: ReLU stands for Rectified Linear Unit. It's a mathematical function commonly used as an activation function in artificial neural networks.
++How it works:
--If the input is positive, the output is the input itself.
--If the input is negative, the output is zero.
--ReLU(x) = max(0, x)

### Loss Function
A loss function quantifies the error between the model's predicted output and the true output. It's essentially a measure of how well the model is performing.

### Optimizer
An optimizer is an algorithm or method used to adjust the model's parameters (weights and biases) in order to minimize the loss function.

## Loss Function and Optimizer

    - Loss Function:
        criterion = nn.CrossEntropyLoss() defines the loss function. Cross-entropy loss is commonly used for classification tasks. It measures the difference between the predicted probabilities and the true labels.
     - Optimizer:
        optimizer = optim.Adam(model.parameters(), lr=0.001) initializes the Adam optimizer with the model's parameters and a learning rate of 0.001. Adam is an adaptive learning rate optimization algorithm that's popular for training deep learning models.

## Training Loop

    - Number of Epochs:
        num_epochs = 1000 sets the number of times the entire dataset is passed through the network.

    - Loop Over Epochs:
        for epoch in range(num_epochs): iterates over the number of epochs.

    - Zero the Parameter Gradients:
        optimizer.zero_grad() clears old gradients, otherwise they would accumulate with each iteration (which is not desired).

    - Forward Pass:
        outputs = model(patterns) computes the model's output (predictions) for the given input patterns. Here, patterns is the input tensor containing encoded and padded sequences.

    - Compute Loss:
        loss = criterion(outputs.view(-1, output_size), responses.view(-1)) calculates the loss between the predicted outputs and the true responses. The view(-1, output_size) reshapes the output tensor to match the expected shape for the loss function, and view(-1) flattens the responses tensor.

    - Backward Pass:
        loss.backward() computes the gradient of the loss with respect to the model's parameters. These gradients are used to update the model parameters during optimization.

     - Optimizer Step:
        optimizer.step() updates the model parameters using the computed gradients. This step applies the optimization algorithm (Adam) to adjust the parameters and minimize the loss.

    - Print Loss:
        if (epoch + 1) % 100 == 0: checks if the current epoch is a multiple of 100.
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}') prints the epoch number and the current loss value. This helps monitor the training process.

In [None]:
# Define loss function and optimizer :
criterion = nn.CrossEntropyLoss() #commonly used loss function for classification problems. It measures the difference between the predicted probability distribution and the actual distribution
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) #popular optimization algorithm that combines the advantages of Adagrad and RMSprop. lr parameter (learning rate) determines the step size the optimizer will take when updating parameters.

# Additional debugging:
print("Example padded pattern:", padded_patterns[0])
print("Example padded response:", padded_responses[0])

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    optimizer.zero_grad()  # Clear the gradients(rate of change of the loss function with respect to the model's parameters.) from the previous step, resets the accumulated gradients for each epoch.
    outputs = model(patterns)  # Forward pass: Compute predicted outputs by passing inputs to the model

    # Reshape outputs and responses for loss computation to match the expected input format
    outputs = outputs.view(-1, output_size)  # Shape: (batch_size, sequence_length, output_size) to (batch_size * sequence_length, output_size)
    responses = responses.view(-1)  # Shape: (batch_size, sequence_length) to (batch_size * sequence_length)  # Convert to long for CrossEntropyLoss

    print(f"Epoch {epoch+1}: Outputs shape: {outputs.shape}, Responses shape: {responses.shape}")  # Debug print

    # Additional debugging:
    print("Outuput shape:", outputs.shape[0])
    print("Response shape:", responses.shape[0])

    #  Check for size mismatch (optional, but helpful for debugging)
    if outputs.shape[0] != responses.shape[0]:
        print(f"WARNING: Mismatch in output and response sizes. Outputs: {outputs.shape}, Responses: {responses.shape}")

    # Proceed with loss calculation only if sizes match
    else:
      loss = criterion(outputs, responses)  # Compute the loss and difference between predicted and actual outputs.
      loss.backward()  # Backward pass: Compute gradient of the loss with respect to model parameters
      optimizer.step()  # Update model parameters based on computed gradients
      # Print the loss every 100 epochs if epochs =1000
      # if (epoch + 1) % 100 == 0:
      print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')


Example padded pattern: [7, 0, 0, 0, 0]
Example padded response: [8, 0, 0, 0, 0]
Epoch 1: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Outuput shape: 20
Response shape: 20
Epoch [1/10], Loss: 2.7810
Epoch 2: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Outuput shape: 20
Response shape: 20
Epoch [2/10], Loss: 2.7601
Epoch 3: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Outuput shape: 20
Response shape: 20
Epoch [3/10], Loss: 2.7396
Epoch 4: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Outuput shape: 20
Response shape: 20
Epoch [4/10], Loss: 2.7194
Epoch 5: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Outuput shape: 20
Response shape: 20
Epoch [5/10], Loss: 2.6994
Epoch 6: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Outuput shape: 20
Response shape: 20
Epoch [6/10], Loss: 2.6795
Epoch 7: Outputs shape: torch.Size([20, 16]), Responses

In [None]:
#Debugging

# After forward pass
outputs = model(patterns)  # Assuming `outputs` is the model's output
print("Epoch {}: Outputs shape: {}".format(epoch, outputs.shape))

# If needed, reshaping outputs for loss calculation
outputs_reshaped = outputs.view(-1, outputs.size(-1))
print("Epoch {}: Reshaped Outputs shape: {}".format(epoch, outputs_reshaped.shape))

# Print the target shape
print("Epoch {}: Responses shape: {}".format(epoch, responses.shape))

# Ensure outputs and targets have matching dimensions
print("Epoch {}: Output size: {}".format(epoch, outputs.size()))
print("Epoch {}: Target size: {}".format(epoch, responses.size()))

# Example padded pattern and response
print("Example padded pattern: {}".format(patterns[0].tolist()))
print("Example padded response: {}".format(responses[0].tolist()))

# Assuming vocab_size is defined somewhere in your code
print("Vocabulary Size: {}".format(vocab_size))

# Example with CrossEntropyLoss
criterion = nn.CrossEntropyLoss()

# Debug reshaping logic
print("Before reshaping Outputs shape: {}".format(outputs.shape))
print("After reshaping Outputs shape: {}".format(outputs_reshaped.shape))



Epoch 9: Outputs shape: torch.Size([4, 5, 16])
Epoch 9: Reshaped Outputs shape: torch.Size([20, 16])
Epoch 9: Responses shape: torch.Size([20])
Epoch 9: Output size: torch.Size([4, 5, 16])
Epoch 9: Target size: torch.Size([20])
Example padded pattern: [7, 0, 0, 0, 0]
Example padded response: 8
Vocabulary Size: 16
Before reshaping Outputs shape: torch.Size([4, 5, 16])
After reshaping Outputs shape: torch.Size([20, 16])


In [None]:
#prediction
def predict_response(input_text): #processing the input text, feeding it to a model, and generating a response.
    input_text = preprocess(input_text) #preprocessing: perform some initial cleaning or transformation on the input text like lowercasing, removing punctuation, tokenization, or stemming.
    input_pattern = encode(input_text) #encoding: preprocessed text is converted into a numerical representation or pattern
    input_pattern = pad_sequence(input_pattern, max_len) #padding: ensures that all input patterns have the same length (max_len) by adding padding
    input_pattern = torch.tensor(input_pattern, dtype=torch.long).unsqueeze(0) #tensor conversion: padded input pattern is converted into a PyTorch tensor with data type torch.long


    output = model(input_pattern) #model predictin: repared input tensor is fed into the neural network model to generate an output
    _, predicted = torch.max(output, dim=2) # finds the index of the most probable response for each time step in the output
    predicted = predicted.squeeze(0).numpy() #dimension reduction: The predicted indices are squeezed to remove unnecessary dimensions and converted to a NumPy array for easier manipulation.

    response_words = [all_words[idx] for idx in predicted if idx in word_to_idx.values()] #decoding: predicted indices are mapped back to words using the all_words vocabulary.
    response_text = ' '.join(response_words) #Response Formation: redicted words are joined together to form the final response text.
    return response_text

# Chat with the bot
print("Start chatting with the bot (type 'quit' to stop)!")
while True:
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break
    response = predict_response(user_input)
    print(f"Bot: {response}")


Start chatting with the bot (type 'quit' to stop)!
You: hello
Bot: hi good a a a
You: how are yo
Bot: hi good a a a
You: how are you
Bot: hi good how a a
You: what is your name
Bot: hi good chatbot a a
You: bye
Bot: hi a a a a
You: bye
Bot: hi a a a a
You: good bye
Bot: hi good a hello a
You: what 
Bot: hi a a a a
You: hello
Bot: hi good a a a
You: how are you?
Bot: hi good how a a
You: quit


#### predict_response Function:
--predict_response function is designed to take a user input, process it, feed it to a trained model, and generate a corresponding output. It's a crucial component of a chatbot system.

#### Preprocesses and encodes the input text:
--Encoding: Each token is assigned a numerical representation (index) based on a predefined vocabulary.

#### Pads the sequence to a fixed length:
---The encoded sequence is padded or truncated to match a fixed length required by the model. This ensures consistency in input shape.

#### Converts the sequence to a tensor and passes it through the model:
--padded sequence is converted into a PyTorch tensor, a data structure optimized for numerical computations.

#### Prediction:
-- Index to Word Mapping: Extracts the predicted indices and converts them back to words using the vocabulary.

-- Text Generation: Joins the words to form the final response text.

#Solved Working Chatbot
##but not as expected!!

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import random
import re
import numpy as np

# Sample dataset
data = [
    ("hello", "hi"),
    ("how are you", "I'm good, how about you?"),
    ("what is your name", "I'm a chatbot"),
    ("bye", "goodbye"),
]

# Preprocessing
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text) # removes all characters from the input text that are not word characters or whitespace
    return text

# Vocabulary
all_words = []
for (pattern, response) in data:
    pattern = preprocess(pattern)
    response = preprocess(response)
    words = pattern.split() + response.split() #splits it into a list of words, using whitespace as the delimiter: "hello Ameer" to ["hello","Ameer"]
    all_words.extend(words) #appends the elements of the list words to the end of the list all_words
all_words = sorted(set(all_words))

# Word to index mapping
word_to_idx = {word: idx for idx, word in enumerate(all_words)} #enumerate iterates on pairs of (index, value) tuples.

# Encode patterns and responses
def encode(text):
    text = preprocess(text)
    return [word_to_idx[word] for word in text.split() if word in word_to_idx]

encoded_data = [(encode(pattern), encode(response)) for (pattern, response) in data]

# Pad sequences: pad_sequence function is used to ensure that all sequences in a dataset have the same length
def pad_sequence(seq, max_len, padding_value=0): #seq is input sequence that needs to be padded, max_len is desired length for all sequences, padding_val is value used to fill the sequence to reach the maximum length
    return seq + [padding_value] * (max_len - len(seq)) #If the input sequence is shorter than the max_len, it appends padding_value to the end of the sequence until it reaches the desired length.

# Determine the maximum length of patterns and responses
max_pattern_len = max(len(pattern) for pattern, response in encoded_data)
max_response_len = max(len(response) for pattern, response in encoded_data)
max_len = max(max_pattern_len, max_response_len)

# Pad patterns and responses
padded_patterns = [pad_sequence(pattern, max_len) for pattern, response in encoded_data]
padded_responses = [pad_sequence(response, max_len) for pattern, response in encoded_data]

print(f"Max Pattern Length: {max_pattern_len}")
print(f"Max Response Length: {max_response_len}")
print(f"Max Length: {max_len}")
print(f"Padded Pattern: {padded_patterns}")
print(f"Padded Response: {padded_responses}")

# Additional debugging:
print("Example padded pattern:", padded_patterns[0])
print("Example padded response:", padded_responses[0])

# Convert to tensors: preparing it for training for efficient computations, leverages GPU acceleration, and integrates seamlessly with the PyTorch ecosystem.
patterns = torch.tensor(padded_patterns, dtype=torch.long)
responses = torch.tensor(padded_responses, dtype=torch.long)

class ChatbotModel(nn.Module): # Defines a basic chatbot model architecture using PyTorch
    def __init__(self, vocab_size, embed_size, hidden_size, output_size, max_len): # Initializes the model with
        super(ChatbotModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.lstm = nn.LSTM(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size * max_len, output_size * max_len)

    def forward(self, x):
        x = self.embedding(x)
        x, _ = self.lstm(x)
        x = x.contiguous().view(x.size(0), -1)  # Flatten the output for the linear layer
        x = self.fc(x)
        return x.view(x.size(0), max_len, -1)  # Reshape to (batch_size, max_len, output_size)

# Initialize model
vocab_size = len(all_words)
embed_size = 10
hidden_size = 20
output_size = vocab_size  # Output size should match the vocabulary size
model = ChatbotModel(vocab_size, embed_size, hidden_size, output_size, max_len)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss() #commonly used loss function for classification problems. It measures the difference between the predicted probability distribution and the actual distribution
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) #popular optimization algorithm that combines the advantages of Adagrad and RMSprop. lr parameter (learning rate) determines the step size the optimizer will take when updating parameters.

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    optimizer.zero_grad()  # Clear the gradients(rate of change of the loss function with respect to the model's parameters.) from the previous step, resets the accumulated gradients for each epoch.
    outputs = model(patterns)  # Forward pass: Compute predicted outputs by passing inputs to the model

    # Reshape outputs and responses for loss computation to match the expected input format
    outputs = outputs.view(-1, output_size)  # Shape: (batch_size * sequence_length, output_size)
    responses = responses.view(-1)  # Shape: (batch_size * sequence_length)

    print(f"Epoch {epoch+1}: Outputs shape: {outputs.shape}, Responses shape: {responses.shape}")  # Debug print

    # Check for size mismatch (optional, but helpful for debugging)
    if outputs.shape[0] != responses.shape[0]:
        print(f"WARNING: Mismatch in output and response sizes. Outputs: {outputs.shape}, Responses: {responses.shape}")

    # Proceed with loss calculation only if sizes match
    else:
        loss = criterion(outputs, responses)  # Compute the loss and difference between predicted and actual outputs.
        loss.backward()  # Backward pass: Compute gradient of the loss with respect to model parameters
        optimizer.step()  # Update model parameters based on computed gradients
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')


Max Pattern Length: 4
Max Response Length: 5
Max Length: 5
Padded Pattern: [[7, 0, 0, 0, 0], [9, 2, 14, 0, 0], [13, 11, 15, 12, 0], [3, 0, 0, 0, 0]]
Padded Response: [[8, 0, 0, 0, 0], [10, 5, 9, 1, 14], [10, 0, 4, 0, 0], [6, 0, 0, 0, 0]]
Example padded pattern: [7, 0, 0, 0, 0]
Example padded response: [8, 0, 0, 0, 0]
Epoch 1: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Epoch [1/10], Loss: 2.7511
Epoch 2: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Epoch [2/10], Loss: 2.7322
Epoch 3: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Epoch [3/10], Loss: 2.7132
Epoch 4: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Epoch [4/10], Loss: 2.6941
Epoch 5: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Epoch [5/10], Loss: 2.6748
Epoch 6: Outputs shape: torch.Size([20, 16]), Responses shape: torch.Size([20])
Epoch [6/10], Loss: 2.6551
Epoch 7: Outputs shape: torch.Size([20,

In [None]:
def predict_response(input_text):
    input_text = preprocess(input_text)
    input_pattern = encode(input_text)
    input_pattern = pad_sequence(input_pattern, max_len)
    input_pattern = torch.tensor(input_pattern, dtype=torch.long).unsqueeze(0)  # Add batch dimension

    with torch.no_grad():  # No need to compute gradients for inference
        output = model(input_pattern)  # Predict output

    output = output.view(-1, output_size)  # Flatten output to (batch_size * sequence_length, output_size)
    _, predicted = torch.max(output, dim=1)  # Find the index of the most probable response

    predicted = predicted.numpy()  # Convert to NumPy array

    response_words = [all_words[idx] for idx in predicted if idx < len(all_words)]  # Decode predicted indices
    response_text = ' '.join(response_words)  # Form response text
    return response_text

# Chat with the bot
print("Start chatting with the bot (type 'quit' to stop)!")
while True:
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break
    response = predict_response(user_input)
    print(f"Bot: {response}")


Start chatting with the bot (type 'quit' to stop)!
You: hi
Bot: hi good a are a
You: how are you
Bot: hi good how a a
You: what is your name
Bot: hi good chatbot a a
You: name
Bot: hi a a a a
You: name
Bot: hi a a a a
You: name what
Bot: hi a a a a
You: hi
Bot: hi good a are a
You: bye
Bot: hi a a a a
You: how are you doing bot
Bot: hi good how a a
You: what is your name
Bot: hi good chatbot a a
You: dont say hi
Bot: hi good a are a
You: quit
