<a href="https://colab.research.google.com/github/sneha3322/Automated-FAQ-system-/blob/main/Automated_FAQ_System_using_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **TITLE: Automated FAQ System Using Retrieval-Augmented Generation (RAG)**

## **Install Required Libraries**

 Here, I install the necessary libraries that will be used throughout the project.


*   'transformers' provides pre-trained models and tokenizers for NLP tasks.
*   'pandas' is used for data manipulation and analysis.
*   'torch' is the core library for PyTorch (used for deep learning).
*   'sentence-transformers' is specifically for creating sentence embeddings.












In [None]:
!pip install transformers
!pip install pandas
!pip install torch
!pip install sentence-transformers




## **Import Libraries**

 This cell imports the libraries that I will be using in this project.


*   'pandas' is essential for handling the FAQ dataset.
*   'pipeline' from 'transformers' allows me to use pre-trained models for various tasks.
*   'SentenceTransformer' from 'sentence-transformers' is used for generating embeddings for the questions.
*   The 'logging' library is used for tracking user interactions.

In [None]:
import pandas as pd
from transformers import pipeline
from sentence_transformers import SentenceTransformer, util
import logging
import re


# **Load and Explore the Dataset**

 In this cell, I load the FAQ dataset from a CSV file using pandas.

This dataset contains questions and answers that the chatbot will use to provide responses.

I also display the first few rows of the dataset to verify that it has been loaded correctly.

The dataset used in this project is downloaded from Kaggle and it contains Tesco's online grocery FAQs.






In [None]:
file_path = '/content/faqs.csv'  #path of the faqs.csv
faq_data = pd.read_csv(file_path)

# Displaying the first few rows of the dataset to ensure it's loaded correctly
print("Loaded FAQ Data (first 5 rows):")
print(faq_data.head())


Loaded FAQ Data (first 5 rows):
    ID                Topic                        Subtopic  \
0  1.0  Online Grocery FAQs  Delivery and collection basics   
1  2.0  Online Grocery FAQs  Delivery and collection basics   
2  3.0  Online Grocery FAQs  Delivery and collection basics   
3  4.0  Online Grocery FAQs  Delivery and collection basics   
4  5.0  Online Grocery FAQs                 Ordering online   

                            Question  \
0            Where Tesco delivers to   
1  Delivery and Click+Collect prices   
2                Minimum order value   
3                  Returning an item   
4             Slot times and options   

                                              Answer  
0  We deliver to most UK residential addresses. T...  
1  The standard delivery charge is between £3–£7,...  
2  A £5 minimum basket charge will be added to de...  
3                     Please see our returns policy.  
4  You can choose to get your shopping delivered ...  


# **Data Cleaning and Preprocessing**

In this cell, I perform data cleaning and preprocessing to ensure that the text data is consistent and ready for analysis.

This involves several steps:
*   Convert all text to lowercase to maintain uniformity.
*   Remove extra spaces to avoid issues during tokenization.
*   Remove special characters to focus on alphanumeric content, which is more relevant for our FAQ matching task.






In [None]:
def clean_text(text):
    text = text.lower()  # Convert to lowercase to ensure uniformity
    text = re.sub(r'\s+', ' ', text)  # Remove extra spaces
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)  # Remove special characters
    return text


# Checking for missing values
# Before proceeding with text cleaning, I check for any missing values in the dataset.
# Missing values can cause issues during text processing and model training, so they need to be addressed.
print("Checking for missing values:")
print(faq_data.isnull().sum())


# Drop rows with any missing values
# If there are any rows with missing values, I drop them to ensure the dataset is clean.
faq_data.dropna(inplace=True)


# Applying text cleaning
# Apply the 'clean_text' function to both the 'Question' and 'Answer' columns.
# This ensures that both the input (questions) and output (answers) are preprocessed consistently.
faq_data['Question'] = faq_data['Question'].apply(clean_text)
faq_data['Answer'] = faq_data['Answer'].apply(clean_text)


# Verifying the data after cleaning
# Display the first few rows of the cleaned dataset to verify the cleaning process.
print("Data after cleaning (first 5 rows):")
print(faq_data.head())


Checking for missing values:
ID          50
Topic        0
Subtopic     0
Question     0
Answer       0
dtype: int64
Data after cleaning (first 5 rows):
    ID                Topic                        Subtopic  \
0  1.0  Online Grocery FAQs  Delivery and collection basics   
1  2.0  Online Grocery FAQs  Delivery and collection basics   
2  3.0  Online Grocery FAQs  Delivery and collection basics   
3  4.0  Online Grocery FAQs  Delivery and collection basics   
4  5.0  Online Grocery FAQs                 Ordering online   

                           Question  \
0           where tesco delivers to   
1  delivery and clickcollect prices   
2               minimum order value   
3                 returning an item   
4            slot times and options   

                                              Answer  
0  we deliver to most uk residential addresses to...  
1  the standard delivery charge is between 37 dep...  
2  a 5 minimum basket charge will be added to del...  
3            

# **Initialize the SentenceTransformer Model for Embedding**

Here, I initialize the SentenceTransformer model, which will be used to generate embeddings for the FAQ questions and user queries.

I use the 'all-MiniLM-L6-v2' model, which is efficient for producing high-quality sentence embeddings.

In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')




# **Configure Logging**

In this cell, I set up logging to track user interactions with the chatbot.

This will help me understand user behavior and improve the FAQ dataset in the future.

The log file will store user queries and the corresponding answers provided by the chatbot.

In [None]:
logging.basicConfig(filename='user_interactions.log', level=logging.INFO)

def log_interaction(user_query, answer):
    logging.info(f"User Query: {user_query}")  # Log the user's query
    logging.info(f"Answer: {answer}")          # Log the corresponding answer


# **Define a Function for Retrieval-Augmented Generation with Debugging**

 In this cell, I define a function that processes user queries to find the most relevant FAQ answer.

This function uses a pre-trained model to encode the user query and compare it against the encoded FAQ questions.

The function also includes debugging prints to help understand the matching process and similarity scores.


In [None]:
def get_faq_answer(user_query):
    try:
        query_embedding = model.encode(user_query, convert_to_tensor=True)   # Encode the user query

        # Compute cosine similarities between the query and FAQs
        faq_embeddings = model.encode(faq_data['Question'].tolist(), convert_to_tensor=True)
        similarities = util.pytorch_cos_sim(query_embedding, faq_embeddings)[0]

        best_idx = similarities.argmax().item()         # Get the index of the most similar FAQ

        # Retrieve the corresponding answer
        best_answer = faq_data['Answer'].iloc[best_idx]

        # These prints display the user query, the best matching FAQ question, and the similarity score.
        print(f"\nQuery: {user_query}")
        print(f"Best match: {faq_data['Question'].iloc[best_idx]}")
        print(f"Similarity score: {similarities[best_idx].item()}\n")

        if similarities[best_idx] < 0.5:                # Threshold for similarity
            best_answer = "I'm not sure about the answer to that. Can you try rephrasing?"

        log_interaction(user_query, best_answer)        # Log the interaction
        return best_answer

    except Exception as e:
        return f"An error occurred: {str(e)}"           # Return error message


# **Example Questions to Demonstrate Functionality**

This cell provides a set of example questions to showcase how the FAQ system works.

It helps to understand how the chatbot responds to common queries without needing to interact manually.

In [None]:
example_questions = [
    "How do I change my delivery address?",
    "What is the return policy?",
    "How can I track my order?"
]

for question in example_questions:
    answer = get_faq_answer(question)
    print(f"Question: {question}")
    print(f"Answer: {answer}\n")



Query: How do I change my delivery address?
Best match: updating my delivery and clickcollect addresses
Similarity score: 0.5946130156517029

Question: How do I change my delivery address?
Answer: to update an address select my account then phone  address book all of your address details should be displayed select add new address add your new details and select save address to update your new address details will be saved in your address book please make sure you select the correct address from your address book when you check out if youre using the tesco grocery  clubcard app select add new address within the book a slot section enter the details required then select use this address choose a convenient delivery slot and then tap begin shopping remember changing your delivery address or preferred clickcollect store may mean that your chosen delivery or clickcollect slot is no longer available products in your shopping basket are no longer available different promotions are available 

# **User Interaction Loop**

This function starts a loop that allows users to interact with the chatbot.

Users can type their questions, and the chatbot will respond with the most relevant FAQ answer.

Type 'exit' to end the interaction.

In [None]:
def start_chatbot():
    print("Welcome to the FAQ chatbot! Type 'exit' to end the session.\n")
    while True:
        user_query = input("You: ")
        if user_query.lower() == 'exit':
            print("Chatbot: Goodbye!")
            break
        answer = get_faq_answer(user_query)
        print(f"Chatbot: {answer}\n")

start_chatbot()


Welcome to the FAQ chatbot! Type 'exit' to end the session.

You: What to expect when my order is delivered?

Query: What to expect when my order is delivered?
Best match: what to expect when my order is delivered
Similarity score: 0.9929834604263306

Chatbot: if your order contains an age restricted product you may be asked for id regardless of your age the courier will record your date of birth and deliver the goods proof of delivery is required for all orders there are two methods that will be used depending on what your order contains if your order contains an age restricted product then a signature will be required if the order does not contain an age restricted product a photo will be taken of the whoosh bags on the doorstep

You: how to track my address?

Query: how to track my address?
Best match: updating my delivery and clickcollect addresses
Similarity score: 0.4128359258174896

Chatbot: I'm not sure about the answer to that. Can you try rephrasing?

You: How to check items 

# **CONCLUSION**



In this project, I built an **Automated FAQ System Using Retrieval-Augmented Generation (RAG)** to create a chatbot that can answer user questions effectively. Using Tesco's online grocery FAQ dataset, I cleaned the data, generated embeddings, and used cosine similarity to match user questions with the best answers.

The chatbot was able to provide accurate responses, improving user experience for customer support. The debugging features, like similarity score tracking, made the system more reliable and user-friendly.

This project shows my ability to use open-source libraries and write good code to solve real problems.

In the future, I plan to make the system even better by using machine learning to improve answer accuracy. I also want to expand the dataset with more questions and answers to handle a wider range of user queries.

Overall, this project demonstrates how AI can help automate customer support, making it more efficient and scalable.
