# Required Libraries
We need this libraries to work with the dataset

In [1]:
!pip install transformers pandas scikit-learn torch




# Loading the Dataset
We have carefully curated (feature extraction) the FAQs dataset of 1000+ records which we will enhance further

In [2]:
import pandas as pd

# Loading the FAQ dataset from CSV
faq_df = pd.read_csv('faq_dataset.csv')  # Replace 'faq_data.csv' with your file path

# Converting the dataset to a list of dictionaries
faq_dataset = faq_df.to_dict(orient="records")

# Display the first few records to ensure proper loading
print(faq_df.head())


   Category                                           Question  \
0  Services      What services does your software house offer?   
1  Services             Do you provide mobile app development?   
2  Services  Can you handle enterprise-level software devel...   
3  Services      Do you offer e-commerce development services?   
4  Services        Do you provide web application development?   

                                              Answer  
0  We offer custom software development, mobile a...  
1  Yes, we develop native and cross-platform mobi...  
2  Yes, we specialize in building scalable and se...  
3  Yes, we create custom e-commerce platforms and...  
4  Yes, we develop responsive and robust web appl...  


# BERT Model and Tokenizer Set Up
We are load the pre-trained BERT model bert-large-uncased-whole-word-masking-finetuned-squadand its tokenizer. This model is fine-tuned for question-answering tasks. This is really large and efficient model with over 340M parameters.

In [3]:
from transformers import BertTokenizer, BertForQuestionAnswering

# Loading the pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

# Confirming that the model and tokenizer are loaded successfully
print("BERT Model and Tokenizer Loaded.")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


BERT Model and Tokenizer Loaded.


# Implementing Question Answering Pipeline
We're using the BERT model to answer questions based on the context (FAQ entries). The context will be dynamically generated for each question-answer pair in the dataset.

In [5]:
from transformers import pipeline

# Initializing the Question Answering pipeline
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)

# Example question and context from the dataset
test_context = faq_dataset[0]["Answer"]
test_question = "What services do you offer?"
answer = qa_pipeline(question=test_question, context=test_context)

# Display the answer
print(f"Answer: {answer['answer']}")


Answer: custom software development, mobile and web app development,


# Function to Retrieve Answers
Function get_answer processes the user query and finds the most relevant answer from the FAQ dataset.

In [14]:
# Function to get the most relevant answer from the FAQ dataset
def get_answer(user_query):
    # List to store possible answers with their confidence scores
    answers = []

    # Loop through each FAQ entry in the dataset
    for faq in faq_dataset:
        context = f"Category: {faq['Category']} Question: {faq['Question']} Answer: {faq['Answer']}"

        # Use BERT to get the answer to the user's query based on the context
        result = qa_pipeline(question=user_query, context=context)
        answers.append((faq['Question'], result['answer'], result['score']))  # Store question, answer, and score

    # Sort answers by the confidence score (highest first)
    answers = sorted(answers, key=lambda x: x[2], reverse=True)

    # If the highest score is below a certain threshold, return a generic response
    if answers[0][2] < 0.5:  # You can adjust this threshold based on testing
        return "I'm sorry, I couldn't find an answer to your question."

    # Return the most relevant answer
    return f"Question: {answers[0][0]}\nAnswer: {answers[0][1]}"



# Interactive Chat Loop
Chatbot loop where the user can type queries, and the chatbot will respond with the most relevant answer.

In [None]:
# Interactive loop for user to query
print("Welcome to the FAQ Chatbot! Type your question or 'exit' to quit.")
while True:
    # Take user input
    user_query = input("You: ")

    # Exit condition
    if user_query.lower() == "exit":
        print("Goodbye!")
        break

    # Get the most relevant answer from the FAQ dataset
    answer = get_answer(user_query)
    print(f"Chatbot: {answer}")


Welcome to the FAQ Chatbot! Type your question or 'exit' to quit.
