#BAX 493A Python LLM - Final Project: A Retrieval-Augmented Product-Facing Assistant for Your Daily ‘How-To’ Queries

Submitted by: Rashmila Mitra

---

This project explores a lightweight, product-facing Retrieval-Augmented Generation (RAG) chatbot built using WikiHow’s instructional Q&A dataset. The goal was to improve the accuracy and specificity of responses to everyday how-to questions—like cleaning, household tasks, or basic DIY—by grounding outputs in real crowd-curated content. The implementation uses FAISS for retrieval, MiniLM for embeddings, and FLAN-T5 for generation. It is fully executable, requires no API keys and aims to use a simple RAG architectures to improve factual reliability for consumer use cases.

#1. Setting up the Environment and Loading the Data

In [1]:
# Installing dependencies
!pip install -q transformers
!pip install -q datasets
!pip install -q sentence-transformers
!pip install -q faiss-cpu


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m35.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m30.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m34.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# Importing required libraries
from datasets import load_dataset
import pandas as pd

# Loading the Wikihow Non-Factoid QA dataset from Hugging Face (training split)
dataset = load_dataset("Lurunchik/WikiHowNFQA", split="train")

# Converting to a DataFrame
df = pd.DataFrame(dataset)

# VIEWING Sample questions and answers
df[["question", "answer"]].head()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.jsonl:   0%|          | 0.00/969M [00:00<?, ?B/s]

valid.jsonl:   0%|          | 0.00/130M [00:00<?, ?B/s]

test.jsonl:   0%|          | 0.00/267M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8235 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1178 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2333 [00:00<?, ? examples/s]

Unnamed: 0,question,answer
0,How To Cook Pork Tenderloin,"To cook pork tenderloin, put it in a roasting ..."
1,How To Prevent Skin Peeling After Sunburn,To prevent your skin from peeling after a sunb...
2,How To Grill Sweet Potatoes,"Before baking sweet potatoes on the grill, pre..."
3,How To Find a Job Working from Home,"To find a job working from home, browse and ap..."
4,How To Lactate,"If you want to start lactating, talk to your d..."


In [3]:
!pip install -q datasets pandas

#2. Data Preprocessing and Cleaning

In [4]:
# Reimporting required libraries after Colab reset
from datasets import load_dataset
import pandas as pd

# Loading the WikiHow Non-Factoid QA dataset
dataset = load_dataset("Lurunchik/WikiHowNFQA", split="train")
df = pd.DataFrame(dataset)

# Keeping only rows with both question and answer fields filled
df = df[df['question'].notna() & df['answer'].notna()]

# Removing duplicates
df = df.drop_duplicates(subset="question")

# Resetting index for cleanliness
df = df.reset_index(drop=True)
df = df.iloc[:500]

# Cleaned data
df[["question", "answer"]].head(10)


Unnamed: 0,question,answer
0,How To Cook Pork Tenderloin,"To cook pork tenderloin, put it in a roasting ..."
1,How To Prevent Skin Peeling After Sunburn,To prevent your skin from peeling after a sunb...
2,How To Grill Sweet Potatoes,"Before baking sweet potatoes on the grill, pre..."
3,How To Find a Job Working from Home,"To find a job working from home, browse and ap..."
4,How To Lactate,"If you want to start lactating, talk to your d..."
5,How To Strip Cloth Diapers,"Before you strip cloth diapers, wash and dry t..."
6,How To Use Rosewater,"To use rosewater, try applying a mask of rosew..."
7,How To Prepare a Stall for a Pregnant Mare,"To prepare a stall for a pregnant mare, start ..."
8,How To Clean a Hermit Crab Tank,"Before you clean your hermit crab tank, remove..."
9,How To Make Your Own Tortillas,"To make your own flour tortillas, start by mix..."


#3. Embedding & FAISS Indexing

In [5]:
# Embedding the answers and building a FAISS index

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# Loading the MiniLM sentence embedding model
embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Getting the list of answer texts
answer_texts = df['answer'].tolist()

# Converting answers to embeddings
embeddings = embedder.encode(answer_texts, convert_to_numpy=True, show_progress_bar=True)

# Creating a FAISS index
embedding_dim = embeddings.shape[1]
index = faiss.IndexFlatL2(embedding_dim)
index.add(embeddings)  # Add all answer vectors to the index

# Basic confirmation
print(f"FAISS index created with {index.ntotal} entries.")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/16 [00:00<?, ?it/s]

  return forward_call(*args, **kwargs)


FAISS index created with 500 entries.


In [6]:
!pip install -q transformers

#4: Loading FLAN-T5 and Test Generation

In [7]:
# Installing transformers
!pip install -q transformers

# Loading a Hugging Face generation model for answer generation
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

# Using FLAN-T5 base
model_name = "google/flan-t5-base"

# Loading tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Creating generation pipeline
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

# Testing with a basic prompt
test_prompt = "Question: How to boil eggs?\nAnswer:"
response = generator(test_prompt, max_length=100, do_sample=False)

# Results
print("Sample output:", response[0]['generated_text'])


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cuda:0
Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Sample output: Place the eggs in a pot of water and bring to a boil.


#5. Answering Questions with Retrieved Context

In [8]:
# Retrieval-Augmented Generation pipeline

def answer_question(user_query, k=3, max_length=100):
    """
    Given a user question, retrieve top-k relevant WikiHow answers and generate a response using FLAN-T5.
    """
    # Embeding the user query
    query_embedding = embedder.encode([user_query], convert_to_numpy=True)

    # Searching for top-k relevant answer chunks
    D, I = index.search(query_embedding, k)  # I contains indices of top-k retrieved answers
    retrieved_answers = [answer_texts[i] for i in I[0]]

    # Constructing prompt with retrieved context
    context = "\n".join(retrieved_answers)
    prompt = f"Context:\n{context}\n\nQuestion: {user_query}\nAnswer:"

    # Generating answer using FLAN-T5
    output = generator(prompt, max_length=max_length, do_sample=False)

    # Returning the answer
    return output[0]["generated_text"]

# Testing with an actual query
test_q = "How can I clean a hermit crab tank?"
response = answer_question(test_q)
print("Generated Answer:", response)


Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generated Answer: Remove the crabs and place them in a separate container with high walls. Wipe down the tank with a 3% bleach solution, then rinse out the tank thoroughly with clean water. Add live plants, such as moss and fern to the bottom of the tank, and use pH test strips to check that your water has a pH between 6.0 and 7.5 before filling your tank. Clean deck wood with vinegar and water.


#6: Evaluation

In [9]:
# Evaluating the RAG pipeline against a baseline

# Sample questions to test performance
test_questions = [
    "How to clean a hermit crab tank?",
    "How to prepare a stall for a pregnant mare?",
    "How to strip cloth diapers?"
]

# Looping through each question and comparing responses
for i, q in enumerate(test_questions, 1):
    print("="*100)
    print(f"Test Question {i}: {q}\n")

    # RAG answer (with context retrieved from WikiHow answers)
    rag_answer = answer_question(q)
    print(" RAG Answer (retrieval + generation):")
    print(rag_answer, "\n")

    # Baseline answer (direct question to the model)
    baseline_prompt = f"Question: {q}\nAnswer:"
    baseline_answer = generator(baseline_prompt, max_length=100, do_sample=False)[0]['generated_text']
    print("Baseline Answer (no retrieval, LLM only):")
    print(baseline_answer, "\n")


Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Test Question 1: How to clean a hermit crab tank?



Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


 RAG Answer (retrieval + generation):
Remove the crabs and place them in a separate container with high walls. Wipe down the tank with a 3% bleach solution, then rinse out the tank thoroughly with clean water. Add live plants, such as moss and fern to the bottom of the tank, and use pH test strips to check that your water has a pH between 6.0 and 7.5 before filling your tank. Clean deck wood with vinegar and water. 



Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Baseline Answer (no retrieval, LLM only):
Clean the tank with a hose and a bucket of water. 

Test Question 2: How to prepare a stall for a pregnant mare?



Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


 RAG Answer (retrieval + generation):
Pick a quiet area of your barn. Clean and disinfect the stall. Look for signs of surrogate mothering. Talk to your veterinarian if you want to start lactating. 



Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Baseline Answer (no retrieval, LLM only):
Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilated area. Place the stall in a well ventilated area and place the stall in a well ventilat 

Test Question 3: How to strip cloth diapers?



Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


 RAG Answer (retrieval + generation):
Fill a tub or container with hot water, and add a stripping agent to the water. Soak the diapers in a mixture of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diapers in a solution of baking soda, hydrogen peroxide, and water. Soak the diaper 

Baseline Answer (n

The RAG pipeline significantly improved answer quality compared to a baseline LLM-only approach. In all test cases, the context-augmented responses included accurate, multi-step, and safety-conscious instructions directly grounded in WikiHow content. The baseline model often hallucinated or repeated generic information (for example "Use scissors to cut diaper"), highlighting the value of retrieval-based augmentation for factual tasks.