In [1]:
# https://www.google.com/url?sa=i&url=https%3A%2F%2Fblog.stackademic.com%2Fmastering-retrieval-augmented-generation-rag-architecture-unleash-the-power-of-large-language-a1d2be5f348c&psig=AOvVaw0DujeOP2NwkAn1ZC_E9waE&ust=1753624406384000&source=images&cd=vfe&opi=89978449&ved=0CBUQjRxqFwoTCNj_uZTW2o4DFQAAAAAdAAAAABAE

In [2]:
# Problem Statement
# An enterprise HR department stores employee policies, benefits, leave rules, and compliance documents in lengthy PDFs. Employees frequently ask repetitive questions like “How many remote days are allowed?” or “What’s the intern stipend?” — and HR struggles to scale this support efficiently.

# The goal is to build a Naive Retrieval-Augmented Generation (RAG) assistant that:

# Retrieves relevant information from static PDF documents.
# Uses a lightweight open-source language model to generate accurate answers.
# Works without model fine-tuning or cloud APIs (i.e., fully local and lightweight).


# Objectives
# By the end of this lab, you will be able to:

# Parse and chunk a PDF document intelligently.
# Embed the text using Sentence Transformers.
# Index and search using FAISS for vector similarity.
# Retrieve relevant chunks based on user queries.
# Use FLAN-T5, an instruction-tuned model, to generate answers.
# Simulate a basic Naive RAG pipeline using open-source tools.


# Outcomes
# After completing this, you will:

# Understand how to build the simplest form of RAG from scratch.
# Be able to embed, retrieve, and generate without any external database or APIs.
# Build a baseline QA system over any internal document.
# Lay the foundation for more advanced RAG types like reranking, hybrid, or agentic RAG.


In [3]:
!pip install transformers sentence-transformers faiss-cpu PyPDF2 -q

# transformers - for using FLAN-T5 model since we are using it for generation
# sentence-transformers - for embedding the text chunks. T5 means "Text-to-Text Transfer Transformer" and is used for generating embeddings.
# faiss-cpu - for indexing and searching the embeddings
# PyPDF2 - for parsing PDF documents
# Note: The -q flag is used to suppress output during installation.
# This is useful for keeping the notebook clean and focused on the code and results.

In [4]:
!pip install tf-keras -q

In [5]:
# Imports
import PyPDF2 # used for reading PDF files and extracting text
import os # used for file and directory operations
import faiss # used for efficient similarity search and clustering of dense vectors
import numpy as np

# update below code to import the necessary libraries
from sentence_transformers import SentenceTransformer # used for generating embeddings from text
from transformers import pipeline # used for using pre-trained models for text generation

In [6]:
# Step 1: Load and split your PDF into chunks
def load_pdf_chunks(pdf_path, chunk_size=300): # chunk_size=300 means each chunk will have a maximum of 300 characters i.e. spill the text into chunks of 300 characters
    reader = PyPDF2.PdfReader(pdf_path) # create a PDF reader object
    print(f"Loaded {len(reader.pages)} pages.")
    text = "" # Initialize an empty string to hold the text
    # Iterate through each page in the PDF and extract text
    # This will concatenate all the text from the PDF into a single string
    # Note: PyPDF2 may not extract text perfectly from all PDFs, especially if they contain complex formatting or images
    # For more complex PDFs, consider using libraries like pdfplumber or PyMuPDF
    # However, for this simple example, PyPDF2 should suffice
    for page in reader.pages:
        text += page.extract_text()

    # Clean and chunk
    text = text.replace("\n", " ") # Replace newlines with spaces to avoid breaking words
    chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
    return chunks

chunks = load_pdf_chunks("company_policy.pdf", chunk_size=300)
print(f"Loaded {len(chunks)} chunks.")

# Print the 3rd chunk to verify
print(f"Chunk 3: {chunks[2]}")

# Why is Chunking Important?
# Chunking is crucial for several reasons:
# 1. **Memory Efficiency**: Large documents can exceed memory limits. Chunking allows processing smaller parts sequentially.
# 2. **Focused Retrieval**: Smaller chunks improve retrieval accuracy. Instead of searching through a massive document, the system can quickly find relevant sections.
# 3. **Improved Context**: Smaller chunks provide better context for language models, leading to more accurate and relevant responses.
# 4. **Scalability**: Chunking allows the system to scale better. As the document grows, the system can handle more data without significant performance degradation.
# # 5. **Parallel Processing**: Smaller chunks can be processed in parallel, speeding up the retrieval and generation process.
# 6. **Flexibility**: Different chunks can be used for different queries, allowing for more tailored responses based on user needs.
# 7. **Reduced Latency**: Smaller chunks can be retrieved and processed faster, improving the overall user experience.
## 8. **Better Model Performance**: Language models often perform better with shorter, more focused text inputs, leading to higher quality responses.
# 9. **Easier Debugging**: Smaller chunks make it easier to identify and fix issues in the text processing pipeline.
# 10. **Enhanced User Experience**: Users can quickly find relevant information without wading through large blocks of text
# 11. **Improved Searchability**: Smaller chunks can be indexed more effectively, allowing for faster and more accurate search results.
# 12. **Better Handling of Complex Documents**
#    : Complex documents with multiple sections, tables, or images can be chunked to maintain context and structure, making it easier to retrieve relevant information.

Loaded 24 pages.
Loaded 152 chunks.
Chunk 3: is the confidential property of Sirca Paints India Limited (SPIL)    and any use, distributing, copying or disclosure by any person to outsiders without any proper  authorization is strictly prohibited.     Any query or doubt concerning the content of the EHB should be forwarded to the Human  Resour


In [7]:
# Step 2: Embed chunks using SentenceTransformer
embedder = SentenceTransformer("all-MiniLM-L6-v2") # create an instance of the SentenceTransformer model
# What is SentenceTransformer?
# SentenceTransformer is a library that provides pre-trained models for generating sentence embeddings. These embeddings are dense vector representations of sentences that capture their semantic meaning. The "all-MiniLM-L6-v2" model is a lightweight transformer model that produces high-quality embeddings suitable for various NLP tasks, including semantic search, clustering, and classification. It is designed to be efficient and fast while maintaining good performance
# on a wide range of text data.

chunk_embeddings = embedder.encode(chunks, convert_to_tensor=False, show_progress_bar=True)
# convert_to_tensor=False means that the embeddings will be returned as NumPy arrays instead of PyTorch tensors. This is useful if you want to work with the embeddings in a NumPy-based environment or if you don't need the additional functionality provided by PyTorch tensors.
# show_progress_bar=True means that a progress bar will be displayed during the embedding process, which can be helpful for tracking the progress of the operation, especially when processing large datasets or many chunks.
print(f"Generated embeddings for {len(chunk_embeddings)} chunks.")

Batches:   0%|          | 0/5 [00:00<?, ?it/s]

  return forward_call(*args, **kwargs)


Generated embeddings for 152 chunks.


In [8]:
# Print the embedding for the 3rd chunk to verify
# print(f"Embedding for Chunk 3: {chunk_embeddings[2]}")
# Print the length of the embedding vector to verify
print(f"Embedding dimension: {len(chunk_embeddings[2])}")

Embedding dimension: 384


In [9]:
# Print the length of the first 10 embedding vector to verify
for i in range(10):
    print(f"Embedding dimension for chunk {i}: {len(chunk_embeddings[i])}")


Embedding dimension for chunk 0: 384
Embedding dimension for chunk 1: 384
Embedding dimension for chunk 2: 384
Embedding dimension for chunk 3: 384
Embedding dimension for chunk 4: 384
Embedding dimension for chunk 5: 384
Embedding dimension for chunk 6: 384
Embedding dimension for chunk 7: 384
Embedding dimension for chunk 8: 384
Embedding dimension for chunk 9: 384


In [10]:
# Step 3: Create FAISS index
dimension = len(chunk_embeddings[0]) # Get the dimension of the embeddings (i.e., the length of the vector representation)
print(f"Embedding dimension: {dimension}")
# FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. It is particularly useful for large-scale similarity search tasks, such as finding similar items in high-dimensional spaces.
index = faiss.IndexFlatL2(dimension) # L2 means that the index will use the L2 (Euclidean) distance metric for similarity search. This is a common choice for dense vector representations, as it measures the straight-line distance between two points in the embedding space.
# The IndexFlatL2 class is a simple index that stores vectors in a flat structure and allows for exact nearest neighbor search using the L2 distance metric. It is suitable for small to medium-sized datasets where exact search is required.
index.add(np.array(chunk_embeddings)) # Add the embeddings to the FAISS index
# This step builds the index, allowing for efficient similarity search later on.
print("FAISS index built.")

Embedding dimension: 384
FAISS index built.


In [11]:
# Step 4: Retrieval function
def retrieve_top_k(query, k=3):
    query_embedding = embedder.encode([query]) # this will convert the query into an embedding using the same model used for the chunks
    distances, indices = index.search(np.array(query_embedding), k) # search the index for the top k most similar chunks
    # distances will contain the distances of the top k chunks from the query embedding
    # indices will contain the indices of the top k chunks in the original chunks list
    # This means that indices[0] will give us the indices of the top k chunks
    # We can then use these indices to retrieve the corresponding chunks from the original chunks list
    return [chunks[i] for i in indices[0]]

In [12]:
# Step 5: Use FLAN-T5 for instruction-style generation
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-base", max_length=256)
# This pipeline will take a query and return a generated answer based on the retrieved chunks.
# max_length=256 means that the generated answer will have a maximum length of 256 tokens. This is useful to control the length of the generated text and prevent excessively long responses.

Device set to use mps:0


In [13]:
# Step 6: Answer function using Naive RAG
def answer_query(query): # query= "How many work-from-home days are allowed per month?"
    context_chunks = retrieve_top_k(query) # Retrieve the top 3 chunks relevant to the query
    context = "\n".join(context_chunks) # Join the retrieved chunks into a single context string
    prompt = f"Context: {context}\n\nQuestion: {query}\n\nUnderstand the intent of the user and do a perfect mapping between Question and Context.Note: If the questions answer is not found in the chunk with confidence of atleast 80% and there is no relevant context, politely reply with a No\n\nAnswer:"
    print(f"\nPrompt:\n {prompt}")
    response = qa_pipeline(prompt)[0]['generated_text']
    return response.strip()

In [14]:
# Test it
question = "How many work-from-home days are allowed per month?"
response = answer_query(question)
print(f"\n🧠 Assistant Response:\n{response}")


Prompt:
 Context: considered as Half Day.   Extra Ordinary Leave and Work from Home   On the basis of Medical Ground (i.e. Any Accident, Fracture or any other problem). The employee  can avail the 7 days leave and the same will be considered on Special Leave. To avail the said  leave the employee has to subm it the p
h. The leave given to the employee will be strictly based on the Date of  Joining (i.e. Pro -Rata Basis). Employee who joins on the following dates will be eligible for the  leave.     1st to 7th of every month  : 2    8th to 14th of every month  : 1.5     15th to 21st of every month  : 1    22nd to
tendance of their respective team members/subordinate.     m) Employee who will take the leave on Saturday and Monday, then the Sunday of that  particular week wi ll also be included.     n) It is clear state that company follows the rule “NO WORK NO PAY” and unauthorized absent  will be treated as 

Question: How many work-from-home days are allowed per month?

Understand the

In [17]:
# Test it
question = "Summarize PDF in 10 lines"
response = answer_query(question)
print(f"\n🧠 Assistant Response:\n{response}")


Prompt:
 Context: ration   Marriage Anniversary   Wedding Ceremony   Employee Children Merit Reward   Salary Advance   Maternity Leave   Paternity Leave   Compensatory Leave   Extraordinary Leave & Work from Home   Management Trainee Hiring     SPIL Corporate HR Policies         Flexi Working Schedule for Women   Mea
nd to make happiness with the part ners,  management had initiate with the couple movie tickets to the employees. Employee can take the  reimbursement for the movie tickets, the amount of Rs 600/ - will be given against the  reimbursement.     Wedding Ceremony     A wedding is the ceremony where tw 
artment after proper submission of bill.   Section 13: Disciplinary Pro cedures     The Disciplinary Procedure will be used only when necessary and as LAST RESORT. Where  possible, informal or formal counseling or other good management practice will resolve the  matters prior to disciplinary action 

Question: Summarize PDF in 10 lines

Answer:

🧠 Assistant Response:
Employee 

In [18]:
# Test it
question = "What is the salary offered to the interns?"
response = answer_query(question)
print(f"\n🧠 Assistant Response:\n{response}")


Prompt:
 Context: be taken to process the Internship programme. The monetary  benefi t for the interns amounting Rs 7 000/ - will be considered as Stipend.   After completion of the Internship programme, the intern has to submit the detailed project  report and the pr oper review from the respective department will b
al from the Managing  Director/Director. Salary advance will be given upto a maximum limit of three months of Gross  Pay and the employee who had completed the successfully period of 1 year in the organisation  may be availed.   The same will be recover on the installment of 3/6/9/12  months. The to
he slab of Rs 250000 to 350000 per  annum. The amount will be strictly based upon the Interview’s Performance.   And based upon the performance of the candidate after a year, the Management  Trainee/Graduate Engineer Trainee /Junior Executives will be promoted on the “Executive”  Grade.   Induction:

Question: What is the salary offered to the interns?

Answer:

🧠 Assistant Re

In [19]:
# Test it
question = "if my kid gets 90% marks in fifth standard, is he eligible for any benefits?"
response = answer_query(question)
print(f"\n🧠 Assistant Response:\n{response}")


Prompt:
 Context: e child w ill  secured the marks above 85%.     Class VIII:  Rs 3500/ - will be given as a CASH REWARD, if the child will  secured the marks above 80%         SPIL Corporate HR Policies       Class X:  Rs 5000/ - will be given as a CASH REWARD, if the child will  secured the marks above 80%     Clas
e of India, and to boost and motivate them the Employee Children Merit  Award will be introduced. This reward will be given to employee who secures  the highest  aggregate marks in School/Colleges. The details of the award is given below     Class V:  Rs 2500/ - will be given as a CASH REWARD, if th
 of the employee to  the HR Department.     d) Employee has to earn the minimum category of “Average” and maximum category of  “Excellent” for the confirmation.     e) Depend upon the performance of the probationers and discretion of the management, the  probationer’s  compensation, grade, designati

Question: if my kid gets 90% marks in fifth standard, is he eligible for any 

In [26]:
# Test it
question = "can I buy a shoes in the shop?"
response = answer_query(question)
print(f"\n🧠 Assistant Response:\n{response}")

  return forward_call(*args, **kwargs)



Prompt:
 Context:   The Company assumes no liabilities for any loss or damage  to personal belonging and  property.     e) The office space, equipment , material and other properties shall be used only for Sirca Paints  India Limited (SPIL). Employee who uses the company property like : Laptop, Mobile Phones,  Camera
, Projectors and other material are responsible for the sake keeping of these  equipment .     f) To make the safety and security of the employees of Sirca, only authorize vendors are allowed  to visit to the workplace. All vendors are authorized to enter through the main lift area and  wait at the 
for the public transport  (Bus/Auto/Metro’s) or any private cab for the long route (OLA/UBER).     e) Lunch Charges will be Rs 100 to Rs 250 based upon the hierarchy level. Expenses of Business  Promotion will be reimbursed as actual. All the bills of Business Promotion will be approved by  the Nati

Question: can I buy a shoes in the shop?

Note: If the questions answer is no

In [29]:
# Test it
question = "please recommend a nearby restaurant?"
response = answer_query(question)
print(f"\n🧠 Assistant Response:\n{response}")

  return forward_call(*args, **kwargs)



Prompt:
 Context: A+ City, A City, B City and C City. The description  of the hotel booking will be given in the Travel Policy.     h) Employee with the same gender is travelling for one location and then  the twin sharing room  will be booked.     i) If the hotel is in the outskirts of the location, then the employe
rking hours :   9:30 AM to  6:00 PM   10:00 to 6:30 PM   Employee can avail the above services on daily basis and must be ensure that work or the project  assigned will not be hamper and the productivity and profitability will be considered as our main  source.   Meal and Conveyance Charges   Meal C
for the public transport  (Bus/Auto/Metro’s) or any private cab for the long route (OLA/UBER).     e) Lunch Charges will be Rs 100 to Rs 250 based upon the hierarchy level. Expenses of Business  Promotion will be reimbursed as actual. All the bills of Business Promotion will be approved by  the Nati

Question: please recommend a nearby restaurant?

Note: If the questions answe

In [32]:
# Test it
question = "When is my Manager resigning?"
response = answer_query(question)
print(f"\n🧠 Assistant Response:\n{response}")

  return forward_call(*args, **kwargs)



Prompt:
 Context: resigns from the services by giving 10 days of notice.     b) On Confirmation, the employee can resign from the services by giving the notice of 30 days  (i.e. one month).     c) The employee has  to fill the Exit Interview Form and Clearance Form when resigned from  the services. Exit Interview wil
d other  details ).     f) After verbal discussion with the employee, Department Heads will accept or reject the  Resignation Letter within 5 working days from the Date of Resignation.     g) The full and final process will take around 30 days to complete. In case, if the employee had  no dues with 
Letter, the candidate has  to provide the acceptance of the same  and submit the resign ation copy to the HR Department.      The documents given in the Offer Letter will be submitted to the HR Department on the Date  of Joining mention in the letter. HR Representative ensure that the all statutory

Question: When is my Manager resigning?

Understand the intent of the user an

# Happy Learning