# Finance Bill 2025 - RAG Chatbot
**Tools:** LangChain · Groq (LLaMA 3.1) · ChromaDB · Sentence Transformers · Google Colab

---

## What is RAG?
RAG stands for **Retrieval-Augmented Generation**. It combines two ideas:

| Step | Name | What it does |
|------|------|--------------|
| 1 | **Retrieval** | Searches a document for the most relevant chunks based on your question |
| 2 | **Augmented** | Combines those chunks into a structured context for the LLM |
| 3 | **Generation** | The LLM reads the context and generates a precise, grounded answer |

> **Why RAG instead of just asking an LLM?**  
> LLMs are trained on general data and have a knowledge cutoff date.  
> RAG lets them answer questions about *specific documents* (like the Finance Bill 2025)  
> without hallucinating or guessing.

---

## Project Goal
Build a chatbot that can answer questions about the **Kenya Finance Bill 2025**  
using only the contents of the official PDF; with memory across follow-up questions.

## Step 1: Install Dependencies

In [1]:
# Install all required packages
# - langchain_community: document loaders and vector store integrations
# - langchain-groq: connects LangChain to Groq's LLaMA models
# - pypdf: reads and parses PDF files
# - chromadb: our vector database that stores embeddings
# - sentence-transformers: converts text into numerical vectors (embeddings)

!pip install numpy==1.26.4 -q
!pip install langchain==0.3.0 langchain-community==0.3.0 langchain-groq pypdf chromadb sentence-transformers -q

## Step 2: Environment Setup

In [2]:
import os

# Suppress Chroma's telemetry warnings (this doesn't affect how the RAG works),
# it just keeps our output clean
os.environ["CHROMA_TELEMETRY"] = "False"

## Step 3: Mount Google Drive & Load API Key

In [3]:
# Mount Google Drive so we can access the Finance Bill PDF stored there
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
# Load the Groq API key securely from Colab's Secrets
# We use Colab Secrets instead of hardcoding the key to keep it safe
from google.colab import userdata

api_key = userdata.get('Groq_api_key')

if not api_key:
    raise ValueError("API key not found. Add 'Groq_api_key' in Colab Secrets.")

print("API key loaded successfully.")

API key loaded successfully.


## Step 4: Import Libraries

In [5]:
# LangChain components
from langchain_community.document_loaders import PyPDFLoader         # loads PDF pages as Document objects
from langchain_text_splitters import RecursiveCharacterTextSplitter  # splits text into overlapping chunks
from langchain_community.vectorstores import Chroma                  # vector database which stores and searches embeddings
from langchain.prompts import PromptTemplate                         # structures the instructions we send to the LLM
from langchain.chains import LLMChain                                # links a prompt template to an LLM
from langchain_core.messages import HumanMessage                     # formats messages for the LLM

# Groq LLM
from langchain_groq import ChatGroq  # connects to Groq's fast LLaMA inference engine

# Embedding model
from sentence_transformers import SentenceTransformer  # converts text to vectors that capture their meaning

## Step 5: Load the Finance Bill PDF

The PDF is loaded page by page. Each page becomes a *Document* object
(think of it as a list where each item is one page of the bill)

In [7]:
# Path to the Finance Bill PDF in Google Drive
pdf_path = "/content/drive/MyDrive/The Finance Bill 2025.pdf"

# PyPDFLoader reads the PDF and converts each page into a Document object
loader = PyPDFLoader(pdf_path)
docs = loader.load()

print(f"PDF loaded successfully. Total pages: {len(docs)}")

PDF loaded successfully. Total pages: 135


## Step 6: Split the Document into Chunks

LLMs and vector databases cannot process an entire PDF at once.  
We split the text into smaller, overlapping pieces called **chunks**.

We use a **Parent-Child splitting strategy**:
- **Child chunks** (smaller) → used for *searching*. Small = more precise matches
- **Parent chunks** (larger) → used for *answering*. Large = more context for the LLM

Think of it like a book index:
- The index entry (child) points you to the right page
- But you read the full paragraph (parent) to get the complete answer

In [8]:
# Parent splitter: larger chunks that give the LLM enough context to answer well
parent_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,   # each parent chunk is up to 1000 characters
    chunk_overlap=100  # 100 character overlap so we don't cut sentences between chunks
)

# Child splitter: smaller chunks used for precise similarity search
child_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,    # smaller chunks = more focused search
    chunk_overlap=20
)

# Split the PDF into parent chunks and count them
parent_documents = parent_splitter.split_documents(docs)
print(f"Parent chunks created: {len(parent_documents)}")

Parent chunks created: 356


## Step 7: Set Up the Embedding Model

An **embedding model** converts text into a list of numbers (a vector)  
that captures the *meaning* of the text — not just the words.

For example:
- "tax on imported goods" and "levy on foreign products" would get *similar* vectors
- even though they use different words

This is what allows our retriever to find relevant chunks even when  
the user's question is worded differently from the document text.

We use `BAAI/bge-base-en` — a lightweight, high-quality open-source embedding model.

In [9]:
# Load the pre-trained embedding model
bge_model = SentenceTransformer("BAAI/bge-base-en")

# Wrap it in a class so LangChain can use it
# LangChain expects an object with embed_documents() and embed_query() methods
class BGEEmbeddings:
    def embed_documents(self, texts):
        """Embed a batch of document chunks (embeddings will be used when building the vector store)"""
        return bge_model.encode(
            texts,
            batch_size=8,
            normalize_embeddings=True  # normalizing improves search accuracy
        ).tolist()

    def embed_query(self, text):
        """Embed a single user question — used at query time"""
        return bge_model.encode(
            [text],
            normalize_embeddings=True
        ).tolist()[0]

embedding_function = BGEEmbeddings()
print("Embedding model ready.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

BertModel LOAD REPORT from: BAAI/bge-base-en
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Embedding model ready.


## Step 8: Build the Vector Store and Retriever

Now we embed all our chunks and store them in **ChromaDB** — our vector database.

The **ParentDocumentRetriever** works like this:
1. A user asks a question → it gets embedded into a vector
2. ChromaDB finds the *child chunks* most similar to that vector
3. The retriever then returns the corresponding *parent chunks* (with more context)
4. Those parent chunks are passed to the LLM as context

In [10]:
# Embed all parent chunks and store them in ChromaDB
# Chroma.from_documents() takes our chunks, converts them to vectors using our
# embedding function, and stores them in ChromaDB so we can search them later
vectorstore = Chroma.from_documents(
    documents=parent_documents,    # the chunks we created from the PDF
    embedding=embedding_function,  # the BGE model that converts text to vectors
    persist_directory="./chroma_store"  # saves to disk so we don't re-embed every run
)

# Build a simple retriever directly from the vector store
# as_retriever() turns ChromaDB into a retriever object that LangChain can use
# When a question comes in, it searches for the 5 most similar chunks to that question
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5}  # k=5 means return the top 5 most relevant chunks
)

print("Vector store and retriever ready.")

Old vector store deleted.
Vector store and retriever ready.


## Step 9: Set Up the LLM (Groq + LLaMA 3.1)

The LLM is the **brain** of our RAG — it reads the retrieved context  
and generates a natural language answer.

We use **Groq's LLaMA 3.1** because:
- It's fast (Groq's LPU hardware is purpose-built for inference)
- It's free on the Groq API's generous free tier
- LLaMA 3.1 is a strong open-source model for factual Q&A tasks

`temperature=0.2` keeps answers factual and consistent —  
closer to 0 means less creative/random, closer to 1 means more varied.

In [11]:
llm = ChatGroq(
    api_key=api_key,
    model_name="llama-3.1-8b-instant",
    temperature=0.2  # low temperature = factual, consistent answers
)

print("LLM ready.")

LLM ready.


## Step 10: Debugging — Testing the LLM in Isolation

Before building the full RAG pipeline, it's good practice to test each  
component separately.

Here we test whether the LLM itself can answer correctly  
when given a manually written context — bypassing retrieval entirely.

> **Finding:** The LLM answered correctly when given clear context.  
> This told us the problem was in *retrieval*, not in the LLM.

In [12]:
# Manually provide context to test the LLM directly
test_context = """
The Finance Bill 2025 introduces a 5% digital service tax on revenue earned
by non-resident persons providing digital services in Kenya.
It also proposes a 10% environmental levy on plastic packaging materials.
There will be a 3% luxury tax on imported vehicles valued above KES 5 million.
"""

test_question = "What new environmental taxes are introduced in the bill?"

test_prompt = f"""Use the context below to answer the question.

Context:
{test_context}

Question: {test_question}
"""

response = llm.invoke([HumanMessage(content=test_prompt)])
print("LLM Test Answer:")
print(response.content)

LLM Test Answer:
According to the context, the new environmental tax introduced in the Finance Bill 2025 is a 10% environmental levy on plastic packaging materials.


## Step 11: Build the Final RAG Function with Memory

The default LangChain `RetrievalQA` chain passes context to the LLM in a way  
that doesn't always work well for specific structured questions.

We fix this by:
1. **Custom prompt template** — we explicitly tell the LLM to use only the provided context
2. **Custom chain** — we control exactly what gets passed to the LLM
3. **Memory** — we store the conversation history so the LLM can handle follow-up questions

> **Memory analogy:** Without memory, each question is like calling a new person.  
> With memory, it's like continuing a conversation with someone who remembers  
> everything you've discussed.

In [13]:
# Initialise chat history (this list grows as the conversation progresses)
chat_history = []

# Step 1: Define a custom prompt template
# {context} = the retrieved chunks from the PDF
# {question} = the user's question
# {chat_history} = previous Q&A pairs so the LLM can handle follow-ups
custom_prompt = PromptTemplate(
    input_variables=["context", "question", "chat_history"],
    template="""You are a helpful assistant that answers questions about the Kenya Finance Bill 2025.
Use ONLY the context provided below to answer the question.
If the answer is not in the context, say "I could not find that information in the Finance Bill 2025."

Previous conversation:
{chat_history}

Context from the Finance Bill:
{context}

Question: {question}

Answer:"""
)

# Step 2: Create the LLM chain — links the prompt template to the LLM
custom_chain = LLMChain(llm=llm, prompt=custom_prompt)

# Step 3: Define the main RAG function
def rag_with_memory(question):
    """
    Ask a question about the Finance Bill 2025.
    The function retrieves relevant context from the PDF,
    builds a prompt with conversation history, and returns the LLM's answer.
    """
    # Retrieve the top 5 most relevant chunks from the vector store
    retrieved_docs = retriever.get_relevant_documents(question)

    # Combine the retrieved chunks into one context string
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])

    # Format the conversation history for the prompt
    past_conversation = "\n".join(chat_history) if chat_history else "No previous conversation."

    # Run the LLM chain with the context, question, and history
    response = custom_chain.invoke({
        "context": context,
        "question": question,
        "chat_history": past_conversation
    })

    answer = response["text"]

    # Save this Q&A to memory for future follow-up questions
    chat_history.append(f"User: {question}")
    chat_history.append(f"Assistant: {answer}")

    return answer

print("RAG function ready. You can now ask questions.")

RAG function ready. You can now ask questions.


  custom_chain = LLMChain(llm=llm, prompt=custom_prompt)


## Step 12: Ask Questions

Now we can query the Finance Bill 2025 in natural language.  
The chatbot will remember previous questions in the same session.

In [14]:
# Question 1
answer1 = rag_with_memory("What is the digital service tax introduced in the Finance Bill 2025?")
print("Q1:", answer1)

  retrieved_docs = retriever.get_relevant_documents(question)


Q1: The digital service tax is not explicitly defined in the provided context. However, it is mentioned in Section 42B of the Tax Procedures Act (Cap. 469B) that the Commissioner may appoint an agent for the purpose of collection and remittance of digital service tax to the Commissioner. 

However, the rate of tax in respect of digital service tax is mentioned in Section 12 of the Finance Bill 2025, which states that the rate of tax in respect of digital service tax shall be five percent of the gross amount.


In [15]:
# Question 2, follow-up. The RAG remembers Q1 so "it" refers to the digital service tax
answer2 = rag_with_memory("Who is required to pay it?")
print("Q2:", answer2)

Q2: According to Section 2 of the provided context, an agent is required to pay the amount specified in the notice to the Commissioner. An agent is defined as a person who owes or may subsequently owe money to the taxpayer, or who holds or may subsequently hold money for or on account of the taxpayer.


In [16]:
# Question 3
answer3 = rag_with_memory("What new environmental taxes are introduced in the bill?")
print("Q3:", answer3)

Q3: I could not find that information in the Finance Bill 2025.


In [17]:
# Question 4
answer4 = rag_with_memory("List at least 3 new taxes introduced in the Finance Bill 2025.")
print("Q4:", answer4)

Q4: Based on the provided context, I could not find a comprehensive list of new taxes introduced in the Finance Bill 2025. However, I can identify a few new taxes or amendments related to taxes:

1. Digital service tax: The rate of tax in respect of digital service tax is mentioned in Section 12 of the Finance Bill 2025, which states that the rate of tax in respect of digital service tax shall be five percent of the gross amount.

2. Electronic tax invoices: The Finance Bill 2025 proposes to amend the Tax Procedures Act to require a person who carries on business to issue an electronic tax invoice through a system established by the Commissioner.

3. Amendments to the Income Tax Act: The Finance Bill 2025 proposes various amendments to the Income Tax Act, including changes to the definitions of "gross investment receipts" and the insertion of new subsections related to tax deductions and assessments.

However, I could not find a comprehensive list of new taxes introduced in the Finance

In [21]:
print(rag_with_memory("What is the purpose of the Finance Bill 2025?"))

The purpose of the Finance Bill 2025 is to formulate proposals relating to revenue raising measures including liability to, and collection of taxes. It proposes to amend various laws relating to taxes and duties.
