## Testing implementation of a simple RAG system

This notebook is a scratch pad of experimentation of a simple RAG system along with various advanced RAG techniques, trying different models, chunking strategies and embeddings.

In [3]:
from langchain_community.document_loaders import PyPDFLoader
from pathlib import Path
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings, HuggingFacePipeline
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from langchain_core.runnables import RunnableLambda
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.chains import LLMChain
from langchain.schema import BaseRetriever
from typing import List, Any
import os
from dotenv import load_dotenv


## Run 

In [2]:
# Load PDFs
pdf_folder = Path("pdfs/")

all_docs = []
for pdf_file in pdf_folder.glob("*.pdf"):
    loader = PyPDFLoader(str(pdf_file))
    all_docs.extend(loader.load())

Ignoring wrong pointing object 37 0 (offset 0)
Ignoring wrong pointing object 42 0 (offset 0)
Ignoring wrong pointing object 91 0 (offset 0)
Ignoring wrong pointing object 9 0 (offset 0)
Ignoring wrong pointing object 11 0 (offset 0)
Ignoring wrong pointing object 9 0 (offset 0)
Ignoring wrong pointing object 11 0 (offset 0)
Ignoring wrong pointing object 37 0 (offset 0)


In [3]:
# Chunk the data using recursive chunking
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50
)

chunks = splitter.split_documents(all_docs)

In [4]:
# Load embeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [5]:
# Store the embeddings locally in ChromaDB

vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

vectordb.persist()

  return forward_call(*args, **kwargs)
  vectordb.persist()


In [9]:
# Load model and tokenizer
model_name = "google/flan-t5-small"  # Or try flan-t5-base if M2 can handle it

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Create text2text pipeline (note: temperature may be ignored)
hf_pipeline = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    device=-1
)

# Wrap in LangChain HuggingFace LLM (new path!)
hf_llm = HuggingFacePipeline(pipeline=hf_pipeline)

Device set to use cpu


In [10]:
retriever = vectordb.as_retriever()

qa_chain = RetrievalQA.from_chain_type(
    llm=hf_llm,
    retriever=retriever,
    chain_type="stuff"
)

In [11]:
response = hf_llm.invoke("What is ornithology?")
print(response)

ornithology ornithology


In [12]:
query = "What is ornithology?"

# ✅ RAG response (using LangChain's new `.invoke()` method)
response_rag = qa_chain.invoke(query)

# ✅ LLM-only response
ragless_prompt = f"Answer the following question:\n{query}"
response_llm_only = hf_llm.invoke(ragless_prompt)

# 📊 Display the results
print("🔹 LLM-Only Response:\n", response_llm_only)
print("\n" + "="*80 + "\n")
print("🔸 RAG-Augmented Response:\n", response_rag)

  return forward_call(*args, **kwargs)
Token indices sequence length is longer than the specified maximum sequence length for this model (541 > 512). Running this sequence through the model will result in indexing errors


🔹 LLM-Only Response:
 Ornithology


🔸 RAG-Augmented Response:
 {'query': 'What is ornithology?', 'result': 'Ornithology is a branch of science devoted to the study of birds. Ornithology encompasses all types of research involving birds, such as habitat studies and migration patterns. Ornithology encompasses all types of research involving birds, such as habitat studies and migration patterns. Ornithology encompasses all types of research involving birds, such as habitat studies and migration patterns. Ornithology encompasses all types of research involving birds, such as habitat studies and migration patterns. Ornithology encompasses all types of research involving birds, such as habitat studies and migration patterns. Ornithology encompasses all types of research involving birds, such as habitat studies and migration patterns. Ornithology encompasses all types of research involving birds, such as habitat studies and migration patterns. Ornithology encompasses all types of research inv

### Test with Open AI LLM

In [6]:
load_dotenv()  # take environment variables from .env.
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set. Please set it in your .env file.")

In [16]:
# A prompt template with RAG context
prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template='''
    Given the context below, answer the question to the best of your ability.
    If the context doesn't contain the answer, say "I don't know".

    Context: {context}

    Question: {question}

    Answer:
    '''
)


In [23]:
# ✅ LLM Setup
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [24]:
# ✅ Retrieval setup (RAG)
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt_template}
)

In [None]:

# ✅ Query to compare
query = "What are the main functions of bird feathers?"

# 🔸 1. LLM-Only (No RAG)
llm_only_prompt = f"Answer the following question clearly and concisely:\n\n{query}"
llm_only_response = llm.invoke(llm_only_prompt)

# 🔹 2. RAG-Based (uses retrieved context)
rag_response = rag_chain({"query": query})

# 🖥️ Print Results
print("🔸 LLM-Only Response:\n", llm_only_response.content)
print("\n" + "="*80 + "\n")
print("🔹 RAG-Augmented Response:\n", rag_response['result'])
print("\nSources:")
for doc in rag_response["source_documents"]:
    print(f"- {doc.metadata.get('source', 'Unknown source')}, page {doc.metadata.get('page', 'N/A')}")


  return forward_call(*args, **kwargs)


🔸 LLM-Only Response:
 The main functions of bird feathers are insulation, flight, protection, and display.


🔹 RAG-Augmented Response:
 The main functions of bird feathers include insulation, flight, waterproofing, communication, and camouflage.

Sources:
- pdfs/Chapter_2_TopographyFeathersandFlight.pdf, page 9
- pdfs/Chapter_2_TopographyFeathersandFlight.pdf, page 9
- pdfs/Chapter_2_TopographyFeathersandFlight.pdf, page 9


In [18]:
# ✅ Query to compare
query = "Why are songbirds so diverse? Be brief."

# 🔸 1. LLM-Only (No RAG)
llm_only_prompt = f"Answer the following question clearly and concisely:\n\n{query}"
llm_only_response = llm.invoke(llm_only_prompt)

# 🔹 2. RAG-Based (uses retrieved context)
rag_response = rag_chain({"query": query})

# 🖥️ Print Results
print("🔸 LLM-Only Response:\n", llm_only_response.content)
print("\n" + "="*80 + "\n")
print("🔹 RAG-Augmented Response:\n", rag_response['result'])
print("\nSources:")
for doc in rag_response["source_documents"]:
    print(f"- {doc.metadata.get('source', 'Unknown source')}, page {doc.metadata.get('page', 'N/A')}")


  return forward_call(*args, **kwargs)


🔸 LLM-Only Response:
 Songbirds are diverse due to a combination of factors such as their ability to adapt to various habitats, their diverse diets, and their unique vocalizations for communication and mating purposes.


🔹 RAG-Augmented Response:
 Songbirds are diverse due to their ability to learn songs and other mate-attracting or territory-protecting behaviors, which are potential key innovations that drive diversity in this group of birds.

Sources:
- pdfs/Chapter_1_FundamentalsofOrnithology.pdf, page 13
- pdfs/Chapter_1_FundamentalsofOrnithology.pdf, page 13
- pdfs/Chapter_1_FundamentalsofOrnithology.pdf, page 13


In [20]:
# ✅ Query to compare
query = "How are bird bones similar and different to that of reptiles? Answer in a tabular form"

# 🔸 1. LLM-Only (No RAG)
llm_only_prompt = f"Answer the following question clearly and concisely:\n\n{query}"
llm_only_response = llm.invoke(llm_only_prompt)

# 🔹 2. RAG-Based (uses retrieved context)
rag_response = rag_chain({"query": query})

# 🖥️ Print Results
print("🔸 LLM-Only Response:\n", llm_only_response.content)
print("\n" + "="*80 + "\n")
print("🔹 RAG-Augmented Response:\n", rag_response['result'])
for doc in rag_response["source_documents"]:
    print(f"- {doc.metadata.get('source', 'Unknown source')}, page {doc.metadata.get('page', 'N/A')}")


  return forward_call(*args, **kwargs)


🔸 LLM-Only Response:
 | Aspect          | Bird Bones                  | Reptile Bones               |
|-----------------|-----------------------------|-----------------------------|
| Structure       | Hollow and lightweight      | Solid and heavier           |
| Composition     | Mostly made of calcium      | Mostly made of calcium      |
| Mobility        | More flexible and mobile    | Less flexible and mobile    |
| Functionality  | Adapted for flight          | Adapted for crawling and swimming |
| Growth          | Continuously grow throughout life | Stop growing once maturity is reached |


🔹 RAG-Augmented Response:
 | Aspect          | Birds                  | Reptiles               |
|-----------------|------------------------|------------------------|
| Skull           | Single ball and socket | Single ball and socket |
| Ear             | One ear bone (stapes)  | One ear bone (stapes)  |
| Lower jaw bones | Five or six            | Five or six            |
| Ankle           

### Advanced RAG Fusion techniques

In [31]:
# 1. Create a query transformation prompt
query_transform_prompt = PromptTemplate(
    input_variables=["original_query"],
    template="""You are an expert assistant that transforms user questions into clear, unambiguous search queries to help retrieve the most relevant information from a knowledge base.

    Your job is to:
    - Clarify vague or incomplete questions.
    - Remove irrelevant words or noise.
    - Preserve the user’s intent and specificity.
    - Output a standalone, well-formed query optimized for semantic retrieval.

    Follow these guidelines:
    - Be concise and specific.
    - Remove filler phrases (e.g., "can you tell me", "I want to know about...").
    - Expand acronyms or shorthand when appropriate.
    - Include relevant context if mentioned.
    - Do not answer the question—only rewrite it. Original query: {original_query}
    Transformed query:"""
)

# 2. Create a query transformation chain
query_transformer = LLMChain(
    llm=llm,
    prompt=query_transform_prompt
)

# 3. Create your RAG chain with the transforming retriever
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={
        "prompt": prompt_template,
        "document_prompt": PromptTemplate(
            input_variables=["page_content"],
            template="{page_content}"
        )
    }
)

# 4. Create a wrapper function that handles the query transformation
def get_rag_response(query):
    # Transform the query
    transformed_query = query_transformer.run(original_query=query).strip('"\' \n')
    print(f"Original query: {query}")
    print(f"Transformed query: {transformed_query}")
    
    # Get RAG response using the transformed query for retrieval
    # but the original query for the final answer generation
    result = rag_chain({"query": transformed_query, "question": query})
    return result

# Example usage:
# response = get_rag_response("What do birds use to fly?")
# print(response["result"])

In [32]:
response = get_rag_response("What do birds use to fly?")
print(response["result"])

Original query: What do birds use to fly?
Transformed query: How do birds fly?


  return forward_call(*args, **kwargs)


Birds fly by flapping their wings and guiding with their tails.


In [34]:
response = get_rag_response("How are bird bones similar and different to that of reptiles?")
print(response["result"])

Original query: How are bird bones similar and different to that of reptiles?
Transformed query: Differences and similarities between bird and reptile bones.


  return forward_call(*args, **kwargs)


Similarities between bird and reptile bones include having a single ball and socket system connecting their skulls to the first vertebra, having a primary middle ear with only one ear bone (the stapes), and having five or six bones on each side of their lower jaws.

Differences between bird and reptile bones include birds having ankles located in the tarsal bones rather than the long lower leg bones (tibia and tarsi) like mammals have.


In [35]:
query = "Why are songbirds also called passerines so diverse like there are literally so many of them, and they are so colourful and have such interesting vocal abilities. How did they come to be so diverse?"
response = get_rag_response(query)
print(response["result"])


Original query: Why are songbirds also called passerines so diverse like there are literally so many of them, and they are so colourful and have such interesting vocal abilities. How did they come to be so diverse?
Transformed query: What factors contribute to the diversity of passerine songbirds and how did this diversity evolve?


  return forward_call(*args, **kwargs)


The factors that contribute to the diversity of passerine songbirds include their small size, sperm structure, vocal abilities, perching foot, and high metabolism. The evolution of this diversity is attributed to the rich 65-million-year history of evolution, expansions, and reductions of new taxa.
