# Overview: Adaptive RAG

This project explores the design and implementation of an adaptive Retrieval-Augmented Generation (RAG) system enhanced with self-reflection capabilities. The goal is to reduce hallucinations in large language model (LLM) outputs by combining contextual retrieval with iterative reasoning.

The system is built using LangChain, OpenAI language models, and Tavily Search API for web retrieval, with FAISS as the underlying vector store. It simulates an AI agent that can:

- Retrieve relevant information from both preloaded documents and live web search

- Generate answers using LLMs while referencing retrieved context

- Evaluate its own response for relevance and confidence

- Adaptively re-query or revise its answer if reflection indicates hallucination

This prototype serves as a foundational experiment in building hallucination-aware agents, demonstrating how structured reasoning and adaptive behavior can be layered on top of RAG to make AI outputs more trustworthy and factually grounded.

In [79]:
# API keys
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPEN_AI')
os.environ['TAVILY_API_KEY'] = userdata.get('TAVILY_API_KEY')

In [None]:
## Intalling libraries
!pip install -q langchain_community langchain_openai faiss-cpu langgraph

### Preparing the Vector Database

In this step, we load documents from selected Medium articles and convert them into vector embeddings for efficient retrieval. The process involves:

1. **Loading documents** from the given URLs using `WebBaseLoader`.
2. **Generating embeddings** for the text using OpenAI's embedding model.
3. **Storing embeddings** in a FAISS vector store to enable fast similarity search during retrieval.

This forms the knowledge base that our Adaptive RAG agent will later use to ground its responses in reliable, retrievable context.


In [None]:
## Working on Retriever and Vector Store
## Import Libraries
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

In [None]:
# Set Embeddings
embeddings = OpenAIEmbeddings()

In [80]:
# Documents to Index
urls = [
    "https://medium.com/codex/what-are-ai-agents-your-step-by-step-guide-to-build-your-own-df54193e2de3",  ## AI Agents
    "https://medium.com/@yaduvanshineelam09/introduction-to-fastapi-123c0b2778a5",  ## Fast API
    "https://medium.com/@tuhinsharma121/understanding-adaptive-rag-smarter-faster-and-more-efficient-retrieval-augmented-generation-38490b6acf88", ## Adaptive RAG
    'https://medium.com/@kelseyywang/a-comprehensive-guide-to-llm-temperature-%EF%B8%8F-363a40bbc91f', ## LLM Temperature,
    'https://medium.com/data-science-at-microsoft/how-large-language-models-work-91c362f5b78f' ## LLM fundamentals

]

In [81]:
# Load
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

In [82]:
docs_list[0].page_content

'What Are AI Agents? A Short Intro And A Step-by-Step Guide to Build Your Own. | by Maximilian Vogel | CodeX | MediumSitemapSign upSign inMedium LogoWriteSign upSign inCodeX·Everything connected with Tech & Code. Follow to join our 1M+ monthly readersWhat Are AI Agents? A Short Intro And A Step-by-Step Guide to Build Your Own.Maximilian VogelFollow8 min read·Dec 28, 2024--70ListenShareThe next big thing? Gartner believes AI agents are the future. OpenAI, Nvidia and Microsoft are betting on it — as are companies such as Salesforce, which have so far been rather inconspicuous in the field of AI.And there’s no doubt that the thing is really taking off right now.„AI Agents“ on Google Trends (trends.google.com)Wow.So, what is really behind the trend? The key to understanding agents is agency.Unlike traditional generative AI systems, agents don’t just respond to user input. Instead, they can process a complex problem such as an insurance claim from start to finish. This includes understandin

In [83]:
# Split Text
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size = 500, chunk_overlap = 50)

docs_split = text_splitter.split_documents(docs_list)

In [84]:
# Add to vectorstore
vectorstore = FAISS.from_documents(
    documents=docs_split,
    embedding=embeddings
)

In [85]:
## Create Retriever
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 2})

### Query Analysis & Routing

To improve the reliability and efficiency of our system, we introduce a lightweight **query classification module** that decides how each user query should be handled.

- If the query is **factual or requires recent information**, it gets routed to a **web search retriever** using the Tavily API.
- If the query is more **general, conceptual, or covered by our indexed documents**, it is routed to the **local FAISS vector store**.

This step acts like a **router** that analyzes the query’s intent and dynamically selects the most appropriate source of knowledge.


In [87]:
## Router

from typing import Literal
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field   # Validation

# Data Model - Pydantic class for data validation
class RouteQuery(BaseModel):
  """
  Route the user query to the most relevant datasource
  """
  datasource: Literal['vectorstore', 'web_search'] = Field(
      ...,
      description = "Given a user question, decide whether to route it to web search or vectorstore"
  )

In [88]:
## LLM with function call
llm = ChatOpenAI(
    model = 'gpt-4o-mini', temperature = 0
)
structured_llm_router = llm.with_structured_output(RouteQuery)    ## LLM will give the output whether websearhc or vectorstore

In [89]:
## Prompt
system = """
You are an expert at routing a user question to a vectorstore or web search.
The vectorstore contains documents related to AI agents, Adaptive RAG, LLM fundamentals, LLM Temperature and FastAPI.
Use the vectorstore for questions on these topics. Else, use web-search
"""

route_prompt = ChatPromptTemplate.from_messages(
    [('system', system),
     ('human', "{question}")]
)

question_router = route_prompt | structured_llm_router

In [93]:
## Test question
print(
    question_router.invoke(
        {"question": "how temperature influences randomness of sampling in llms"}
    )
)

datasource='vectorstore'


### Retrieval Grader

This module evaluates the relevance of retrieved documents to the user query and filters out low-quality or off-topic results before answer generation.


In [15]:
# Data Model
class GradeDocuments(BaseModel):
  """Binary score for relevance check on retrieved documents."""

  binary_score: str = Field(
      description = "Documents are relevant to the question, 'yes' or 'no'"
  )

In [68]:
## LLM With Function Call
llm = ChatOpenAI(model = 'gpt-4o-mini', temperature = 0)
structured_llm_grader = llm.with_structured_output(GradeDocuments)

In [69]:
# Prompt
system = """
You are a grader assessing relevance of a retrieved document to a user question.
If the document contains keywords or semantic meaning related to the user question, grade it as relevant.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question
"""

grade_prompt = ChatPromptTemplate.from_messages(
    [('system', system),
     ('human', "Retrieved document: \n\n{document} \n\n User question: {question}")]
)

retrieval_grader = grade_prompt | structured_llm_grader

In [70]:
question = "Adaptive RAG has query classification and self correction mechanism"
docs = retriever.invoke(question)
doc_txt = docs[0].page_content
print(retrieval_grader.invoke({'question': question, 'document': doc_txt}))

binary_score='yes'


### Generate Node

This node uses the language model to generate an answer based on the top-ranked retrieved documents, ensuring the response is grounded in relevant context.


In [19]:
## Generate
from langchain import hub ## We will pull predefined templates from here
from langchain_core.output_parsers import StrOutputParser

#Prompt
prompt = hub.pull('rlm/rag-prompt')

# LLM
llm = ChatOpenAI(model_name = 'gpt-4o-mini', temperature = 0)

# Post-processing - combining all retrieved text to be sent to RAG
def format_docs(docs):
  return "\n\n".join(doc.page_content for doc in docs)


# Chain
rag_chain = prompt | llm | StrOutputParser()



In [71]:
# Run
generation = rag_chain.invoke({'context': format_docs(docs), 'question': question})
print(generation)

Yes, Adaptive-RAG includes a query classification mechanism that determines the complexity of a query and selects an appropriate retrieval strategy. It also features a self-correction mechanism that allows it to refine its responses based on the retrieved information. This combination enhances the system's efficiency and accuracy in answering queries.


In [72]:
# Let's see the prompt we used for RAG
print(prompt.messages[0].prompt.template)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:


### Hallucination Check Node

This node reviews the generated answer to identify potential hallucinations by checking factual consistency and alignment with the retrieved context.


In [22]:
# Data Model
class GradeHallucinations(BaseModel):
  """
  Binary score for hallucination present in generation answer.
  """

  binary_score: str = Field(
      description="Answer is grounded in facts, 'yes' or 'no'"
  )


In [73]:
# LLM with function call
llm = ChatOpenAI(model = 'gpt-4o-mini', temperature = 0)
structured_llm_grader = llm.with_structured_output(GradeHallucinations)

In [74]:
# Prompt
system = """
You are a grader assessing whether an LLM generation is grounded in or supported by a set of retrieved facts.
Give a binary score 'yes' or 'no'. 'Yes' means the answer is grounded in or supported by set of facts
"""

hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        ('system', system),
        ('human', "Set of facts: {documents} LLM generation: {generation}")
    ]
)

hallucination_grader = hallucination_prompt | structured_llm_grader

In [75]:
hallucination_grader.invoke({'documents': format_docs(docs), 'generation': generation})


GradeHallucinations(binary_score='yes')

### Answer Grader

This component assesses the final answer’s quality, fluency, and factual accuracy, helping determine if the response is ready to return or needs refinement.


In [26]:
# Data Model
class GradeAnswer(BaseModel):
  """
  Binary score to assess answer addresses qeustion.
  """

  binary_score: str = Field(
      description = "Answer addresses the question, 'yes' or 'no'"
  )

In [27]:
# LLM with function call
llm = ChatOpenAI(model = 'gpt-4o-mini', temperature = 0)
structured_llm_grader = llm.with_structured_output(GradeAnswer)

In [76]:
# Prompt
system = """
You are a grader assessing whether an answer addresses and/or resolves a question.
Give a binary score 'yes' or 'no'. 'Yes' means that the answer resolves the question.
"""

answer_prompt = ChatPromptTemplate.from_messages(
    [
        ('system',system),
        ('human', 'User Question: {question}, LLM Answer: {generation}')
    ]
)

answer_grader = answer_prompt | structured_llm_grader

In [77]:
answer_grader.invoke({'question': question, 'generation': generation})

GradeHallucinations(binary_score='yes')

### Question Re-Writer

If the initial response is found lacking or hallucinated, this node rephrases the original question to improve retrieval and regenerate a more accurate answer.


In [52]:
from langchain_core.output_parsers import StrOutputParser

In [55]:
parser =  StrOutputParser()

In [30]:
# LLM
llm = ChatOpenAI(model = 'gpt-4o-mini', temperature = 0)

In [57]:
# Prompt
system = """
You are a question re-writer that converts an input question to a better version that is optimized for vectorstore retrieval.
Look at the input and try to reason about the underlying semantic intent and/or meaning.
"""

re_write_prompt = ChatPromptTemplate.from_messages(
    [
        ('system', system),
        ('human', "Here is the initial question: {question}. Formulate an improved question")
    ]
)

question_rewriter = re_write_prompt | llm | parser

In [58]:
question_rewriter.invoke({'question': question})

'What are the query classification and self-correction mechanisms in Adaptive RAG?'

## Integrating the WEB SEARCH tool

In [34]:
from langchain_community.tools.tavily_search import TavilySearchResults

web_search_tool = TavilySearchResults(k=3)

  web_search_tool = TavilySearchResults(k=3)


### Graph State

This section defines the structure and flow of the LangGraph agent by connecting all the nodes—retrieval, generation, grading, and re-writing—into a coherent execution graph.


In [35]:
from typing import List
from typing_extensions import TypedDict

class GraphState(TypedDict):
  """
  Represents the state of our graph.

  Attributes:
    question: quesion
    generation: LLM generation
    documents: List of documents
  """

  question: str
  generation: str
  documents: List[str]

### Combining All Functionalities

In this step, we bring together all the components—query router, retrievers, generator, hallucination checker, graders, and question re-writer—into a unified LangGraph workflow. This creates an adaptive, agentic RAG system capable of self-evaluating and improving its responses in real time.


In [36]:
from langchain.schema import Document


def retrieve(state):
    """
    Retrieve documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    question = state["question"]

    # Retrieval
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}


def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]

    # RAG generation
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}


def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with only filtered relevant documents
    """

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        grade = score.binary_score
        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            continue
    return {"documents": filtered_docs, "question": question}


def transform_query(state):
    """
    Transform the query to produce a better question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates question key with a re-phrased question
    """

    print("---TRANSFORM QUERY---")
    question = state["question"]
    documents = state["documents"]

    # Re-write question
    better_question = question_rewriter.invoke({"question": question})
    return {"documents": documents, "question": better_question}


def web_search(state):
    """
    Web search based on the re-phrased question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with appended web results
    """

    print("---WEB SEARCH---")
    question = state["question"]

    # Web search
    docs = web_search_tool.invoke({"query": question})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)

    return {"documents": web_results, "question": question}


### Edges ###


def route_question(state):
    """
    Route question to web search or RAG.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("---ROUTE QUESTION---")
    question = state["question"]
    source = question_router.invoke({"question": question})
    if source.datasource == "web_search":
        print("---ROUTE QUESTION TO WEB SEARCH---")
        return "web_search"
    elif source.datasource == "vectorstore":
        print("---ROUTE QUESTION TO RAG---")
        return "vectorstore"


def decide_to_generate(state):
    """
    Determines whether to generate an answer, or re-generate a question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Binary decision for next node to call
    """

    print("---ASSESS GRADED DOCUMENTS---")
    state["question"]
    filtered_documents = state["documents"]

    if not filtered_documents:
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        print(
            "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---"
        )
        return "transform_query"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"


def grade_generation_v_documents_and_question(state):
    """
    Determines whether the generation is grounded in the document and answers question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Decision for next node to call
    """

    print("---CHECK HALLUCINATIONS---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    score = hallucination_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score.binary_score

    # Check hallucination
    if grade == "yes":
        print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check question-answering
        print("---GRADE GENERATION vs QUESTION---")
        score = answer_grader.invoke({"question": question, "generation": generation})
        grade = score.binary_score
        if grade == "yes":
            print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        else:
            print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
    else:
        print("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"

### Create State Graph

This step builds the full LangGraph state machine by defining nodes and edges based on our adaptive RAG logic. The graph governs how queries flow through retrieval, generation, self-reflection, and correction steps.


In [37]:
from langgraph.graph import END, START, StateGraph

In [38]:
workflow = StateGraph(GraphState)

In [39]:
# Define the nodes
workflow.add_node("web_search", web_search)  # web search
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generate
workflow.add_node("transform_query", transform_query)  # transform_query

<langgraph.graph.state.StateGraph at 0x7a097b536710>

In [40]:
# Build Graph
workflow.add_conditional_edges(
    START,
    route_question,
    {
        'web_search':'web_search',
        "vectorstore":"retrieve"
    }
)

workflow.add_edge('web_search', 'generate')
workflow.add_edge('retrieve', 'grade_documents')
workflow.add_conditional_edges(
    'grade_documents',
    decide_to_generate,
    {
        'transform_query':'transform_query',
        'generate':'generate'
    }
)

workflow.add_edge('transform_query','retrieve')
workflow.add_conditional_edges(
    'generate',
    grade_generation_v_documents_and_question,
    {
        'not supported': 'generate',
        'useful': END,
        'not useful': 'transform_query'
    }
)



<langgraph.graph.state.StateGraph at 0x7a097b536710>

In [41]:
# Compile
app = workflow.compile()

In [95]:
result = app.invoke({'question': "Does Adaptive RAG reduce hallucinations?"})

---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---
---TRANSFORM QUERY---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---
---TRANSFORM QUERY---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---
---TRANSFORM QUERY---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
---G

In [98]:
result['generation']

'Adaptive RAG minimizes hallucinations by intelligently determining the retrieval strategy based on the complexity of a query. It uses a classifier to assess whether a query is simple, moderate, or complex, allowing it to choose the appropriate level of document retrieval. This targeted approach ensures that the model relies on external information when necessary, reducing the likelihood of generating inaccurate responses.'

## Benchmarking with a Traditional RAG

In [94]:
rag_vs_adaptive_rag_questions = [
    # RAG-based questions
    "What are the three core components of an AI agent according to the Medium article by codex?",
    "How does FastAPI improve the performance of API servers compared to Flask?",
    "What role does the planner node play in an adaptive RAG system?",
    "Explain the concept of temperature in LLMs and its effect on generation diversity.",
    "What makes Adaptive RAG more efficient than Traditional RAG according to Tuhin Sharma’s article?",
    "What is the purpose of tools like retriever and re-ranker in an adaptive RAG setup?",
    "How do attention mechanisms in LLMs contribute to contextual understanding?",
    "In the context of AI agents, what does it mean for an agent to be \"reactive\"?",
    "What does Kelsey Wang say about the trade-off between temperature and coherence in generation?",
    "According to the Microsoft article, how do transformers scale with data and model size?",

    # Web-search-based questions
    "What’s the latest open-source implementation of Adaptive RAG released in 2024?",
    "Who are the major contributors to LangGraph's development in 2025?",
    "What benchmark datasets are used to evaluate hallucination rates in RAG pipelines?",
    "What is the most recent evaluation metric added to the Hugging Face leaderboard for factual grounding?",
    "What are the latest best practices for setting LLM temperature when used in agentic systems?"
]


In [99]:
results_adaptive_rag = []
for query in rag_vs_adaptive_rag_questions:
  result = app.invoke({'question': query})
  results_adaptive_rag.append(result['generation'])

---ROUTE QUESTION---
---ROUTE QUESTION TO WEB SEARCH---
---WEB SEARCH---
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
---TRANSFORM QUERY---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---
---TRANSFORM QUERY---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---
---TRANSFORM QUERY---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTI

GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition. You can increase the limit by setting the `recursion_limit` config key.
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/GRAPH_RECURSION_LIMIT