# Agentic RAG: A Minimal Implementation

This notebook demonstrates an **Agentic RAG** system. Unlike traditional RAG (which retrieves -> generates), an Agentic RAG can:
1.  **Decide** whether to retrieve information or not.
2.  **Refine** its search queries based on initial findings.
3.  **Synthesize** information from multiple steps.

### Reflection vs Reflexion vs Agentic RAG
- **Reflection agent**: a general pattern where an agent critiques its own draft answer (often with a second pass of the same model) before responding. Use this when accuracy matters more than latency and you want self-review without extra tools. Great for short-form Q&A, summaries, or emails.
- **Reflexion agent** (per the Reflexion paper): extends reflection by storing critiques in long-term memory so future attempts improve. Use it for tasks that need multiple tries with learning across attempts (coding contests, math puzzles, iterative planning).
- **Agentic RAG**: blends tool use + retrieval. The agent plans which tool calls to make (e.g., vector search, web search) before composing the answer. Use it when fresh or grounded knowledge is required or when the corpus is too large to preload into context.

**Which one to choose?** Agentic RAG is more flexible but also heavier: it needs tool wiring, retrieval latency, and costs more tokens. Prefer it when factual grounding matters. Stick to Reflection when you just need a quick double-check, and reach for Reflexion when the agent must improve across repeated attempts. For many apps, combining techniques (e.g., Reflection + Agentic RAG) gives the best trade-off.

We will use `LangChain` and `LangGraph` for this demonstration.


In [98]:
# Install necessary packages
%pip install -q -U langchain langchain-openai langchain-community chromadb langgraph python-dotenv


/Users/jay/work/task/ai/.venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.


In [99]:
import os
from pathlib import Path

from dotenv import load_dotenv

# Load .env if present and expose helper for mandatory keys
load_dotenv()


def ensure_env_var(key: str) -> str:
    value = os.environ.get(key)
    if value:
        return value
    raise EnvironmentError(f"Missing required environment variable: {key}")


os.environ["OPENAI_API_KEY"] = ensure_env_var("OPENAI_API_KEY")


## 1. Setup Vector Store (The "Knowledge Base")
We'll create a simple in-memory vector store with some dummy data about a fictional company "Nostra".

In [100]:
from dataclasses import dataclass, field
from typing import Iterable, Mapping, Sequence

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document


@dataclass
class CorpusConfig:
    collection_name: str = "nostra_docs"
    default_metadata: Mapping[str, str] = field(default_factory=dict)


def _normalize_docs(raw_docs: Iterable[Mapping[str, str]], default_meta: Mapping[str, str]) -> Sequence[Document]:
    docs: list[Document] = []
    for item in raw_docs:
        content = item.get("page_content") or item.get("content")
        if not content:
            continue
        metadata = {**default_meta, **item.get("metadata", {})}
        docs.append(Document(page_content=content, metadata=metadata))
    if not docs:
        raise ValueError("At least one document with page_content is required")
    return docs


def build_retriever(raw_docs: Iterable[Mapping[str, str]], config: CorpusConfig | None = None):
    config = config or CorpusConfig()
    docs = _normalize_docs(raw_docs, config.default_metadata)
    embeddings = OpenAIEmbeddings()
    vectorstore = Chroma.from_documents(docs, embeddings, collection_name=config.collection_name)
    return vectorstore.as_retriever()


DEFAULT_DOCS = [
    {"page_content": "Nostra is a prediction market platform running on Arbitrum.", "metadata": {"source": "overview"}},
    {"page_content": "Nostra allows users to trade on future events like Sports and Politics.", "metadata": {"source": "features"}},
    {"page_content": "The native token of Nostra is NST, used for governance and fee rebates.", "metadata": {"source": "tokenomics"}},
    {"page_content": "Nostra uses a CTF (Conditional Token Framework) for its market resolution.", "metadata": {"source": "tech"}},
]

retriever = build_retriever(DEFAULT_DOCS)


## 2. Define Tools
The agent needs a tool to access the vector store.

In [101]:
from langchain_core.tools import Tool

def make_search_tool(retriever, *, name: str = "search_docs", description: str | None = None):
    desc = description or "Retrieves grounded knowledge snippets for the agent."

    def _search(query: str):
        return retriever.invoke(query)

    return Tool(name=name, description=desc, func=_search)


tools = [make_search_tool(retriever, description="Searches the configured Nostra corpus.")]


## 3. Build the Agent (Using LangGraph)
We will use a pre-built ReAct agent structure from LangGraph.

In [102]:
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI


def build_agent(*, model: str = "gpt-4o-mini", temperature: float = 0.0, tools=None):
    llm = ChatOpenAI(model=model, temperature=temperature)
    return create_agent(llm, tools or [])


agent_executor = build_agent(tools=tools)


## 4. Run the Agent
Let's ask a question that requires retrieval.

In [103]:
def run_agent_query(agent, query: str):
    print(f"User: {query}\n")
    for chunk in agent.stream({"messages": [("human", query)]}):
        print(chunk)
        print("----")


run_agent_query(agent_executor, "What framework does Nostra use for market resolution?")


User: What framework does Nostra use for market resolution?

{'model': {'messages': [AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 61, 'total_tokens': 81, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CgSrOeyIh4fwvKfF5Ksn3dyKHbo8K', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--70058f98-b8d6-48d1-b528-28f1ba6ee003-0', tool_calls=[{'name': 'search_docs', 'args': {'__arg1': 'Nostra market resolution framework'}, 'id': 'call_O8XrZs61FOjGRyeSOQV8v6Z1', 'type': 'tool_call'}], usage_metadata={'input_tokens': 61, 'output_tokens': 20, 'total_tokens': 81, 'input_token_details': {'audio'

## 5. Advanced: Inspecting the Trace
The output above shows the agent's reasoning steps:
1.  It identifies it needs to search.
2.  It calls `search_nostra_docs`.
3.  It receives the context.
4.  It synthesizes the final answer.

In [104]:
# Try another query to demonstrate reuse
run_agent_query(agent_executor, "What is the purpose of the NST token?")


User: What is the purpose of the NST token?

{'model': {'messages': [AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 60, 'total_tokens': 78, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_50906f2aac', 'id': 'chatcmpl-CgSrRO7okmacuWYkPrkKLbMI7INlv', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--5c205be3-c834-41db-b548-f21199685503-0', tool_calls=[{'name': 'search_docs', 'args': {'__arg1': 'NST token purpose'}, 'id': 'call_XM7XC8eyw6jS8qVu2vEkuYrq', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 18, 'total_tokens': 78, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_to

## 6. Bonus Example: Multi-hop question
Ask something broader so the agent may retrieve multiple snippets before answering.


In [105]:
# Broader query that requires combining multiple documents
run_agent_query(agent_executor, "Give me a quick overview of Nostra and mention its key features.")


User: Give me a quick overview of Nostra and mention its key features.

{'model': {'messages': [AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 65, 'total_tokens': 86, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CgSrUG4jbISM7wCWXXSs2xTICoqiP', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--43272bd6-a5bd-4806-be38-0c61c8d44a9f-0', tool_calls=[{'name': 'search_docs', 'args': {'__arg1': 'Nostra overview and key features'}, 'id': 'call_K3wmA2RghAu610vGg2Owq5tt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 65, 'output_tokens': 21, 'total_tokens': 86, 'input_token_details':

## 7. Reflection + Agentic RAG
Use a lightweight reflection pass after the agent responds to catch obvious issues without rerunning retrieval. First the agent executes its tool-augmented RAG plan, then a reviewer pass critiques or amends the answer before you surface it to users.


In [106]:
from langchain_core.messages import HumanMessage, SystemMessage

def collect_agent_answer(agent, query: str) -> str:
    final_answer = None
    for event in agent.stream({"messages": [("human", query)]}):
        model_event = event.get("model")
        if model_event:
            ai_message = model_event["messages"][-1]
            final_answer = getattr(ai_message, "content", None)
    if not final_answer:
        raise RuntimeError("Agent did not return an answer; check the logs above.")
    return final_answer


def run_reflective_agent_query(agent, query: str, *, reviewer_model: str = "gpt-4o-mini"):
    initial_answer = collect_agent_answer(agent, query)
    reviewer = ChatOpenAI(model=reviewer_model, temperature=0.1)
    critique = reviewer.invoke([
        SystemMessage(content="You double-check agent answers for factual accuracy."),
        HumanMessage(content=f"Question: {query}\nAnswer: {initial_answer}\nProvide a short verification or correction."),
    ])
    print("Initial answer:")
    print(initial_answer)
    print("\nReflection:")
    print(critique.content)


run_reflective_agent_query(agent_executor, "Summarize Nostra and mention its token utility.")


Initial answer:
Nostra is a prediction market platform that operates on the Arbitrum blockchain. However, there is no specific information available regarding the utility of its token in the provided documents. If you have any other questions or need further details, feel free to ask!

Reflection:
The answer contains some inaccuracies. Nostra is indeed a prediction market platform, but it operates on the Bitcoin network using the Nostr protocol, not specifically on the Arbitrum blockchain. Additionally, the utility of its token, if applicable, typically includes functions such as governance, staking, or participation in the prediction markets, but specific details about the token's utility were not provided in the answer. For accurate information, it's best to refer to official sources or documentation related to Nostra.
