# Agentic RAG

## Dependencies

In [21]:
!pip install -q openai sentence-transformers datasets

In [10]:
import os
import json
from getpass import getpass
import numpy as np
from sentence_transformers import SentenceTransformer
from openai import OpenAI
import datasets
from datasets import load_dataset
from typing import List, Dict
import re
import json
import pprint

## Overview

In this agentic RAG system, a user question first goes through a planner agent, which decides whether to answer directly or to retrieve information from external open-source knowledge bases. The system uses two vector-search knowledge bases:
- AG News (real-world news)
- SQuAD Wikipedia passages (factual knowledge)
- plus an external calculator tool for math expressions.

Based on the plan, a research agent retrieves relevant text chunks from one or both KBs and may call tools. The answer agent then generates a grounded response using only the retrieved context. A judge agent evaluates whether the answer is accurate and supported by the evidence; if not, it instructs a refinement step. Together, these agents create a flexible, multi-source, self-correcting pipeline that demonstrates how agentic RAG systems improve over standard RAG in adaptability, accuracy, and extensibility.

## OpenAI API key initialization

In [None]:
client = OpenAI()

## Chunking

In [4]:
def simple_chunk(text: str, max_words: int = 120) -> List[str]:
    """Split a long text into chunks of at most max_words words."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), max_words):
        chunks.append(" ".join(words[i:i + max_words]))
    return chunks

## Data loading

### 1) AG News dataset

AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity.

In [6]:
raw_news = load_dataset("ag_news", split="train[:200]")

data/train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

In [8]:
news_docs: List[Dict] = []

for i, row in enumerate(raw_news):
    chunks = simple_chunk(row["text"], max_words=100)
    for j, ch in enumerate(chunks):
        news_docs.append(
            {
                "kb": "news",        # which knowledge base it belongs to
                "doc_id": i,         # original article id
                "chunk_id": j,       # chunk index inside article
                "label": int(row["label"]),
                "text": ch,          # the chunk text
            }
        )

In [11]:
for example in news_docs[:3]:
    pprint.pprint(example)
    print("\n" + "="*80 + "\n")

{'chunk_id': 0,
 'doc_id': 0,
 'kb': 'news',
 'label': 2,
 'text': 'Wall St. Bears Claw Back Into the Black (Reuters) Reuters - '
         "Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are "
         'seeing green again.'}


{'chunk_id': 0,
 'doc_id': 1,
 'kb': 'news',
 'label': 2,
 'text': 'Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - '
         'Private investment firm Carlyle Group,\\which has a reputation for '
         'making well-timed and occasionally\\controversial plays in the '
         'defense industry, has quietly placed\\its bets on another part of '
         'the market.'}


{'chunk_id': 0,
 'doc_id': 2,
 'kb': 'news',
 'label': 2,
 'text': "Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring "
         'crude prices plus worries\\about the economy and the outlook for '
         'earnings are expected to\\hang over the stock market next week '
         'during the depth of the\\summer doldrums.'}




### 2) SQuAD (open-source Wikipedia QA)

In [12]:
raw_squad = load_dataset("squad", split="train[:200]")

README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

plain_text/validation-00000-of-00001.par(…):   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

In [13]:
squad_docs: List[Dict] = []

for i, row in enumerate(raw_squad):
    context = row["context"]
    question = row["question"]

    base_text = f"Q: {question}\nCONTEXT: {context}" # using both questions and context

    chunks = simple_chunk(base_text, max_words=100)
    for j, ch in enumerate(chunks):
        squad_docs.append(
            {
                "kb": "squad",      # name of second KB
                "doc_id": i,
                "chunk_id": j,
                "label": None,
                "text": ch,
            }
        )

In [14]:
print(f"SQuAD: {len(squad_docs)} chunks from {len(raw_squad)} examples")

SQuAD: 434 chunks from 200 examples


In [15]:
for example in squad_docs[:3]:
    pprint.pprint(example)
    print("\n" + "="*80 + "\n")

{'chunk_id': 0,
 'doc_id': 0,
 'kb': 'squad',
 'label': None,
 'text': 'Q: To whom did the Virgin Mary allegedly appear in 1858 in Lourdes '
         'France? CONTEXT: Architecturally, the school has a Catholic '
         "character. Atop the Main Building's gold dome is a golden statue of "
         'the Virgin Mary. Immediately in front of the Main Building and '
         'facing it, is a copper statue of Christ with arms upraised with the '
         'legend "Venite Ad Me Omnes". Next to the Main Building is the '
         'Basilica of the Sacred Heart. Immediately behind the basilica is the '
         'Grotto, a Marian place of prayer and reflection. It is a replica of '
         'the grotto at Lourdes, France where the'}


{'chunk_id': 1,
 'doc_id': 0,
 'kb': 'squad',
 'label': None,
 'text': 'Virgin Mary reputedly appeared to Saint Bernadette Soubirous in '
         '1858. At the end of the main drive (and in a direct line that '
         'connects through 3 statues and the Gold D

## Build embeddings

Model for embeddings: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.

In [18]:
EMBED_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
embedder = SentenceTransformer(EMBED_MODEL_NAME)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [20]:
def build_embeddings(docs: List[Dict]):
    texts = [d["text"] for d in docs]
    embs = embedder.encode(
        texts,
        convert_to_numpy=True,
        show_progress_bar=True
    )
    norms = np.linalg.norm(embs, axis=1, keepdims=True) + 1e-12
    embs = embs / norms
    return embs

In [19]:
news_embeddings = build_embeddings(news_docs)
squad_embeddings = build_embeddings(squad_docs)

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

Batches:   0%|          | 0/14 [00:00<?, ?it/s]

Now, our knowledge base consists of pairs of texts and their corresponding embeddings.

In [21]:
KB = {
    "news": {
        "docs": news_docs,
        "embeddings": news_embeddings,
    },
    "squad": {
        "docs": squad_docs,
        "embeddings": squad_embeddings,
    },
}

for name, kb in KB.items():
    print(name, "->", len(kb["docs"]), "docs/chunks")


news -> 215 docs/chunks
squad -> 434 docs/chunks


## Retrieval

Given a `kb_name` and a query, find the top-k most similar chunks with cosine similarity.

In [22]:
def retrieve_from_kb(kb_name: str, query: str, k: int = 5) -> List[Dict]:
    """Vector search in the chosen knowledge base."""
    kb = KB[kb_name]
    docs = kb["docs"]
    embs = kb["embeddings"]

    q_emb = embedder.encode([query], convert_to_numpy=True)
    q_emb = q_emb / (np.linalg.norm(q_emb, axis=1, keepdims=True) + 1e-12)

    # cosine similarity via dot product (since vectors are normalized)
    scores = np.dot(embs, q_emb[0])
    top_idx = np.argsort(scores)[::-1][:k]

    results = []
    for idx in top_idx:
        d = docs[idx]
        results.append(
            {
                "score": float(scores[idx]),
                "kb": kb_name,
                "doc_id": d["doc_id"],
                "chunk_id": d["chunk_id"],
                "text": d["text"],
            }
        )
    return results


In [26]:
print("News retrieval example:")
for r in retrieve_from_kb("news", "stock markets and oil prices", k=3):
    print(r["kb"], r["score"], "->", r["text"][:120], "...\n")

News retrieval example:
news 0.6110310554504395 -> Oil and Economy Cloud Stocks' Outlook NEW YORK (Reuters) - Soaring crude prices plus worries about the economy and the o ...

news 0.5996482968330383 -> Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring crude prices plus worries\about the economy and the ou ...



In [32]:
print("FAQ retrieval example:")
for r in retrieve_from_kb("squad", "Where is the Eiffel Tower located?", k=3):
    print(r["kb"], r["score"], "->", r["text"][:120], "...\n")


FAQ retrieval example:
squad 0.43769073486328125 -> located in Beijing, Chicago, Dublin, Jerusalem and Rome. ...

squad 0.43769073486328125 -> located in Beijing, Chicago, Dublin, Jerusalem and Rome. ...

squad 0.40566298365592957 -> Q: How large in square feet is the LaFortune Center at Notre Dame? CONTEXT: A Science Hall was built in 1883 under the d ...



Small tool for calculations:

In [23]:
def calculator_tool(expression: str) -> str:
    """
    A tiny math tool.
    Only allows digits, +, -, *, /, parentheses, and spaces.
    """
    if not re.fullmatch(r"[0-9+\-*/().\s]+", expression):
        return "Calculator error: unsupported characters."

    try:
        result = eval(expression, {"__builtins__": {}})
        return f"{expression} = {result}"
    except Exception as e:
        return f"Calculator error: {e}"


Quick check of toll usage:

In [25]:
print("Calculator test:", calculator_tool("3*(5+2) - 4"))

Calculator test: 3*(5+2) - 4 = 17


## LLM helper + Planner agent

### LLM helper function

A small wrapper around the OpenAI Chat Completions API, to reduce repetition.

In [33]:
def call_llm(system: str, user: str, model: str = "gpt-4o-mini") -> str:
    resp = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user},
        ],
    )
    return resp.choices[0].message.content

### Planner agent: decide mode, sources, and tool use

The planner reads the question and outputs JSON with:

- `mode`: "DIRECT" or "RAG"

- `sources`: which knowledge base to use

- `use_calculator`: whether to call the math tool

In [36]:
def plan_agent(question: str) -> dict:
    system = "You are a routing agent that outputs ONLY valid JSON, no explanation."
    user = f"""
You help decide how to answer user questions using tools.

TOOLS AND SOURCES:
- "news": vector search over news articles (AG News dataset).
- "squad": vector search over Wikipedia-style QA contexts (SQuAD dataset).
- "calculator": a math tool for expressions like "3*(5+2) - 4".

INSTRUCTIONS:
1. If the question is about current or past events, business, finance, politics, or real-world news,
   set mode to "RAG" and include "news" in sources.

2. If the question looks like a factual 'what/when/where/why/how' question that a Wikipedia article
   might answer, set mode to "RAG" and include "squad" in sources.

3. If the question contains a math expression that can be evaluated, set "use_calculator": true.

4. If the question is simple chit-chat or general knowledge, set mode to "DIRECT"
   and use an empty sources list.

Return ONLY valid JSON with keys:
- "mode": "DIRECT" or "RAG"
- "sources": an array from ["news", "squad"]
- "use_calculator": true or false

Example output:
{{"mode": "RAG", "sources": ["news", "squad"], "use_calculator": false}}

Now decide for this question:
"{question}"
""".strip()

    raw = call_llm(system, user)

    try:
        plan = json.loads(raw)
    except json.JSONDecodeError:
        m = re.search(r"\{.*\}", raw, re.DOTALL)
        if m:
            plan = json.loads(m.group(0))
        else:
            plan = {"mode": "DIRECT", "sources": [], "use_calculator": False}

    return plan


Planner quick test:

In [37]:
for q in [
    "Explain the idea of Hamiltonian oscillations.",
    "What happened in the stock market and oil prices recently?",
    "Compute 3*(7+5).",
]:
    print("QUESTION:", q)
    print(plan_agent(q))
    print("-" * 60)

QUESTION: Explain the idea of Hamiltonian oscillations.
{'mode': 'RAG', 'sources': ['squad'], 'use_calculator': False}
------------------------------------------------------------
QUESTION: What happened in the stock market and oil prices recently?
{'mode': 'RAG', 'sources': ['news'], 'use_calculator': False}
------------------------------------------------------------
QUESTION: Compute 3*(7+5).
{'mode': 'DIRECT', 'sources': [], 'use_calculator': True}
------------------------------------------------------------


Works nice!

### Convert research output into a text context

Format the retrieved chunks and tool outputs into a single text block for the answer agent.

In [39]:
def build_context_snippet(research_output: dict) -> str:
    """Pretty-print context from knowledge base chunks and tool outputs."""
    parts = []

    for i, ch in enumerate(research_output["chunks"]):
        parts.append(
            f"[KB {ch['kb']} | doc {ch['doc_id']} | chunk {ch['chunk_id']} | score={ch['score']:.3f}]\n"
            + ch["text"]
        )

    for t in research_output["tools"]:
        parts.append(
            f"[TOOL {t['tool']}]\ninput: {t['input']}\noutput: {t['output']}"
        )

    return "\n\n".join(parts)


## Research agent: to execute plan

In [44]:
def research_agent(question: str, plan: dict, k_per_source: int = 4) -> dict:
    context_chunks = []

    # 1) vector retrieval from each selected knowledge base
    for source in plan.get("sources", []):
        if source in KB:
            hits = retrieve_from_kb(source, question, k=k_per_source)
            context_chunks.extend(hits)

    # 2) optional calculator
    tool_outputs = []
    if plan.get("use_calculator"):
        tool_outputs.append(
            {
                "tool": "calculator",
                "input": question,
                "output": calculator_tool(question),
            }
        )

    return {
        "question": question,
        "plan": plan,
        "chunks": context_chunks,
        "tools": tool_outputs,
    }


In [45]:
def build_context_snippet(research_output: dict) -> str:
    """Pretty-print context for the answer agent."""
    parts = []

    for i, ch in enumerate(research_output["chunks"]):
        parts.append(
            f"[KB {ch['kb']} | doc {ch['doc_id']} | chunk {ch['chunk_id']} | score={ch['score']:.3f}]\n"
            + ch["text"]
        )

    for t in research_output["tools"]:
        parts.append(f"[TOOL {t['tool']}]\ninput: {t['input']}\noutput: {t['output']}")

    return "\n\n".join(parts)


## Answer agent + Judge agent

### Answer agent

If `mode = DIRECT`, answer from general knowledge.
If `mode = RAG`, answer using the context (knowledge base + tools).

In [40]:
def answer_agent(research_output: dict) -> str:
    question = research_output["question"]
    plan = research_output["plan"]
    mode = plan.get("mode", "DIRECT")

    # DIRECT mode just LLM
    if mode == "DIRECT":
        system = "You are a helpful assistant. Answer concisely and clearly."
        user = f"Question: {question}"
        return call_llm(system, user)

    # RAG mode
    context = build_context_snippet(research_output)
    system = "You are a careful RAG assistant. Use only the given context when possible."
    user = f"""
You are given:

QUESTION:
{question}

CONTEXT (from multiple knowledge bases and tools):
{context}

INSTRUCTIONS:
- Use the context to answer in 3–6 sentences.
- If something is not supported by the context, say you are not sure.
- At the end, list which KBs or tools you relied on, e.g. "Sources: news, agentic_faq, calculator".
""".strip()

    return call_llm(system, user)

### Judge agent (self-check and refinement)

The judge checks if the answer is grounded in the context.
If not, it suggests a revised answer.

In [41]:
def judge_agent(question: str, answer: str, research_output: dict) -> dict:
    """
    Judge checks grounding, coherence, and suggests improvement.
    Returns dict with:
      - verdict: "OK" or "REVISE"
      - comment: explanation
      - improved_answer: optional revised answer
    """
    context = build_context_snippet(research_output)
    system = "You are a strict but fair evaluator of RAG answers."
    user = f"""
You evaluate an answer produced by a RAG system.

QUESTION:
{question}

CONTEXT (from KBs and tools):
{context}

ANSWER TO EVALUATE:
{answer}

TASK:
1. Decide if the answer is well grounded in the context and factually consistent.
2. If it is mostly correct and grounded, set "verdict" to "OK".
3. If it misses key information, hallucinates, or contradicts the context, set "verdict" to "REVISE".
4. If verdict is "REVISE", also provide a corrected answer in "improved_answer".
5. Output ONLY JSON with keys: "verdict", "comment", "improved_answer".

Example:
{{"verdict": "OK", "comment": "Well grounded.", "improved_answer": ""}}
""".strip()

    raw = call_llm(system, user)
    try:
        j = json.loads(raw)
    except json.JSONDecodeError:
        m = re.search(r"\{.*\}", raw, re.DOTALL)
        if m:
            j = json.loads(m.group(0))
        else:
            j = {"verdict": "OK", "comment": "Failed to parse judge output.", "improved_answer": ""}
    return j


## Agentic RAG controller (full pipeline)

All the flow together:

1. Plan

2. Research

3. Initial answer

4. Judge + possible revision

In [42]:
def agentic_rag(question: str) -> dict:
    plan = plan_agent(question)

    research_out = research_agent(question, plan)

    answer1 = answer_agent(research_out)

    judgment = judge_agent(question, answer1, research_out)

    final_answer = answer1
    if judgment.get("verdict") == "REVISE" and judgment.get("improved_answer"):
        final_answer = judgment["improved_answer"]

    return {
        "question": question,
        "plan": plan,
        "research": research_out,
        "initial_answer": answer1,
        "judgment": judgment,
        "final_answer": final_answer,
    }


## Inference

In [46]:
test_questions = [
    "Explain how the Hamiltonian cycle in a graph can be found.",
    "What are some topics in the AG News dataset about the stock market or oil?",
    "3*(7+5) - 4",
]

In [48]:
for q in test_questions:
    result = agentic_rag(q)
    print("======================================================")
    print("QUESTION:", q)
    print("PLAN:", result["plan"])
    print("\nINITIAL ANSWER:\n", result["initial_answer"])
    print("\nJUDGMENT:", result["judgment"])
    print("\nFINAL ANSWER:\n", result["final_answer"])
    print("\nUsed knowledge base:", [c["kb"] for c in result["research"]["chunks"]][:5])
    print("Tools:", [t["tool"] for t in result["research"]["tools"]])


QUESTION: Explain how the Hamiltonian cycle in a graph can be found.
PLAN: {'mode': 'RAG', 'sources': ['squad'], 'use_calculator': False}

INITIAL ANSWER:
 I'm not sure. The context provided does not contain information about how to find a Hamiltonian cycle in a graph. 

Sources: KB squad

JUDGMENT: {'verdict': 'REVISE', 'comment': 'The answer correctly identifies that the context does not provide information about Hamiltonian cycles, but it does not attempt to explain how to find a Hamiltonian cycle based on common knowledge. Therefore, the answer lacks completeness.', 'improved_answer': 'To find a Hamiltonian cycle in a graph, one can use several approaches, including backtracking algorithms, dynamic programming, or heuristic methods such as genetic algorithms. The backtracking approach involves exploring all possible paths in the graph, attempting to construct the cycle by adding one vertex at a time and checking if it leads to a complete cycle. Dynamic programming can be employed t