# Agentic RAG (Single Agent) — ReAct + Tavily + Memory + Planning (Beginner Notebook)

This notebook demonstrates a **beginner-friendly Agentic RAG system** using a **single ReAct agent**.

You will see:
- **ReAct framework**: the agent alternates between *reasoning* and *tool use* (search, memory).
- **Planning**: the system creates a plan before running the agent.
- **Self-reflection**: the system checks and improves the answer after the first draft.
- **Managing tool inventory**: tools are registered in one place and passed to the agent.
- **Managing memory**:
  - Short-term: chat history (ConversationBufferMemory)
  - Long-term: a tiny “Notes Memory” tool the agent can write/read

> ⚠️ Disclaimer: This is for learning and audit-style research support only — **NOT legal/tax advice**.


## What is “Agentic RAG”?

Classic RAG is usually: **retrieve → generate** (fixed pipeline).

**Agentic RAG** is: **agent decides** when to retrieve, what to retrieve, how many times, and how to combine evidence — using tools.


## 0) Install dependencies (single cell)

If you re-run the notebook, you can re-run this cell safely.


In [None]:
!pip -q install -U tavily-python gradio langchain langchain-tavily langchain-openai langchain-community

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/105.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.8/105.8 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25h

## 1) Set API keys (environment variables only)

We use:
- `OPENAI_API_KEY`
- `OPENAI_BASE_URL` (optional; default OpenAI endpoint)
- `TAVILY_API_KEY`


In [None]:
# Cell 2 — Configure API keys (safe prompting)
import os, getpass
from google.colab import userdata
openai_api_key = userdata.get('OPENAI_API_KEY')

OPENAI_BASE_URL = "https://aibe.mygreatlearning.com/openai/v1"

os.environ["OPENAI_API_KEY"] = openai_api_key

os.environ["TAVILY_API_KEY"] = userdata.get('TAVILY_API_KEY')

## 2) Tools: Tavily Web Search + Memory Tools

We will build a **small tool inventory**:
1) `tavily_search` — web search (snippets + URLs, no scraping)
2) `write_note` — save a fact to long-term notes
3) `read_notes` — retrieve saved notes
4) `clear_notes` — clear notes for a new session

This shows how to manage a tool inventory in one place.


In [None]:
from typing import Dict, List
import time

from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import Tool

# 1) Tavily tool (LangChain community tool)
tavily_search_tool = TavilySearchResults(max_results=5)

# 2) Simple long-term "Notes Memory" (beginner-friendly)
NOTES_MEMORY: List[str] = []

def write_note(note: str) -> str:
    """Store a short fact/decision for later."""
    note = (note or "").strip()
    if not note:
        return "No note provided."
    NOTES_MEMORY.append(note)
    return f"Saved note #{len(NOTES_MEMORY)}."

def read_notes(_: str = "") -> str:
    """Read all stored notes."""
    if not NOTES_MEMORY:
        return "No notes saved yet."
    return "\n".join([f"- {n}" for n in NOTES_MEMORY])

def clear_notes(_: str = "") -> str:
    """Clear the notes memory."""
    NOTES_MEMORY.clear()
    return "Notes cleared."

notes_write_tool = Tool(
    name="write_note",
    func=write_note,
    description="Write a short fact/decision to long-term notes memory. Input: a single note string."
)

notes_read_tool = Tool(
    name="read_notes",
    func=read_notes,
    description="Read the long-term notes memory. Input can be empty."
)

notes_clear_tool = Tool(
    name="clear_notes",
    func=clear_notes,
    description="Clear all long-term notes. Input can be empty."
)

TOOLS = [tavily_search_tool, notes_write_tool, notes_read_tool, notes_clear_tool]

print("✅ Tools loaded:", [t.name for t in TOOLS])


✅ Tools loaded: ['tavily_search_results_json', 'write_note', 'read_notes', 'clear_notes']


## 3) LLM + Short-term memory (chat history)

We use:
- `ChatOpenAI` as the model
- `ConversationBufferMemory` for short-term memory (what the user and agent said)

In beginner demos, buffer memory is easiest to understand.


In [None]:
!pip -q uninstall -y pydantic pydantic-core langchain langchain-core langchain-community langchain-openai
!pip -q install -U \
  "pydantic>=2.7,<3" "pydantic-core>=2.18,<3" \
  "langchain==0.2.16" "langchain-core==0.2.38" "langchain-community==0.2.16" "langchain-openai==0.1.23"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.6/90.6 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m396.4/396.4 kB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m82.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.0/52.0 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m463.6/463.6 kB[0m [31m37.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m84.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m948.6/948.6 kB[0m [31m51.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following d

In [None]:
!pip install -qU langchain-classic
! pip install "numpy==1.26.4"
!pip install -q langchain-core



In [None]:
from langchain_openai import ChatOpenAI
#from langchain_classic.memory import ConversationBufferMemory
#from langchain_community.memory import ConversationBufferMemory
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.1,
    base_url=OPENAI_BASE_URL,
    timeout=60,
    max_retries=2)

short_term_memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

print("✅ LLM + short-term memory ready.")

ModuleNotFoundError: No module named 'langchain_core.memory'

## 4) Planning step (before the agent runs)

To make agent behavior **predictable for beginners**, we do a small planning step:
- Restate the problem
- Decide 2–4 search queries
- Decide what to store in memory

Then we pass that plan into the agent prompt.

This is *planning in a single-agent system* (still 1 agent, but structured).


In [None]:
import json

def make_plan(question: str, jurisdiction: str = "General") -> Dict:
    """Simple planner: generate search queries + what to remember."""
    prompt = f"""You are a planner for an Agentic RAG system.
Return STRICT JSON with keys:
- problem_restate: string
- search_queries: list of 2 to 4 short queries
- memory_notes_to_save: list of 1 to 3 notes to save after we find evidence
Jurisdiction: {jurisdiction}
User question: {question}
"""
    resp = llm.invoke(prompt)
    # Try parse; if parsing fails, fall back safely.
    try:
        return json.loads(resp.content)
    except Exception:
        return {
            "problem_restate": question,
            "search_queries": [f"{jurisdiction} {question}"],
            "memory_notes_to_save": ["Remember to cite official sources and state limits."]
        }

plan = make_plan("UAE VAT late registration penalties and key obligations", jurisdiction="UAE")
plan


## 5) ReAct Agent (single agent)

We use LangChain’s ReAct agent pattern:
- The prompt encourages the agent to use tools when needed (search, memory).
- The AgentExecutor runs the loop until it finishes.


In [None]:
from langchain.agents import create_react_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

SYSTEM_GUARDRAILS = """
You are an audit-style research assistant.
Non-negotiable guardrails:
- NOT legal/tax advice.
- Do NOT provide filing instructions or tax planning/avoidance.
- Use web search snippets only; if information is missing, say so.
- Cite sources with [1], [2] ... and include a citations list at the end.
- If authoritative sources are insufficient, explicitly say "Insufficient authoritative guidance found."
""".strip()

# ReAct prompt template
# Note: ReAct agents typically include tool descriptions automatically via create_react_agent.
react_prompt = ChatPromptTemplate.from_messages([
    ("system", SYSTEM_GUARDRAILS),
    ("system", "Here is the PLAN (follow it, but you may adjust if evidence suggests):
{plan_json}"),
    ("system", "Short-term chat history:
{chat_history}"),
    ("human", "{input}")
])

agent = create_react_agent(llm=llm, tools=TOOLS, prompt=react_prompt)

agent_executor = AgentExecutor(
    agent=agent,
    tools=TOOLS,
    memory=short_term_memory,
    verbose=True,  # set False if you don't want to see the ReAct loop
    handle_parsing_errors=True,
    max_iterations=8
)

print("✅ ReAct agent ready.")


## 6) Run the Agentic RAG query

Tip for teaching:
- Start with a question that clearly needs web search.
- Watch the agent decide to call `tavily_search`.
- Then watch it write notes via `write_note`.


In [None]:
# Optional: reset long-term notes before demo
clear_notes()

question = "In UAE, what penalties are mentioned for late VAT registration and what are common obligations an auditor checks?"
inputs = {
    "input": question,
    "plan_json": json.dumps(make_plan(question, jurisdiction="UAE"), ensure_ascii=False)
}
result = agent_executor.invoke(inputs)
result["output"][:1200]


## 7) Self-reflection (quality check + revision)

A very simple “self-reflecting mechanism”:
1) Ask the model to critique the draft against guardrails
2) If issues are found, ask it to produce a revised answer

This is a common beginner pattern to show self-correction.


In [None]:
def reflect_and_improve(question: str, draft: str) -> str:
    critique_prompt = f"""You are a reviewer for an audit-style assistant.
Check the DRAFT for:
- missing disclaimer
- missing citations list
- claims not supported by snippets
- tax planning/avoidance or filing steps (must be absent)
- missing explicit limitations

Return:
1) A short critique (bullets)
2) A revised answer that fixes issues (keep it concise)
Question: {question}
DRAFT:
{draft}
"""
    resp = llm.invoke(critique_prompt)
    return resp.content

draft = result["output"]
review = reflect_and_improve(question, draft)
review[:1600]


## 8) Managing memory (demo)

We show both:
- **Short-term memory**: the conversation buffer
- **Long-term notes**: what the agent wrote via `write_note`

This helps beginners see that memory can be *multiple layers*.


In [None]:
print("=== Long-term Notes Memory ===")
print(read_notes())

print("\n=== Short-term Chat History (last few messages) ===")
hist = short_term_memory.load_memory_variables({})["chat_history"]
for msg in hist[-4:]:
    print(f"{msg.type.upper()}: {msg.content[:200]}")


## 9) Managing large tool inventories (beginner view)

In real systems, you can have dozens of tools. Beginners get overwhelmed.
A simple strategy:
- Maintain a **Tool Registry** (one list)
- Group tools by category
- Keep “safe defaults” (search + memory + calculator)

Below we print the tool registry and show short descriptions.


In [None]:
print("=== Tool Registry ===")
for t in TOOLS:
    # Some tools have different fields; keep printing robust
    desc = getattr(t, "description", "") or getattr(t, "args", "")
    print(f"- {t.name}: {desc}")


# End Summary (inside the notebook)

### What you built
A **single-agent Agentic RAG system** that:
- Plans first (simple JSON plan)
- Uses **ReAct** to decide when to call tools
- Retrieves evidence via **LangChain Tavily** (no scraping)
- Uses **memory**:
  - short-term chat history
  - long-term notes memory tools
- Uses **self-reflection** to critique and improve the answer
- Handles a **tool inventory** via a registry

### Why it matters
This pattern scales to real audit/compliance assistants:
- Auditors ask scenarios → agent searches → extracts evidence → produces structured notes
- Guardrails reduce hallucinations and unsafe advice
- Memory helps carry context across multi-turn investigations
