# Tutorial 6 - The ReAct Agent (Reason + Act)

## Where You Are in the Learning Journey

```
 Tutorials 1-5      Tutorial 6         Tutorial 7         Tutorial 8
 RAG Fundamentals   ReAct Agent        Reflection         State
 (the retrieval     (you are here)     Self-Correction    Management
  pipeline)                            (T7)               (T8)
```

**What this tutorial adds:** a ReAct (Reason + Act) agent that uses the RAG
retrieval system built in Tutorials 1-5 as a *tool*.

Instead of directly retrieving and answering in one fixed pipeline, the agent
decides at each step *whether* to retrieve, *what* to search for, and *when*
it has enough information to answer.

**What you will learn in this tutorial:**
- What an AI agent is and how it differs from a fixed pipeline
- What the ReAct pattern is (Reason + Act)
- What a tool call is and how the agent chooses which tool to use
- How to trace a Thought-Action-Observation loop step by step
- When an agent approach is better than a fixed RAG pipeline

**Prerequisites:** Tutorials 1-4 (understand RAG retrieval). Python basics.

```mermaid
flowchart TD
    Q[User Question] --> T[Thought: what do I need?]
    T --> A[Action: call a tool]
    A --> O[Observation: tool result]
    O --> T2{Have enough info?}
    T2 -- No --> T
    T2 -- Yes --> F[Final Answer]
```


## What Is an AI Agent?

### The Difference Between a Pipeline and an Agent

A **pipeline** is a fixed sequence of steps. Every question goes through the
same steps in the same order, whether or not all steps are needed.

```
Fixed RAG Pipeline (Tutorials 1-5):
  Question -> Embed -> Retrieve top-5 -> Generate answer
  (always exactly these 3 steps, always top-5, always one retrieval)
```

An **agent** is a system that decides what to do next at each step. It has:
- A set of **tools** it can call (e.g., retrieve, search, calculate)
- A reasoning loop that chooses which tool to use and when to stop

```
ReAct Agent (this tutorial):
  Question -> Think -> Maybe retrieve once -> Think -> Maybe retrieve again
           -> Think -> Answer (stops when it decides it has enough)
```

### Why 'ReAct'?

ReAct stands for **Reason + Act**. The agent alternates between:
- **Reasoning**: generating a thought about what to do next
- **Acting**: calling a tool and observing the result

This was introduced in the 2022 paper 'ReAct: Synergizing Reasoning and Acting
in Language Models' (Yao et al.).

### The Three Things in Each Step

Each ReAct cycle produces three things:

| Step | What it is | Example |
|------|-----------|--------|
| Thought | The agent's reasoning about what to do | 'I need to find the leave policy.' |
| Action  | Which tool to call and what input to give | retrieve('annual leave entitlement') |
| Observation | What the tool returned | 'Employees get 25 days per year...' |

After the observation, the agent thinks again and decides: do I have enough to
answer the question, or do I need to call another tool?


## What Is a Tool?

A **tool** is any Python callable that accepts a plain string input and returns
a plain string output. The agent receives a dictionary of tool names to callables.

In this tutorial the tool is the RAG retriever from Tutorials 1-5:

```
Tool name   : 'retrieve'
Input       : a search query (string)
Output      : the top-3 retrieved chunks joined as a single string

Example:
  Input  -> 'annual leave entitlement'
  Output -> 'Chunk 1: Employees are entitled to 25 days...
             Chunk 2: Leave accrues monthly...
             Chunk 3: Unused leave may be carried over...'
```

By packaging retrieval as a tool, the agent can:
- Decide when to retrieve (not every question needs retrieval)
- Choose what to search for (reformulate the query if the first one misses)
- Make multiple retrieval calls for complex questions


In [None]:
import importlib
import os
from pathlib import Path
import shutil
import subprocess
import sys

import pandas as pd
from dotenv import load_dotenv

if shutil.which("uv") is None:
    print("uv not found. Installing with pip...")
    subprocess.run([sys.executable, "-m", "pip", "install", "uv"], check=True)

cwd = Path.cwd().resolve()
repo_root = next(
    (path for path in [cwd, *cwd.parents] if (path / "pyproject.toml").exists() and (path / "src").exists()),
    cwd,
)
os.chdir(repo_root)
src_path = repo_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

REQUIRED_PACKAGES = ["openai", "chromadb", "numpy", "pandas", "rank_bm25", "sentence_transformers", "dotenv"]
PIP_NAME_MAP = {"rank_bm25": "rank-bm25", "sentence_transformers": "sentence-transformers", "dotenv": "python-dotenv"}

def find_missing(packages):
    importlib.invalidate_caches()
    return [pkg for pkg in packages if importlib.util.find_spec(pkg) is None]

missing = find_missing(REQUIRED_PACKAGES)
if missing:
    print("Missing packages:", missing)
    subprocess.run(["uv", "sync"], check=True)

missing_after_sync = find_missing(REQUIRED_PACKAGES)
if missing_after_sync:
    pip_targets = [PIP_NAME_MAP.get(pkg, pkg) for pkg in missing_after_sync]
    subprocess.run([sys.executable, "-m", "pip", "install", *pip_targets], check=True)

final_missing = find_missing(REQUIRED_PACKAGES)
if final_missing:
    raise ImportError(f"Dependencies still missing: {final_missing}")

from rag_tutorials.io_utils import load_handbook_documents, load_queries
from rag_tutorials.chunking import semantic_chunk_documents
from rag_tutorials.pipeline import build_dense_retriever
from rag_tutorials.qa import answer_with_context

load_dotenv()
if not os.getenv("OPENAI_API_KEY"):
    raise EnvironmentError("OPENAI_API_KEY is required")

embedding_model = os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small")
chat_model = os.getenv("OPENAI_CHAT_MODEL", "gpt-4.1-mini")

handbook_path = Path("data/handbook_manual.txt")
queries_path = Path("data/queries.jsonl")
if not handbook_path.exists() or not queries_path.exists():
    raise FileNotFoundError("Run: uv run python scripts/generate_data.py")

documents = load_handbook_documents(handbook_path)
queries = load_queries(queries_path)
chunks = semantic_chunk_documents(documents)
dense_retriever, _ = build_dense_retriever(
    chunks=chunks,
    collection_name="agent_tutorial_dense",
    embedding_model=embedding_model,
)

In [None]:
# Build the retrieve tool from the dense retriever (Tutorial 1-4 pipeline)
# The tool wraps retriever output as a plain string so the agent can read it.

from rag_tutorials.agent_loop import run_react_loop

TOP_K = 3

def retrieve_tool(query: str) -> str:
    """Retrieve the top-3 relevant chunks and return them as a formatted string."""
    results = dense_retriever(query, top_k=TOP_K)
    if not results:
        return "No relevant chunks found."
    parts = [f"Chunk {i+1} [{r.chunk_id}]: {r.text}" for i, r in enumerate(results)]
    return "\n\n".join(parts)

tools = {"retrieve": retrieve_tool}

print("Tool registered: retrieve")
print("Quick smoke test:")
sample_output = retrieve_tool("remote work VPN policy")
print(sample_output[:300], "...")

## Novice Trace: Watching the Agent Think

Before running a full evaluation, let us trace a single question step by step.
Each printed block shows exactly what the agent was thinking at each step.

**Question:** 'What is the maximum number of days an employee can work
internationally before needing a Global Mobility case?'

What to watch for in the trace:
- The agent produces a **Thought** explaining why it is calling the tool
- The **Action** is the tool name and the search query the agent chose
- The **Observation** is what the retriever returned
- The agent may iterate several times before it is confident enough to answer


In [None]:
# Single-question agent trace

question = "What is the maximum number of days an employee can work internationally before needing a Global Mobility case?"

result = run_react_loop(
    question=question,
    tools=tools,
    model=chat_model,
    max_steps=5,
)

print("QUESTION:", result.question)
print("="*70)

for i, step in enumerate(result.steps, start=1):
    print(f"\n--- Step {i} ---")
    print(f"Thought     : {step.thought}")
    print(f"Action      : {step.action}({repr(step.action_input)})")
    print(f"Observation : {step.observation[:200]}..." if len(step.observation) > 200 else f"Observation : {step.observation}")

print("\n" + "="*70)
print("FINAL ANSWER:", result.answer)
print(f"\nSteps taken : {len(result.steps)}")

## Evaluating Agent Answers Across Multiple Questions

Now we run the agent on several questions from the shared query set and compare
the answers to the expected source documents.

We measure:
- **Steps**: average number of Thought-Action-Observation cycles per question
- **Tool calls**: total retrieval calls made across all questions
- **Answer length**: a proxy for whether the agent produced a complete response

Note: The same `recall_at_k` / `mrr` metrics used in Tutorials 1-5 do not apply
directly here because the agent decides its own retrieval queries and top-k.
Tutorial 8 (State Management) shows how to inspect individual agent runs in detail.


In [None]:
# Run the ReAct agent on the first 5 queries and collect step statistics

import time

rows = []
eval_queries = queries[:5]

for q in eval_queries:
    start = time.perf_counter()
    agent_result = run_react_loop(
        question=q.question,
        tools=tools,
        model=chat_model,
        max_steps=5,
    )
    elapsed_ms = (time.perf_counter() - start) * 1000
    rows.append({
        "query_id": q.query_id,
        "question": q.question[:60] + "...",
        "steps": len(agent_result.steps),
        "answer_tokens": len(agent_result.answer.split()),
        "latency_ms": round(elapsed_ms, 1),
    })

df = pd.DataFrame(rows)
print(df.to_string(index=False))
print(f"\nAverage steps   : {df['steps'].mean():.1f}")
print(f"Average latency : {df['latency_ms'].mean():.0f} ms")

## Learning Checkpoint: ReAct Agent

### What Works

- The agent decides what to search for rather than using the raw user question
  directly. This means it can reformulate a vague question into a better search
  query.
- The agent can make multiple retrieval calls if one is not enough, which a
  fixed pipeline cannot do.
- The loop terminates naturally when the agent is satisfied it has enough context.

### What Does Not Work Well

- The agent accepts its first answer without checking whether it is accurate.
  If the retrieved context was misleading, the answer may be wrong and the agent
  will not notice.
- There is no quality gate: the agent cannot self-critique or revise.

### Why Move to Tutorial 7?

Tutorial 7 adds a **Critic** agent that reviews every Worker answer before it is
returned to the user. If the answer is incomplete or inaccurate, the Critic sends
feedback back to the Worker for revision.
