# ReAct Agent: Reason + Act Loop Explained (Demo Notebook)

This notebook is built to **teach** how a ReAct-style agent works using the structure from `ReAct-Agent-from-Scratch` (tools, prompts, loop with Thought → Action → Observation) citeturn1view0.

We will do 5 things:
1. Understand why ReAct exists (what problem it solves)
2. Understand the loop: **Thought → Action → Observation → repeat**
3. Map that loop to code structure in the repo (agent + tools + prompts + UI) citeturn1view0
4. Run a tiny local demo loop
5. Show how to extend the agent with your own tool


## 0. Why do we even need ReAct?

LLMs are good at *thinking in text*, but bad at:
- doing math reliably
- searching the web live
- fetching "today's weather in Hyderabad"
- looking up factual data from Wikipedia

ReAct fixes this by letting the model **think, then call tools** in a deliberate loop.

The agent does:
1. `Thought:` "I should look up the weather in Hyderabad."
2. `Action:` call `weather_tool(city="Hyderabad")`
3. `Observation:` tool returns "30°C, mostly cloudy"
4. `Thought:` "Now I can answer the user with accurate info."

This means:
- the model is not guessing
- it is planning
- it is verifying with real tools


## 1. High-level loop

The standard ReAct loop looks like this (simplified):

```text
User Question
↓
LLM produces:
  Thought: what do I need?
  Action: which tool should I call and with what arguments?
PAUSE
Tool is executed in Python
↓
We capture:
  Observation: tool result
↓
LLM is called again with the full transcript so far
  (Thought, Action, Observation history)
↓
Either:
  - pick another tool
  - OR produce final answer for the user
```

So the agent is literally *thinking, acting, reading the result, thinking again*.
That's why it's called **ReAct**: **Re**ason + **Act**.


## 2. Repository mental model

In `ReAct-Agent-from-Scratch`, the repo is organized something like this (based on the GitHub listing) citeturn1view0:

```text
ReAct-Agent-from-Scratch/
    agent.py          <- the main agent loop (ReAct core logic)
    prompts/          <- system prompts / instruction templates
    tools/            <- python tools: calculator, web search, weather, wikipedia
    utils/            <- helpers (parsing, formatting, etc.)
    web_app.py        <- Streamlit UI for interactive demo
    test_queries.txt  <- example queries
    requirements.txt  <- deps: streamlit, requests, openai/llm client, etc.
```

Conceptually:
- `agent.py` = the brain
- `tools/` = hands + senses
- `web_app.py` = face / UI

You can teach this analogy in leadership talks. It lands.


## 3. Let's build a *mini* ReAct Agent locally in this notebook

We'll implement:
- A **Tool Registry** (a dict of callable tools)
- A tiny **LLM policy stub** (pretend LLM that decides which tool to call)
- A **ReAct loop runner**

In your actual repo, the model would be OpenAI / DeepSeek / Gemini etc., and it would generate Thought / Action text.  Here we will simulate that behavior in a controlled way so it's 100% offline and reproducible for teaching.


In [None]:
from typing import Dict, Callable, Any, Tuple

# -----------------------------
# 1. Tools
# -----------------------------

def tool_calculator(expression: str) -> str:
    """
    Very dumb calculator for demo.
    Supports +, -, *, / between two numbers.
    """
    try:
        result = eval(expression, {"__builtins__": {}})
        return f"{result}"
    except Exception as e:
        return f"ERROR: {e}"

def tool_wikipedia_stub(query: str) -> str:
    """
    Offline fake wikipedia tool.
    In the real repo, this would call Wikipedia and retrieve summary text.
    """
    db = {
        "hyderabad": "Hyderabad is a major tech hub in India, known for HITEC City and rich food culture.",
        "react agent": "ReAct is a prompting / agent pattern combining reasoning steps and tool usage."
    }
    return db.get(query.lower(), "No offline summary found.")

# Tool registry – like tools/ folder in repo
TOOLS: Dict[str, Callable[..., str]] = {
    "calculator": tool_calculator,
    "wikipedia": tool_wikipedia_stub,
}
TOOLS

### 3.1 The 'LLM policy'
In the real repo, the agent asks the LLM:
- Given conversation so far,
- Should I answer now,
- or should I call a tool? Which one? With what input?

The LLM responds in a standardized format like:

```text
Thought: I should look up X
Action: wikipedia
Action Input: "Hyderabad"
```

Here in the notebook, we'll *fake* the model policy using handwritten rules.  
This keeps the logic transparent for students.


In [None]:
def simple_policy(user_message: str, scratchpad: str) -> Tuple[str, str, str]:
    """
    This simulates what the LLM would output.
    Returns (thought, action_name, action_input).

    Rules for demo:
    - if user asks arithmetic -> call calculator
    - if user asks 'who/what is ...' -> call wikipedia
    - else -> answer directly (no tool)
    """
    lower_msg = user_message.lower()

    if any(op in lower_msg for op in ["+", "-", "*", "/"]):
        return (
            "I should evaluate this math expression using calculator.",
            "calculator",
            user_message,
        )
    if any(x in lower_msg for x in ["who is", "what is", "tell me about"]):
        topic = (
            lower_msg
            .replace("who is", "")
            .replace("what is", "")
            .replace("tell me about", "")
            .strip(" ?")
        )
        return (
            f"I should look up {topic} on wikipedia.",
            "wikipedia",
            topic,
        )
    return (
        "I can answer directly without using a tool.",
        "final",
        user_message,
    )

simple_policy("What is Hyderabad?", scratchpad="")

### 3.2 The ReAct loop runner

This is the heart.  
It does exactly what `agent.py` does in ReAct-Agent-from-Scratch style repos:
1. Show the current transcript (chat history, thoughts, observations)
2. Ask policy (LLM) what to do next
3. If policy says "final", we return answer
4. Else we execute the tool, record `Observation`, and continue

We will also build a human-readable trace so you can show students EXACTLY how the agent is thinking.


In [None]:
def react_loop(user_message: str, max_steps: int = 3) -> Dict[str, Any]:
    transcript = []  # list of dicts: {role, content}
    reasoning_trace = ""  # what we'll feed back to policy in real life
    final_answer = None

    for step in range(max_steps):
        thought, action_name, action_input = simple_policy(user_message, reasoning_trace)

        reasoning_trace += f"Thought: {thought}\n"

        if action_name == "final":
            final_answer = f"Answer: {action_input}"
            reasoning_trace += final_answer + "\n"
            break

        tool_fn = TOOLS.get(action_name)
        if tool_fn is None:
            observation = f"ERROR: tool {action_name} not found"
        else:
            observation = tool_fn(action_input)

        reasoning_trace += f"Action: {action_name}({action_input})\n"
        reasoning_trace += f"Observation: {observation}\n"

    return {
        "trace": reasoning_trace,
        "final_answer": final_answer,
    }

demo_result = react_loop("What is Hyderabad?")
demo_result["trace"]

### 3.3 Interpreting the trace

In the trace we captured:
- `Thought:` model deciding what it needs to do
- `Action:` which tool it chose
- `Observation:` what came back from the tool
- possibly `Answer:` if it can finalize now

This is **auditability**:
- Compliance can review why a tool was called
- We can prove numbers came from calculator / DB, not hallucination
- We can debug reasoning loops when something goes wrong

This is the main reason enterprises like ReAct.
It is not just "chatbot"; it is **controlled reasoning + transparent tool usage**.


## 4. Let's test with math

We'll ask the agent:
> `123 * 7 + 10`

It should:
1. Decide to call the calculator tool
2. Get the observation back (the math result)
3. Produce an answer


In [None]:
math_demo = react_loop("123 * 7 + 10")
print("=== TRACE ===")
print(math_demo["trace"])
print()
print("=== FINAL ANSWER ===")
print(math_demo["final_answer"])

## 5. Adding your own tool (critical teaching moment)

In `ReAct-Agent-from-Scratch`, there are multiple tools defined: Calculator, Wikipedia, Web Search, Weather, etc.  citeturn1view0  
You can add *any* tool that is just a Python function.

Example: `hotel_recommender` that returns a curated business hotel in Hyderabad.
This turns the agent from "generic Q&A" into "domain assistant with policy".


In [None]:
def tool_hotel_recommender(city: str) -> str:
    if city.lower() == "hyderabad":
        return (
            "MetroLink Executive Suites (~₹5400/night). Walkable to Hitech City offices. "
            "Breakfast+gym included. Good for business stays."
        )
    return "No curated business-friendly pick for that city yet."

TOOLS["hotel_recommender"] = tool_hotel_recommender

def business_policy(user_message: str, scratchpad: str) -> Tuple[str, str, str]:
    lower_msg = user_message.lower()
    if "business trip" in lower_msg and "hyderabad" in lower_msg:
        return (
            "I should fetch a curated business hotel for Hyderabad.",
            "hotel_recommender",
            "Hyderabad",
        )
    return simple_policy(user_message, scratchpad)

def react_loop_business(user_message: str, max_steps: int = 3):
    reasoning_trace = ""
    final_answer = None

    for step in range(max_steps):
        thought, action_name, action_input = business_policy(user_message, reasoning_trace)
        reasoning_trace += f"Thought: {thought}\n"

        if action_name == "final":
            final_answer = f"Answer: {action_input}"
            reasoning_trace += final_answer + "\n"
            break

        tool_fn = TOOLS.get(action_name)
        if tool_fn is None:
            observation = f"ERROR: tool {action_name} not found"
        else:
            observation = tool_fn(action_input)

        reasoning_trace += f"Action: {action_name}({action_input})\n"
        reasoning_trace += f"Observation: {observation}\n"

        final_answer = f"Answer: {observation}"
        break

    return {
        "trace": reasoning_trace,
        "final_answer": final_answer,
    }

biz_demo = react_loop_business("Suggest a hotel for my business trip to Hyderabad.")
print("=== TRACE ===")
print(biz_demo["trace"])
print()
print("=== FINAL ANSWER ===")
print(biz_demo["final_answer"])

### Why this matters

Now it's not just Q&A. It's policy-aware action.

- You can inject company rules (approved hotels, max nightly rate)
- You can pull LIVE internal data (inventory, budget, compliance)
- You still keep the transparent `Thought / Action / Observation` log to prove why the agent said what it said

This is how you sell "agentic AI" to leadership:
1. It can reason.
2. It can act.
3. It can explain itself.


## 6. How this aligns with the original repo

In the repo (`agent.py`, `tools/`, `web_app.py`) you referenced citeturn1view0:

- `web_app.py` (Streamlit) is the front-end chat interface.
- `tools/` is like the `TOOLS` dict we built (calculator, wikipedia, weather, etc.).
- `agent.py` is effectively a richer version of `react_loop`, where:
  - The LLM (OpenAI / Gemini / etc.) generates the Thought / Action / Action Input.
  - Python executes that tool.
  - The Observation goes back into the LLM's context.
  - The loop continues until the LLM decides to answer the user.

This notebook is your teaching companion:
- It explains ReAct to students and stakeholders
- It demonstrates the reasoning trace live
- It shows exactly where to plug in company logic (like hotel policy tools)

This is how you go from "react agent from scratch" to "agentic AI product for business".
