# Day 2 — Exercise 5: Agent Planning and Tool Integration (Plan–Act–Observe)

## Background & Plan
**Why this matters.** Real-world tasks rarely fit in one prompt. Agents follow a **Plan → Act → Observe** loop to break work into steps and call external tools (search, calculator, code) to extend what an LLM can do.

**What we’ll build (final objective).** A minimal yet capable **Plan–Act–Observe agent** that:
- Plans the next action given the current goal and scratchpad.
- Invokes tools: **DuckDuckGo web search**, a **safe calculator**, and a **restricted Python executor**.
- Observes results, updates its scratchpad, and iterates until it can produce a final answer with brief citations when web is used.

**How steps unfold (basic → intermediate → advanced).**
1. **Stage A (Core loop):** Prompt, scratchpad, and parsing of Thought/Action/Action Input/Observation.
2. **Stage B (Tools):** Plug in three tools (search, calc, code). Add a simple tool router.
3. **Stage C (Control & Safety):** Timeouts, step limits, and result formatting with optional citations.
4. **Stage D (Demos & Tests):** Three diverse tasks that trigger multiple tool calls.

**Requirements.**
- **Python:** 3.10+
- **Libraries (pinned):**
  - `openai==1.40.2` (or set `LITELLM_PROVIDER` to use LiteLLM-compatible endpoints)
  - `duckduckgo-search==5.3.1`
  - `tiktoken==0.7.0` (token counting, optional)
- **Env vars:**
  - `OPENAI_API_KEY` (or set `LITELLM_BASE_URL` and `LITELLM_MODEL` if routing through LiteLLM; keep `OPENAI_API_KEY` empty in that case.)

> ⚠️ **No secrets in code.** Read keys from environment variables only.

## Plan Outline
- **Stage A → Core Plan–Act–Observe Loop**
- **Stage B → Add Tools (Search, Calculator, Python Exec)**
- **Stage C → Safety, Limits, and Answer Formatting**
- **Stage D → Testing with 3 Tasks**
- **Wrap-Up + Quick Install**

## Stage A — Core Plan–Act–Observe Loop
*Goal:* Implement a minimal loop where the LLM proposes a Thought and (optionally) a Tool `Action` with `Action Input`, we then capture an `Observation`, append to scratchpad, and continue.

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "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"
print("API KEY:", os.getenv("OPENAI_API_KEY"))

API KEY: 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


### A1. System & helper utilities

In [3]:
# A1: Imports and environment
import os, time, json, re, math, textwrap
from dataclasses import dataclass, field
from typing import Dict, List, Callable, Optional, Any

# Model selection: default to OpenAI; allow LiteLLM-style routing via envs
USE_LITELLM = bool(os.getenv("LITELLM_BASE_URL"))
MODEL_DEFAULT = os.getenv("LITELLM_MODEL", "gpt-4o-mini") if USE_LITELLM else os.getenv("OPENAI_MODEL", "gpt-4o-mini")

if USE_LITELLM:
    import requests
    def llm_chat(messages, model=MODEL_DEFAULT, temperature=0.2):
        url = os.environ["LITELLM_BASE_URL"].rstrip("/") + "/chat/completions"
        headers = {"Content-Type": "application/json"}
        payload = {"model": model, "messages": messages, "temperature": temperature, "stream": False}
        r = requests.post(url, headers=headers, json=payload, timeout=60)
        r.raise_for_status()
        return r.json()["choices"][0]["message"]["content"]
else:
    from openai import OpenAI
    _client = OpenAI()
    def llm_chat(messages, model=MODEL_DEFAULT, temperature=0.2):
        resp = _client.chat.completions.create(model=model, messages=messages, temperature=temperature)
        return resp.choices[0].message.content

SYSTEM_PROMPT = """
You are a careful, tool-using assistant that follows a Plan–Act–Observe loop.
When you need to use a tool, reply ONLY in this format:

Thought: <your reasoning, brief>
Action: <tool_name>
Action Input: <json-serialized input for that tool>

If you have enough information to answer, reply ONLY in this format:

Final Answer: <your concise answer with brief citations if web was used>
""".strip()

**Explanation.** We provide a system prompt and an abstraction `llm_chat` that can call either OpenAI or a LiteLLM-compatible endpoint. The prompt constrains the agent to a predictable schema.

### A2. Scratchpad + parser for Thought/Action blocks

In [4]:
# A2: Parse model output into {thought, action, action_input, final}
ACTION_RE = re.compile(r"^Thought:\s*(?P<thought>.*?)\nAction:\s*(?P<action>[\w_\-]+)\nAction Input:\s*(?P<input>.*)\s*\Z", re.DOTALL)
FINAL_RE = re.compile(r"^Final Answer:\s*(?P<final>.*)\Z", re.DOTALL)

@dataclass
class Step:
    thought: str
    action: Optional[str] = None
    action_input: Optional[dict] = None
    observation: Optional[str] = None

@dataclass
class AgentState:
    goal: str
    scratchpad: List[Step] = field(default_factory=list)
    used_web: bool = False

def parse_agent_reply(text: str) -> Dict[str, Any]:
    m_final = FINAL_RE.match(text.strip())
    if m_final:
        return {"final": m_final.group("final").strip()}
    m_action = ACTION_RE.match(text.strip())
    if m_action:
        raw_input = m_action.group("input").strip()
        try:
            action_input = json.loads(raw_input) if raw_input else {}
        except json.JSONDecodeError:
            # fallback: treat as plain string
            action_input = {"query": raw_input}
        return {
            "thought": m_action.group("thought").strip(),
            "action": m_action.group("action").strip(),
            "action_input": action_input,
        }
    # If neither matches, coerce into a Thought with no Action.
    return {"thought": text.strip(), "action": None, "action_input": None}

**Explanation.** The agent may emit either a `Final Answer` or a `Thought/Action/Action Input` triple. We parse both and store as `Step`s.

### A3. The core Plan–Act–Observe loop (no tools yet)

In [5]:
# A3: Core loop skeleton

def run_agent(goal: str, tools: Dict[str, Callable[[dict], str]], max_steps: int = 6, temperature: float = 0.2) -> str:
    state = AgentState(goal=goal)
    for step_idx in range(1, max_steps + 1):
        # Build messages: system + user(goal) + scratchpad transcript
        messages = [{"role": "system", "content": SYSTEM_PROMPT}]
        user_block = f"Goal: {goal}\n\nScratchpad so far:\n"
        for i, s in enumerate(state.scratchpad, 1):
            user_block += f"Step {i}\nThought: {s.thought}\n"
            if s.action:
                user_block += f"Action: {s.action}\nAction Input: {json.dumps(s.action_input, ensure_ascii=False)}\n"
            if s.observation:
                user_block += f"Observation: {s.observation}\n"
            user_block += "\n"
        messages.append({"role": "user", "content": user_block})

        reply = llm_chat(messages, temperature=temperature)
        parsed = parse_agent_reply(reply)

        # Final?
        if "final" in parsed:
            return parsed["final"]

        # Otherwise execute action (if any)
        thought = parsed.get("thought", "")
        action = parsed.get("action")
        action_input = parsed.get("action_input") or {}

        observation = "(no action taken)"
        if action:
            tool = tools.get(action)
            if tool is None:
                observation = f"Error: Unknown tool '{action}'. Available: {list(tools)}"
            else:
                try:
                    observation = tool(action_input)
                except Exception as e:
                    observation = f"ToolError: {type(e).__name__}: {e}"
            if action == "web_search" and "Error" not in observation:
                state.used_web = True

        state.scratchpad.append(Step(thought=thought, action=action, action_input=action_input, observation=observation))

    # Fallback if max_steps reached
    tail_note = "\n\n(Note: step limit reached; consider increasing max_steps.)"
    return ("Partial result after steps: \n" + "\n".join(
        [f"- {s.observation}" for s in state.scratchpad if s.observation]) + tail_note)

**Explanation.** The loop builds a message with the current transcript, asks the LLM for the next move, executes the chosen action if valid, appends the observation, and repeats. When the model believes it’s done, it returns a **Final Answer**.

## Stage B — Add Tools (Search, Calculator, Python Exec)
*Goal:* Register three tools and simple adapters.

### B1. Tool: DuckDuckGo web search (top results + snippets)

In [6]:
# B1: Web search via DuckDuckGo
from duckduckgo_search import DDGS

# Simple cache to avoid repeated IO in a single run
_CACHE: Dict[str, Any] = {}

def web_search_tool(args: dict) -> str:
    """Args: {"query": str, "max_results": int=5, "region": "wt-wt", "safesearch": "moderate"}
    Returns a JSON string with results: [{title, href, body}]."""
    q = args.get("query") or args.get("q")
    if not q: return "Error: provide {'query': '<text>'}"
    k = int(args.get("max_results", 5))
    region = args.get("region", "wt-wt")
    safesearch = args.get("safesearch", "moderate")
    cache_key = json.dumps([q, k, region, safesearch])
    if cache_key in _CACHE:
        return _CACHE[cache_key]
    with DDGS() as ddgs:
        results = list(ddgs.text(q, region=region, safesearch=safesearch, max_results=k))
    payload = json.dumps(results[:k], ensure_ascii=False)
    _CACHE[cache_key] = payload
    return payload

**Explanation.** We use `duckduckgo-search` to fetch a few fresh results. The agent will summarize and include brief citations.

### B2. Tool: Safe calculator (AST-limited eval)

In [19]:
# B2: Safe calculator with AST parsing
import ast, math

class SafeEval(ast.NodeVisitor):
    allowed_nodes = (
        ast.Expression, ast.BinOp, ast.UnaryOp, ast.Constant,
        ast.Add, ast.Sub, ast.Mult, ast.Div, ast.Mod, ast.Pow,
        ast.USub, ast.UAdd, ast.FloorDiv, ast.Call, ast.Name, ast.Load
    )
    allowed_names = {
        "pi": math.pi,
        "e": math.e,
        "tau": math.tau,
        "sqrt": math.sqrt,
        "log": math.log,
        "sin": math.sin,
        "cos": math.cos,
        "tan": math.tan,
    }

    def visit(self, node):
        if not isinstance(node, self.allowed_nodes):
            raise ValueError(f"Disallowed expression: {type(node).__name__}")
        return super().visit(node)

    def eval(self, expr: str):
        tree = ast.parse(expr, mode="eval")
        self.visit(tree)
        code = compile(tree, "<calc>", "eval")
        return eval(code, {"__builtins__": {}}, self.allowed_names)

_calc = SafeEval()

def calculator_tool(args: dict) -> str:
    expr = str(args.get("expression") or args.get("expr") or "").strip()
    if not expr:
        return "Error: provide {'expression': '<math>'}"
    try:
        val = _calc.eval(expr)
        return str(val)
    except Exception as e:
        return f"Error: {e}"

**Explanation.** We parse with `ast` to whitelist numeric operations and a few math functions.

### B3. Tool: Restricted Python code execution

In [20]:
# B3: Restricted Python execution (no I/O, limited globals)
import contextlib, io, random

ALLOWED_GLOBALS = {
    "math": math, "random": random, "range": range, "len": len, "sum": sum,
    "min": min, "max": max, "abs": abs, "enumerate": enumerate
}

def python_exec_tool(args: dict) -> str:
    """Args: {"code": "python code"}
    Executes in a restricted namespace and returns printed output.
    """
    code = args.get("code")
    if not code:
        return "Error: provide {'code': '<python>'}"
    # Disallow import, open, __, exec/eval, etc.
    forbidden = ["import ", "open(", "__", "exec(", "eval(", "compile(", "globals(", "locals("]
    if any(tok in code for tok in forbidden):
        return "Error: forbidden tokens detected in code."
    # Capture stdout
    buf = io.StringIO()
    try:
        with contextlib.redirect_stdout(buf):
            exec(compile(code, "<agent-code>", "exec"), {**ALLOWED_GLOBALS}, {})
    except Exception as e:
        return f"Error: {type(e).__name__}: {e}"
    return buf.getvalue().strip() or "<no output>"

**Explanation.** We block dangerous builtins and keep the sandbox intentionally minimal—good enough for mathy demos and small simulations.

### B4. Register tools & a router

In [21]:
# B4: Register tools
TOOLS: Dict[str, Callable[[dict], str]] = {
    "web_search": web_search_tool,
    "calculator": calculator_tool,
    "python": python_exec_tool,
}

# A small description block the LLM can use (you can show this to the model as needed)
TOOL_DESCRIPTIONS = {
    "web_search": "Search the web with DuckDuckGo. Input: {\"query\": str, \"max_results\": int}. Output: JSON list of {title, href, body}",
    "calculator": "Evaluate math expressions safely. Input: {\"expression\": str}. Output: stringified result",
    "python": "Run small Python snippets without I/O. Input: {\"code\": str}. Output: stdout or <no output>",
}

**Explanation.** The agent now has three callable tools.

## Stage C — Safety, Limits, and Answer Formatting
*Goal:* Encourage the model to plan; inject the tool list; limit steps; and format final answers with optional citations.

### C1. Add tool hints to the prompt & a planning nudge

In [22]:
# C1: Extend SYSTEM_PROMPT with tool inventory + planning guidance
TOOL_HINTS = "\n".join([f"- {k}: {v}" for k, v in TOOL_DESCRIPTIONS.items()])
SYSTEM_PROMPT_WITH_TOOLS = SYSTEM_PROMPT + "\n\nAvailable tools:\n" + TOOL_HINTS + "\n\nPlanning guidance: Break the task into steps. Prefer using tools when needed; keep Thoughts brief."

def run_agent_with_tools(goal: str, max_steps: int = 6, temperature: float = 0.2) -> str:
    # Same as run_agent but uses the extended system prompt and returns clean Final Answer.
    state = AgentState(goal=goal)
    for step_idx in range(1, max_steps + 1):
        messages = [{"role": "system", "content": SYSTEM_PROMPT_WITH_TOOLS}]
        user_block = f"Goal: {goal}\n\nScratchpad so far:\n"
        for i, s in enumerate(state.scratchpad, 1):
            user_block += f"Step {i}\nThought: {s.thought}\n"
            if s.action:
                user_block += f"Action: {s.action}\nAction Input: {json.dumps(s.action_input, ensure_ascii=False)}\n"
            if s.observation:
                user_block += f"Observation: {s.observation}\n"
            user_block += "\n"
        messages.append({"role": "user", "content": user_block})

        reply = llm_chat(messages, temperature=temperature)
        parsed = parse_agent_reply(reply)
        if "final" in parsed:
            final = parsed["final"].strip()
            # Add a small note if web used but no citation-like text present
            if state.used_web and ("http" not in final and "href" not in final):
                final += "\n\n(Sources: derived from DuckDuckGo results in the scratchpad.)"
            return final

        thought = parsed.get("thought", "")
        action = parsed.get("action")
        action_input = parsed.get("action_input") or {}

        observation = "(no action taken)"
        if action:
            tool = TOOLS.get(action)
            if tool is None:
                observation = f"Error: Unknown tool '{action}'. Available: {list(TOOLS)}"
            else:
                try:
                    observation = tool(action_input)
                except Exception as e:
                    observation = f"ToolError: {type(e).__name__}: {e}"
            if action == "web_search" and "Error" not in observation:
                state.used_web = True
        state.scratchpad.append(Step(thought=thought, action=action, action_input=action_input, observation=observation))

    return "Unable to reach Final Answer within step limit. Try increasing max_steps or simplifying the goal."

**Explanation.** We now provide tool descriptions to the model, nudge planning behavior, and return a clean string.

## Stage D — Tests & Demonstrations
We’ll execute three diverse tasks. Feel free to change topics to your interests.

### D1. Research Task — “Find and summarize recent articles on a topic”
*Prompt:*  
> "Find and summarize recent articles (past 3 months) on the environmental impact of data centers; provide 5 bullets and 3 source links."

In [25]:
research_goal = (
    "Find and summarize recent articles (past 3 months) on the environmental impact of data centers; "
    "provide 5 bullets and 3 source links."
)
print(run_agent_with_tools(research_goal, max_steps=6, temperature=0.2))

- **2025 ESG Report: Data Centre Environmental Impact**: This report provides an in-depth look at the environmental footprint of data center providers, highlighting the need for sustainable practices. [Read more](https://dcnnmagazine.com/data-centres/2025-esg-report-data-centre-environmental-impact/)
- **Investigating the Environmental Sustainability of Data Centers**: This study reviews sustainable practices in data center construction and maintenance, emphasizing the importance of addressing environmental impacts. [Read more](https://ajosr.org/wp-content/uploads/journal/published_paper/volume-3/issue-1/ajsr2024_Zrj3RTFV.pdf)
- **Data Center Environmental Impact: Key Challenges**: Discusses energy use, water consumption, e-waste, and emissions as major concerns in the digital infrastructure of data centers. [Read more](https://cc-techgroup.com/data-center-environmental-impact/)
- **The Environmental Impact of Data Centers - Simple Science**: Highlights the significant increase in data

**Expected behavior.** The agent will call `web_search` at least once, parse snippets, and return a compact summary + links.

### D2. Fact → Calculation Task — “World population growth rate rough calc”
*Prompt:*  
> "If world population was ~7.9B in 2021 and ~8.1B in 2023, estimate the annualized growth rate using the calculator, then explain in one sentence."

In [26]:
calc_goal = (
    "If world population was ~7.9B in 2021 and ~8.1B in 2023, estimate the annualized growth rate using the calculator, "
    "then explain in one sentence."
)
print(run_agent_with_tools(calc_goal, max_steps=6))

The annualized growth rate of the world population from 2021 to 2023 is approximately 1.26%. This is calculated using the formula for annual growth rate, which shows the percentage increase over the two-year period.


**Expected behavior.** The agent should (ideally) call `calculator` with an expression like `(8.1/7.9)**(1/2)-1` and then report a percentage.

### D3. Code/Simulation Task — “Monte Carlo π”
*Prompt:*  
> "Use the python tool to run a short Monte Carlo simulation to estimate π with 100_000 samples; print the estimate only."

In [29]:
mc_goal = (
    "Use the python tool to run a short Monte Carlo simulation to estimate π with 100_000 samples; print the estimate only."
)
print(run_agent_with_tools(mc_goal, max_steps=10))

Unable to reach Final Answer within step limit. Try increasing max_steps or simplifying the goal.


**Expected behavior.** The agent should choose the `python` tool and print a numeric estimate near 3.1415.

##  Notes
- **Why Plan–Act–Observe?** It structures problem solving, leaves a traceable audit trail (scratchpad), and makes tool use explicit.
- **Design choices:**
  - We used a **constrained output format** to keep parsing reliable.
  - **Tool registry** enables easy extension—add your own tools and update `TOOL_DESCRIPTIONS`.
  - **Safety:** AST-limited calculator and restricted `exec` prevent common hazards. The Python sandbox is intentionally conservative.
- **Limitations:**
  - Output parsing may fail if the model deviates from the schema; we partially guard against this.
  - Web search quality depends on DDG snippets; for production, combine with site fetching + reader.
  - The sandbox blocks imports and file/network I/O; widen carefully if you need more features.

## Wrap-Up
**What you learned**
- How a Plan–Act–Observe loop coordinates an LLM with external tools.
- How to integrate web search, a calculator, and restricted Python execution.
- How to test an agent on research, calc, and simulation tasks.

**Next steps**
- Add a **retriever** over your documents (e.g., local PDFs) as another tool.
- Implement **result grading** (self-check) before Final Answer.
- Log telemetry (steps, tokens) and add UI.