# Level 2 - Week 6 - 01 Agent Loop and State

**Estimated time:** 60-90 minutes

## Learning Objectives

- Track explicit state per step
- Store tool inputs and outputs
- Define stop conditions


## Overview

An agent is not a single prompt.

It is a controller that loops:

- plan
- call tool
- observe output
- decide next step

## Underlying theory: an agent is a policy over state

Model the agent as a loop that maintains state $s_t$ and chooses an action $a_t$.

- state $s_t$: what the agent currently knows (task, plan, tool results)
- action $a_t$: either “call a tool with arguments” or “stop and answer”

Conceptually:

$$
s_{t+1} = \mathrm{Update}(s_t, a_t, o_t)
$$

where $o_t$ is the observation (tool output or error).

Most agent bugs are state bugs:

- missing/corrupted state → repeated calls
- forgotten constraints → invalid tool inputs
- untracked errors → infinite loops

## State machine intuition (why stop conditions are mandatory)

Execution looks like a finite state machine:

- PLAN → TOOL_CALL → OBSERVE → DECIDE → (repeat)

Stop conditions are the “accepting states”:

- DONE (final answer)
- NEEDS_USER (clarify)
- FAILED (bounded failure)

Without explicit stop states, the loop can run forever.

## Practice Steps

- Define `Step` and `AgentState`.
- Implement a deterministic `run_agent` with a step cap.
- Store every tool call (inputs, outputs, errors, latency) in `state.steps`.

### Sample code

State model for agent steps.


In [None]:
from __future__ import annotations

from dataclasses import dataclass, field
from typing import Any


@dataclass
class Step:
    tool: str
    tool_input: dict[str, Any]
    tool_output: dict[str, Any] | None = None
    error: str | None = None
    latency_ms: int | None = None


@dataclass
class AgentState:
    task: str
    plan: list[str] = field(default_factory=list)
    steps: list[Step] = field(default_factory=list)
    final: dict[str, Any] | None = None

### Student fill-in

Implement run_agent and store steps.


In [None]:
from __future__ import annotations

import time
from typing import Callable


def run_agent(task: str, tools: dict[str, Callable[[dict], dict]], max_steps: int = 3) -> AgentState:
    state = AgentState(task=task, plan=["search", "write_answer"], steps=[], final=None)

    for step_index, tool_name in enumerate(state.plan, start=1):
        if step_index > max_steps:
            state.final = {"mode": "refuse", "answer": None, "citations": [], "reason": "max_steps_reached"}
            break

        tool_fn = tools.get(tool_name)
        if tool_fn is None:
            state.steps.append(Step(tool=tool_name, tool_input={}, tool_output=None, error="unknown_tool"))
            state.final = {"mode": "refuse", "answer": None, "citations": [], "reason": "unknown_tool"}
            break

        if tool_name == "search":
            tool_input = {"query": task, "top_k": 3}
        elif tool_name == "write_answer":
            search_step = next((s for s in state.steps if s.tool == "search" and s.tool_output), None)
            hits = (search_step.tool_output or {}).get("hits", []) if search_step else []
            if not hits:
                state.final = {"mode": "clarify", "answer": None, "citations": [], "question": "Can you clarify what domain or document set to search?"}
                break
            tool_input = {"question": task, "hits": hits}
        else:
            tool_input = {"task": task}

        t0 = time.perf_counter()
        try:
            tool_output = tool_fn(tool_input)
            latency_ms = int((time.perf_counter() - t0) * 1000)
            state.steps.append(Step(tool=tool_name, tool_input=tool_input, tool_output=tool_output, error=None, latency_ms=latency_ms))
        except Exception as e:
            latency_ms = int((time.perf_counter() - t0) * 1000)
            state.steps.append(Step(tool=tool_name, tool_input=tool_input, tool_output=None, error=str(e), latency_ms=latency_ms))
            state.final = {"mode": "refuse", "answer": None, "citations": [], "reason": "tool_error"}
            break

    if state.final is None:
        last = state.steps[-1].tool_output if state.steps else None
        state.final = last

    return state

## Self-check

- Is every tool call stored in steps?
- Do you cap max_steps?


### Exercise: Run a toy agent with two tools

You will implement two deterministic “toy” tools:

- `search`: returns a small list of hits (each with `chunk_id` so it can be cited)
- `write_answer`: produces an answer + citations from those hits

Then run `run_agent(...)` and inspect the captured `state.steps` and `state.final`.

Goal: verify you can debug behavior *without rerunning*, purely from the recorded state.

### Exercise: Implement toy tools

Define two tools with explicit inputs/outputs:

- `search({query, top_k}) -> {hits: [{chunk_id, text}]}`
- `write_answer({question, hits}) -> {mode, answer, citations}`

Then pass them into `run_agent(...)` and inspect the resulting `AgentState`.

In [None]:
from typing import Any


def search_tool(payload: dict[str, Any]) -> dict[str, Any]:
    query = str(payload.get("query", "")).strip()
    top_k = int(payload.get("top_k", 3))
    if not query:
        return {"hits": []}

    hits = [
        {"chunk_id": "kb#001", "text": f"Policy excerpt relevant to: {query}"},
        {"chunk_id": "kb#002", "text": "Additional supporting detail."},
    ]
    return {"hits": hits[:top_k]}


def write_answer_tool(payload: dict[str, Any]) -> dict[str, Any]:
    question = str(payload.get("question", "")).strip()
    hits = payload.get("hits", []) or []

    if not hits:
        return {"mode": "clarify", "answer": None, "citations": [], "question": "What domain should I search?"}

    top = hits[0]
    chunk_id = top.get("chunk_id", "")
    answer = f"Based on the KB, here is the key detail: {top.get('text', '')}"
    citations = [{"chunk_id": chunk_id, "snippet": top.get("text", "")[:80]}] if chunk_id else []

    return {"mode": "answer", "answer": answer, "citations": citations}


tools = {"search": search_tool, "write_answer": write_answer_tool}
state = run_agent("refund policy", tools=tools, max_steps=3)

print("final:", state.final)
print("n_steps:", len(state.steps))

In [None]:
for i, s in enumerate(state.steps, start=1):
    print("---")
    print("step", i)
    print("tool:", s.tool)
    print("latency_ms:", s.latency_ms)
    print("input:", s.tool_input)
    print("output:", s.tool_output)
    print("error:", s.error)

## Self-check

- Is every tool call recorded in `state.steps` (including inputs + outputs + errors)?
- Do you cap steps with `max_steps`?
- If `search` returns no hits, does the agent switch to a safe `clarify` outcome?
- Can you explain the final answer from the recorded state alone?