<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/038_JW_Modular_AgentDesign_III.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# 🧩 Agent Design — Compact Recipe (GAME: Goals · Actions · Memory · Environment)

> What you’ll get below:
>
> 1. tiny types,
> 2. `Action` + `ActionRegistry`,
> 3. a safe-ish `Environment`,
> 4. an `AgentLanguage` (fenced-JSON variant),
> 5. a couple of file tools,
> 6. wire it all into your `Agent` (the class you pasted),
> 7. a one-liner to run.

---

## 0) Minimal types

```python
from typing import List, Dict, Any, Callable, Protocol
import json, os, re, time
from dataclasses import dataclass

# Aliases
Prompt = List[Dict[str, str]]

@dataclass
class Goal:
    text: str

class Memory:
    def __init__(self, max_events: int = 50):
        self.events: List[Dict[str, Any]] = []
        self.max_events = max_events

    def add_memory(self, item: Dict[str, Any]):
        self.events.append(item)
        # keep it small
        if len(self.events) > self.max_events:
            self.events = self.events[-self.max_events:]

    def to_text(self) -> str:
        # simple linearization for prompts; keep it compact
        lines = []
        for e in self.events[-20:]:
            role = e.get("type", "note")
            content = e.get("content", "")
            lines.append(f"{role.upper()}: {content}")
        return "\n".join(lines)
```

---

## 1) Actions (definition + registry)

```python
class Action:
    def __init__(self, name: str, function: Callable, description: str, parameters: Dict[str, Any], terminal: bool=False):
        self.name = name
        self.function = function          # Python callable to run
        self.description = description    # short blurb for the model
        self.parameters = parameters      # JSON Schema for args
        self.terminal = terminal          # if True, ends the loop

    def to_openai_tool(self) -> Dict[str, Any]:
        # If you later switch to tool-calling, you'll use this.
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters,
            },
        }

    def execute(self, **args):
        return self.function(**args)

class ActionRegistry:
    def __init__(self):
        self.actions: Dict[str, Action] = {}

    def register(self, action: Action):
        if action.name in self.actions:
            raise ValueError(f"Action '{action.name}' already registered")
        self.actions[action.name] = action

    def get_action(self, name: str):
        return self.actions.get(name)

    def get_actions(self) -> List[Action]:
        return list(self.actions.values())
```

---

## 2) Environment (safe execution gateway)

```python
class Environment:
    """
    Central place for policy, safety, logging, and normalized results.
    """
    def __init__(self, base_dir: str | None = None, allowed_exts={".txt", ".md"}, timeout_s: float = 10.0):
        self.base_dir = os.path.abspath(base_dir) if base_dir else None
        self.allowed_exts = set(allowed_exts) if allowed_exts else set()
        self.timeout_s = timeout_s

    def _safe_path(self, filename: str) -> str:
        full = os.path.abspath(os.path.join(self.base_dir, filename)) if self.base_dir else os.path.abspath(filename)
        if self.base_dir and not full.startswith(self.base_dir + os.sep):
            raise PermissionError("Path traversal blocked.")
        if self.allowed_exts:
            _, ext = os.path.splitext(full)
            if ext.lower() not in self.allowed_exts:
                raise PermissionError(f"Extension {ext} not allowed.")
        return full

    def execute_action(self, action: Action, args: Dict[str, Any]) -> Dict[str, Any]:
        start = time.time()
        # simple semantic guardrails for file-like tools
        guarded_args = dict(args)
        if action.name in {"read_file", "search_in_file"} and "file_name" in guarded_args:
            guarded_args["file_name"] = self._safe_path(guarded_args["file_name"])

        try:
            data = action.execute(**guarded_args)
            return {"ok": True, "action": action.name, "args": guarded_args, "data": data, "ms": int((time.time()-start)*1000)}
        except Exception as e:
            return {"ok": False, "action": action.name, "args": guarded_args, "error": str(e), "ms": int((time.time()-start)*1000)}
```

---

## 3) AgentLanguage (prompt builder + response parser)

This version uses the **fenced-JSON** protocol you learned (easy to swap to OpenAI tool-calling later).

````python
class AgentLanguage:
    """
    Defines the protocol for speaking with the LLM:
    - construct_prompt: how to ask
    - parse_response:   how to interpret
    """
    block_example = (
        '{"schema_version":1,'
        '"tool":"<tool-name|final_answer>",'
        '"args":{ /* key: value */ },'
        '"terminal": <true|false>}'
    )

    def construct_prompt(self, actions: List[Action], environment: Environment, goals: List[Goal], memory: Memory) -> Prompt:
        tools_list = "\n".join(
            f"- {a.name}: {a.description} | params: {list(a.parameters.get('properties', {}).keys())}"
            for a in actions
        )
        sys = (
            "You are an agent planner. "
            "Reply ONLY with a single fenced code block labeled action containing STRICT JSON exactly like:\n"
            f"```action\n{self.block_example}\n```\n"
            "No extra text."
        )
        user = (
            "GOALS:\n" + "\n".join(f"- {g.text}" for g in goals) +
            "\n\nTOOLS:\n" + tools_list +
            "\n\nMEMORY (recent):\n" + memory.to_text() +
            "\n\nDecide the next step."
        )
        return [{"role":"system","content":sys}, {"role":"user","content":user}]

    def parse_response(self, response_text: str) -> Dict[str, Any]:
        m = re.search(r"```action(?:\s+json)?\s*(\{.*?\})\s*```", response_text, re.DOTALL | re.IGNORECASE)
        if not m:
            raise ValueError("Missing ```action``` JSON block.")
        try:
            data = json.loads(m.group(1))
        except json.JSONDecodeError as e:
            raise ValueError(f"Invalid JSON in action block: {e}")
        # minimal schema checks
        if "tool" not in data or "args" not in data:
            raise ValueError("Missing 'tool' or 'args' in action JSON.")
        if "terminal" not in data:
            data["terminal"] = False
        return data
````

---

## 4) Tools (Python functions) + registration

```python
# --- sample tools ---
def list_files() -> list[str]:
    return sorted([f for f in os.listdir(".") if os.path.isfile(f)])

def read_file(file_name: str) -> dict:
    with open(file_name, "r", encoding="utf-8") as f:
        return {"file_name": file_name, "content": f.read()}

def search_in_file(file_name: str, search_term: str) -> dict:
    matches = []
    with open(file_name, "r", encoding="utf-8") as f:
        for i, line in enumerate(f, start=1):
            if search_term in line:
                matches.append({"line": i, "text": line.rstrip("\n")})
    return {"file_name": file_name, "matches": matches}

# --- a terminal action so the agent can stop ---
def final_answer(text: str) -> dict:
    return {"answer": text}

# --- register them ---
registry = ActionRegistry()

registry.register(Action(
    name="list_files",
    function=list_files,
    description="List all files in the current working directory.",
    parameters={"type":"object", "properties":{}, "required":[]}
))

registry.register(Action(
    name="read_file",
    function=read_file,
    description="Read the contents of a text file.",
    parameters={"type":"object",
                "properties":{"file_name":{"type":"string"}},
                "required":["file_name"]}
))

registry.register(Action(
    name="search_in_file",
    function=search_in_file,
    description="Find lines containing a term in a file.",
    parameters={"type":"object",
                "properties":{"file_name":{"type":"string"},
                              "search_term":{"type":"string"}},
                "required":["file_name","search_term"]}
))

registry.register(Action(
    name="final_answer",
    function=final_answer,
    description="Return the final answer to the user and stop.",
    parameters={"type":"object",
                "properties":{"text":{"type":"string"}},
                "required":["text"]},
    terminal=True
))
```

---

## 5) LLM caller (dependency-injected)

You can start with a stub (for local testing) and swap in your real OpenAI call later.

````python
# --- stub for local testing (always proposes to list files then stop) ---
def fake_generate_response(prompt: Prompt) -> str:
    # Always choose list_files first (toy logic)
    return """```action
{"schema_version":1,"tool":"list_files","args":{},"terminal":false}
```"""

# Example OpenAI wiring (uncomment & adapt when ready):
# from openai import OpenAI
# client = OpenAI()
# def openai_generate_response(prompt: Prompt) -> str:
#     resp = client.chat.completions.create(
#         model="gpt-4o-mini",
#         messages=prompt,
#         max_tokens=400
#     )
#     return resp.choices[0].message.content
````

---

## 6) Plug into **your Agent** and run

```python
# Instantiate pieces (Dependency Injection)
goals = [Goal("Answer the user's request using available tools"),
         Goal("Stop with final_answer when done")]

agent_language = AgentLanguage()
env = Environment(base_dir=".", allowed_exts={".txt", ".md"})
generate = fake_generate_response  # swap to openai_generate_response when ready

# Use your Agent class exactly as pasted
# (make sure it's already in the notebook above this cell)
agent = Agent(goals, agent_language, registry, generate, env)

# Kick off a run
mem = agent.run("What files are here, and then stop with a final answer?")
```

> When you swap in the real LLM, it should return an `action` JSON block.
> After a tool or two, have it return:
>
> ```action
> {"schema_version":1,"tool":"final_answer","args":{"text":"..."},"terminal":true}
> ```
>
> The loop will stop because `final_answer` has `terminal=True`.

---

## 7) When you’re ready for production-ish upgrades

* **Swap to OpenAI Tool-Calling**: change `AgentLanguage` to build `tools=[a.to_openai_tool()]` and parse `message.tool_calls` instead of fenced JSON.
* **Validation**: add Pydantic models per action and validate args before `execute`.
* **Retry on parse**: if `parse_response` fails, append a corrective message and retry once at `temperature=0`.
* **Observability**: log `prompt → action → result (ok/ms)` in the environment.
* **Memory control**: summarize occasionally; or keep only the last N steps.

---

### TL;DR flow (keep this mental model)

```
Goals + Memory + Actions --(AgentLanguage.construct_prompt)--> LLM
LLM --> {"tool": "...", "args": {...}, "terminal": ...}  (fenced JSON)
Agent --> Registry.get_action(name)
Environment --> execute safely --> {ok,data|error}
Agent --> update Memory --> check terminal --> loop or stop
```

This notebook skeleton lets you **start simple** and **swap pieces later** (LLM caller, prompt protocol, tool set, environment policy) without touching the Agent’s core.




# 🧭 Agent Dev Notes — What to Focus On & Why

## 1) Priorities (in this order)

1. **Reliability** → Structured outputs (tool calls/JSON) + schema validation + clear termination.
2. **Safety** → Environment guardrails (path/url allowlists, timeouts, rate limits, idempotency).
3. **Observability** → Log prompts, tool calls, args (redacted), results, timings, token/cost, errors.
4. **Testability** → Dependency injection (fake LLM, fake Environment); unit tests for parse/dispatch.
5. **Cost/Latency** → Short prompts, strict schemas, minimal memory context, limited retries.

---

## 2) Design pillars

* **Separation of concerns**

  * LLM plans; **Environment** executes; **Registry** discovers tools; **AgentLanguage** builds/parses; **Agent** orchestrates.
* **Contracts over prose**

  * Prefer tool/function calling or JSON mode; otherwise fenced JSON + Pydantic validation.
* **Deterministic control**

  * Whitelist tool names; validate types/ranges; consistent result envelope `{ok, data|error, ms}`.
* **Termination**

  * Provide a `final_answer` (or similar) **terminal action**; cap `max_iterations`.

---

## 3) Memory policy

* Keep it **small and relevant**: sliding window of recent steps + optional periodic summaries.
* Store **structured tool results**; avoid stuffing huge blobs into the prompt.
* Consider a “pins” section for facts that must persist across steps.

---

## 4) Prompts (AgentLanguage)

* Be **brief and specific**; include:

  * goals (bullets),
  * tools (name + one-line description + param names),
  * minimal recent memory,
  * exact output format (schema or tool call).
* Add **one tiny example** only if needed; examples cost tokens and can drift.
* For retries: lower temperature (→ 0), restate schema, and ask for **JSON only**.

---

## 5) Environment guardrails (must-haves)

* **Filesystem**: base directory, path traversal checks, allowed extensions.
* **Network**: domain allowlist, per-host rate limits, response size caps.
* **Runtime**: per-tool timeouts, retry for transient errors, idempotency keys for side effects.
* **Security/Privacy**: redact logs, never echo secrets into prompts, validate user-provided paths/URLs.

---

## 6) Validation strategy

* **Structural**: JSON parse + Pydantic/JSON Schema (required keys, types, enums).
* **Semantic**: business rules (e.g., `b != 0`, filename in allowed dir, nonempty query).
* **On failure**: 1 retry (2 max) with stricter instructions; else return a safe error.

---

## 7) Logging & metrics (observability)

Track per step:

* `tool_name`, `args_hash`, `ok`, `ms`, error message (if any),
* tokens in/out, cost (if available),
* retry count, invalid-output rate, loop length.
  Set alerts if invalid-output or error rates spike.

---

## 8) Do’s & Don’ts

### ✅ Do

* Inject dependencies (LLM caller, Environment, Registry, AgentLanguage).
* Keep schemas tight; version them (`"schema_version": 1`).
* Normalize tool results (same envelope) to make next-step reasoning easier.
* Unit-test `parse_response`, arg validation, and dispatcher with fakes.
* Use `max_iterations` and per-turn budgets (time/tokens).

### ❌ Don’t

* Don’t parse free-form prose if you can use tool calling/JSON mode.
* Don’t execute tools without validating args.
* Don’t let the LLM choose arbitrary file paths or URLs.
* Don’t let memory grow unbounded or include huge raw blobs.
* Don’t hard-code providers/paths inside your Agent (avoid the “glued toy” anti-pattern).

---

## 9) Minimal “production-ish” checklist

* [ ] Tool schemas defined; tool names whitelisted.
* [ ] Pydantic validation before every execution.
* [ ] Environment with allowlists, timeouts, retries, idempotency.
* [ ] Terminal action defined and enforced.
* [ ] Retry policy (≤2) + lower temp on retry.
* [ ] Logs/metrics wired; sensitive data redacted.
* [ ] Unit tests with fake LLM + fake Environment.

---

## 10) Quick upgrade roadmap

1. Start with fenced JSON → **switch to tool/function calling**.
2. Add Pydantic models per tool args + per-tool semantic validators.
3. Introduce a **summarizing memory** to keep context small.
4. Add budgeting (tokens/time per run) and circuit breakers.
5. Move to a vector store/RAG **only if** the tasks require retrieval (don’t over-engineer early).

---

**Bottom line:** make the LLM’s output **machine-readable**, execute **only through a guarded Environment**, and keep everything **swappable** via dependency injection. Reliability and safety first; flashiness later.
