<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/095_Modular_AgentDesign_GAME.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Modular AI Agent Design**

---

## **Core Focus Areas**

### **1. GAME Components as Modular Building Blocks**

* The lecture reinforces that **Goals, Actions, Memory, and Environment** are the four key interchangeable pieces of an agent.
* The **core agent loop** remains constant — only the GAME components change.
* This makes your framework **flexible** and **reusable** because you can build very different agents by swapping components without rewriting everything.

---

### **2. Goals**

* Encapsulated as **objects** (`Goal` class) with:

  * `priority` → for sorting and decision-making.
  * `name` → for reference.
  * `description` → for defining what the agent is trying to do and how.
* This removes the need for big “walls of text” and lets you combine multiple goals intelligently.

---

### **3. Actions**

* Represented as an **Action** class containing:

  * `name`, `function`, `description`, `parameters`, and `terminal` (to signal loop termination).
* Managed through an **ActionRegistry**:

  * Centralizes registering and looking up actions.
  * Decouples action execution from the agent loop.
* This setup allows you to easily add, remove, or swap actions without touching the rest of the system.

---

### **4. Memory**

* Encapsulated in a **Memory** class:

  * Provides `add_memory()` and `get_memories()` methods.
  * Allows future changes to how memory is stored or retrieved without breaking the agent loop.
* **Why it matters:** You can later implement advanced strategies (e.g., summaries, embeddings) without rewriting the whole system.

---

### **5. Environment**

* Acts as the **bridge** between the agent and the outside world.
* Executes **Actions** via `execute_action()` and standardizes the output via `format_result()`.
* Removes the need for hardcoded `if/else` logic in the agent loop.

---

## **Why This Is Important**

This lecture is about **future-proofing** your agent architecture:

* Your **loop** becomes lean and stable.
* GAME pieces are **self-contained** and swappable.
* Encourages **separation of concerns** — each part has a single clear purpose.
* Makes testing, debugging, and iterating much easier.




# G - Goals Implementation

### **1. The `Goal` dataclass**

```python
@dataclass(frozen=True)
class Goal:
    priority: int
    name: str
    description: str
```

* **Encapsulation:** A goal is now a structured object, not just loose text.
* **Frozen:** Immutable — prevents accidental changes while the agent runs.
* **Three fields:**

  * `priority` → lets you order goals.
  * `name` → a shorthand reference.
  * `description` → what & how to achieve it.

---

### **2. Sorting Goals by Priority**

```python
def sort_goals(goals: List[Goal]) -> List[Goal]:
    return sorted(goals, key=lambda g: g.priority)
```

* Ensures the agent focuses on the **most important** goals first.
* **Lower number = higher priority** (e.g., `priority=0` is the most important).

---

### **3. Rendering Goals into the System Prompt**

```python
def goals_to_prompt(goals: List[Goal]) -> str:
    ordered = sort_goals(goals)
    lines = ["You are an AI agent. Follow these goals in order of priority:"]
    for g in ordered:
        lines.append(f"- ({g.priority}) {g.name}: {g.description}")
    return "\n".join(lines)
```

* Converts your `Goal` objects into a **compact instruction block** for the LLM.
* Gives the model **clear, prioritized guidance** without a messy “wall of text.”
* You can **add/remove goals dynamically** without rewriting your agent’s main loop.

---

### **4. Why This Matters**

* Keeps **logic (what to do)** separate from **implementation (how the agent works)**.
* Makes your agent’s mission statement **clearer** and **easier to maintain**.
* Allows **goal swapping** for different tasks without touching the agent’s core code.




In [1]:
# --- Dynamic Goals Example ---
from dataclasses import dataclass
from typing import List

@dataclass(frozen=True)
class Goal:
    priority: int
    name: str
    description: str

# Base goals
base_goals: List[Goal] = [
    Goal(1, "summarize_txt", "Summarize .txt files into concise bullet points."),
    Goal(2, "quality", "Prefer small, reversible changes; ask before editing."),
]

# Alternate goals for a new mission
research_goals: List[Goal] = [
    Goal(1, "research_topic", "Find facts and summarize key points about a given topic."),
    Goal(2, "cite_sources", "Always return citations for every claim."),
]

def sort_goals(goals: List[Goal]) -> List[Goal]:
    return sorted(goals, key=lambda g: g.priority)

def goals_to_prompt(goals: List[Goal]) -> str:
    ordered = sort_goals(goals)
    lines = ["You are an AI agent. Follow these goals in order of priority:"]
    for g in ordered:
        lines.append(f"- ({g.priority}) {g.name}: {g.description}")
    return "\n".join(lines)

# Switch between goal sets at runtime
current_goals = base_goals
print("\n--- Current Mission ---\n")
print(goals_to_prompt(current_goals))

# Agent switches to research mode mid-run
current_goals = research_goals
print("\n--- New Mission ---\n")
print(goals_to_prompt(current_goals))



--- Current Mission ---

You are an AI agent. Follow these goals in order of priority:
- (1) summarize_txt: Summarize .txt files into concise bullet points.
- (2) quality: Prefer small, reversible changes; ask before editing.

--- New Mission ---

You are an AI agent. Follow these goals in order of priority:
- (1) research_topic: Find facts and summarize key points about a given topic.
- (2) cite_sources: Always return citations for every claim.


This is the **Lego block shift** in thinking.

The `Goal` class is now:

* **Self-contained** → it holds all info the agent needs about *what* to do and *how* to approach it.
* **Reusable** → you can drop it into any agent without changing the rest of your code.
* **Swappable** → change one set of goals for another, and the *exact same* agent loop + environment can handle a completely different mission.

This modularity means:

* You can **test new goals in isolation** without breaking the pipeline.
* You can **combine multiple goal sets** and sort them by priority.
* You can store, load, or share goal templates between projects.

Instead of a “big messy prompt” stuffed into the system message, you now have a **goal object** that you can manage like any other piece of software.

If you want, I can show you **how to swap goals dynamically at runtime** so your agent could change objectives mid-conversation — like a true multi-mission system. That’s where this Lego block idea really shines.

---



## **Deterministic vs. Cognitive Separation**  

That separation is typically called **"Separation of Concerns"**, and in this context you can think of it as a specialized form of it for AI agent design:

- **Deterministic tasks** (sorting, validation, formatting) → handled in Python, predictable, zero ambiguity.
- **Cognitive/reasoning tasks** (planning, tradeoffs, creativity) → handled by the LLM.

In AI agents, this reduces cognitive load on the model, increases reliability, and makes your system modular — letting you upgrade, debug, or swap either side independently.


* Let **Python** handle anything deterministic, cheap, and unambiguous (sorting by priority, filtering, path checks, schema validation).
* Let the **LLM** handle the ambiguous/hard parts (planning, summarizing, proposing edits, making tradeoffs across goals).

A few extra pro tips to lock this in:

1. Separate **hard constraints** from **soft goals**

   * Hard constraints = enforce in code (e.g., “never write outside `/content/summaries`”).
   * Soft goals = give to the LLM as prioritized guidance (your `Goal` list).

2. Pre- and post-validate deterministically

   * Pre: filter inputs, cap sizes, sort files.
   * Post: check the model’s outputs (length caps, filename sanity), and bounce back with a clear error/hint if needed.

3. Keep goals as **swappable data**

   * You can load/save goal sets (JSON/YAML), merge them, or toggle “modes” at runtime without touching your loop.

4. Minimize token clutter

   * Render only the **top-k** goals (by priority) for the current step, not the whole catalog.





We broadly use the term **"goal"** to encompass both *what* the agent is trying to achieve and *how* it should approach the task. This duality is crucial for guiding the agent’s behavior effectively.

Types of goals can include:
- **Reasoning examples** that show the agent how to act in certain situations.
- **Core rules** that apply across all agents in the system.
- **Special instructions** for solving specific types of tasks.

Below is an example of defining a file management goal:

```python
from game.core import Goal

# Define a simple file management goal
file_management_goal = Goal(
    priority=1,
    name="file_management",
    description="""Manage files in the current directory by:
    1. Listing files when needed
    2. Reading file contents when needed
    3. Searching within files when information is required
    4. Providing helpful explanations about file contents"""
)
```

---

💡 **Design Principle:** This follows **Deterministic vs. Cognitive Separation**, a specialized form of *Separation of Concerns* for AI agent design:
- **Deterministic tasks** (listing, reading, validating file paths) → handled in Python.
- **Cognitive/reasoning tasks** (deciding *when* and *why* to read a file) → handled by the LLM.

This reduces cognitive load on the model, increases reliability, and makes the system modular — letting you upgrade, debug, or swap goals independently.



# A – Actions Implementation with JSON Schemas

Actions define what the agent can do — think of them as the agent’s **toolkit**. Each action is a discrete capability that can be executed in the environment. The action system has two main parts: the `Action` class and the `ActionRegistry`.

The actions are the **interface** between our agent and its environment. These encapsulate what the agent can do to affect the environment. Previously, we built actions as standalone Python functions, but encapsulating them as objects makes it easier to swap actions without touching the core loop.

```python
class Action:
    def __init__(self,
                 name: str,
                 function: Callable,
                 description: str,
                 parameters: Dict,
                 terminal: bool = False):
        self.name = name
        self.function = function
        self.description = description
        self.terminal = terminal
        self.parameters = parameters

    def execute(self, **args) -> Any:
        """Execute the action's function"""
        return self.function(**args)
```

When the agent returns JSON indicating an action, we need a way to map the `tool_name` from that JSON back to the actual `Action` object. This is where the `ActionRegistry` comes in — a simple registry to **register actions** and **look them up by name**.

---

💡 **Design Principle:**
- Encapsulation of actions makes the system more **modular**.
- The agent's capabilities can be **extended or replaced** without touching its reasoning loop.
- JSON Schemas give the LLM **structured guidance** for parameters, improving accuracy and reliability.

This aligns with the modular "lego block" philosophy, where actions are independent, reusable units the agent can select and execute.

---

### **Before: Standalone Functions**

When you define actions like this:

```python
def list_files():
    ...
def read_file(file_name):
    ...
```

* They work fine for small projects.
* But the **core loop** (the agent’s main decision-making process) has to *know* exactly which functions exist and how to call them.
* If you add a new action or remove one, you have to go into the loop and **hard-code changes** — which risks breaking unrelated parts of the agent.

---

### **After: Encapsulated as Objects**

With an `Action` class:

```python
list_files_action = Action(
    name="list_files",
    function=list_files,
    description="List all files in the /content/files directory",
    parameters={...}
)
```

* All **metadata** (name, description, parameters, function reference) is **in one place**.
* The core loop doesn’t need to know any details about the action — it just:

  1. Looks up the action by name in the `ActionRegistry`.
  2. Calls `.execute()` with the arguments.

---

### **Why This Makes Swapping Easier**

* If you want to replace `list_files` with a completely different implementation (e.g., one that lists files in cloud storage), you **don’t touch the agent loop at all**.
* You just register a new `Action` object with the same name but a different function.
* The agent loop still works exactly the same, because it calls actions through a **generic interface** (`Action.execute()`).

---

💡 **Analogy**:
Think of it like swapping tools in a toolbox.
With standalone functions, you’ve glued each tool to a specific slot — changing one means rewiring the whole box.
With encapsulated `Action` objects, the tools slide in and out freely, but the box (the agent loop) still works the same.





Let’s look at a **before vs. after** example so you can see why encapsulating actions as objects makes swapping painless.

---

## **Before: Standalone Functions (Tightly Coupled)**

```python
# --- Actions ---
def list_files():
    return ["file1.txt", "file2.txt"]

def read_file(file_name):
    return f"Contents of {file_name}"

# --- Core loop ---
def agent_step(action_name, **kwargs):
    if action_name == "list_files":
        return list_files()
    elif action_name == "read_file":
        return read_file(kwargs["file_name"])
    else:
        return "Unknown action"

# --- Usage ---
print(agent_step("list_files"))
print(agent_step("read_file", file_name="file1.txt"))
```

**Problem**:

* If you want to replace `list_files()` with a new function, you must **edit `agent_step`**.
* Every time you add or remove an action, you have to update that `if/elif` chain — that’s fragile and repetitive.

---

## **After: Encapsulated `Action` Objects (Decoupled)**

```python
class Action:
    def __init__(self, name, function, description, parameters, terminal=False):
        self.name = name
        self.function = function
        self.description = description
        self.parameters = parameters
        self.terminal = terminal

    def execute(self, **kwargs):
        return self.function(**kwargs)

# --- Example functions ---
def list_files():
    return ["file1.txt", "file2.txt"]

def read_file(file_name):
    return f"Contents of {file_name}"

# --- Registry ---
action_registry = {
    "list_files": Action(
        name="list_files",
        function=list_files,
        description="List all files in the working directory",
        parameters={}
    ),
    "read_file": Action(
        name="read_file",
        function=read_file,
        description="Read a file's content",
        parameters={"file_name": "string"}
    )
}

# --- Core loop ---
def agent_step(action_name, **kwargs):
    action = action_registry.get(action_name)
    if not action:
        return f"Unknown action: {action_name}"
    return action.execute(**kwargs)

# --- Usage ---
print(agent_step("list_files"))
print(agent_step("read_file", file_name="file1.txt"))
```

---

## **Why the Second Approach Wins**

1. **No touching the core loop** — all logic for what an action does lives in the `Action` object.
2. **Metadata lives with the action** — description, parameters, and behavior are packaged together.
3. **Hot-swapping is easy** — swap the function in `Action(...)` or register a new action, no core logic changes.
4. **LLM-friendly** — since you have structured metadata, you can feed tool names, descriptions, and parameter schemas directly to the LLM.





The **functions themselves don’t change** — they still do the same work (read a file, list files, call an API, etc.).
What changes is **how you organize and reference them**:

---

### **Old way (if/else chain in core loop)**

* **Coupled**: Every time you add/remove a function, you must edit the core loop.
* **Brittle**: If you miss an `elif` case or make a typo, the agent breaks.
* **No metadata**: The function’s purpose, parameters, and descriptions aren’t stored anywhere in a structured way for the LLM.
* **Hard to scale**: With 20–50 tools, the if/else chain becomes huge and messy.

---

### **New way (Action objects in a registry)**

* **Decoupled**: Adding a new tool means *only* creating a new `Action` object and adding it to the `action_registry`.
* **Metadata-rich**: Name, description, parameter schema, and the callable all live in one place.
* **Core loop stays the same forever**:

  ```python
  def agent_step(action_name, **kwargs):
      action = action_registry.get(action_name)
      if not action:
          return f"Unknown action: {action_name}"
      return action.execute(**kwargs)
  ```

  This loop never changes no matter how many tools you have.
* **LLM-ready**: You can serialize the `action_registry` into JSON to give the LLM all tool descriptions in a consistent schema.
* **Easier testing**: You can test each `Action` independently without touching the agent logic.

---

So in short:
➡ **Old core loop** = command dispatcher hardcoded in the brain.
➡ **New core loop** = brain looks up a tool in its toolbox and uses it, without knowing exactly *how* it works.

---

**Old way = messy shed**

* Every new tool → edit the core loop.
* Long `if/elif` chain to sift through.
* Easy to break things, hard to scale.
* No structured metadata for the LLM.

**New way = labeled, organized workshop**

* New tool = define function → wrap in `Action` → `registry.register(...)`.
* Core loop never changes.
* Each tool carries its **name, description, and JSON Schema** for args.
* The LLM can “look up” the right tool by name/desc and fill args correctly.

Here’s how it feels in code:

```python
# 1) Write the function (unchanged style)
def write_doc_file(file_name: str, content: str) -> str:
    with open(f"/content/docs/{file_name}", "w") as f:
        f.write(content)
    return "saved"

# 2) Register it (no touching the loop)
registry.register(Action(
    name="write_doc_file",
    function=write_doc_file,
    description="Write documentation to /content/docs",
    parameters={
        "type": "object",
        "properties": {
            "file_name": {"type": "string"},
            "content": {"type": "string"}
        },
        "required": ["file_name", "content"]
    }
))
```

Core loop stays tiny forever:

```python
def agent_step(action_name, **kwargs):
    action = registry.get_action(action_name)
    return action.execute(**kwargs) if action else {"error": f"Unknown action: {action_name}"}
```





## ActionRegistry

* **Purpose** (centralized store of actions, keeps core loop small)
* **Features** (metadata, JSON Schema export, lightweight validation)
* **Example usage** (registering `list_txt_files`)
* **Integration** (`to_openai_tools()` for function calling, `agent_step` core loop)

Here’s what I’d add or emphasize to help you when you come back to this later in your Agent Handbook:

---

### Key Takeaways for ActionRegistry

1. **Single Source of Truth**
   The registry is where your agent’s abilities live. If it’s not registered, the agent can’t use it.

2. **Plug-and-Play Tools**
   Adding or removing a tool is as simple as `register()` or deleting from `_actions` — no edits to the core loop.

3. **LLM-Friendly Exports**
   `to_openai_tools()` generates a `tools` array ready for OpenAI function-calling, keeping tool descriptions and parameter schemas synchronized with the code.

4. **Validation at the Door**
   Prevents bad inputs before they cause runtime errors, and makes error messages clearer for the LLM.

5. **Extensibility**
   You can later:

   * Swap `validate_args` for a `jsonschema` validator
   * Add logging for all executed actions
   * Support dynamic (runtime) tool registration




The `Action` is a **solid minimal core**, but in real agents you’ll almost certainly iterate—add a few fields, tighten validation, and standardize outputs—as you discover edge cases.

Here’s how I’d think about it:

## Action Class Params

What you have (good minimal)

* `name`, `function`, `description`, `parameters`, `terminal`
* `execute(**kwargs)` that delegates to the underlying Python function

This gets you moving fast and keeps the core loop tiny. 👍

# What you’ll likely refine (through trial & error)

* **Argument validation**: today it’s implied by `parameters`. Add a lightweight validator (or plug `jsonschema`) before `execute`.
* **Structured return envelope**: standardize tool outputs so the LLM never guesses:

  * Success: `{"ok": true, "data": ...}`
  * Error: `{"ok": false, "error": "...", "hint": "...", "retryable": true/false}`
* **Terminal semantics**: keep `terminal` but be explicit—“agent should stop after this action returns successfully.”

# Fields you may add later (only when needed)

* **`return_schema`**: JSON Schema for outputs (helps downstream checks & self-correction).
* **`idempotent: bool`**: safe to retry on failure/timeouts?
* **`side_effects: list[str]`** or **`permissions: set[str]`**: documents what it can change (files, network, DB).
* **`timeout_s` / `retries`**: execution policy per action.
* **`cost_tag` / `rate_limit_tag`**: for budgeting/throttling groups.
* **`requires_approval: bool`**: forces a “ask\_user\_approval” step before running.
* **`visibility: Literal["public","internal"]`**: hide helper actions from the model.
* **`examples: list[dict]`**: few-shot examples of good calls (can be surfaced to the LLM if needed).
* **`version` / `deprecated`**: safe migrations as tool contracts evolve.

Tip: you don’t have to bake all of these into the class immediately. A pragmatic pattern is to add a single **`metadata: dict`** bag to the `Action` so you can pilot new attributes without refactoring everything.

# Minimal “next step” upgrade (conceptual)

* Keep your class the same, but:

  1. Validate args against `parameters` before `execute`.
  2. Wrap results/errors in the envelope above.
  3. (Optional) add `metadata: dict | None` for future flags.

# What to test (rigor without bloat)

* **Registry**: duplicate registration error; lookup works.
* **Validation**: missing required args → clear error; basic type checks (if you add them).
* **Execution**: success path; exception turns into `{"ok": false, "error": ...}`.
* **Terminal**: loop stops when an action with `terminal=True` returns ok.
* **Serialization**: `to_openai_tools()` (or equivalent) matches your registry state.




# M — Memory Implementation

Almost every agent needs to **remember** what happens from one loop iteration to the next.
This is where the **Memory** component comes in.
It allows the agent to store and retrieve information about its interactions, which is critical for **context** and **decision-making**.

We can create a simple class to represent the memory:

```python
from typing import List, Dict

class Memory:
    def __init__(self):
        self.items = []  # Basic conversation history

    def add_memory(self, memory: dict):
        """Add memory to working memory."""
        self.items.append(memory)

    def get_memories(self, limit: int = None) -> List[Dict]:
        """Get the last N memories for prompt construction."""
        return self.items[:limit] if limit else self.items
```

---

### Why wrap a simple list in a class?

Originally, we just used a Python list of messages.
**Is it worth wrapping the list in a class?**
Yes — because:

1. **Future-proofing:**
   We can add advanced features later (e.g., database storage, retrieval-augmented filtering, or summarization) without touching the agent’s core loop.

2. **Memory strategies:**
   With this interface, we can subclass `Memory` to implement different strategies:

   * Sliding window memory
   * Summarized memory
   * Long-term vector DB recall
     All without changing the rest of the system.

3. **Decoupling storage from usage:**
   We can change **how** we store memory (list, DB, graph) without changing **how** the agent accesses it.

---

### Implementation Detail

* **Prompt formatting:**
  LLM APIs expect a **list of messages** in the prompt.
  Even if we store memory in a complex format internally, `get_memories()` ensures we always output the right shape.

* **Example:**
  If our memory is stored in a vector DB for semantic search, `get_memories()` could run a query and return the most relevant N memories in proper `{role, content}` format.

---

💡 **Design Principle:**
Think of **Memory** as the “tape recorder” of the agent’s experiences.
The format returned to the LLM should be consistent, even if the underlying storage evolves.




## Adding Memory

Wrapping memory in a class gives you a clean **contract** now, and room to grow later without touching the loop. It’s the same Lego-block idea: swap strategies, keep the rest.

Here are three tiny, *drop-in* strategies to show how this pays off:

### 1) Sliding window (deterministic, fast)

```python
class SlidingWindowMemory(Memory):
    def __init__(self, k=8):
        super().__init__()
        self.k = k
    def get_memories(self, limit=None):
        use = limit or self.k
        return self.items[-use:]
```

### 2) Summarized memory (compress older turns)

```python
class SummarizedMemory(Memory):
    def __init__(self, summarizer, keep_recent=6):
        super().__init__()
        self.summarizer = summarizer  # callable(text) -> summary
        self.keep_recent = keep_recent
        self.summary = None
    def get_memories(self, limit=None):
        head = [{"role":"system","content": self.summary}] if self.summary else []
        tail = self.items[-self.keep_recent:]
        return head + tail
    def maybe_compact(self):
        if len(self.items) > 50:  # threshold you pick
            text = "\n".join(m["content"] for m in self.items[:-self.keep_recent])
            self.summary = self.summarizer(text)
```

### 3) Vector recall (retrieve relevant past info)

```python
class VectorRecallMemory(Memory):
    def __init__(self, embed, store):
        super().__init__()
        self.embed = embed   # callable(text) -> vector
        self.store = store   # has upsert(id, vec, meta) and search(query_vec, k)
    def add_memory(self, memory):
        super().add_memory(memory)
        self.store.upsert(id=len(self.items), vec=self.embed(memory["content"]), meta=memory)
    def get_memories(self, limit=None, query=None, k=6):
        if not query:
            return super().get_memories(limit)
        qv = self.embed(query)
        return [m["meta"] for m in self.store.search(qv, k=k)]
```

### When to use what

* **Sliding window**: small tasks, low latency; simplest and most robust.
* **Summarized**: long chats where older detail matters in aggregate; keep recent turns verbatim.
* **Vector recall**: large corpora or long-running agents; pull only what’s relevant.

### Why this is smart

* **Decouples storage from usage**: change *how* you store/retrieve without changing *how* the loop asks for messages.
* **Deterministic vs. cognitive separation**: keep filtering/windowing deterministic; if you do summarization, make it explicit (and test it).
* **Easy to test**: each strategy is a tiny class with a clear behavior.






# E — Environment Implementation

In the first versions, the agent “environment” was buried in `if/elif` dispatch and direct function calls.
A dedicated **Environment** component makes execution **modular** and **swappable**: the agent (brain) decides *what* to do, and the environment (body) knows *how* to do it in the real world.

### Minimal interface

```python
import time, traceback
from typing import Any, Dict

class Environment:
    def execute_action(self, action, args: Dict) -> Dict:
        """Bridge between agent & world: run an Action safely and return a structured result."""
        try:
            result = action.execute(**args)  # call the underlying Python function
            return self.format_result(result)
        except Exception as e:
            return {
                "tool_executed": False,
                "error": str(e),
                "traceback": traceback.format_exc(),
                "timestamp": time.strftime("%Y-%m-%dT%H:%M:%S%z"),
            }

    def format_result(self, result: Any) -> Dict:
        """Uniform success envelope with metadata."""
        return {
            "tool_executed": True,
            "result": result,
            "timestamp": time.strftime("%Y-%m-%dT%H:%M:%S%z"),
        }
```

### Why this is useful

* **Separation of concerns:** The agent loop never touches filesystem/API details.
* **Swap environments:** Local files today; S3, GitHub, or REST APIs tomorrow—no agent changes.
* **Uniform envelopes:** Every tool returns the same shape → easier for the LLM to reason about.

---

## Design tips (high-quality Environment)

* **Structured output:** Always return a consistent shape. Consider a stricter envelope:

  ```json
  {"ok": true,  "data": ...}
  {"ok": false, "error": "...", "hint": "...", "retryable": true/false, "next_tool": "list_*"}
  ```
* **Pre-validation at the door:** Validate args (schema) *before* `execute()`; return clear errors.
* **Just-in-time guidance:** Put recovery hints (`hint`, `next_tool`) in error responses.
* **Determinism:** Sort lists, cap sizes, add `"... [truncated]"` markers for large outputs.
* **Safety guards:** Path traversal checks, output size limits, timeouts; never crash the loop.
* **Observability:** Log action name, args summary, duration, outcome (success/error).
* **Approvals/policies:** For risky actions, require an explicit approval flag or user confirmation.

---

## Example: specialized environment (local text files)

```python
import os, io

class LocalTxtEnvironment(Environment):
    base_dir = "/content/files"
    out_dir  = "/content/summaries"

    def list_txt_files(self):
        files = sorted(f for f in os.listdir(self.base_dir) if f.endswith(".txt"))
        return self.format_result(files)

    def read_txt(self, file_name: str):
        path = os.path.join(self.base_dir, file_name)
        if not (path.startswith(self.base_dir) and os.path.exists(path)):
            return {"tool_executed": False, "error": "File not found", "hint": "Call list_txt_files"}
        with io.open(path, "r", encoding="utf-8", errors="replace") as f:
            content = f.read(8000)
        if len(content) == 8000: content += "\n... [truncated]"
        return self.format_result({"file_name": file_name, "content": content})
```

**How it plugs in (sketch):**

* The **Action** objects for `list_txt_files` / `read_txt` point to `env.list_txt_files` / `env.read_txt`.
* Your **core loop** looks up the action from the registry → calls `env.execute_action(action, args)` → gets a uniform result back for the LLM and memory.

---

### Mental model

> **Brain–Body split:** The **Agent** plans and selects actions; the **Environment** performs them safely and predictably.
> Treat the environment as a *port/adapter layer* you can swap without touching the agent’s reasoning.



The **Agent (brain)** shouldn’t think about file paths, encodings, or “file not found” edge cases—those belong to the **Environment (body)**.

Think **busy CEO**:

* CEO (Agent) sets priorities, chooses which action to take next.
* Operations team (Environment) handles logistics, checks, and execution details.
* Results come back in a clean, predictable report.

What this buys you:

* **Lower cognitive load** for the LLM → better planning and fewer mistakes.
* **Cleaner code** → agent loop never changes when you swap files → S3 → GitHub.
* **Safer execution** → all guards, size limits, and “just-in-time” hints live in one place.

Tiny sketch of the flow:

1. Agent decides: `read_txt("notes.txt")`
2. Registry resolves the action → points to `env.read_txt`
3. Environment executes safely:

   * validates args
   * checks path & existence
   * reads with size/encoding guards
   * returns a **uniform envelope** (ok/data or ok/false + hint)
4. Agent uses the result to plan the next step—no I/O details in its head.

That’s the whole point of the brain–body split: **the brain focuses on the goal; the body handles the particulars.**
