<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/105_TxtSummarizerAgent_05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ SETUP (Notebook-only)                                                        ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
!pip -q install openai python-dotenv


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ IMPORTS                                                                      ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
from openai import OpenAI
from dotenv import load_dotenv
import os
import textwrap
import time
import re
import inspect
from typing import Callable, Optional
from dataclasses import dataclass
import builtins


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ OPENAI CLIENT & ENV VARS                                                     ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
# Loads API key from a .env file and initializes the OpenAI client.
load_dotenv('/content/API_KEYS.env')
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise RuntimeError("OPENAI_API_KEY not found in /content/API_KEYS.env")
client = OpenAI(api_key=api_key)


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ STANDARD RESULT ENVELOPE (ok / err)                                          ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def ok(**data):
    """Successful tool result. Add any fields you like."""
    return {"ok": True, **data}

def err(msg, hint=None, retryable=False, **extra):
    """Error result with optional guidance and flags."""
    out = {"ok": False, "error": msg, "retryable": retryable}
    if hint:
        out["hint"] = hint
    if extra:
        out.update(extra)
    return out



### 🔹 Why Standardize the Error Format?

Here’s why the `ok()` / `err()` pattern is such a **powerful design decision** for agents, especially LLM-driven ones:

#### 1. **Predictability for the LLM**

* Agents work best when they know what to expect from tools.
* If every function returns something different on success vs failure (e.g. `None`, strings, exceptions, random dicts), the LLM can’t reliably handle it.
* Standardizing ensures the LLM can reason about tool outcomes and recover from failures intelligently.

#### 2. **Clear Error Signaling**

* The `{"ok": False, "error": ...}` signature is explicit and machine-friendly.
* Enables the agent to **check results simply**:
  `if result["ok"] is False: handle_error(result["error"])`

#### 3. **Retryability & Hints**

* The `err()` format also allows for:

  * `hint`: Explains what to try next
  * `retryable`: Boolean for whether the step might succeed on retry
* This gives agents and users **next-step guidance** — *crucial for robust and autonomous execution*.

#### 4. **Simplifies Logging and Debugging**

* You can log every action’s outcome uniformly (as seen in `ctx.track_progress(...)`)
* Makes it easy to track down issues, spot retries, and debug why something failed.

#### 5. **Avoids Exceptions from Bubbling**

* Instead of throwing exceptions, tools return structured error info.
* That’s critical when you're running tools in a loop (as agents do), because a single uncaught exception can crash the agent.

---

## 🧩 What Does `**data` Actually Do?

```python
def ok(**data):
    return {"ok": True, **data}
```

This is **keyword argument unpacking**.

* `**data` captures **all named arguments** passed to the function as a `dict`.
* Then, `{ "ok": True, **data }` merges that dict into a new one.

---

### 🔧 Example in Use:

```python
return ok(message="Plan complete", steps=plan_steps, next_tool="summarize")
```

Behind the scenes, that call becomes:

```python
{
    "ok": True,
    "message": "Plan complete",
    "steps": [...],
    "next_tool": "summarize"
}
```

No need to define or update a rigid result schema every time you change what a tool returns.

---

## 🔄 Why Is This Useful in Agents?

Because agent tools **return different kinds of success data**, and you want a **single success wrapper** that doesn’t care what those details are.

For example:

* A planning tool might return:
  `steps`, `next_tool`, `summary`
* A file-reading tool might return:
  `content`, `path`, `num_lines`
* A summarizer might return:
  `summary`, `tokens_used`, `raw_prompt`

But every one of them can just say:

```python
return ok(whatever_key=whatever_value)
```

…and it’ll be uniformly shaped like:

```json
{
  "ok": true,
  ...
}
```

Which means:

* The orchestrator can just check `result["ok"]` ✅
* It doesn’t need to know or care what else is in the payload — that's the tool's business

---

## 🧠 Why Not Just Return Raw Dicts?

You *could* do:

```python
return {"summary": ..., "tokens_used": ...}
```

But then:

* You have **no consistent success/failure signal**
* Every downstream function has to **guess what’s in the result**
* You lose the ability to layer in **logging, retry logic, or AI reasoning** safely

`ok(**data)` ensures every result is **structured, semantically tagged**, and **agent-friendly**.

---

## 🧪 Bonus: Validating with Schema (Optional)

Later on, if you want to enforce expected outputs per tool, you can add something like:

```python
def ok(schema=None, **data):
    if schema:
        validate(instance=data, schema=schema)
    return {"ok": True, **data}
```

---

## ✅ Summary

| Concept                | What It Does                                 |
| ---------------------- | -------------------------------------------- |
| `**data` in `ok()`     | Accepts arbitrary named return values        |
| `{"ok": True, **data}` | Uniform success shape, customizable per tool |
| Benefit                | Flexibility + consistency — ideal for agents |
| Why use it?            | Keeps agent scaffolds simple, clean, robust  |

This is an intentional design pattern — a foundational brick in agent infrastructure.





Both of these are **flexible result factories** that take any keyword arguments and return a predictable, agent-friendly shape.

---

## 🔹 What’s Happening Under the Hood?

### 1. `ok(**data)`

This accepts **any number of named keyword arguments**, e.g.:

```python
ok(message="File saved", path="output.txt", size_kb=42)
```

Internally:

```python
data = {"message": "File saved", "path": "output.txt", "size_kb": 42}
return {"ok": True, **data}
```

Result:

```python
{
  "ok": True,
  "message": "File saved",
  "path": "output.txt",
  "size_kb": 42
}
```

No need to predefine what “success” includes. Each tool defines what matters for its own domain.

---

### 2. `err(msg, hint=None, retryable=False, **extra)`

You have three “core” arguments (`msg`, `hint`, `retryable`) — and then anything else via `**extra`.

This allows you to tack on **diagnostics, debug values, tracebacks, metadata, etc.** without changing the function’s signature.

Example:

```python
err("File too large", hint="Try a smaller file", retryable=True, path="huge.pdf", size_mb=300)
```

Internally:

```python
extra = {"path": "huge.pdf", "size_mb": 300}
out = {
  "ok": False,
  "error": "File too large",
  "retryable": True,
  "hint": "Try a smaller file",
  "path": "huge.pdf",
  "size_mb": 300
}
```

So `err()` is just as flexible, but with a stronger opinion about required + optional base fields (`error`, `retryable`, etc).

---

## 🔧 TL;DR

| Function                                        | Purpose                               | Flexibility                                       |
| ----------------------------------------------- | ------------------------------------- | ------------------------------------------------- |
| `ok(**data)`                                    | Standardize **successful** results    | Arbitrary fields allowed                          |
| `err(msg, hint=None, retryable=False, **extra)` | Standardize **failures** with context | Arbitrary metadata allowed (e.g. `tool`, `input`) |

Both use `**kwargs` to future-proof your agent tools.

So yes — you can always add more fields **without rewriting the return logic**.





The `ok=True` or `ok=False` is **explicitly set** inside the `ok()` and `err()` functions — it doesn’t infer it or guess. You’re declaring whether a result was successful or not.

Here’s exactly how:

---

### 🔹 The `ok()` Function

```python
def ok(**data):
    return {"ok": True, **data}
```

* This always builds a dictionary where `ok` is set to `True`.
* No logic or guessing — **it hardcodes success** into the result.

---

### 🔹 The `err()` Function

```python
def err(msg, hint=None, retryable=False, **extra):
    out = {"ok": False, "error": msg, "retryable": retryable}
    if hint:
        out["hint"] = hint
    if extra:
        out.update(extra)
    return out
```

* This always sets `"ok": False`, and includes a mandatory `"error"` message.
* You’re declaring that something went wrong, optionally telling the agent:

  * whether it's retryable,
  * how to fix it (`hint`),
  * any extra diagnostics.

---

### 🔍 How Does the Agent Use It?

Your agent (like `ScriptedAgent`, `Environment`, or your tool orchestration logic) **checks the `ok` field** in the result dictionary:

```python
result = some_tool(ctx, input)
if not result["ok"]:
    log_error(result["error"])
    maybe_retry()
else:
    proceed_to_next_step()
```

So this flag is **used by your agent as the official signal**:

* `"ok": True` → ✅ move forward
* `"ok": False` → ⚠️ handle failure

---

### ✅ Recap

| Function       | What it returns                                        |
| -------------- | ------------------------------------------------------ |
| `ok(...)`      | `{"ok": True, ...}` — hardcoded success                |
| `err(...)`     | `{"ok": False, "error": ..., ...}` — hardcoded failure |
| Agent behavior | Looks at `result["ok"]` to decide what to do next      |

So it's never inferred — **you always set it on purpose**, and your agent logic treats it as the signal for success or failure.



In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ FILESYSTEM ADAPTER (for underscore-DI: _fs)                                  ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
# RealFS exposes .path/.makedirs/.open so tools can accept a pluggable FS.
class RealFS:
    path = os.path
    makedirs = staticmethod(os.makedirs)
    open = staticmethod(builtins.open)




## 🔍 So What Does `RealFS` Do?

It exposes just **three key things**:

| Attribute  | What It Does                                                      |
| ---------- | ----------------------------------------------------------------- |
| `path`     | Access to file path utilities (`join`, `exists`, `basename`, etc) |
| `makedirs` | Creates directories (`os.makedirs(...)`)                          |
| `open`     | Opens files for reading/writing (`open("file.txt", "r")`)         |

And it does this in a way that makes the file system **pluggable**.

---

## 🔌 Why Make a Filesystem Adapter?

Most tools in your agent scaffold will eventually read from or write to files — logs, scratchpads, cached results, plans, etc.

But:

* In dev: you want them to write to your local drive.
* In prod: maybe a virtual FS, cloud blob, memory, or sandboxed container.
* In tests: maybe just a dummy memory store.

Instead of hardcoding `open(...)`, `os.makedirs(...)`, `os.path...` inside your tools, you abstract the FS into an injectable component:

```python
_fs = RealFS  # Or replace with InMemoryFS, SecureFS, etc
```

Then your tools use `_fs.open`, `_fs.makedirs`, `_fs.path.exists`, etc.

That’s **dependency injection** (DI) — the “underscore-DI” mentioned in the comment.

---

## 🧠 The Underscore Prefix: `_fs`, `_memory`, `_tools`, `_plan`, `_goal`, etc.

### ✅ It’s a **Signal to the LLM**

In the prompt (or runtime context), underscore-prefixed variables:

* Are not just *arbitrary variables*.
* They’re **core tools**, **resources**, or **system inputs** meant for the LLM to use.
* The underscore acts as a **semantic flag**:
  👉 *"This is a pluggable utility or scaffold component you can call."*

### 🧭 It Helps the LLM Decide:

> “If I want to read a file, I should use `_fs.open(...)`, not invent my own method.”

---

## 📦 Why It Works So Well

By consistently using underscore names for LLM-facing tools:

* You **reduce confusion** about what’s user-data vs system-responsibility
* You give the LLM a **vocabulary of known interfaces**
* You make the prompt context **self-documenting**

So instead of:

```python
memory = MemoryStorage()
plan = TaskPlanner()
```

You do:

```python
_memory = MemoryStorage()
_plan = TaskPlanner()
```

Now the LLM knows:

> "Oh — `_plan` is the thing I use to ask for the next step."
> "I don’t need to reason about *how* it works — I just *use* it."

---

## 🛠️ Why This Is Genius in the Agent Scaffold

It aligns with:

| Design Principle         | Benefit                                                                 |
| ------------------------ | ----------------------------------------------------------------------- |
| **Dependency Injection** | You can swap `_fs` with a fake or secure version                        |
| **Prompt Shaping**       | The LLM sees `_fs`, `_memory`, `_goal`, `_tools` and learns the pattern |
| **Semantic Clarity**     | Humans and LLMs both instantly recognize what’s usable                  |
| **LLM Empowerment**      | You reduce hallucination by providing real handles it can call          |

---

## ✅ Summary

| Underscore Prefix                   | Signals to LLM: "This is yours to use"  |
| ----------------------------------- | --------------------------------------- |
| `_fs`, `_tools`, `_memory`, `_plan` | Agent-facing capabilities               |
| `_goal`, `_ctx`, `_input`           | LLM-readable input variables            |
| Convention                          | Clarifies intent + improves reliability |

This is what makes your agent scaffold **LLM-native**: the code is shaped to speak the LLM’s language from the start.





In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ MEMORY & CONTEXT                                                             ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
class ScratchMemory:
    """Minimal in-memory key/value store for agent state."""
    def __init__(self):
        self.store = {}

    def get(self, key, default=None):   # default added for convenience
        return self.store.get(key, default)

    def set(self, key, value):
        self.store[key] = value



## 🧠 What Is `ScratchMemory`?

This memory type is designed to be:

* **Temporary**
* **In-memory (non-persistent)**
* **Lightweight and fast**
* **Swappable later with something richer**

Think of it as a **“scratchpad”** for the agent — where it keeps:

| Info Type            | Example                                |
| -------------------- | -------------------------------------- |
| Short-term state     | Current task, step count, partial plan |
| Intermediate results | Cached output, retry flags             |
| Contextual bookmarks | “Where was I in the tool sequence?”    |

This is **not** a place to store chat history, long-term user profiles, or RAG embeddings. It’s meant for **step-to-step execution logic** inside a single agent run.

---

## 📦 Why Include It in the Final Agent?

While you may *upgrade* later (to Redis, a DB, or LLM-context memory), you should **absolutely** include *some* memory interface from day one.

Reasons:

1. **Agent tools often need shared state** (e.g., what’s the input file name? Did we already search?)
2. The agent itself might need to store planning checkpoints, retries, or signals.
3. `ScratchMemory` provides a **default that works everywhere**: dev, CLI, tests, cloud.
4. It’s **fully compatible** with the DI scaffold pattern (`_memory = ScratchMemory()`)

So yes — it’s **valid and useful in a final agent**, especially if:

* Your use case doesn’t need persistent memory,
* Or you want a fallback/default config,
* Or you’re prototyping quickly and cleanly.

---

## 🔄 Future Upgrades

Eventually, you might replace it with something like:

| Replacement                              | When to use                                    |
| ---------------------------------------- | ---------------------------------------------- |
| `RedisMemory`, `DictMemory`, `SQLMemory` | For multi-agent setups or parallel tasking     |
| `VectorMemory` or `ContextualMemory`     | For LLM-augmented recall, RAG, semantic lookup |
| `FileMemory`                             | If you want to save/load session state         |

But critically, they all implement the **same interface**:

```python
_memory.get(key)
_memory.set(key, value)
```

So the switch is seamless.

---

## ✅ Summary

| Feature         | Value                                                    |
| --------------- | -------------------------------------------------------- |
| `ScratchMemory` | Temporary state store for runtime use                    |
| Purpose         | Keep track of agent context, progress, or partial output |
| Final Use       | ✅ Yes, suitable as default or fallback memory            |
| Why Useful      | Enables planning, retries, dynamic context sharing       |
| Upgrade Path    | Replace with persistent/semantic memory if needed        |





> **Standardization = reduced cognitive load = better LLM performance.**

Here’s what that means in practice:

---

## 🧠 Why Standardizing Memory (even if it's minimal) Helps the LLM

| Without Standardization                     | With `ScratchMemory` or similar                                       |
| ------------------------------------------- | --------------------------------------------------------------------- |
| LLM doesn’t know how to store/retrieve data | LLM sees `_memory.get(...)`, `_memory.set(...)` and knows the pattern |
| May invent variable names or forget keys    | Can rely on known structure                                           |
| Can’t coordinate state across tools         | Can persist `step_count`, `task_status`, `intermediate_result`, etc   |
| Higher hallucination risk                   | Lower — uses grounded handles to interact with memory                 |
| Hard to test/replace                        | Easy to upgrade memory system behind the scenes                       |

---

## 🔁 Reusability Across Agents

By exposing memory as `_memory`, and giving it just two methods:

* `.get(key)`
* `.set(key, value)`

You unlock:

* **Consistent tool behavior**
* **Easier debug/testing**
* **Plug-and-play upgrades**

The LLM doesn’t need to reason about where data is stored — just what key to use. Same as with `_fs`, `_tools`, `_ctx`.

---

## 🧩 Your Agent Scaffold Is Composable

That’s what makes your scaffold **LLM-native** and **developer-ergonomic** at the same time:

* Simple for the LLM to use
* Flexible for you to build on
* Safe to upgrade under the hood




In [None]:
# Valid progress states for centralized logging.
VALID_STATUSES = {"started", "completed", "error"}

This line is a **tiny piece of standardization** with outsized value:

```python
VALID_STATUSES = {"started", "completed", "error"}
```

Let’s break down why it matters and how it helps your agent work better.

---

## 🔎 What It Does

This defines the **allowed values** for logging or tracking an agent’s **progress status**:

| Status        | Meaning                                       |
| ------------- | --------------------------------------------- |
| `"started"`   | Agent or tool began executing a task or step  |
| `"completed"` | The step/task succeeded                       |
| `"error"`     | The step/task failed (ideally with a message) |

It’s used to **validate or annotate** the state of progress during execution. For example:

```python
track_progress("create_plan", "started")
...
track_progress("create_plan", "completed")
```

Or, if something failed:

```python
track_progress("search_file", "error", hint="File not found")
```

---

## 🎯 Why This Matters

### 1. 🧠 **Helps the LLM Know What Language to Use**

The LLM doesn’t need to guess what progress status to return — it sees the canonical set.

Less guesswork → more reliability.

---

### 2. 📊 **Supports Centralized Logging / Debugging**

Standardized values let you:

* Generate status dashboards
* Show agent timelines (step → status)
* Run metrics like success rate or error type frequency

---

### 3. ⚠️ **Catches Mistakes Early**

If a tool tries to log an unsupported status like `"running"` or `"donezo"`, it can be caught and flagged.

This guards against drift or malformed entries.

---

### 4. 🔁 **Enables Retry Logic or Progress Tracking**

Let’s say a tool failed and logged `"error"` — your orchestrator might check for that and decide to retry with a different tool or modified input.

So this kind of structure makes **control flow more deterministic**.

---

## ✅ Summary

| What It Is     | Enum-like set of allowed status values         |
| -------------- | ---------------------------------------------- |
| Why It Exists  | Standardizes step logging across tools         |
| Who Uses It    | `track_progress()` and the orchestrator        |
| LLM Benefit    | Less hallucination; follows known pattern      |
| System Benefit | Easier logging, debugging, retries, dashboards |




In [None]:
class ActionContext:
    """
    The agent's 'backpack':
      - memory: state across steps
      - llm:    LLM wrapper
      - config: runtime configuration (folders, knobs)
      - deps:   injectable dependencies (e.g., fs/clock)
    """
    def __init__(self, memory, llm, config=None, deps=None):
        self.memory = memory
        self.llm = llm
        self.config = config or {}
        self.deps = deps or {}

    # --- progress helpers ---
    def track_progress(self, step, status, note=""):
        if status not in VALID_STATUSES:
            raise ValueError(f"Invalid status '{status}'. Use {VALID_STATUSES}.")
        log = self.memory.get("progress_log") or []
        log.append({
            "step": step,
            "status": status,
            "note": note,
            "time": time.strftime("%Y-%m-%d %H:%M:%S"),
        })
        self.memory.set("progress_log", log)

    def print_progress(self):
        log = self.memory.get("progress_log") or []
        print("\n📊 Progress Log:")
        for e in log:
            t = f" ({e.get('time')})" if e.get("time") else ""
            note = f" — {e['note']}" if e.get("note") else ""
            print(f"- [{e['status']}] {e['step']}{t}{note}")

    def last_completed_step(self):
        log = self.memory.get("progress_log") or []
        for e in reversed(log):
            if e.get("status") == "completed":
                return e.get("step")
        return None

    def first_error(self):
        log = self.memory.get("progress_log") or []
        for e in log:
            if e.get("status") == "error":
                return e
        return None

`ActionContext` is mostly **self-explanatory**, and that's **by design**. This is a **clean, centralized state carrier** — a.k.a., the agent’s **backpack** or “runtime brain.” Let’s break down its components and purpose:

---

## 🎒 `ActionContext` Overview

```python
class ActionContext:
    """
    The agent's 'backpack':
      - memory: state across steps
      - llm:    LLM wrapper
      - config: runtime configuration (folders, knobs)
      - deps:   injectable dependencies (e.g., fs/clock)
    """
```

This object is **passed to tools** (or used by the environment) to:

1. Store + retrieve working state
2. Access the LLM itself
3. Read runtime config (like working dirs)
4. Use injected dependencies like `_fs` or `_clock`

---

## 🔧 Constructor: How the Agent Gets Context

```python
def __init__(self, memory, llm, config=None, deps=None):
    self.memory = memory
    self.llm = llm
    self.config = config or {}
    self.deps = deps or {}
```

It sets up everything the agent will carry around while it's thinking and acting.

All tools can then rely on:

* `ctx.memory.get("key")`
* `ctx.llm.complete(...)`
* `ctx.config.get("path")`
* `ctx.deps["_fs"].open(...)`

No global state. No magic. Everything passed in.

---

## 📈 Progress Tracking Helpers

This is a **built-in logbook** the agent can write to:

```python
def track_progress(self, step, status, note=""): ...
def print_progress(self): ...
def last_completed_step(self): ...
def first_error(self): ...
```

These methods:

* Help tools or the environment **record progress**
* Allow the agent to **pick up where it left off**
* Allow you (the dev) to **debug at a glance**

These make use of `VALID_STATUSES` for sanity and consistency.

---

## 🧠 Why It Matters for the LLM

By standardizing access to memory, tools, and config like this, you allow the LLM to:

| Without Context               | With `ActionContext`                                |
| ----------------------------- | --------------------------------------------------- |
| Guess where state is stored   | Use `.memory` for state                             |
| Wonder how to talk to the LLM | Use `.llm.complete(...)`                            |
| Not know how to log or resume | Use `.track_progress()` or `.last_completed_step()` |
| Juggle globals or shared vars | Use `.deps` to cleanly access shared tools          |

This **reduces ambiguity and hallucination**, and makes your scaffold **LLM-legible** — it always knows what’s available.

---

## ✅ Summary

| Feature            | Description                                           |
| ------------------ | ----------------------------------------------------- |
| `memory`           | Shared scratch state                                  |
| `llm`              | Wrapper to call the model                             |
| `config`           | Runtime switches or paths                             |
| `deps`             | Pluggable tools like `_fs`, `_clock`                  |
| `track_progress()` | Centralized logging mechanism                         |
| Design Goal        | Keep tools and agent modular, clean, and LLM-friendly |




In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ LLM WRAPPER                                                                 ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
class OpenAILLM:
    def __init__(self, client, model="gpt-4o-mini", temperature=0.2):
        self.client = client
        self.model = model
        self.temperature = temperature

    def complete(self, prompt, **kwargs):
        temp = kwargs.get("temperature", self.temperature)
        resp = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temp,
        )
        return resp.choices[0].message.content




> **Why do we create an LLM wrapper instead of calling it directly?**

Here’s the answer, broken down into **6 powerful benefits**:

---

## 🧱 1. Abstraction: Hide Implementation Details

The wrapper (`OpenAILLM`) acts as a **stable interface**, regardless of what API, client, or provider you're using.

You could replace OpenAI with Claude, Mistral, Groq, local models — and the rest of your agent doesn't need to change.

```python
# Everywhere else in your code:
response = ctx.llm.complete(prompt)

# Only the wrapper knows how to call OpenAI, Anthropic, etc.
```

---

## ♻️ 2. Reusability & DRY Code

Rather than repeating this in every tool:

```python
client.chat.completions.create(...temperature=0.2...)
```

You call:

```python
llm.complete(prompt)
```

This avoids copy-pasting config and logic all over your agent — **single source of truth**.

---

## 🎛 3. Centralized Config & Defaults

Notice how the wrapper sets:

```python
model="gpt-4o-mini"
temperature=0.2
```

You could later:

* Load these from `config.yml`
* Override them per call with `llm.complete(prompt, temperature=0.9)`
* Switch model versions globally with one change

That’s powerful when scaling agents.

---

## 🛡 4. Extend with Features Later (No Breaking Changes)

Want to add:

* Retry on rate limit?
* Logging every prompt?
* Token counting?
* Streaming or stop tokens?

You can add all that **inside the wrapper** without touching the rest of your agent.

Example:

```python
def complete(self, prompt, **kwargs):
    log_prompt(prompt)
    ...
    return response
```

---

## 🧠 5. Make the LLM Interface LLM-Legible

The call pattern becomes:

```python
response = ctx.llm.complete("Write a plan...")
```

That’s **clear, readable, and learnable** by the model itself. You don’t want it to reason about HTTP headers and API chains.

---

## 🧪 6. Easy to Mock or Swap for Testing

In tests, you can inject a dummy model:

```python
class DummyLLM:
    def complete(self, prompt, **kwargs):
        return "This is a fake result"
```

That makes unit testing tools or flows trivial — just plug in a fake LLM that returns static strings.

---

## ✅ Summary

| Reason              | Benefit                                    |
| ------------------- | ------------------------------------------ |
| **Abstraction**     | Swap providers (OpenAI, Claude, etc)       |
| **Centralization**  | One place to configure model + temperature |
| **Maintainability** | Add logging, retry, validation easily      |
| **Readability**     | `llm.complete(prompt)` is crystal clear    |
| **Testability**     | Plug in a dummy for unit tests             |
| **Scalability**     | Future-proof for advanced features         |




In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: PLANNING                                                              ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def create_plan(ctx):
    goal = ctx.memory.get("goal")
    if not goal:
        return err("No goal provided (memory key 'goal' missing).",
                   hint="Set ctx.memory['goal'] before calling create_plan")

    prompt = f"""You are an expert task planner. Given the goal below, break it down into a clear, short list of steps.

Goal: {goal}

Respond ONLY with a numbered list, one step per line. No extra prose."""
    raw = ctx.llm.complete(prompt).strip()

    # Prefer numbered steps like "1. ...", "2) ..."
    numbered = re.findall(r'^\s*(?:\d+[\).\s-]+)\s*(.+)$', raw, flags=re.M)

    if numbered:
        steps = numbered
    else:
        # Fallback to bullets like "- ...", "* ...", "• ..."
        bullets = re.findall(r'^\s*(?:[-*•]\s+)(.+)$', raw, flags=re.M)
        steps = bullets if bullets else [ln.strip() for ln in raw.splitlines() if ln.strip()]

    # Normalize: collapse spaces, trim punctuation, drop empties/dupes
    clean_steps = []
    seen = set()
    for s in steps:
        s = re.sub(r'\s+', ' ', s).strip(' .')
        if s and s.lower() not in seen:
            seen.add(s.lower())
            clean_steps.append(s)

    if not clean_steps:
        return err("Planner returned no steps.",
                   hint="Refine the goal or relax the parser constraints")

    ctx.memory.set("plan", clean_steps)
    # optional: match handbook wording
    # ctx.memory.set("current_plan", clean_steps)

    return ok(message="Plan created from goal.", steps=clean_steps)



## 🔧 Planning Tool

It’s a **tool function** (aka a callable capability) that:

* Takes a **goal** (from memory)
* Sends it to the LLM in a prompt
* Parses the LLM’s response into a **clean list of plan steps**
* Saves that plan back into memory (`ctx.memory["plan"]`)
* Returns a standardized result (`ok()` or `err()`)

In essence:

> 🧠 **“LLM, take this goal and give me a clean, step-by-step plan I can act on.”**

---

## 🤖 Why Is This Useful?

### 1. **Gives the LLM a Focused Subtask**

The LLM is no longer trying to plan, act, and think all at once. This tool lets it:

> "Only focus on planning — break down the goal into discrete steps."

This **modularizes thinking**, which improves reliability and interpretability.

---

### 2. **Makes Planning Explicit and Interpretable**

By capturing the result as:

```python
ctx.memory["plan"] = [step1, step2, step3, ...]
```

…you now have a **clear, inspectable, iterable plan** that can be:

* Reviewed
* Altered by the user
* Executed step by step
* Logged and resumed

This is a core feature of **structured agents**.

---

### 3. **Improves Prompt Control and Parsing**

The LLM prompt is very tight:

```text
Respond ONLY with a numbered list, one step per line. No extra prose.
```

Then you add **regex-based cleanup**:

```python
re.findall(r'^\s*(?:\d+[\).\s-]+)\s*(.+)$', raw)
```

This ensures you don’t get prose, rambling, or hallucinated YAML. It’s a bulletproof way to get usable, repeatable structure.

---

### 4. **Standardized Result Interface**

By wrapping with `ok()` or `err()`, you’re saying:

| On Success           | On Failure                                   |
| -------------------- | -------------------------------------------- |
| ✅ “Here's your plan” | ❌ “Here’s what went wrong and how to fix it” |

The LLM (or the environment) doesn’t have to guess what happened. This improves **reliability and recoverability**.

---

### 5. **Memory-Based Control Flow**

By writing to `ctx.memory["plan"]`, your environment can later do something like:

```python
plan = ctx.memory.get("plan")
for step in plan:
    run_tool_for_step(step)
```

This enables **agent looping** and step-by-step execution — one of the key differences between a chat bot and an agent.

---

## 🧠 How It Helps the LLM

| Without This Tool                                                                    | With This Tool                                                 |
| ------------------------------------------------------------------------------------ | -------------------------------------------------------------- |
| LLM has to reason about goals, actions, retries, planning, and structure all at once | LLM is asked to do **one focused thing**: generate a step list |
| May return unstructured paragraphs or misunderstood tasks                            | Returns a clean, parseable plan list                           |
| No guarantee of usable output                                                        | Regex + normalization = predictable agent inputs               |
| Harder to debug                                                                      | Errors (like “no goal found”) are surfaced clearly and fixable |

---

## ✅ Summary

| Feature                | Why It Helps                                        |
| ---------------------- | --------------------------------------------------- |
| Tight prompt           | Steers the LLM toward usable structure              |
| Memory access          | Makes the plan persistent and shared                |
| Result standardization | Enables error handling and tool chaining            |
| Text parsing           | Prevents hallucinated output or random formatting   |
| Focused function       | Keeps the LLM “mentally scoped” — one job at a time |




In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: I/O (FILES)                                                           ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def read_txt_file(ctx, file_name):
    base = os.path.abspath(ctx.config.get("input_folder", ""))
    path = os.path.abspath(os.path.join(base, file_name))
    if not base or not path.startswith(base + os.sep):
        return err("Path traversal blocked.", retryable=False)

    if not os.path.exists(path):
        return err(f"File not found: {path}",
                   hint="Call list_txt_files to see available files",
                   retryable=True)

    with open(path, "r", encoding="utf-8") as f:
        text = f.read()

    ctx.memory.set("file_name", file_name)
    ctx.memory.set("raw_text", text)
    return ok(message="File read successfully.", length=len(text))


# ── Helper: list available .txt files (for JIT guidance) ───────────────────────
def list_txt_files(ctx):
    base = ctx.config.get("input_folder")
    if not base:
        return err("No input_folder in config.", hint="Set ctx.config['input_folder']")
    if not os.path.isdir(base):
        return err(f"Input folder not found: {base}", retryable=False)

    files = sorted(f for f in os.listdir(base) if f.endswith(".txt"))
    ctx.memory.set("available_txt_files", files)  # optional: stash for UI/agent
    return ok(message=f"Found {len(files)} .txt files.", files=files, count=len(files))


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: SUMMARIZATION                                                         ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def generate_summary_prompt(ctx, max_len=None):
    text = ctx.memory.get("raw_text")
    if not text:
        return err("No raw text found in memory.",
                   hint="Run read_txt_file before generate_summary_prompt")
    if max_len is None:
        max_len = ctx.config.get("summary_max_chars", 2000)

    truncated = len(text) > max_len
    short_text = text[:max_len]

    ctx.memory.set("was_truncated", truncated)
    ctx.memory.set("source_length", len(text))
    ctx.memory.set("used_length", len(short_text))
    ctx.memory.set("summary_prompt", f"""You are an expert technical writer.

Summarize the following content into a set of clear, concise bullet points...
\"\"\"{short_text}\"\"\"

Summary:""")

    return ok(message="Summary prompt created.",
              truncated=truncated, used=len(short_text), total=len(text),
              prompt_preview=ctx.memory.get("summary_prompt")[:600])


def summarize(ctx):
    prompt = ctx.memory.get("summary_prompt")
    if not prompt:
        return err("No summary prompt found in memory.",
                   hint="Run generate_summary_prompt before summarize")
    response = ctx.llm.complete(prompt)
    ctx.memory.set("summary", response)
    return ok(message="Summary completed.", summary_preview=response[:1000])


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: OUTPUT                                                                ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def save_summary(ctx, out_name=None, _fs=os):
    summary = ctx.memory.get("summary")
    if not summary:
        return err("No summary in memory.",
                   hint="Run summarize before save_summary")
    out_dir = ctx.config.get("output_folder")
    if not out_dir:
        return err("No output_folder in config.",
                   hint="Set ctx.config['output_folder']")

    _fs.makedirs(out_dir, exist_ok=True)
    src = ctx.memory.get("file_name", "summary")
    root, _ = os.path.splitext(os.path.basename(src))
    base = out_name or f"{root}_summary.txt"
    path = _fs.path.join(out_dir, base)

    with _fs.open(path, "w", encoding="utf-8") as f:
        f.write(summary)

    ctx.memory.set("summary_path", path)
    return ok(message="Summary saved.", path=path)



## 🧰 FILE TOOLS: `read_txt_file` & `list_txt_files`

### 🟩 Good Design Features

* **Path Traversal Protection**:

  ```python
  if not base or not path.startswith(base + os.sep):
      return err("Path traversal blocked.")
  ```

  Prevents LLMs from escaping the input folder. Excellent safety measure, especially for autonomous or semi-autonomous agents.

* **File Existence Check**:
  Clear error messages guide the LLM (or user) if a file doesn’t exist.

* **Memory-first Design**:
  `file_name` and `raw_text` go into memory, making them accessible for downstream tools like summarization, without passing them around manually.

* **Tool discoverability**:
  `list_txt_files()` helps the LLM choose file names in a just-in-time fashion — good for chaining.

### 🤖 Agent Benefit

Helps the LLM **stay safe, stay focused**, and avoid hallucinating filenames or file paths.

---

## 🧠 SUMMARIZATION TOOLS: `generate_summary_prompt` + `summarize`

### 🟩 Good Design Features

* **Prompt Truncation**:

  ```python
  text[:max_len]
  ```

  Prevents long texts from overflowing context window.

* **Memory Priming**:
  Stores useful metadata like:

  * `"was_truncated"`
  * `"used_length"`
  * `"summary_prompt"`

* **Prompt Template is Clear & Friendly**:

  ```text
  You are an expert technical writer...
  Summarize the following content...
  ```

  This sort of system prompt gives the LLM clarity and tone.

* **Two-Step Workflow**:
  Separation of `generate_summary_prompt` and `summarize()` enables:

  * Inspection
  * Editing
  * Reuse
  * Retry with different prompts

### 🤖 Agent Benefit

Improves **accuracy, customizability, and fail-safety** of summarization. You can reuse the summary prompt for multiple models or re-prompt if the summary wasn’t satisfactory.

---

## 🧾 OUTPUT TOOL: `save_summary`

### 🟩 Good Design Features

* **Filesystem Injection**:
  `_fs=os` allows for **mocking**, **virtual filesystems**, or switching to cloud storage without rewriting code.

* **Smart Output Naming**:

  ```python
  base = out_name or f"{root}_summary.txt"
  ```

  Automatically creates descriptive filenames if none are provided.

* **Error Handling for Config**:
  Ensures `output_folder` is set before saving.

* **Memory Stashing**:
  Stores `summary_path` in memory, so follow-up steps or UI can show download links.

### 🤖 Agent Benefit

Ensures **reliable outputs**, can be extended easily (PDF, Markdown, etc), and keeps the result accessible in `ctx.memory`.

---

## ✨ BONUS IDEAS (for the future)

Here are **features you could add** if you want to grow the toolkit:

| Feature                                                      | Add To                            |
| ------------------------------------------------------------ | --------------------------------- |
| Token usage count                                            | `summarize()` or `llm.complete()` |
| Retry on failed summarization                                | `summarize()`                     |
| Metadata report (e.g. file name, word count, summary length) | `save_summary()`                  |
| Switchable formats (Markdown/PDF)                            | `save_summary()`                  |
| Embedding or vector store logging                            | `read_txt_file()`                 |

---

## ✅ Summary of Strengths

| Design Strength             | Why It Matters                                                  |
| --------------------------- | --------------------------------------------------------------- |
| Memory-based I/O            | Everything flows through memory — consistent access for tools   |
| Truncation & parsing        | Prevents LLM failure from long inputs                           |
| Friendly prompt scaffolding | Makes agent behavior more consistent                            |
| Error handling              | Helps LLMs recover or retry from mistakes                       |
| Modular steps               | Makes the pipeline inspectable, testable, and override-friendly |



In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOL REGISTRY — TYPES & REGISTRATION                                         ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
from dataclasses import dataclass
from typing import Callable

@dataclass
class ToolDef:
    name: str
    func: Callable
    description: str = ""
    schema: dict | None = None
    returns: dict | None = None   # optional metadata about outputs

class ToolRegistry:
    def __init__(self):
        self._tools = {}

    def register(self, tool: ToolDef):
        self._tools[tool.name] = tool

    def get(self, name: str) -> ToolDef:
        if name not in self._tools:
            raise KeyError(f"Unknown tool: {name}")
        return self._tools[name]

    def list(self):
        return list(self._tools.keys())

# -- Build registry -------------------------------------------------------------
registry = ToolRegistry()

registry.register(ToolDef(
    "create_plan",
    create_plan,
    "Create a plan from goal",
    schema={ "type": "object", "properties": {}, "required": [] },   # no kwargs
    returns={
        "type": "object",
        "properties": {
            "message": { "type": "string" },
            "steps":   { "type": "array", "items": { "type": "string" } }
        },
        "required": ["message", "steps"]
    }
))

registry.register(ToolDef(
  "read_txt_file", read_txt_file, "Read a .txt file from input_folder",
  schema={
    "type": "object",
    "properties": {"file_name": {"type": "string"}},
    "required": ["file_name"]
  },
  returns={
    "type": "object",
    "properties": {"message": {"type": "string"}, "length": {"type": "integer"}},
    "required": ["message"]
  },
))

registry.register(ToolDef(
    "list_txt_files",
    list_txt_files,
    "List .txt files in input_folder",
    schema={ "type": "object", "properties": {}, "required": [] },
    returns={
        "type": "object",
        "properties": {
            "ok":    { "type": "boolean" },
            "message": { "type": "string" },
            "files": { "type": "array", "items": { "type": "string" } },
            "count": { "type": "integer" }
        },
        "required": ["ok", "files"]
    }
))

registry.register(ToolDef(
  "generate_summary_prompt", generate_summary_prompt, "Build a summarization prompt",
  schema={
    "type": "object",
    "properties": {"max_len": {"type": "integer", "minimum": 1}},
    "required": []
  }
))

registry.register(ToolDef(
  "summarize", summarize, "Run LLM summarization",
  schema={"type": "object", "properties": {}, "required": []}
))

registry.register(ToolDef(
  "save_summary", save_summary, "Persist summary to output_folder",
  schema={
    "type": "object",
    "properties": {"out_name": {"type": "string"}},
    "required": []
  },
))


The **Tool Registry** is one of the most important architectural components in your agent. It not only organizes your tools but also sets the stage for intelligent decision-making and tool use by the LLM. Here's a breakdown of what to focus on and why this matters:

---

## 🧠 Purpose of the Tool Registry

The **Tool Registry** acts like the **agent’s toolbox index** — it:

1. **Catalogs all available tools** with their:

   * Function reference
   * Description (natural language)
   * Input schema (JSON Schema)
   * Output metadata (optional)

2. **Provides structured, discoverable access** for agents or UIs to:

   * List tools
   * Validate tool parameters
   * Present usage options to the LLM
   * Integrate with planners or routers

---

## 🧩 Key Components

### 1. `ToolDef` — the definition schema

This is a **structured metadata container** for tools.

```python
@dataclass
class ToolDef:
    name: str
    func: Callable
    description: str = ""
    schema: dict | None = None
    returns: dict | None = None
```

**Why this matters**:

* Enables **decoupling** of logic from registry code.
* Powers **UI introspection**, **LLM guidance**, and **validation**.
* Helps build **OpenAI function calls** or **LangChain tools** dynamically.

---

### 2. `ToolRegistry` — the runtime registry

Manages tools internally as a dictionary.

```python
class ToolRegistry:
    def __init__(self):
        self._tools = {}
```

With core methods:

* `.register(tool)` — add new tool
* `.get(name)` — fetch by name (with safety check)
* `.list()` — show available tools

**Why this matters**:

* Adds **consistency** and **centralization** to tool handling.
* Enables custom logic (e.g., search, filtering, categories).
* Supports **dynamic tool loading** or plug-and-play systems.

---

### 3. The Tool Registrations

You’re registering tools with their schema, description, and return shape:

```python
registry.register(ToolDef(
  "create_plan",
  create_plan,
  "Create a plan from goal",
  schema={ ... },
  returns={ ... }
))
```

**Why this matters**:

* Empowers LLMs to **choose valid tools** (via name + description).
* Enables agents to **auto-check** arguments before calling tools.
* Encourages **safe, testable, explainable behavior**.

---

## ✅ Why the Registry Is Critical

| Feature                | Benefit                                                                                      |
| ---------------------- | -------------------------------------------------------------------------------------------- |
| 🧩 **Modularity**      | Tools can be added/swapped without touching agent logic                                      |
| 📚 **Discoverability** | Agents (or UIs) can list & explain tools clearly                                             |
| 🔐 **Safety**          | Input schemas define what arguments are allowed                                              |
| 🧪 **Testing**         | Tools can be validated independently of the agent                                            |
| 📈 **Extensibility**   | Enables future use of dynamic tool loading, plugin support, UI forms, or OpenAI tool calling |

---

## 🧠 How It Helps the LLM

When the LLM wants to decide what to do next:

* It can query or reason over the tool list
* It can infer what arguments are valid via `schema`
* It can expect structured results via `returns`
* If used in a **router-style agent**, the agent can show available tools as context

This is how structured tool calling (like OpenAI's `function_calling` or LangChain's `Tool`) works under the hood.






## 🧠 What `@dataclass` Does (and Why It’s Used Here)

The `@dataclass` decorator in Python auto-generates a bunch of boilerplate code for classes that are **just containers for data**.

### Normally, you'd write:

```python
class ToolDef:
    def __init__(self, name, func, description="", schema=None, returns=None):
        self.name = name
        self.func = func
        self.description = description
        self.schema = schema
        self.returns = returns

    def __repr__(self):
        return f"ToolDef(name={self.name!r}, ...)"
```

### With `@dataclass`, all of that becomes:

```python
@dataclass
class ToolDef:
    name: str
    func: Callable
    description: str = ""
    schema: dict | None = None
    returns: dict | None = None
```

### Benefits in this context:

| Feature             | Why it matters here                                        |
| ------------------- | ---------------------------------------------------------- |
| 🧹 Less boilerplate | No need to write `__init__`, `__repr__`, `__eq__`, etc.    |
| 📦 Clarity          | Expresses intent: this class is **just** data              |
| 📋 Structure        | Easy to pass around in tool registration or introspection  |
| 🔄 Future-proofing  | If you want to serialize/compare/inspect tools, you're set |

---

## 🧠 Why It’s Only Used Here

The other classes (like `ActionContext`, `ScratchMemory`, `ToolRegistry`, etc.) are **not just data containers** — they contain **behavior** (methods), and manage **mutable state**, so decorators like `@dataclass` would not add much value.

Examples:

* `ActionContext` has memory, logging, and utility methods → it's an **operational object**
* `ToolRegistry` stores a dynamic set of tools → it's an **internal manager**
* `OpenAILLM` wraps a client and runtime config → it's a **service object**

For those, using a decorator like `@dataclass` would be unnecessary or misleading.

---

Using the `@dataclass` decorator **avoids** repetitive boilerplate code like the example you just showed:

```python
class ToolDef:
    def __init__(self, name, func, description="", schema=None, returns=None):
        self.name = name
        self.func = func
        self.description = description
        self.schema = schema
        self.returns = returns

    def __repr__(self):
        return f"ToolDef(name={self.name!r}, ...)"
```

Instead, with just:

```python
from dataclasses import dataclass
from typing import Callable

@dataclass
class ToolDef:
    name: str
    func: Callable
    description: str = ""
    schema: dict | None = None
    returns: dict | None = None
```

You automatically get:

* `__init__`
* `__repr__`
* `__eq__`
* `__hash__` (optional)
* `__asdict__()` (via `dataclasses.asdict()`)

---

## 🧼 Why This Matters

* **Cleaner code** → Less clutter, easier to read.
* **Fewer bugs** → Less manual copying of attribute names/values.
* **Consistency** → You don’t forget to update `__init__` or `__repr__` when you change fields.
* **Better tooling** → IDEs and type checkers can infer structure more reliably.

---

## ✅ When to Use `@dataclass`

Use it when your class:

* Just holds **structured data**
* Doesn't have significant **custom logic or mutable state**
* Is passed around frequently (like config objects, data schemas, etc.)

In your agent design, `ToolDef` is a perfect fit — it’s basically a registration form for tools.

---

## 🎯 Summary

* `@dataclass` is used **only** for **plain data holders**.
* `ToolDef` is the only class in your codebase that qualifies.
* This choice keeps your code **clean**, **intentional**, and **idiomatic**.
* It's **not** about LLMs or agent architecture directly — just good Python design.


In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ ENVIRONMENT — VALIDATION & EXECUTION                                         ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
import inspect

def _validate(schema, kwargs):
    """Minimal JSON-schema-ish validator for tool kwargs."""
    if not schema:
        return None
    missing = [k for k in schema.get("required", []) if k not in kwargs]
    if missing:
        return f"Missing required: {missing}"
    types = {"string": str, "integer": int, "number": (int, float), "boolean": bool}
    for k, spec in (schema.get("properties") or {}).items():
        if k in kwargs and "type" in spec:
            py_t = types.get(spec["type"])
            if py_t and not isinstance(kwargs[k], py_t):
                return f"Bad type for '{k}': expected {spec['type']}"
    return None

class Environment:
    """Runs tools by name with auto-DI, validation, and centralized logging."""
    def __init__(self, ctx: ActionContext, registry: ToolRegistry):
        self.ctx = ctx
        self.registry = registry

    def run(self, tool_name: str, **kwargs):
        tool = self.registry.get(tool_name)
        fn = tool.func
        sig = inspect.signature(fn)

        # 1) Schema validation BEFORE logging/exec
        v_err = _validate(tool.schema, kwargs)
        if v_err:
            self.ctx.track_progress(tool.name, "error", note=v_err[:180])
            return err(v_err)  # standardized envelope

        # 2) Build call args with auto-DI (ctx + underscore deps)
        call_args = {}
        for pname, param in sig.parameters.items():
            if pname == "ctx":
                call_args["ctx"] = self.ctx
            elif pname.startswith("_"):   # underscore dep, e.g. _fs, _clock
                dname = pname[1:]
                if dname not in self.ctx.deps:
                    msg = f"Missing dep '{dname}' for tool '{tool_name}'"
                    self.ctx.track_progress(tool.name, "error", note=msg[:180])
                    return err(msg)
                call_args[pname] = self.ctx.deps[dname]
            else:
                if pname in kwargs:
                    call_args[pname] = kwargs[pname]
                elif param.default is not inspect._empty:
                    pass
                else:
                    msg = f"Missing required arg '{pname}' for tool '{tool_name}'"
                    self.ctx.track_progress(tool.name, "error", note=msg[:180])
                    return err(msg)

        # 3) Log start, call tool
        self.ctx.track_progress(tool.name, "started", note=str(kwargs))
        try:
            result = fn(**call_args)
        except Exception as e:
            # Normalize exceptions into err(...) so the agent can handle them
            msg = f"{type(e).__name__}: {e}"
            self.ctx.track_progress(tool.name, "error", note=msg[:180])
            return err(msg)

        # 4) Normalize + log outcome
        if isinstance(result, dict):
            # If tool used envelope:
            if result.get("ok") is False:
                self.ctx.track_progress(tool.name, "error", note=str(result.get("error", ""))[:180])
                return result
            # Back-compat: dict returned with "error" but no "ok"
            if "ok" not in result and "error" in result:
                self.ctx.track_progress(tool.name, "error", note=str(result["error"])[:180])
                return {"ok": False, **result}
            # Success path: ensure ok=True for consistency
            result = result if "ok" in result else {"ok": True, **result}
            note = result.get("message", "")[:120]
            self.ctx.track_progress(tool.name, "completed", note=note)
            return result

        # Non-dict success (rare): mark completed with empty note
        self.ctx.track_progress(tool.name, "completed", note="")
        return result

This block is **the brainstem of your agent execution system** — and also the most complex. Let's break it down into digestible parts, with **what it does**, **why it matters**, and **what you should focus on**.

---

## 🧠 What Is Environment Class?

This is your **tool execution engine** — the `Environment` class. It's the core runtime that:

* Validates input against schemas
* Wires dependencies
* Handles errors
* Logs progress
* Returns normalized output

If the registry is your toolbox, this is the **tool operator** that picks up the right wrench and uses it correctly.

---

## 🔍 Overview of Major Sections

### 1. `_validate(schema, kwargs)`

Validates inputs **before calling a tool**.

* Mimics a mini JSON Schema validator.
* Ensures required fields are present.
* Checks types of parameters.

✅ **Why this helps**:

* Catches simple user mistakes (e.g., missing "file\_name").
* Prevents LLMs from calling tools with bad arguments.

---

### 2. `Environment.run(...)` — THE MAIN EVENT

#### 🔹 Step 1: Schema validation

```python
v_err = _validate(tool.schema, kwargs)
if v_err:
    self.ctx.track_progress(tool.name, "error", note=v_err[:180])
    return err(v_err)
```

* Ensures arguments to the tool match expectations.
* Fails fast if they're wrong.

✅ **Why it matters**:

* Protects tools from crashing due to bad input.
* Gives LLMs a consistent "you messed up" signal.

---

#### 🔹 Step 2: Build call arguments with **Auto-DI**

```python
for pname, param in sig.parameters.items():
```

This is where the magic happens. It uses Python's `inspect` module to:

* Detect all the arguments a tool needs (`ctx`, `file_name`, `_fs`, etc).
* Automatically inject:

  * `ctx` (context)
  * `deps` like `_fs` from `ctx.deps`
  * `kwargs` from the caller

✅ **Why it matters**:

* Makes tools easy to write — no need to manually wire dependencies.
* Enforces standardization: tools should **expect** `ctx`, and **use** underscore-prefixed optional deps (`_fs`, `_clock`, etc.).

---

#### 🔹 Step 3: Execution and Error Catching

```python
try:
    result = fn(**call_args)
except Exception as e:
    ...
    return err(msg)
```

✅ **Why it matters**:

* Prevents crashes and always returns a well-formed error envelope (`ok: False`).
* Allows the LLM to recover or retry.

---

#### 🔹 Step 4: Output normalization

```python
if isinstance(result, dict):
    if result.get("ok") is False:
        ...
    elif "ok" not in result and "error" in result:
        ...
    else:
        result = result if "ok" in result else {"ok": True, **result}
```

✅ **Why it matters**:

* Guarantees that every result has the same structure (`ok`, `error`, `message`, etc.).
* Ensures agents can reason over responses without ambiguity.

---

## 🎯 What You Should Focus On

| Concept                           | Why It Matters                                       |
| --------------------------------- | ---------------------------------------------------- |
| **Schema Validation**             | Stops garbage input from reaching tools.             |
| **Dependency Injection (DI)**     | Keeps tool code clean and flexible.                  |
| **Standardized Result Envelopes** | Lets agents handle success/failure without guessing. |
| **Progress Logging**              | Useful for both debugging and LLM self-awareness.    |
| **Error Containment**             | Avoids unhandled crashes that break the loop.        |



In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ SETUP & CONFIG                                                               ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
memory = ScratchMemory()
memory.set("goal", "Summarize the content of a text file.")

config = {
    "input_folder": "/content/files",
    "output_folder": "/content/output",
    # "summary_max_chars": 2400,  # optional
}

llm = OpenAILLM(
    client,
    model=config.get("model", "gpt-4o-mini"),
    temperature=config.get("temperature", 0.2),
)




## ✅ What This Block Does

This section initializes the **core components** your agent needs to run:

| Component                  | Purpose                                                                                        |
| -------------------------- | ---------------------------------------------------------------------------------------------- |
| `memory = ScratchMemory()` | Creates a scratchpad for the agent's short-term memory (e.g. storing `goal`, `raw_text`, etc). |
| `config = {...}`           | Provides runtime knobs and folders the agent tools will need (e.g. where to find files).       |
| `llm = OpenAILLM(...)`     | Instantiates the LLM wrapper with client, model, and temperature.                              |

---

## 🧠 Why It Matters

Think of this block as the **agent’s environment setup** before you plug it into the `ActionContext` or `Environment`.

It's like handing the agent its:

* 🧠 **Memory**
* 🔧 **Configuration knobs**
* 🗣️ **LLM interface**

Without these, even the best tools and registries wouldn’t know how or where to function.

---

## 🔍 What to Focus On

Here’s what deserves your attention as a developer/designer:

---

### 🔹 `ScratchMemory()`

* **Purpose**: Holds all in-flight values the agent needs during a run (e.g., `goal`, `summary`, `file_name`).
* **Focus**: Make sure keys like `"goal"`, `"plan"`, `"summary"` are named clearly and used consistently across tools.
* ✅ **Why it’s flexible**: It’s just a dictionary under the hood — dead simple for experimentation or debugging.

---

### 🔹 `config = {...}`

* **Purpose**: Stores global runtime settings.
* **Focus**:

  * Are all necessary folders and limits set?
  * Does `input_folder` match your actual file structure?
  * Consider exposing knobs like `summary_max_chars` for agent tuning.

🧩 This is where you configure **dynamic behavior without touching code**.

---

### 🔹 `llm = OpenAILLM(...)`

* **Purpose**: Wraps the OpenAI client so the agent can generate completions easily.
* **Focus**:

  * Is the **model name** set properly (`gpt-4o`, etc)?
  * Is the **temperature** appropriate for the task? (Low = factual, high = creative)
  * Can this wrapper be easily swapped for Anthropic, local models, etc? (Hint: yes!)

---

## 🧩 Summary

| Area     | What You Should Double-Check                         |
| -------- | ---------------------------------------------------- |
| `memory` | Are all necessary values seeded correctly?           |
| `config` | Are folders valid and tuned for your workflow?       |
| `llm`    | Are the model + params aligned with your task goals? |




In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ CONTEXT & ENVIRONMENT                                                        ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
# Create context with DI bag pre-populated (fs adapter)
ctx = ActionContext(memory=memory, llm=llm, config=config, deps={"fs": RealFS})

# Ensure folders exist (lightweight guardrails)
os.makedirs(ctx.config["input_folder"], exist_ok=True)
os.makedirs(ctx.config["output_folder"], exist_ok=True)
ctx.track_progress("setup", "completed", "goal + config injected")

# Build environment (validation + auto-DI + centralized logging)
env = Environment(ctx, registry)


This is a **critical section** — it's the bridge between *setup* and *execution*. Think of it as the moment your agent goes from being a set of disconnected tools to becoming a **self-aware operating unit**.

Let’s break it down so you know **exactly what to focus on** as a builder:

---

## 🧱 What This Block Does

| Step                       | Purpose                                                                          |
| -------------------------- | -------------------------------------------------------------------------------- |
| `ctx = ActionContext(...)` | Binds all runtime components into one shared context for every tool.             |
| `os.makedirs(...)`         | Ensures the I/O folders exist — a simple safety net.                             |
| `ctx.track_progress(...)`  | Logs that setup succeeded — useful for audits and debugging.                     |
| `env = Environment(...)`   | Instantiates the engine that runs tools, with validation + dependency injection. |

---

## 🎯 What You Should Focus On

---

### 🔹 `ActionContext(...)`

This is your agent’s **backpack**. It carries everything the tools need to do their jobs:

* `memory`: Temporary working state (e.g., goals, raw text)
* `llm`: The language model interface
* `config`: Settings like folders or runtime knobs
* `deps`: Injected low-level dependencies like filesystem access (`fs`)

🔍 **Key detail**: You're injecting `RealFS` via the `deps={"fs": RealFS}` bag — this makes `_fs` in tools work automatically.

✅ **What to verify**:

* Are all tools using `ctx.memory`, `ctx.config`, and `ctx.deps` properly?
* Do you have a clear convention for what goes into `deps`? (e.g., clock, UUID, time, etc)

---

### 🔹 `os.makedirs(...)`

These are **lightweight guardrails**. They make sure the folders your tools rely on won’t throw errors.

✅ **Check**:

* Are `"input_folder"` and `"output_folder"` set correctly in `config`?
* Are any default paths safe to use on your machine/cloud setup?

---

### 🔹 `ctx.track_progress(...)`

This logs that **setup is complete**, using the same standardized progress tracker the rest of the agent uses.

✅ **Why this matters**:

* You’ll see this log when debugging or monitoring workflows.
* It gives you a first “checkpoint” so you can detect failures later.

---

### 🔹 `env = Environment(...)`

This is where you **give the agent its execution engine**.

The `Environment` handles:

* Validation of tool schemas
* Auto-wiring of dependencies (ctx, \_fs, etc)
* Error standardization
* Progress tracking

✅ **Check**:

* Is this wired to the correct `ctx` and `registry`?
* Are all tools registered properly before the environment is built?

---

## 🧩 Summary: What to Double-Check

| Area                     | Focus                                                                                 |
| ------------------------ | ------------------------------------------------------------------------------------- |
| `ActionContext(...)`     | Are all core components (memory, llm, config, deps) present and correct?              |
| `deps={"fs": RealFS}`    | Are you using the right adapters for tools? Could you inject other helpful utilities? |
| `os.makedirs(...)`       | Are folders correct and safe to create on your target system?                         |
| `track_progress(...)`    | Does this give you a clear setup signal in logs?                                      |
| `env = Environment(...)` | Is this built after registry setup is complete?                                       |



In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ AGENT STEPS (SCRIPTED PIPELINE)                                              ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
file_name = "004_AGENT_Tools.txt"
steps = [
    ("create_plan", {}),
    ("read_txt_file", {"file_name": file_name}),
    ("generate_summary_prompt", {}),  # or {"max_len": 2400}
    ("summarize", {}),
    ("save_summary", {}),
]


This block defines the **scripted pipeline** — a deterministic list of tool steps that your agent will run *in order*. It’s like handing your agent a mission briefing: “Here’s what to do, and in what order.”

---

## 🔍 What This Block Does

### 🧾 `file_name = "004_AGENT_Tools.txt"`

Sets the name of the input file the agent will read and summarize.

> 📌 **Used in**: The `read_txt_file` step, passed as a parameter.

---

### 🪜 `steps = [...]`

This is the **ordered sequence** of tool invocations. Each entry is a tuple:

```python
("tool_name", {arguments_dict})
```

Here’s what each one does:

| Step                                          | Tool                                                  | Purpose |
| --------------------------------------------- | ----------------------------------------------------- | ------- |
| `("create_plan", {})`                         | Calls the planner to break the goal into steps.       |         |
| `("read_txt_file", {"file_name": file_name})` | Reads the contents of the text file into memory.      |         |
| `("generate_summary_prompt", {})`             | Trims the text and formats the prompt for the LLM.    |         |
| `("summarize", {})`                           | Feeds the prompt to the LLM and stores the result.    |         |
| `("save_summary", {})`                        | Saves the LLM-generated summary to the output folder. |         |

---

## 🎯 What You Should Focus On

---

### ✅ 1. **Tool Names Must Match Registry**

Every tool name must match what you registered in the `ToolRegistry`. If you misspell a tool or forget to register it, `env.run(...)` will throw a `KeyError`.

---

### ✅ 2. **Arguments Must Match Tool Schema**

Each tool may expect parameters (like `file_name`). If a required param is missing, the `Environment` will catch it during validation and return a standardized `err(...)`.

---

### ✅ 3. **Order Matters**

These tools build on each other. For example:

* `generate_summary_prompt` needs `raw_text`, which is only available *after* `read_txt_file`
* `summarize` needs the prompt generated by the previous step

> 💡 You can think of each tool like a **pipe** in a plumbing system — the `ctx.memory` is the water flowing through it.

---

### ✅ 4. **Steps Can Be Dynamic**

Although this is a **hard-coded plan**, you could dynamically generate or modify the `steps` list based on:

* User preferences
* LLM-created plans
* Prior run results in memory

---

**That's one of the most powerful capabilities** of your agent scaffold: the `create_plan` tool can be used by the LLM itself to **autonomously generate a dynamic, goal-specific pipeline**.

---

## 🧠 How It Works in Practice

### 1. **Goal Is Set by User**

You inject a high-level task into memory like:

```python
memory.set("goal", "Summarize all customer feedback files and highlight top pain points.")
```

---

### 2. **LLM Uses `create_plan`**

The agent calls the `create_plan` tool, which prompts the LLM to break that goal into concrete, ordered steps.

The LLM might respond with:

```
1. List all feedback files.
2. Read each file.
3. Generate a summary for each.
4. Extract key pain points.
5. Combine and save the final report.
```

This output becomes a memory-stored `plan`:

```python
ctx.memory.get("plan")  # list of clean, normalized steps
```

---

### 3. **LLM or Agent Maps Steps to Tools**

The agent can now **map those plan steps to tool names** (from the registry) and arguments. For example:

```python
[
  ("list_txt_files", {}),
  ("read_txt_file", {"file_name": "feedback1.txt"}),
  ("generate_summary_prompt", {}),
  ("summarize", {}),
  ("save_summary", {"out_name": "feedback1_summary.txt"})
]
```

This dynamic plan becomes the `steps` array — either built by the LLM or another planning module.

---

### 4. **Execution Loop Runs It**

You run each step via:

```python
for tool_name, kwargs in steps:
    result = env.run(tool_name, **kwargs)
```

---

## 🔑 Benefits of LLM-Generated Plans

| Feature                  | Benefit                                                             |
| ------------------------ | ------------------------------------------------------------------- |
| Goal-based flexibility   | No need to hard-code pipelines                                      |
| Adaptable to new domains | Works for summarization, coding, research, etc.                     |
| Customizable             | User could inject constraints (“don’t read files larger than 10MB”) |
| Human-readable           | Plan steps can be shown to the user for approval                    |







In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ RUN AGENT                                                                    ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
agent = ScriptedAgent(env, steps)
final = agent.run(max_calls=10)  # optional guard
print("Agent result:", final["final"])
if "hint" in final:
    print("💡 Hint:", final["hint"])


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ PRETTY PRINTS (FROM MEMORY)                                                  ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
plan = ctx.memory.get("plan") or []
print("\nPlan:")
for s in plan:
    print("-", s)

raw_text = ctx.memory.get("raw_text") or ""
print("\n📄 File Preview:\n")
print(textwrap.fill(raw_text[:600], width=80, subsequent_indent="  "))

prompt = ctx.memory.get("summary_prompt") or ""
print("\n🧾 Prompt Preview:\n")
print(textwrap.fill(prompt[:600], width=80, subsequent_indent="  "))

summary = ctx.memory.get("summary") or ""
print("\n📝 Summary Preview:\n")
print(textwrap.fill(summary[:1000], width=80, subsequent_indent="  "))

if ctx.memory.get("summary_path"):
    print("\n📄 Saved to:", ctx.memory.get("summary_path"))


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ CONTEXT SNAPSHOT / PROGRESS LOG                                              ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
print("\n" + "="*80)
print("📦 ActionContext Snapshot")
ctx.print_progress()



## ✅ AGENT RUN

```python
agent = ScriptedAgent(env, steps)
final = agent.run(max_calls=10)
```

### Key Points:

* **`ScriptedAgent`** takes the `env` (which knows how to run tools) and the ordered `steps` list.
* **`run(max_calls=10)`** is a safety guard — prevents runaway loops.
* **`final`** is the last step's output, wrapped in the `ok()` or `err()` envelope.

### Why this matters:

* This is **the main execution driver**. All planning, memory setup, registry config, and tool wiring comes together *here*.
* `ScriptedAgent` is dumb by design (just runs steps in order), but it's enough to prove your architecture is working and test end-to-end flow.

---

## 🖨️ PRETTY PRINTS

This section gives **debug visibility into the agent’s state**.

```python
plan = ctx.memory.get("plan") or []
print("\nPlan:")
for s in plan:
    print("-", s)
```

Prints the **human-readable step list** created by the LLM.

---

```python
raw_text = ctx.memory.get("raw_text") or ""
print("\n📄 File Preview:\n")
```

Gives a **peek at the input content**, helping validate the `read_txt_file` step worked.

---

```python
prompt = ctx.memory.get("summary_prompt") or ""
print("\n🧾 Prompt Preview:\n")
```

Shows what the LLM was fed — useful for debugging prompt quality or truncation issues.

---

```python
summary = ctx.memory.get("summary") or ""
print("\n📝 Summary Preview:\n")
```

Outputs the **actual LLM-generated summary**.

---

```python
if ctx.memory.get("summary_path"):
    print("\n📄 Saved to:", ctx.memory.get("summary_path"))
```

Confirms the result was successfully saved by `save_summary`.

---

## 🧾 CONTEXT SNAPSHOT

```python
ctx.print_progress()
```

Prints your **progress log** (step-by-step status trail):

* Which tools were called
* Whether they succeeded or failed
* Any hints or notes

This is incredibly useful for tracing and debugging tool pipelines.

---

## ✅ Summary: What to Focus On

| Focus Area             | Why It Matters                                     |
| ---------------------- | -------------------------------------------------- |
| `agent.run(...)`       | Runs the actual toolchain; triggers all the logic  |
| `ctx.memory[...]`      | Captures what the agent “knows” at each step       |
| Pretty prints          | Give you visibility into key inputs/outputs        |
| `ctx.print_progress()` | Tracks status of each step, critical for debugging |




## 🧠 Underscores in Tool Parameters (not Tool Names)

In your final agent code, **tools themselves are not named with underscores**, but some of their **dependencies are passed using parameter names with leading underscores**. That’s the important distinction.

---

### ✅ Example: `save_summary` Tool

```python
def save_summary(ctx, out_name=None, _fs=os):
```

Here, `_fs` is a **dependency**. The leading underscore tells the `Environment` class:

> “Don’t expect the agent to pass this in explicitly — inject it from `ctx.deps`.”

And sure enough, in setup:

```python
ctx = ActionContext(..., deps={"fs": RealFS})
```

So:

* The tool **doesn’t need to know** where `_fs` came from.
* The `Environment` handles the injection automagically based on name (`_fs → deps["fs"]`).

---

### ❌ We do **not** do this:

```python
def _save_summary(...):
```

Tool **names** and registry identifiers don’t use underscores. They’re simple and human-readable:

```python
"save_summary", "read_txt_file", "create_plan"
```

---

## ✅ Why This Matters

This convention keeps tools clean:

* Use underscores **only in parameter names** to signal dependency injection.
* Avoid underscores in tool names, so agents and registries stay human-readable and composable.

---

### In short:

| Use                | Example                 | Purpose                                        |
| ------------------ | ----------------------- | ---------------------------------------------- |
| Tool name          | `"save_summary"`        | Registered for use by agents                   |
| Tool param (DI)    | `_fs`                   | Tells the `Environment` to inject `deps["fs"]` |
| Tool function name | `def save_summary(...)` | Simple, readable — no leading `_`              |






The leading underscores (`_`) are **only for the `Environment` class to recognize dependencies** during execution. Here's a breakdown of how it works:

---

## ✅ **Underscores Signal Dependency Injection (DI)**

When the `Environment.run()` method prepares to call a tool, it uses the tool’s function signature to decide **what to pass in**.

```python
def save_summary(ctx, out_name=None, _fs=os):
```

In this signature:

* `ctx` → will be passed directly (always).
* `out_name` → comes from the agent/tool call arguments.
* `_fs` → **is not expected from the agent.** Instead, the `Environment` sees the `_` and goes:

  > “This is a dependency — look in `ctx.deps` for `'fs'` and inject it.”

---

### 🔧 The key design:

```python
elif pname.startswith("_"):  # underscore dep, e.g. _fs
    dname = pname[1:]        # strip underscore → 'fs'
    call_args[pname] = self.ctx.deps[dname]
```

That line from the `Environment.run()` method is what does the magic:

* `_fs` → pulls `fs` from `ctx.deps` and injects it as `_fs`.

---

## ❗ Important Clarification:

* **The tools themselves don’t know or care** where `_fs` came from.
* **Only the `Environment` cares about the underscore.**

---

## 🎁 Why This Is Smart

* Keeps tool functions clean and minimal — no need to clutter them with setup code.
* Keeps dependencies configurable — you could swap out `fs` or `clock` for mocks in testing.
* Makes tool reuse trivial — just register the tool and provide its dependencies.



In [None]:
# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ SETUP (Notebook-only)                                                        ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
!pip -q install openai python-dotenv


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ IMPORTS                                                                      ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
from openai import OpenAI
from dotenv import load_dotenv
import os
import textwrap
import time
import re
import inspect
from typing import Callable, Optional
from dataclasses import dataclass
import builtins


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ OPENAI CLIENT & ENV VARS                                                     ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
# Loads API key from a .env file and initializes the OpenAI client.
load_dotenv('/content/API_KEYS.env')
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise RuntimeError("OPENAI_API_KEY not found in /content/API_KEYS.env")
client = OpenAI(api_key=api_key)


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ STANDARD RESULT ENVELOPE (ok / err)                                          ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def ok(**data):
    """Successful tool result. Add any fields you like."""
    return {"ok": True, **data}

def err(msg, hint=None, retryable=False, **extra):
    """Error result with optional guidance and flags."""
    out = {"ok": False, "error": msg, "retryable": retryable}
    if hint:
        out["hint"] = hint
    if extra:
        out.update(extra)
    return out


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ FILESYSTEM ADAPTER (for underscore-DI: _fs)                                  ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
# RealFS exposes .path/.makedirs/.open so tools can accept a pluggable FS.
class RealFS:
    path = os.path
    makedirs = staticmethod(os.makedirs)
    open = staticmethod(builtins.open)


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ MEMORY & CONTEXT                                                             ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
class ScratchMemory:
    """Minimal in-memory key/value store for agent state."""
    def __init__(self):
        self.store = {}

    def get(self, key, default=None):   # default added for convenience
        return self.store.get(key, default)

    def set(self, key, value):
        self.store[key] = value

# Valid progress states for centralized logging.
VALID_STATUSES = {"started", "completed", "error"}

class ActionContext:
    """
    The agent's 'backpack':
      - memory: state across steps
      - llm:    LLM wrapper
      - config: runtime configuration (folders, knobs)
      - deps:   injectable dependencies (e.g., fs/clock)
    """
    def __init__(self, memory, llm, config=None, deps=None):
        self.memory = memory
        self.llm = llm
        self.config = config or {}
        self.deps = deps or {}

    # --- progress helpers ---
    def track_progress(self, step, status, note=""):
        if status not in VALID_STATUSES:
            raise ValueError(f"Invalid status '{status}'. Use {VALID_STATUSES}.")
        log = self.memory.get("progress_log") or []
        log.append({
            "step": step,
            "status": status,
            "note": note,
            "time": time.strftime("%Y-%m-%d %H:%M:%S"),
        })
        self.memory.set("progress_log", log)

    def print_progress(self):
        log = self.memory.get("progress_log") or []
        print("\n📊 Progress Log:")
        for e in log:
            t = f" ({e.get('time')})" if e.get("time") else ""
            note = f" — {e['note']}" if e.get("note") else ""
            print(f"- [{e['status']}] {e['step']}{t}{note}")

    def last_completed_step(self):
        log = self.memory.get("progress_log") or []
        for e in reversed(log):
            if e.get("status") == "completed":
                return e.get("step")
        return None

    def first_error(self):
        log = self.memory.get("progress_log") or []
        for e in log:
            if e.get("status") == "error":
                return e
        return None

# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ LLM WRAPPER                                                                 ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
class OpenAILLM:
    def __init__(self, client, model="gpt-4o-mini", temperature=0.2):
        self.client = client
        self.model = model
        self.temperature = temperature

    def complete(self, prompt, **kwargs):
        temp = kwargs.get("temperature", self.temperature)
        resp = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temp,
        )
        return resp.choices[0].message.content


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: PLANNING                                                              ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def create_plan(ctx):
    goal = ctx.memory.get("goal")
    if not goal:
        return err("No goal provided (memory key 'goal' missing).",
                   hint="Set ctx.memory['goal'] before calling create_plan")

    prompt = f"""You are an expert task planner. Given the goal below, break it down into a clear, short list of steps.

Goal: {goal}

Respond ONLY with a numbered list, one step per line. No extra prose."""
    raw = ctx.llm.complete(prompt).strip()

    # Prefer numbered steps like "1. ...", "2) ..."
    numbered = re.findall(r'^\s*(?:\d+[\).\s-]+)\s*(.+)$', raw, flags=re.M)

    if numbered:
        steps = numbered
    else:
        # Fallback to bullets like "- ...", "* ...", "• ..."
        bullets = re.findall(r'^\s*(?:[-*•]\s+)(.+)$', raw, flags=re.M)
        steps = bullets if bullets else [ln.strip() for ln in raw.splitlines() if ln.strip()]

    # Normalize: collapse spaces, trim punctuation, drop empties/dupes
    clean_steps = []
    seen = set()
    for s in steps:
        s = re.sub(r'\s+', ' ', s).strip(' .')
        if s and s.lower() not in seen:
            seen.add(s.lower())
            clean_steps.append(s)

    if not clean_steps:
        return err("Planner returned no steps.",
                   hint="Refine the goal or relax the parser constraints")

    ctx.memory.set("plan", clean_steps)
    # optional: match handbook wording
    # ctx.memory.set("current_plan", clean_steps)

    return ok(message="Plan created from goal.", steps=clean_steps)


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: I/O (FILES)                                                           ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def read_txt_file(ctx, file_name):
    base = os.path.abspath(ctx.config.get("input_folder", ""))
    path = os.path.abspath(os.path.join(base, file_name))
    if not base or not path.startswith(base + os.sep):
        return err("Path traversal blocked.", retryable=False)

    if not os.path.exists(path):
        return err(f"File not found: {path}",
                   hint="Call list_txt_files to see available files",
                   retryable=True)

    with open(path, "r", encoding="utf-8") as f:
        text = f.read()

    ctx.memory.set("file_name", file_name)
    ctx.memory.set("raw_text", text)
    return ok(message="File read successfully.", length=len(text))


# ── Helper: list available .txt files (for JIT guidance) ───────────────────────
def list_txt_files(ctx):
    base = ctx.config.get("input_folder")
    if not base:
        return err("No input_folder in config.", hint="Set ctx.config['input_folder']")
    if not os.path.isdir(base):
        return err(f"Input folder not found: {base}", retryable=False)

    files = sorted(f for f in os.listdir(base) if f.endswith(".txt"))
    ctx.memory.set("available_txt_files", files)  # optional: stash for UI/agent
    return ok(message=f"Found {len(files)} .txt files.", files=files, count=len(files))


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: SUMMARIZATION                                                         ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def generate_summary_prompt(ctx, max_len=None):
    text = ctx.memory.get("raw_text")
    if not text:
        return err("No raw text found in memory.",
                   hint="Run read_txt_file before generate_summary_prompt")
    if max_len is None:
        max_len = ctx.config.get("summary_max_chars", 2000)

    truncated = len(text) > max_len
    short_text = text[:max_len]

    ctx.memory.set("was_truncated", truncated)
    ctx.memory.set("source_length", len(text))
    ctx.memory.set("used_length", len(short_text))
    ctx.memory.set("summary_prompt", f"""You are an expert technical writer.

Summarize the following content into a set of clear, concise bullet points...
\"\"\"{short_text}\"\"\"

Summary:""")

    return ok(message="Summary prompt created.",
              truncated=truncated, used=len(short_text), total=len(text),
              prompt_preview=ctx.memory.get("summary_prompt")[:600])


def summarize(ctx):
    prompt = ctx.memory.get("summary_prompt")
    if not prompt:
        return err("No summary prompt found in memory.",
                   hint="Run generate_summary_prompt before summarize")
    response = ctx.llm.complete(prompt)
    ctx.memory.set("summary", response)
    return ok(message="Summary completed.", summary_preview=response[:1000])


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOLS: OUTPUT                                                                ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
def save_summary(ctx, out_name=None, _fs=os):
    summary = ctx.memory.get("summary")
    if not summary:
        return err("No summary in memory.",
                   hint="Run summarize before save_summary")
    out_dir = ctx.config.get("output_folder")
    if not out_dir:
        return err("No output_folder in config.",
                   hint="Set ctx.config['output_folder']")

    _fs.makedirs(out_dir, exist_ok=True)
    src = ctx.memory.get("file_name", "summary")
    root, _ = os.path.splitext(os.path.basename(src))
    base = out_name or f"{root}_summary.txt"
    path = _fs.path.join(out_dir, base)

    with _fs.open(path, "w", encoding="utf-8") as f:
        f.write(summary)

    ctx.memory.set("summary_path", path)
    return ok(message="Summary saved.", path=path)

# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ TOOL REGISTRY — TYPES & REGISTRATION                                         ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
from dataclasses import dataclass
from typing import Callable

@dataclass
class ToolDef:
    name: str
    func: Callable
    description: str = ""
    schema: dict | None = None
    returns: dict | None = None   # optional metadata about outputs

class ToolRegistry:
    def __init__(self):
        self._tools = {}

    def register(self, tool: ToolDef):
        self._tools[tool.name] = tool

    def get(self, name: str) -> ToolDef:
        if name not in self._tools:
            raise KeyError(f"Unknown tool: {name}")
        return self._tools[name]

    def list(self):
        return list(self._tools.keys())

# -- Build registry -------------------------------------------------------------
registry = ToolRegistry()

registry.register(ToolDef(
    "create_plan",
    create_plan,
    "Create a plan from goal",
    schema={ "type": "object", "properties": {}, "required": [] },   # no kwargs
    returns={
        "type": "object",
        "properties": {
            "message": { "type": "string" },
            "steps":   { "type": "array", "items": { "type": "string" } }
        },
        "required": ["message", "steps"]
    }
))

registry.register(ToolDef(
  "read_txt_file", read_txt_file, "Read a .txt file from input_folder",
  schema={
    "type": "object",
    "properties": {"file_name": {"type": "string"}},
    "required": ["file_name"]
  },
  returns={
    "type": "object",
    "properties": {"message": {"type": "string"}, "length": {"type": "integer"}},
    "required": ["message"]
  },
))

registry.register(ToolDef(
    "list_txt_files",
    list_txt_files,
    "List .txt files in input_folder",
    schema={ "type": "object", "properties": {}, "required": [] },
    returns={
        "type": "object",
        "properties": {
            "ok":    { "type": "boolean" },
            "message": { "type": "string" },
            "files": { "type": "array", "items": { "type": "string" } },
            "count": { "type": "integer" }
        },
        "required": ["ok", "files"]
    }
))

registry.register(ToolDef(
  "generate_summary_prompt", generate_summary_prompt, "Build a summarization prompt",
  schema={
    "type": "object",
    "properties": {"max_len": {"type": "integer", "minimum": 1}},
    "required": []
  }
))

registry.register(ToolDef(
  "summarize", summarize, "Run LLM summarization",
  schema={"type": "object", "properties": {}, "required": []}
))

registry.register(ToolDef(
  "save_summary", save_summary, "Persist summary to output_folder",
  schema={
    "type": "object",
    "properties": {"out_name": {"type": "string"}},
    "required": []
  },
))


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ ENVIRONMENT — VALIDATION & EXECUTION                                         ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
import inspect

def _validate(schema, kwargs):
    """Minimal JSON-schema-ish validator for tool kwargs."""
    if not schema:
        return None
    missing = [k for k in schema.get("required", []) if k not in kwargs]
    if missing:
        return f"Missing required: {missing}"
    types = {"string": str, "integer": int, "number": (int, float), "boolean": bool}
    for k, spec in (schema.get("properties") or {}).items():
        if k in kwargs and "type" in spec:
            py_t = types.get(spec["type"])
            if py_t and not isinstance(kwargs[k], py_t):
                return f"Bad type for '{k}': expected {spec['type']}"
    return None

class Environment:
    """Runs tools by name with auto-DI, validation, and centralized logging."""
    def __init__(self, ctx: ActionContext, registry: ToolRegistry):
        self.ctx = ctx
        self.registry = registry

    def run(self, tool_name: str, **kwargs):
        tool = self.registry.get(tool_name)
        fn = tool.func
        sig = inspect.signature(fn)

        # 1) Schema validation BEFORE logging/exec
        v_err = _validate(tool.schema, kwargs)
        if v_err:
            self.ctx.track_progress(tool.name, "error", note=v_err[:180])
            return err(v_err)  # standardized envelope

        # 2) Build call args with auto-DI (ctx + underscore deps)
        call_args = {}
        for pname, param in sig.parameters.items():
            if pname == "ctx":
                call_args["ctx"] = self.ctx
            elif pname.startswith("_"):   # underscore dep, e.g. _fs, _clock
                dname = pname[1:]
                if dname not in self.ctx.deps:
                    msg = f"Missing dep '{dname}' for tool '{tool_name}'"
                    self.ctx.track_progress(tool.name, "error", note=msg[:180])
                    return err(msg)
                call_args[pname] = self.ctx.deps[dname]
            else:
                if pname in kwargs:
                    call_args[pname] = kwargs[pname]
                elif param.default is not inspect._empty:
                    pass
                else:
                    msg = f"Missing required arg '{pname}' for tool '{tool_name}'"
                    self.ctx.track_progress(tool.name, "error", note=msg[:180])
                    return err(msg)

        # 3) Log start, call tool
        self.ctx.track_progress(tool.name, "started", note=str(kwargs))
        try:
            result = fn(**call_args)
        except Exception as e:
            # Normalize exceptions into err(...) so the agent can handle them
            msg = f"{type(e).__name__}: {e}"
            self.ctx.track_progress(tool.name, "error", note=msg[:180])
            return err(msg)

        # 4) Normalize + log outcome
        if isinstance(result, dict):
            # If tool used envelope:
            if result.get("ok") is False:
                self.ctx.track_progress(tool.name, "error", note=str(result.get("error", ""))[:180])
                return result
            # Back-compat: dict returned with "error" but no "ok"
            if "ok" not in result and "error" in result:
                self.ctx.track_progress(tool.name, "error", note=str(result["error"])[:180])
                return {"ok": False, **result}
            # Success path: ensure ok=True for consistency
            result = result if "ok" in result else {"ok": True, **result}
            note = result.get("message", "")[:120]
            self.ctx.track_progress(tool.name, "completed", note=note)
            return result

        # Non-dict success (rare): mark completed with empty note
        self.ctx.track_progress(tool.name, "completed", note="")
        return result


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ SCRIPTED AGENT — FIXED PIPELINE RUNNER                                       ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
class ScriptedAgent:
    """Executes a predetermined sequence of (tool_name, kwargs) steps."""
    def __init__(self, env, steps):
        self.env = env
        self.steps = steps

    def run(self, max_calls=None, stop_on_error=True):
        calls = 0
        for name, kwargs in self.steps:
            if max_calls is not None and calls >= max_calls:
                return {"final": f"stopped: max_calls={max_calls}"}
            res = self.env.run(name, **(kwargs or {}))
            calls += 1
            if stop_on_error and isinstance(res, dict) and res.get("ok") is False:
                # include hint so you know the next best step
                out = {"final": f"stopped at {name}: {res['error']}"}
                if "hint" in res: out["hint"] = res["hint"]
                return out
        return {"final": "done"}


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ SETUP & CONFIG                                                               ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
memory = ScratchMemory()
memory.set("goal", "Summarize the content of a text file.")

config = {
    "input_folder": "/content/files",
    "output_folder": "/content/output",
    # "summary_max_chars": 2400,  # optional
}

llm = OpenAILLM(
    client,
    model=config.get("model", "gpt-4o-mini"),
    temperature=config.get("temperature", 0.2),
)


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ CONTEXT & ENVIRONMENT                                                        ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
# Create context with DI bag pre-populated (fs adapter)
ctx = ActionContext(memory=memory, llm=llm, config=config, deps={"fs": RealFS})

# Ensure folders exist (lightweight guardrails)
os.makedirs(ctx.config["input_folder"], exist_ok=True)
os.makedirs(ctx.config["output_folder"], exist_ok=True)
ctx.track_progress("setup", "completed", "goal + config injected")

# Build environment (validation + auto-DI + centralized logging)
env = Environment(ctx, registry)


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ AGENT STEPS (SCRIPTED PIPELINE)                                              ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
file_name = "004_AGENT_Tools.txt"
steps = [
    ("create_plan", {}),
    ("read_txt_file", {"file_name": file_name}),
    ("generate_summary_prompt", {}),  # or {"max_len": 2400}
    ("summarize", {}),
    ("save_summary", {}),
]


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ RUN AGENT                                                                    ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
agent = ScriptedAgent(env, steps)
final = agent.run(max_calls=10)  # optional guard
print("Agent result:", final["final"])
if "hint" in final:
    print("💡 Hint:", final["hint"])


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ PRETTY PRINTS (FROM MEMORY)                                                  ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
plan = ctx.memory.get("plan") or []
print("\nPlan:")
for s in plan:
    print("-", s)

raw_text = ctx.memory.get("raw_text") or ""
print("\n📄 File Preview:\n")
print(textwrap.fill(raw_text[:600], width=80, subsequent_indent="  "))

prompt = ctx.memory.get("summary_prompt") or ""
print("\n🧾 Prompt Preview:\n")
print(textwrap.fill(prompt[:600], width=80, subsequent_indent="  "))

summary = ctx.memory.get("summary") or ""
print("\n📝 Summary Preview:\n")
print(textwrap.fill(summary[:1000], width=80, subsequent_indent="  "))

if ctx.memory.get("summary_path"):
    print("\n📄 Saved to:", ctx.memory.get("summary_path"))


# ╔══════════════════════════════════════════════════════════════════════════════╗
# ║ CONTEXT SNAPSHOT / PROGRESS LOG                                              ║
# ╚══════════════════════════════════════════════════════════════════════════════╝
print("\n" + "="*80)
print("📦 ActionContext Snapshot")
ctx.print_progress()
