<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/090_Tools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



## 1. **Tool Definition for Agents**

* Just giving an LLM a function name isn’t enough — you must **explicitly describe**:

  * **What the tool does** (description)
  * **What parameters it takes**
  * **What type each parameter is**
* Without this, the LLM may not know:

  * Which tool to choose
  * What arguments to pass
  * How to format them correctly

---

## 2. **Structured Metadata**

* We move from “here’s a Python function” → “here’s a **machine-readable description** of the function.”
* The example here uses **JSON Schema** — a standard for describing data shapes.

**Why?**

* LLM can parse the description and know exactly how to call the tool.
* The execution environment can **validate inputs** before running them.
* Clear separation between **what the LLM outputs** and **how the tool runs**.

---

## 3. **JSON Schema Example**

**Read File Tool Schema**

```json
{
  "tool_name": "read_file",
  "description": "Reads the content of a specified file.",
  "parameters": {
    "type": "object",
    "properties": {
      "file_path": { "type": "string" }
    },
    "required": ["file_path"]
  }
}
```

**Key points:**

* `"type": "object"` → arguments are passed in a dictionary.
* `"required"` → must provide `file_path`.
* Allows *automated validation*.

---

## 4. **Why Wrap Parameters in an Object**

* In your action output, the LLM should produce something like:

```json
{
  "tool_name": "read_file",
  "args": {
    "file_path": "src/file.py"
  }
}
```

* This consistent `{"tool_name": ..., "args": {...}}` format matches what your `parse_action` expects.
* JSON Schema just describes the **shape** of `args`.

---

## 5. **Practical Impact**

* This is laying the groundwork for:

  1. Giving your agent **more tools** safely.
  2. Letting the agent **pick the right one** based on a description.
  3. Validating inputs before execution so you don’t run bad commands.




In [2]:
# --- Lecture 4: Agent Tools with JSON Schema (minimal style) ---
!pip -q install openai python-dotenv

import os, json, re
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI

# ---- Setup ----
load_dotenv('/content/API_KEYS.env')
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise RuntimeError("OPENAI_API_KEY not found in /content/API_KEYS.env")
client = OpenAI(api_key=api_key)

FILES_DIR = Path("/content/files")
DOCS_DIR = Path("/content/docs")
DOCS_DIR.mkdir(parents=True, exist_ok=True)

# ---- Tool registry: metadata + simple JSON-Schema-like validation ----
# We keep the schema very small (type: string, required list) for clarity.
TOOLS = {
    "list_files": {
        "description": "Returns a list of file names in /content/files.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": []
        }
    },
    "read_file": {
        "description": "Reads the content of a specified file.",
        "parameters": {
            "type": "object",
            "properties": {
                "file_path": {"type": "string"}
            },
            "required": ["file_path"]
        }
    },
    "write_doc_file": {
        "description": "Writes a documentation file to /content/docs.",
        "parameters": {
            "type": "object",
            "properties": {
                "file_name": {"type": "string"},
                "content": {"type": "string"}
            },
            "required": ["file_name", "content"]
        }
    },
    "terminate": {
        "description": "Stop the agent with a final message.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {"type": "string"}
            },
            "required": ["message"]
        }
    },
    "error": {
        "description": "Signal a formatting or usage error.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {"type": "string"}
            },
            "required": ["message"]
        }
    }
}

# Minimal validator for our tiny schema subset

def validate_args(schema, args):
    if schema.get("type") != "object":
        return False, "Schema type must be 'object'."
    props = schema.get("properties", {})
    required = schema.get("required", [])
    # Check required keys
    for k in required:
        if k not in args:
            return False, f"Missing required parameter: {k}"
    # Check string types only (keep minimal)
    for k, v in args.items():
        if k in props and props[k].get("type") == "string" and not isinstance(v, str):
            return False, f"Parameter '{k}' must be a string."
    return True, "ok"

# ---- Tools implementation ----

def tool_list_files():
    try:
        return sorted([p.name for p in FILES_DIR.iterdir() if p.is_file()])
    except FileNotFoundError:
        return [f"Error: Folder not found: {FILES_DIR}"]


def _resolve_file_path(file_path: str) -> Path:
    # Allow simple names to refer to files in /content/files;
    # resolve any relative path under /content to avoid traversal.
    p = Path(file_path)
    if not p.is_absolute():
        p = FILES_DIR / p
    p = p.resolve()
    # basic traversal guard
    if Path("/content") not in p.parents and p != Path("/content"):
        raise ValueError("Path is outside /content")
    return p


def tool_read_file(file_path: str, max_chars: int = 3000):
    try:
        target = _resolve_file_path(file_path)
    except Exception as e:
        return f"Error: {e}"
    if not target.exists() or not target.is_file():
        return f"Error: File not found: {file_path}"
    try:
        with open(target, "r", encoding="utf-8", errors="replace") as f:
            content = f.read(max_chars + 1)
        if len(content) > max_chars:
            content = content[:max_chars] + "\n... [truncated]"
        return content
    except Exception as e:
        return f"Error reading {file_path}: {e}"


def tool_write_doc_file(file_name: str, content: str):
    try:
        # prevent sneaky paths; only names, no separators
        if any(sep in file_name for sep in ("/", "\\")):
            return "Error: file_name must be a simple name (no paths)."
        target = DOCS_DIR / file_name
        with open(target, "w", encoding="utf-8") as f:
            f.write(content)
        return f"Wrote {target}"
    except Exception as e:
        return f"Error writing doc file: {e}"

# ---- Parsing model output ----

def extract_markdown_block(text, block_name):
    m = re.search(rf"```{block_name}\s*(.*?)\s*```", text, flags=re.DOTALL | re.IGNORECASE)
    return m.group(1) if m else None


def parse_action(response_text):
    action_str = extract_markdown_block(response_text, "action")
    if not action_str:
        return {"tool_name": "error", "args": {"message": "No action block found."}}
    try:
        data = json.loads(action_str)
    except json.JSONDecodeError:
        return {"tool_name": "error", "args": {"message": "Invalid JSON."}}
    if not (isinstance(data, dict) and "tool_name" in data and "args" in data):
        return {"tool_name": "error", "args": {"message": "Missing tool_name or args."}}
    return data

# ---- Execute with validation against schema ----

def execute_action(action):
    tool = (action or {}).get("tool_name")
    args = (action or {}).get("args", {})

    if tool not in TOOLS:
        return {"error": f"Unknown tool: {tool}"}

    schema = TOOLS[tool]["parameters"]
    ok, msg = validate_args(schema, args)
    if not ok:
        return {"error": f"Invalid args: {msg}"}

    # Dispatch
    if tool == "list_files":
        return {"result": tool_list_files()}
    elif tool == "read_file":
        return {"result": tool_read_file(args["file_path"]) }
    elif tool == "write_doc_file":
        return {"result": tool_write_doc_file(args["file_name"], args["content"]) }
    elif tool == "terminate":
        return {"result": args.get("message", "Terminated")}
    elif tool == "error":
        return {"error": args.get("message", "Unknown error")}

# ---- System prompt including tool schemas ----
SCHEMAS_TEXT = json.dumps([
    {
        "tool_name": name,
        "description": meta["description"],
        "parameters": meta["parameters"],
    } for name, meta in TOOLS.items()
], indent=2)

SYSTEM_TOOLS = f"""
You are an AI file/documentation agent. You can choose ONE tool per turn.
Return ONLY a single JSON object inside a markdown block like:

```action
{{"tool_name": "list_files", "args": {{}}}}
```

Here are the available tools and their JSON Schemas:

{SCHEMAS_TEXT}

Rules:
1) Choose the tool that best progresses the user's request.
2) Format exactly as shown; no prose outside the ```action block.
3) If you finish, return a terminate action with a clear message.
""".strip()

# ---- Agent loop (single or multi-step) ----

def run_agent(task: str, max_iters: int = 5, memory_size=None):
    messages = [
        {"role": "system", "content": SYSTEM_TOOLS},
        {"role": "user", "content": task},
    ]

    for i in range(1, max_iters + 1):
        visible = messages if memory_size is None else messages[-memory_size:]
        resp = client.chat.completions.create(model="gpt-4o-mini", messages=visible)
        reply = resp.choices[0].message.content
        print(f"\n=== Iteration {i}: Raw model output ===\n{reply}")

        action = parse_action(reply)
        print("Parsed action:", action)

        result = execute_action(action)
        print("Execution result:", (str(result)[:400] + ("..." if len(str(result))>400 else "")))

        # feedback memory
        messages.append({"role": "assistant", "content": reply})
        messages.append({"role": "user", "content": json.dumps(result)})

        if action.get("tool_name") == "terminate":
            print("\nAgent terminated:", result.get("result"))
            return {"messages": messages, "final": result}

    print("\nMax iterations reached. Terminating.")
    return {"messages": messages, "final": {"error": "max_iters_reached"}}


In [4]:
out = run_agent("Read '000_Prompting for Agents -GAIL.txt' and then terminate with a one-sentence takeaway.")


=== Iteration 1: Raw model output ===
```action
{"tool_name": "read_file", "args":{"file_path":"/content/files/000_Prompting for Agents -GAIL.txt"}}
```
Parsed action: {'tool_name': 'read_file', 'args': {'file_path': '/content/files/000_Prompting for Agents -GAIL.txt'}}
Execution result: {'result': 'Error: File not found: /content/files/000_Prompting for Agents -GAIL.txt'}

=== Iteration 2: Raw model output ===
```action
{"tool_name": "list_files", "args": {}}
```
Parsed action: {'tool_name': 'list_files', 'args': {}}
Execution result: {'result': ['002_Execute_the_Action.txt', '003_gent Feedback and Memory.txt', '004_AGENT_Tools.txt']}

=== Iteration 3: Raw model output ===
```action
{"tool_name": "read_file", "args":{"file_path":"/content/files/002_Execute_the_Action.txt"}}
```
Parsed action: {'tool_name': 'read_file', 'args': {'file_path': '/content/files/002_Execute_the_Action.txt'}}

=== Iteration 4: Raw model output ===
```action
{"tool_name": "terminate", "args":{"message":"The 

In [3]:
out = run_agent("List files, then read '001_PArse_the Response.txt', then write a short doc file named 'parse_notes.md' summarizing it, and terminate.")


=== Iteration 1: Raw model output ===
```action
{"tool_name": "list_files", "args": {}}
```
Parsed action: {'tool_name': 'list_files', 'args': {}}
Execution result: {'result': ['002_Execute_the_Action.txt', '003_gent Feedback and Memory.txt', '004_AGENT_Tools.txt']}

=== Iteration 2: Raw model output ===
```action
{"tool_name": "error", "args": {"message":"The file '001_PArse_the Response.txt' does not exist."}}
```
Parsed action: {'tool_name': 'error', 'args': {'message': "The file '001_PArse_the Response.txt' does not exist."}}
Execution result: {'error': "The file '001_PArse_the Response.txt' does not exist."}

=== Iteration 3: Raw model output ===
```action
{"tool_name": "terminate", "args": {"message": "The requested file '001_PArse_the Response.txt' was not found, ending the operation."}}
```
Parsed action: {'tool_name': 'terminate', 'args': {'message': "The requested file '001_PArse_the Response.txt' was not found, ending the operation."}}
Execution result: {'result': "The requ

## Tool Schema

Putting every tool into the **same JSON Schema shape** makes it much easier for the model to pick a tool and format `args` correctly because:

* **Consistency = learnability.** The model sees the same fields (`tool_name`, `description`, `parameters.type`, `parameters.properties`, `parameters.required`) for every tool, so it learns *where* to look for what.
* **Reduced ambiguity.** It doesn’t have to infer parameter names or types from prose; they’re always in the same slots.
* **Better few-shot patterning.** The schema acts like repeated examples that bias the model to return `{"tool_name": "...", "args": {...}}` with the right keys.

A couple of practical tips to make this work even better:

* **Keep keys identical across tools.** Always use `"parameters": {"type":"object","properties":{...},"required":[...]}`—don’t rename these fields.
* **Be explicit with types + required.** List *all* required parameters; mark string vs. number, and include constraints (e.g., enums) when helpful.
* **Add short examples.** After each schema, include a 1-line example call in the system prompt inside an \`\`\`action block. This dramatically improves correctness.
* **Name tools clearly.** Verb-first, specific names (`read_file`, `write_doc_file`) beat vague ones (`process`, `run`).
* **Validate on your side.** Even with schemas, keep your tiny validator; if the model hallucinates an arg, you’ll catch it and feed back an `error` action.
* **Mind verbosity.** Schemas add tokens. Keep descriptions concise and only include properties that matter.

Common pitfalls to avoid:

* **Inconsistent shapes** (e.g., some tools use `input` instead of `args`, or omit `required`).
* **Overly long descriptions** that bury the important bits (parameter names/types) below the fold.
* **Case/format drift** (model returns `Args` or `toolName`). Your parse/feedback loop helps correct this.

Net: your uniform schema like the `list_files` example is the right move—it gives the model a stable “map” to find **params, properties, types, and required fields** every time, and it lets your executor validate safely.




When you embed that tool definition inside the **system prompt** like we’re doing here:

```python
SYSTEM_TOOLS = f"""
You are an AI file/documentation agent...
Here are the available tools and their JSON Schemas:

{SCHEMAS_TEXT}
"""
```

`{SCHEMAS_TEXT}` expands into something like:

```json
{
  "tool_name": "list_files",
  "description": "Returns a list of file names in /content/files.",
  "parameters": {
    "type": "object",
    "properties": {},
    "required": []
  }
},
{
  "tool_name": "read_file",
  "description": "Reads the content of a specified file.",
  "parameters": {
    "type": "object",
    "properties": {
      "file_path": {"type": "string"}
    },
    "required": ["file_path"]
  }
}
...
```

That means at **every single step** in the loop, the model has:

* **Tool name** (`"list_files"`)
* **Purpose** (`"Returns a list of file names in /content/files."`)
* **Parameter structure** (object with `properties` and `required` lists)
* **Parameter types** (string, integer, etc.)

So when it decides *“I want to list files”*, it’s not guessing at the syntax — it just copies the `tool_name` and formats `args` according to `parameters`.

---

📌 **Why this matters for you**

* The model’s “search” for a tool is basically scanning that structured block in the system prompt.
* Keeping all tools in **one consistent JSON Schema format** means the model doesn’t have to relearn the layout for each tool.
* If you later add a new tool, as long as it follows the same schema shape, the model can use it without retraining — just from reading the new entry in the prompt.

---
## Cost & Token Usage

Every time we pass this tool schema block to the model:

* It **consumes tokens** (so costs money and shortens available context).
* It **competes with the user’s actual task** for the model’s attention.

**Why this matters:**

* The longer the schema block, the more the model’s early attention budget gets spent reading tool definitions instead of reasoning about the current request.
* If the schema is very verbose, the model may “drift” toward the tool descriptions and lose track of the main objective.
* In multi-step agents, this cost is repeated *every single step*, because we resend the schema each time.




**1. `TOOLS` dictionary**

* Acts as a **menu / reference manual** for the LLM.
* Lists:

  * **Tool name** (what the LLM should output in `tool_name`)
  * **Description** (so it knows what the tool does)
  * **Parameters** & **required fields** (so it formats `args` correctly)
* Is *purely descriptive* — it does **not** execute anything.

---

**2. Actual tool functions**

* Real Python functions that do the work (`tool_list_files`, `tool_read_file`, etc.).
* You, the developer, write these functions and make sure they match the expected parameters from the `TOOLS` dictionary.
* The LLM never runs the functions itself — your agent code does that after parsing and validation.

---

**3. Relationship between them**

* `TOOLS` is like the **API spec**.
* Functions are the **API implementation**.
* The agent loop works like this:

  1. Send the `TOOLS` spec to the LLM.
  2. LLM chooses a tool + fills `args`.
  3. Your parser extracts that choice.
  4. Your executor **matches `tool_name` in `TOOLS`** to the correct Python function and calls it with the provided args.
  5. You feed the result back to the LLM as a new user message (so it can plan the next step).

---

**4. Updating `TOOLS`**

* Yes, you must manually update `TOOLS` **every time** you add a new function.
* You need:

  * A new entry in the `TOOLS` dict with `description` and `parameters`.
  * A matching new `tool_xxx` function.
  * A case for that new `tool_name` in your `execute_action` dispatcher.



In [None]:
def extract_markdown_block(text, block_name):
    m = re.search(rf"```{block_name}\s*(.*?)\s*```", text, flags=re.DOTALL | re.IGNORECASE)
    return m.group(1) if m else None


def parse_action(response_text):
    action_str = extract_markdown_block(response_text, "action")
    if not action_str:
        return {"tool_name": "error", "args": {"message": "No action block found."}}
    try:
        data = json.loads(action_str)
    except json.JSONDecodeError:
        return {"tool_name": "error", "args": {"message": "Invalid JSON."}}
    if not (isinstance(data, dict) and "tool_name" in data and "args" in data):
        return {"tool_name": "error", "args": {"message": "Missing tool_name or args."}}
    return data

These functions aren’t in `TOOLS` because they are **internal agent logic**, not tools that the LLM can (or should) call directly.

---

* **Functions in `TOOLS`** = Externally visible **capabilities** you want the LLM to *choose* and execute, like `list_files` or `read_file`.
* **Functions like `parse_action` & `extract_markdown_block`** = Internal **plumbing** that your agent needs to *run itself*, but which are not part of the LLM’s “menu” of tools.

---

Think of it like this:

* `TOOLS` = The restaurant menu the customer (LLM) gets.
* `parse_action` = The chef in the kitchen reading the order ticket and figuring out what dish to cook.
* You wouldn’t put “read order ticket” on the menu for the customer, because that’s your own operational step.

---

**Why keep them separate:**

1. **Clarity** – The LLM’s prompt only lists things it can invoke.
2. **Safety** – Internal helpers may give the LLM too much control if exposed (e.g., `parse_action` could be misused to manipulate parsing logic).
3. **Focus** – The LLM will do better when it only sees relevant tools for the task at hand.

---

So:

* `TOOLS` describes **functions meant for the model to call**.
* Internal functions like `parse_action` and `extract_markdown_block` are **backend infrastructure** for interpreting and executing those calls, so they don’t go in `TOOLS`.




In [None]:
# ---- Execute with validation against schema ----

def execute_action(action):
    tool = (action or {}).get("tool_name")
    args = (action or {}).get("args", {})

    if tool not in TOOLS:
        return {"error": f"Unknown tool: {tool}"}

    schema = TOOLS[tool]["parameters"]
    ok, msg = validate_args(schema, args)
    if not ok:
        return {"error": f"Invalid args: {msg}"}

    # Dispatch
    if tool == "list_files":
        return {"result": tool_list_files()}
    elif tool == "read_file":
        return {"result": tool_read_file(args["file_path"]) }
    elif tool == "write_doc_file":
        return {"result": tool_write_doc_file(args["file_name"], args["content"]) }
    elif tool == "terminate":
        return {"result": args.get("message", "Terminated")}
    elif tool == "error":
        return {"error": args.get("message", "Unknown error")}



### The LLM never executes your Python functions

It only **chooses** what to do next by outputting a structured *action*.
**Your code** decides *when* to run functions.

### Who does what (clear split of responsibilities)

* **LLM’s job:** Produce a JSON decision like:

  ```json
  {"tool_name": "read_file", "args": {"file_path": "001_PArse_the Response.txt"}}
  ```

  It does this because your **system prompt** lists the available tools and shows the required format.

* **Your agent’s job (the loop):**

  1. Send messages → get the LLM’s reply (text).
  2. `parse_action(...)` → turn that text into a Python dict.
  3. `execute_action(...)` → **your code** validates the args and runs the proper function.
  4. Append the **result** back into `messages` as a `user` turn (feedback).
  5. Repeat, until the action is `terminate` or you hit a max-iterations cap.

### Where your `execute_action` fits

`execute_action` is called **by you** right after parsing — not by the LLM.
Your loop is effectively:

```python
reply = model(messages)                 # LLM chooses tool + args (in JSON)
action = parse_action(reply)            # internal: parse/validate shape
result = execute_action(action)         # internal: you run Python function
messages.append({"role":"assistant", "content": reply})   # what LLM intended
messages.append({"role":"user", "content": json.dumps(result)})  # what happened
# loop until tool_name == "terminate"
```

### Why the LLM “knows” which tool to choose

Because you **show it the tool catalog** (names, descriptions, parameter schemas) in the system prompt. The model *reads* that and outputs the corresponding `tool_name` + `args` — but the **actual execution** only happens when your Python calls `execute_action`.

### What `execute_action` does (in your snippet)

* Looks up the tool in `TOOLS`.
* Validates `args` against the schema (`validate_args`).
* **Dispatches** to the correct Python function:

  * `list_files` → `tool_list_files()`
  * `read_file` → `tool_read_file(file_path)`
  * `write_doc_file` → `tool_write_doc_file(file_name, content)`
  * `terminate` / `error` → return messages (no file I/O)

### Termination

The LLM signals it’s done by returning:

```json
{"tool_name": "terminate", "args": {"message": "…"}}
```

Your loop sees that and **stops** (no more calls to `execute_action` for tools, except to pass back the message).





### What the LLM is looking at when it makes a decision

When you call the model in your loop, the `messages` you pass in typically include:

1. **System prompt**

   * Tool catalog (TOOLS JSON Schema)
   * Formatting instructions (`tool_name`, `args`, inside `action` block)

2. **User messages**

   * The original user query (`"Find a file…"`).
   * Any execution results you’ve fed back (e.g., `"result": ["file1.txt", "file2.txt"]`).

3. **Assistant messages**

   * The model’s previous tool calls.

Because your memory policy keeps appending these, the LLM has a running log of:

* What you (the human) wanted.
* What tools it chose before.
* What happened when those tools ran.

---

### How this affects tool choice

* **First step**: It reacts mostly to your initial request.
* **Subsequent steps**: It reacts to both your original goal *and* the execution results from earlier steps.
  Example:

  1. User: “Read the file with ‘Parse’ in the name and summarize it.”
  2. Model: Calls `list_files`.
  3. Execution: Returns list of files → fed back into messages.
  4. Model: Sees list in chat history, picks `read_file` with the right name.
  5. Execution: Reads file → fed back.
  6. Model: Uses file content to produce `terminate` with a summary.

---

### The key point

The model **isn’t just reacting to new user input each time** — it’s planning its next move using:

* Your original instructions.
* The tools it knows are available.
* The conversation + tool execution results it’s seen so far (the “memory”).

That’s why *memory management* is so important — if you cut out too much history, the model may lose track of what it’s doing; if you keep *too much*, you waste tokens and might distract it with irrelevant past steps.


