<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/041_ActionClass_ActionRegistry.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

How to build an `Action` and register it properly in the agent framework. Let’s walk through this step-by-step.

---

## 🧱 Step 1: Define the Tool Function

First, here's the plain Python function you'd normally write:

```python
import os

def list_python_files():
    """Lists all .py files in the src/ directory."""
    src_path = "src"
    try:
        return [f for f in os.listdir(src_path) if f.endswith(".py")]
    except FileNotFoundError:
        return {"error": f"The directory '{src_path}' was not found."}
```

This function is **simple**, **pure**, and works with or without the agent system. ✅

---

## 🧱 Step 2: Wrap it with an Action

Now we wrap this function using the `Action` class:

```python
from agent_framework.actions import Action  # or your local Action class

list_python_files_action = Action(
    name="list_python_files",
    function=list_python_files,
    description="Lists all Python files in the src/ directory.",
    terminal=False
)
```

✅ This makes the function compatible with agent-based decision-making. The name becomes part of the agent's vocabulary.

---

## 🧱 Step 3: Register It in the ActionRegistry

Next, we need to add the action to the agent's registry of available tools:

```python
from agent_framework.actions import ActionRegistry

action_registry = ActionRegistry()
action_registry.register(list_python_files_action)
```

Now the agent knows this tool exists and can choose to use it during its reasoning loop. 🧠

---

## 🔍 Why This Is Better than Manual Routing

Here’s how this compares to the old method:

| Feature           | Old Script                           | With `Action` + Registry             |
| ----------------- | ------------------------------------ | ------------------------------------ |
| Tool access       | Manually routed with `tool_router()` | Auto-accessed via `ActionRegistry`   |
| Metadata handling | Separate dicts or manual comments    | Encapsulated in `Action`             |
| Discoverability   | Hard to loop or list tools           | Tools stored and introspected easily |
| Extendability     | Manual updates in multiple files     | Just call `register()` again         |

---

## ✅ Recap: What You Just Did

* Defined a reusable function
* Wrapped it with an `Action` object (gives it structure and metadata)
* Registered it with the agent's registry

This agent can now intelligently decide when to call `list_python_files()` based on its current `Goal` and `Memory`.

---

### ❓ But what if there are multiple paths?

If your project structure is more complex (e.g., `src/`, `tests/`, `utils/`), and you want the agent to be able to explore **more than one directory**, then:

✅ Yes, you should modify the function to accept a `directory` parameter.

---

### ✅ New Version: Parameterized Listing

```python
def list_python_files(directory: str):
    try:
        return [f for f in os.listdir(directory) if f.endswith(".py")]
    except FileNotFoundError:
        return {"error": f"Directory '{directory}' not found."}
```

Then your `Action` would include the parameter:

```python
Action(
    name="list_python_files",
    function=list_python_files,
    description="Lists all Python files in a given directory.",
    parameters={
        "type": "object",
        "properties": {
            "directory": {"type": "string"}
        },
        "required": ["directory"]
    },
    terminal=False
)
```

---

### 🧠 Trade-Off: Simplicity vs Flexibility

| Simpler (No Params)         | Flexible (With Params)                       |
| --------------------------- | -------------------------------------------- |
| Easy to use                 | Supports more project structures             |
| Less chance of agent error  | Higher risk of wrong paths or invalid inputs |
| Good for constrained agents | Better for agents exploring unknown projects |

---

### 🔄 Recommendation

Start **specific** (e.g., `list_python_files()` in `src/`) when prototyping.
Refactor to be **more general** only if the use case demands it.





We'll add two new Actions to complement `list_python_files`:

1. `read_python_file(file_name: str)` – reads contents of a Python file from `src/`
2. `summarize_python_file(content: str)` – summarizes Python code (mocked for now)

Then we’ll register all three.

---

## 🛠️ Action 2: `read_python_file` Function + Action

### 🔹 Function Definition:

```python
def read_python_file(file_name: str):
    """Reads a Python file from the src/ directory with basic error handling."""
    import os
    file_path = os.path.join("src", file_name)

    if not file_name.endswith(".py"):
        return {"error": "Only .py files can be read."}

    try:
        with open(file_path, "r") as f:
            return {"content": f.read()}
    except FileNotFoundError:
        return {"error": f"File '{file_name}' not found in src/ directory."}
```

### 🔹 Wrap with `Action`:

```python
Action(
    name="read_python_file",
    function=read_python_file,
    description="Reads a Python file from the src/ directory and returns its contents.",
    parameters={
        "type": "object",
        "properties": {
            "file_name": {"type": "string"}
        },
        "required": ["file_name"]
    },
    terminal=False
)

```

---



## 🧠 Action 3: `summarize_python_file` Function + Action

### 🔹 Function (mocked for now):

```python
def summarize_python_file(content: str):
    """Mocks summarizing a Python file."""
    num_lines = len(content.splitlines())
    return f"This file contains {num_lines} lines of code. Summary: Implements core logic."
```

> Later, this could be replaced with an actual call to GPT for smarter summarization.

### 🔹 Wrap with `Action`:

```python
summarize_action = Action(
    name="summarize_python_file",
    function=summarize_python_file,
    description="Summarizes a Python file's contents.",
    parameters={
        "type": "object",
        "properties": {
            "content": {"type": "string"}
        },
        "required": ["content"]
    },
    terminal=False
)
```

---

## 🧰 Step 3: Register All Three Actions

```python
action_registry = ActionRegistry()

action_registry.register(Action(
    name="list_python_files",
    function=list_python_files,
    description="Lists all Python files in the src/ directory.",
    terminal=False
))

action_registry.register(read_action)
action_registry.register(summarize_action)
```

---

✅ You now have a working **Action pipeline**:

1. `list_python_files()` → discover files
2. `read_python_file(file_name)` → extract code
3. `summarize_python_file(content)` → describe the code






## ✅ Why Not Just Use a Dict of Functions?

In early prototypes or toy agents, you might define tools like this:

```python
tools = {
    "list_python_files": list_python_files,
    "read_python_file": read_python_file,
    "summarize_python_file": summarize_python_file,
}
```

This works! But as soon as your agent gets more complex, you hit limitations. Let’s walk through **why switching to the `Action` class is smarter and more scalable.**

---

### 🔐 1. **Metadata Encapsulation**

With a plain dict:

* All you store is the function name and reference.
* You can't easily describe the tool or document its purpose.

With an `Action` class:

```python
Action(
    name="read_python_file",
    function=read_python_file,
    description="Reads a Python file from the src/ directory.",
    parameters={
        "type": "object",
        "properties": {
            "file_name": {"type": "string"}
        },
        "required": ["file_name"]
    },
    terminal=False
)
```

**✅ Benefit:** Now every action knows:

* Its purpose (`description`)
* How to validate inputs (`parameters`)
* Whether it ends the loop (`terminal`)

---

### ✅ 2. **Validation for Arguments**

With a plain dict:

* You have to manually write code to check if `file_name` was passed, or if it’s the right type.

With an `Action`, you can validate inputs against a schema before running the function.

That means:

* You catch errors early
* You know the structure expected
* You can auto-generate UI components or documentation

---

### 🔁 3. **Extensibility**

With a plain dict:

* It’s just function pointers. You can’t add logic to them.

With `Action` objects, you can **extend behavior** in lots of ways:

* Add logging before/after execution
* Track usage count
* Add metadata for UI or dashboards
* Add versioning, permissions, etc.

Here’s a fun example:

```python
class Action:
    def __init__(..., log_usage=False):
        self.log_usage = log_usage
        ...

    def execute(self, **kwargs):
        if self.log_usage:
            print(f"🔍 Executing: {self.name} with {kwargs}")
        return self.function(**kwargs)
```

---

### 🧱 Summary Table

| Feature               | Dict of Functions | `Action` Class                     |
| --------------------- | ----------------- | ---------------------------------- |
| Function reference    | ✅                 | ✅                                  |
| Human-readable desc   | ❌                 | ✅ `description` field              |
| Parameter schema      | ❌                 | ✅ for validation + docs            |
| Terminal control      | ❌                 | ✅ via `terminal=True`              |
| Extensible logic      | ❌                 | ✅ (logging, decorators, analytics) |
| Safer agent reasoning | ❌                 | ✅ easier for the LLM to understand |

---

### ✅ Think of it like this:

| Shoebox of Tools 🧰                         | Structured Toolbox 🧱 (Action Class + Registry) |
| ------------------------------------------- | ----------------------------------------------- |
| Random assortment of functions              | Well-labeled, schema-validated toolset          |
| No standard format or naming                | Consistent naming, description, and structure   |
| Hard to scale or reason about               | Easy to scale, debug, and extend                |
| LLM must guess how and when to use tools    | LLM has clear, structured knowledge of tools    |
| No guardrails for input/output expectations | Explicit input types and clear behavior         |

### 💡 Why this matters for agents and LLMs:

* LLMs are **reasoning machines**, not rigid parsers.
* But to reason well, they need **clarity and structure**.
* Giving your tools well-defined schemas is like giving the LLM a user manual with guaranteed reliability — so it can plan, decide, and act with confidence.

### 🔁 Bonus: It also helps *you* as a developer

* You know how tools will behave.
* You can plug them into new agents with minimal changes.
* You reduce bugs and edge cases.





### 🧪 Old Style: Shoebox (Dict of Functions)

```python
# Tools in a dict (no metadata, schema, or structure)
tools = {
    "list_python_files": lambda: os.listdir("src"),
    "read_python_file": lambda file_name: open(f"src/{file_name}").read()
}

# Manually call tools
tool_name = "read_python_file"
args = {"file_name": "utils.py"}

# No validation, no schema — just trusting it works
result = tools[tool_name](**args)
print(result)
```

💥 Problems:

* No validation on `args`
* No metadata or terminal flag
* LLM gets no schema or descriptions
* You need to hardcode or guess tool names

---

### ✅ New Style: Action Class + ActionRegistry

```python
# Define the action
action = Action(
    name="read_python_file",
    function=read_python_file,
    description="Reads a Python file from the src/ directory.",
    parameters={
        "type": "object",
        "properties": {
            "file_name": {"type": "string"}
        },
        "required": ["file_name"]
    },
    terminal=False
)

# Register the action
registry = ActionRegistry()
registry.register(action)

# Agent calls the action
invocation = {"tool": "read_python_file", "args": {"file_name": "utils.py"}}
action = registry.get_action(invocation["tool"])

# Structured, safe execution
result = action.function(**invocation["args"])
print(result)
```

✨ Benefits:

* Tool name is guaranteed to match
* Arguments are validated by schema
* You can programmatically list all tools and their capabilities
* The LLM can reason about tools with confidence using structured prompts



With the **Action class**, the LLM always knows:

* 🧭 **Where to find the tool name** (it’s in the same attribute every time)
* 📦 **What arguments are expected** (via a consistent `parameters` schema)
* 📝 **What each tool does** (clear, structured `description`)
* 🔚 **Whether using a tool should end the loop** (`terminal` flag)

This consistency makes it *trivially easy* for the LLM to reason about tools.

---

### 🧠 Let's compare what the prompt looks like

---

#### 🧪 Old Style: Unstructured Prompt (Shoebox)

```plaintext
You can use the following tools:

- list_python_files: runs a function that lists files
- read_python_file: you can try calling this to read a file

Ask the user what to do.
```

😬 Issues:

* No formal argument schema
* Descriptions are ambiguous
* The LLM must *guess* argument names and types
* No signal about whether a tool ends the task

---

#### ✅ New Style: Structured Prompt with Action Objects

This is what you pass to `openai.ChatCompletion.create(..., tools=...)`:

```json
[
  {
    "type": "function",
    "function": {
      "name": "read_python_file",
      "description": "Reads a Python file from the src/ directory.",
      "parameters": {
        "type": "object",
        "properties": {
          "file_name": {
            "type": "string",
            "description": "The name of the file to read (must be a .py file in src/)"
          }
        },
        "required": ["file_name"]
      }
    }
  }
]
```

💡 This tells the LLM:

* ✅ The *exact name* of the tool (`read_python_file`)
* ✅ The *required arguments* and their *types*
* ✅ A clear, useful *description*
* ✅ Where to put the arguments (`tool_call.arguments`)

---

### 🧠 So Why Does It Matter?

With this setup, the LLM can:

* Autonomously *choose* tools
* Call them *safely* with correct arguments
* Understand *what the tool does* without trial and error
* Be confident that its call will succeed

This drastically reduces hallucinations and errors — and improves reliability across tool-heavy tasks.





### ✅ Where This JSON Comes From

That structured JSON **doesn’t get manually written**. It’s built automatically from your `Action` objects inside the `ActionRegistry`. When the agent is constructing its prompt (via `AgentLanguage.construct_prompt()`), it pulls from the `ActionRegistry` and creates this kind of OpenAI-compatible tool schema for each registered `Action`.

So yes — this schema is:

✅ Automatically generated
✅ Stored in a central place (the registry)
✅ Structured consistently for **every action**

---

### 🧠 When Does the LLM See It?

When the agent loop begins, it builds the full message to send to the LLM:

```python
openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[...],
    tools=tool_schemas  # <- this is where your Action metadata goes
)
```

* The LLM sees **all available tools** upfront in this list (`tools=...`)
* It **decides** whether to use a tool based on the current prompt and goals
* If it wants to call a tool, it returns a structured `tool_calls` object

---

### 🛠️ If the LLM chooses a different Action...

Say it wants to use `summarize_python_file` instead. That tool’s schema is already present in the `tools` list, because it was registered in `ActionRegistry`. The LLM can choose it *just as easily* because:

* It knows the name
* It knows the required arguments
* It has a description to reason about what the tool does

Every Action follows this contract — and that’s what makes the agent *modular*, extensible, and smart.




## 🛠️ Example Tool Call from the LLM

When the LLM decides to use a tool, it returns a response like this:

```json
{
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "read_python_file",
        "arguments": "{ \"file_name\": \"main.py\" }"
      }
    }
  ]
}
```

This is part of the actual `ChatCompletion` response, typically found in:

```python
response.choices[0].message.tool_calls
```

---

## 🔍 What’s Happening Here?

* **`tool_calls`** is a list of actions the LLM wants to perform (in many cases, just one).
* **`name`** exactly matches the name you registered in the `Action` object (`"read_python_file"`).
* **`arguments`** is a stringified JSON object with arguments the tool needs — pulled from the tool’s schema.
* The LLM knows it needs a `file_name`, because your `Action` declared that as a required parameter.

You never have to manually parse natural language anymore — the model gives you exactly what you need to call the function.

---

## ✅ Why This Is Powerful

Thanks to the structured tool schema you provide:

* ✅ The LLM knows which tools are available
* ✅ It understands *what inputs* are required for each tool
* ✅ It responds with machine-parsable output, not freeform text
* ✅ You can directly route and execute the right Python function via `ActionRegistry`

---

## 🔁 Then You Can Run It Like This:

```python
tool_call = response.choices[0].message.tool_calls[0]
tool_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)

action = action_registry.get_action(tool_name)
result = action.function(**args)
```



Let’s walk through a **full end-to-end cycle** of a tool call using the new modular agent setup. This shows exactly how the pieces connect: from user input, to LLM decision, to function execution.

---

## 🧭 1. User Input

The user gives a task:

```python
user_input = "Can you show me what's in the src/ directory and read main.py?"
```

---

## 🧱 2. Construct the Prompt

The agent builds a prompt that includes:

* 🎯 Agent’s **goals**
* 🧰 **Actions** (via the `ActionRegistry`)
* 🧠 **Memory** (if any)
* 🌍 **Environment** context

```python
full_prompt = agent_language.construct_prompt(
    goals=goals,
    actions=action_registry.get_actions(),
    memory=memory,
    environment=environment
)
```

---

## 🧠 3. LLM Decides What to Do

The model receives the structured prompt and returns this response:

```json
{
  "tool_calls": [
    {
      "function": {
        "name": "list_python_files",
        "arguments": "{}"
      }
    }
  ]
}
```

→ You extract it from the LLM response:

```python
tool_call = response.choices[0].message.tool_calls[0]
```

---

## 🛠️ 4. Look Up the Action

```python
tool_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)

action = action_registry.get_action(tool_name)
```

This returns the registered `Action` object:

```python
Action(
  name="list_python_files",
  function=<function list_python_files>,
  description="Lists Python files in the src/ directory.",
  terminal=False
)
```

---

## ⚙️ 5. Execute the Tool

```python
result = action.function(**args)
```

If it's:

```python
def list_python_files():
    return ["main.py", "utils.py", "config.py"]
```

Then the result will be:

```json
["main.py", "utils.py", "config.py"]
```

---

## 🧠 6. Update Memory

```python
memory.add_memory({"type": "assistant", "content": str(tool_call)})
memory.add_memory({"type": "user", "content": json.dumps(result)})
```

Now the agent has context for the next iteration.

---

## 🔁 7. Loop Back or Exit

If the action had `terminal=True`, the agent exits.

Otherwise, it uses the updated memory to continue the loop — maybe this time deciding to call:

```json
{
  "function": {
    "name": "read_python_file",
    "arguments": "{\"file_name\": \"main.py\"}"
  }
}
```

