<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/080_Understanding_the_Capability_Class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook introduces and explains the **Capability class** — a key architectural pattern that adds **modularity, extensibility, and lifecycle control** to agents.

Here’s what you should focus on in this lecture:

---

## 🧠 What Capabilities Are

**Capabilities** are modular building blocks that can hook into different stages of the agent lifecycle. They allow you to:

* Inject behaviors without modifying the core agent logic
* Add time awareness, logging, monitoring, validation, etc.
* Chain behaviors together, like middleware in web frameworks

Think of them as reusable plugins that let you “teach” the agent new behaviors cleanly.

---

## ⚙️ Lifecycle Hooks in the Capability Class

Each method in the `Capability` class corresponds to a **specific point** in the agent’s execution loop:

| Method                   | When it's Called                       | Use Case Example                         |
| ------------------------ | -------------------------------------- | ---------------------------------------- |
| `init()`                 | Once when the agent starts             | Initialize time info, load config, etc.  |
| `start_agent_loop()`     | At the start of each loop iteration    | Check iteration limits, throttle, etc.   |
| `process_prompt()`       | Right before prompt is sent to the LLM | Add time info, enrich with extra context |
| `process_response()`     | After getting the LLM response         | Validate or modify the raw text          |
| `process_action()`       | After parsing response into action     | Add metadata, transform action           |
| `process_result()`       | After executing the action             | Add duration, wrap result                |
| `process_new_memories()` | Before adding memories to memory store | Filter, enhance, or audit memory         |
| `end_agent_loop()`       | At the end of each loop iteration      | Logging, summary, checkpointing          |
| `should_terminate()`     | Check if the agent should stop         | Exit on condition                        |
| `terminate()`            | Once when the agent is shutting down   | Final cleanup or reporting               |

Each capability gets access to the `agent` and the `action_context`, so it has full access to everything — but keeps the logic modular and scoped.

---

## 🔁 Chaining Capabilities with `reduce()`

The agent loop runs all capabilities **in order**, like this:

```python
prompt = reduce(lambda p, c: c.process_prompt(self, action_context, p),
               self.capabilities, base_prompt)
```

This means you can stack multiple behaviors without them interfering with each other — just like middleware in Flask or Express.js.

---

## ✅ Practical Example: `TimeAwareCapability`

This example capability does two things:

1. Adds the current time (in human and machine-readable formats) to memory when the agent starts
2. Updates each prompt to include the current time — which helps with decisions like "schedule a meeting"

This demonstrates:

* Lifecycle hook usage (`init`, `process_prompt`)
* How time awareness influences decision-making
* Clean extension of agent behavior without touching core logic

---

## 🔧 Enhanced Time Awareness

The `EnhancedTimeAwareCapability` adds even more functionality:

* Adds `execution_time` to actions via `process_action`
* Adds `action_duration` to results via `process_result`

This shows how **capabilities can be layered and extended**.

---

## 🚀 Why This Matters

Capabilities give you:

* 🧱 **Modularity** – Keep logic separate and reusable
* 🔄 **Lifecycle control** – Tap into the agent at specific moments
* 🧩 **Composability** – Mix and match behaviors as needed
* 🧼 **Clean design** – No bloat inside the core agent loop

For you as a data scientist or agent builder, this means **you can inject logic like logging, tracking, validation, and domain-specific intelligence without polluting your main codebase.**



# Understanding the Capability Class

A Capability can interact with the agent loop at multiple points. Think of these interaction points like hooks or lifecycle events in a web framework - they give us specific moments where we can modify or enhance the agent’s behavior. Let’s examine the Capability class in detail:


In [None]:
class Capability:
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description

    def init(self, agent, action_context: ActionContext) -> dict:
        """Called once when the agent starts running."""
        pass

    def start_agent_loop(self, agent, action_context: ActionContext) -> bool:
        """Called at the start of each iteration through the agent loop."""
        return True

    def process_prompt(self, agent, action_context: ActionContext,
                      prompt: Prompt) -> Prompt:
        """Called right before the prompt is sent to the LLM."""
        return prompt

    def process_response(self, agent, action_context: ActionContext,
                        response: str) -> str:
        """Called after getting a response from the LLM."""
        return response

    def process_action(self, agent, action_context: ActionContext,
                      action: dict) -> dict:
        """Called after parsing the response into an action."""
        return action

    def process_result(self, agent, action_context: ActionContext,
                      response: str, action_def: Action,
                      action: dict, result: any) -> any:
        """Called after executing the action."""
        return result

    def process_new_memories(self, agent, action_context: ActionContext,
                           memory: Memory, response, result,
                           memories: List[dict]) -> List[dict]:
        """Called when new memories are being added."""
        return memories

    def end_agent_loop(self, agent, action_context: ActionContext):
        """Called at the end of each iteration through the agent loop."""
        pass

    def should_terminate(self, agent, action_context: ActionContext,
                        response: str) -> bool:
        """Called to check if the agent should stop running."""
        return False

    def terminate(self, agent, action_context: ActionContext) -> dict:
        """Called when the agent is shutting down."""
        pass


## 🌀 Mapping Capability Methods to the Agent’s Execution Cycle

Let’s walk through how these methods map to the agent’s execution cycle:

### 🔧 Initialization Phase

The `init()` method runs **once when the agent starts**.
This is where you set up any initial state or add starting information to the agent’s memory.
📌 *Example:* In `TimeAwareCapability`, this is where we first tell the agent what time it is.

---

### 🔁 Loop Start Phase

Before each iteration of the agent loop, `start_agent_loop()` runs.
You can use this to check conditions or prepare for the next iteration.
📌 *Example:* Check if enough time has passed since the last iteration.

---

### 💬 Prompt Construction Phase

Just before sending a prompt to the LLM, `process_prompt()` lets you **modify the prompt**.
📌 *Example:* Add current time info to every prompt.

---

### 🧠 Response Processing Phase

After getting the LLM’s response but before parsing it, `process_response()` lets you **modify or validate the raw response**.

---

### 🛠️ Action Processing Phase

Once the response is parsed into an action, `process_action()` lets you **modify the action before it’s executed**.
📌 *Example:* Add metadata or validate the action.

---

### ✅ Result Processing Phase

After the action executes, `process_result()` lets you **modify the result**.
📌 *Example:* Add additional context or format the result.

---

### 🧾 Memory Update Phase

When new memories are created, `process_new_memories()` lets you **filter or enrich** what gets stored.
📌 *Example:* Add extra metadata, redact sensitive info.

---

### 🔚 Loop End Phase

At the end of each iteration, `end_agent_loop()` runs.
📌 *Example:* Cleanup, audit logging, metrics tracking.

---

### 🛑 Termination Phase

* `should_terminate()` signals that the agent should stop.
* `terminate()` handles **final cleanup** when the agent exits.

---

### 🔄 Access and Execution

Each of these methods receives both the `agent` instance and the `ActionContext`,
giving you access to everything you need to modify behavior.

The agent processes these methods in sequence using **Python’s `reduce()` function**, allowing multiple capabilities to be chained together cleanly.






## 🔁 Example from the Agent Loop

```python
prompt = reduce(lambda p, c: c.process_prompt(self, action_context, p),
                self.capabilities, base_prompt)
```

Each of these methods receives both the **agent instance** and the **ActionContext**, giving you access to everything you need to modify the agent’s behavior.

---

## 🧱 Agent Constructor with Capabilities

```python
class Agent:
    def __init__(self,
                 goals: List[Goal],
                 agent_language: AgentLanguage,
                 action_registry: ActionRegistry,
                 generate_response: Callable[[Prompt], str],
                 environment: Environment,
                 capabilities: List[Capability] = [],
                 max_iterations: int = 10,
                 max_duration_seconds: int = 180):
        """
        Initialize an agent with its core GAME components and capabilities.

        Goals, Actions, Memory, and Environment (GAME) form the core of the agent,
        while capabilities provide ways to extend and modify the agent's behavior.

        Args:
            goals: What the agent aims to achieve
            agent_language: How the agent formats and parses LLM interactions
            action_registry: Available tools the agent can use
            generate_response: Function to call the LLM
            environment: Manages tool execution and results
            capabilities: List of capabilities that extend agent behavior
            max_iterations: Maximum number of action loops
            max_duration_seconds: Maximum runtime in seconds
        """
        self.goals = goals
        self.generate_response = generate_response
        self.agent_language = agent_language
        self.actions = action_registry
        self.environment = environment
        self.capabilities = capabilities or []
        self.max_iterations = max_iterations
        self.max_duration_seconds = max_duration_seconds
```




## 🧩 Composing Agents with Capabilities

This design lets us compose an agent with exactly the capabilities it needs. For example, we might create an agent that’s both **time-aware** and able to **log its actions**:

```python
agent = Agent(
    goals=[
        Goal(name="scheduling",
             description="Schedule meetings considering current time and availability")
    ],
    agent_language=JSONAgentLanguage(),
    action_registry=registry,
    generate_response=llm.generate,
    environment=PythonEnvironment(),
    capabilities=[
        TimeAwareCapability(),
        LoggingCapability(log_level="INFO"),
        MetricsCapability(metrics_server="prometheus:9090")
    ]
)
```

Each capability in the list gets a chance to participate in **every phase** of the agent’s execution.
The agent processes these methods in sequence using Python’s `reduce()` function.

* 🕒 **`TimeAwareCapability`** might add time information to a prompt
* 📋 **`LoggingCapability`** could log that time-enhanced prompt before it goes to the LLM

---

This architecture allows us to build **complex behaviors** by composing **simple, focused capabilities**, each responsible for one aspect of the agent’s behavior.

It’s similar to how **middleware** works in web frameworks, where each piece can modify the request/response cycle **without the core application needing to know** about these modifications.






## 🕒 Implementing Time Awareness

The `TimeAwareCapability` needs to inform the agent about the current time and ensure this information persists throughout its decision-making process:

```python
from datetime import datetime
from zoneinfo import ZoneInfo

class TimeAwareCapability(Capability):
    def __init__(self):
        super().__init__(
            name="Time Awareness",
            description="Allows the agent to be aware of time"
        )
        
    def init(self, agent, action_context: ActionContext) -> dict:
        """Set up time awareness at the start of agent execution."""
        # Get timezone from context or use default
        time_zone_name = action_context.get("time_zone", "America/Chicago")
        timezone = ZoneInfo(time_zone_name)
        
        # Get current time in specified timezone
        current_time = datetime.now(timezone)
        
        # Format time in both machine and human-readable formats
        iso_time = current_time.strftime("%Y-%m-%dT%H:%M:%S%z")
        human_time = current_time.strftime("%H:%M %A, %B %d, %Y")
        
        # Store time information in memory
        memory = action_context.get_memory()
        memory.add_memory({
            "type": "system",
            "content": f"""Right now, it is {human_time} (ISO: {iso_time}).
            You are in the {time_zone_name} timezone.
            Please consider the day/time, if relevant, when responding."""
        })
        
    def process_prompt(self, agent, action_context: ActionContext,
                      prompt: Prompt) -> Prompt:
        """Update time information in each prompt."""
        time_zone_name = action_context.get("time_zone", "America/Chicago")
        current_time = datetime.now(ZoneInfo(time_zone_name))
        
        # Add current time to system message
        system_msg = (f"Current time: "
                     f"{current_time.strftime('%H:%M %A, %B %d, %Y')} "
                     f"({time_zone_name})\n\n")
        
        # Add to existing system message or create new one
        messages = prompt.messages
        if messages and messages[0]["role"] == "system":
            messages[0]["content"] = system_msg + messages[0]["content"]
        else:
            messages.insert(0, {
                "role": "system",
                "content": system_msg
            })
            
        return Prompt(messages=messages)
```

---

## 🛠️ Using the Capability

Now we can use this capability when creating our agent:

```python
agent = Agent(
    goals=[Goal(name="task", description="Complete the assigned task")],
    agent_language=JSONAgentLanguage(),
    action_registry=registry,
    generate_response=llm.generate,
    environment=PythonEnvironment(),
    capabilities=[
        TimeAwareCapability()
    ]
)
```

This modular design makes it easy to inject time-awareness without modifying the agent’s core logic.



## 🧠 Time Awareness in Action

Our agent now consistently knows the current time, enabling it to make time-aware decisions. For example, if we ask it to schedule a meeting, it might respond:

```python
# Example conversation
agent.run("Schedule a team meeting for today")

# Agent response might include:
"Since it's already 5:30 PM on Friday, I recommend scheduling the meeting
for Monday morning instead. Would you like me to look for available times
on Monday?"
```

---

## 🕹️ How Time Awareness Changes Agent Behavior

The `TimeAwareCapability` modifies agent behavior in several ways:

* **Through `init()`**: When the agent starts, it establishes baseline time awareness by adding time information to memory.
* **Through `process_prompt()`**: Before each prompt, it updates the current time, ensuring the agent always has fresh time data for decision-making.

✅ These modifications ripple through the agent’s decision-making process while keeping the **core agent loop clean**.
✅ We didn’t need to modify the `Agent` class at all — the **capability pattern** handled everything.

---

## 🔁 Extending the Time Awareness Capability

We could extend this capability further to handle more complex time-related features:

```python
class EnhancedTimeAwareCapability(TimeAwareCapability):
    def process_action(self, agent, action_context: ActionContext,
                      action: dict) -> dict:
        """Add timing information to action results."""
        # Add execution time to action metadata
        action["execution_time"] = datetime.now(
            ZoneInfo(action_context.get("time_zone", "America/Chicago"))
        ).isoformat()
        return action
        
    def process_result(self, agent, action_context: ActionContext,
                      response: str, action_def: Action,
                      action: dict, result: any) -> any:
        """Add duration information to results."""
        if isinstance(result, dict):
            result["action_duration"] = (
                datetime.now(ZoneInfo(action_context.get("time_zone"))) -
                datetime.fromisoformat(action["execution_time"])
            ).total_seconds()
        return result
```

---

This **enhanced version** tracks:

* 🕓 **When** actions are executed
* ⏱️ **How long** they take




### ✅ Why This Capability System Matters

The **Capability pattern** gives your agents **plug-and-play superpowers** without rewriting core logic. Think of each capability as a reusable “middleware” that can:

* **Inject contextual knowledge** (like time, location, or user state)
* **Log or trace decisions**
* **Enforce constraints** (like rate limits or permission checks)
* **Modify actions, prompts, results, or memory**

You're building a **pipeline of behaviors** that shapes how the agent thinks and acts — and you can **stack, combine, or swap** them at will.

---

### 🧩 Bonus Design Patterns in Play

* **Decorator pattern**: Each capability “wraps” part of the agent’s loop without the agent being aware of it.
* **Middleware-style processing**: Like in web frameworks, each capability gets a chance to inspect and modify data in the pipeline.
* **Separation of concerns**: Capabilities isolate behavior (e.g., time-awareness, metrics, logging) so each part is focused and testable.

---

### 🚨 Pro Tip

As agents grow more powerful and handle more real-world complexity, **capabilities give you a modular way to scale behavior** without complexity ballooning.

They let you say:
*"This agent should log every action."*
*"This one should obey working hours."*
*"This one should adapt based on a user’s past preferences."*

All without duplicating logic or cluttering the agent core.




Nw that you’ve learned about **capabilities**, it’s a good idea to update your ✅ **Agent System Components** list to reflect the full architecture you’re working with.

Here’s the updated version:

---

### ✅ **Agent System Components (Updated)**

1. **Tool** – A reusable function that performs a specific action.
2. **Tool Registry** – Stores all tools, allowing the agent to find and call them.
3. **ActionContext** – Holds shared dependencies and request/session state (e.g. memory, auth, LLM).
4. **Environment** – Executes tools by injecting dependencies from the ActionContext.
5. **Agent** – The orchestrator: builds prompts, calls the LLM, parses actions, calls tools.
6. **Capabilities** – Modular behaviors that plug into each step of the agent loop (e.g. time-awareness, logging).
7. **Dependencies** – The actual injected values (e.g. database connections, tokens, config).
8. **Wiring** – Code that connects all components together (instantiating tools, dependencies, environment, agent, etc.).

---

### 🔧 Optional But Useful (Advanced)

* **AgentLanguage** – The format used to structure prompts and parse model output.
* **Goal** – A task or objective that guides the agent’s behavior.
* **Memory** – Stores the agent’s conversation history and long-term state.
* **Action** – A structured instruction for the agent to perform using tools.



### ✅ **Agent-Building Checklist (Updated)**

| Step | Task                                                                                          |
| ---- | --------------------------------------------------------------------------------------------- |
| 1    | **Define the Agent's Goal** – What problem is the agent solving?                              |
| 2    | **Identify Required Tools** – What actions must the agent be able to perform?                 |
| 3    | **List Tool Dependencies** – What does each tool need (auth token, config, memory, etc.)?     |
| 4    | **Implement Tools** – Use `@register_tool` with optional `_dependencies` and `ActionContext`. |
| 5    | **Define Capabilities (Optional)** – Add reusable behaviors like logging or time-awareness.   |
| 6    | **Build Tool Registry** – Map tool names to their function references.                        |
| 7    | **Assemble ActionContext** – Inject dependencies (memory, LLM, services, tokens, etc.).       |
| 8    | **Wire It All Together** – Create the `Environment`, `Agent`, and any `Capabilities`.         |
| 9    | **Test and Iterate** – Run scenarios, adjust tools, context, and capabilities as needed.      |

