# Agent design principles & safety

* M -> model efficiency, we want to use efficient models for the right task (task vs cost vs speed)
* A -> actions, or specificity of the actions, design actions that are not possible to use wrong
* T -> token efficiency, right dense information presented improve reasoning
* E -> environment is safe (cannot break much things), actions are safe and reversible

## The MATE Design Principles

In chess, a checkmate represents the perfect execution of strategy—every piece positioned correctly, every move calculated for maximum effect, leading to an inevitable victory. When building AI agents, we can apply similar strategic thinking through the **MATE** principles: **Model efficiency**, **Action specificity**, **Token efficiency**, and **Environmental safety**.

### Model Efficiency: Choose Your Pieces Wisely

In chess, each piece has unique capabilities. You wouldn’t use a queen when a pawn would suffice. Similarly, model efficiency means selecting the right LLM for each task.

```python
@register_tool(description="Extract basic contact information from text")
def extract_contact_info(action_context: ActionContext, text: str) -> dict:
    """Extract name, email, and phone from text using a smaller, faster model."""
    # Use a smaller model for simple extraction
    response = action_context.get("fast_llm")(Prompt(messages=[
        {"role": "system", "content": "Extract contact information in JSON format."},
        {"role": "user", "content": text}
    ]))
    return json.loads(response)
```

```python
@register_tool(description="Analyze complex technical documentation")
def analyze_technical_doc(action_context: ActionContext, document: str) -> dict:
    """Perform deep analysis of technical documentation."""
    # Use a more capable model for complex analysis
    response = action_context.get("powerful_llm")(Prompt(messages=[
        {"role": "system", "content": "Analyze this technical documentation thoroughly to identify contradictions in the process that could lead to unexpected problems."},
        {"role": "user", "content": document}
    ]))
    return json.loads(response)
```

### Action Specificity: Control the Board

In chess, precise positioning limits your opponent’s options. Similarly, specific actions limit the potential for misuse or confusion.

**Too generic—opens up possibilities for misuse:**

```python
@register_tool(description="Modify calendar events")
def update_calendar(action_context: ActionContext, 
                    event_id: str,
                    updates: dict) -> dict:
    """Update any aspect of a calendar event."""
    return calendar.update_event(event_id, updates)
```

**More specific—clear purpose and limited scope:**

```python
@register_tool(description="Reschedule a meeting you own to a new time")
def reschedule_my_meeting(action_context: ActionContext,
                          event_id: str,
                          new_start_time: str,
                          new_duration_minutes: int) -> dict:
    """
    Reschedule a meeting you own to a new time.
    Only works for meetings where you are the organizer.
    """
    # Verify ownership
    event = calendar.get_event(event_id)
    if event.organizer != action_context.get("user_email"):
        raise ValueError("Can only reschedule meetings you organize")
        
    # Validate new time is in the future
    new_start = datetime.fromisoformat(new_start_time)
    if new_start < datetime.now():
        raise ValueError("Cannot schedule meetings in the past")
        
    return calendar.update_event_time(
        event_id,
        new_start_time=new_start_time,
        duration_minutes=new_duration_minutes
    )
```

### Token Efficiency: Maximize Every Move

In chess, every move should advance your position. If we prompt for more than we need or provide unnecessary context, we’re wasting tokens. Sometimes, we might need to add tokens to get the appropriate level of reasoning, but we should test and optimize for the most efficient use of tokens.

**Token-inefficient—includes unnecessary context:**

```python
@register_tool(description="Analyze sales data to identify trends and patterns...")
def analyze_sales(action_context: ActionContext, data: str) -> str:
    """
    Analyze sales data to identify trends and patterns.
    It looks at various aspects including:
    - Monthly trends
    - Seasonal patterns
    - Year-over-year growth
    - Product category performance
    - Regional variations
    - Customer segments
    """
    # This prompt asks for unneeded analysis, wasting input and output tokens
    return prompt_llm(action_context, f"""
        Analyze this sales data thoroughly. Consider monthly trends,
        seasonal patterns, year-over-year growth, product categories,
        regional variations, and customer segments. Provide detailed
        insights about all these aspects.
        
        Data: {data}
        
        Please give a comprehensive analysis...
    """)
```

**Token-efficient—focused and precise:**

```python
@register_tool(description="Analyze sales data for key trends")
def analyze_sales(action_context: ActionContext, data: str) -> str:
    """Calculate key sales metrics and identify significant trends."""
    # Focused prompt that uses tokens efficiently
    return prompt_llm(action_context, f"""
        Sales Data: {data}
        1. Calculate YoY growth
        2. Identify top 3 trends
        3. Flag significant anomalies
    """)
```


## Environmental Safety for AI Agents

When agents take actions in the real world, we need robust safety mechanisms. Below are patterns for building safe action systems, illustrated with a calendar-coordination example.

---

### Pattern 1: Reversible Actions

When agents interact with the real world, mistakes can happen. Even if an agent makes the right decision based on available information, external circumstances can change and make the action undesirable after it’s taken. **Reversible actions** provide a safety net, allowing us to “undo” operations when things go wrong.

The key to making actions reversible is capturing enough information during execution to fully reverse the operation later. This isn’t always straightforward—e.g., deleting a calendar event is easy to reverse if you saved all event details, but reversing an email that’s already been read requires careful thought about what “reversal” means in your system. Sometimes, the best we can do is send a follow-up correction.

**Structure for reversible actions:**

```python
from datetime import datetime
import json

class ReversibleAction:
    def __init__(self, execute_func, reverse_func):
        self.execute = execute_func
        self.reverse = reverse_func
        self.execution_record = None

    def run(self, **args):
        """Execute action and record how to reverse it."""
        result = self.execute(**args)
        self.execution_record = {
            "args": args,
            "result": result,
            "timestamp": datetime.now().isoformat()
        }
        return result

    def undo(self):
        """Reverse the action using recorded information."""
        if not self.execution_record:
            raise ValueError("No action to reverse")
        return self.reverse(**self.execution_record)

# Example using reversible actions
create_event = ReversibleAction(
    execute_func=calendar.create_event,
    reverse_func=lambda **record: calendar.delete_event(record["result"]["event_id"])
)

send_invite = ReversibleAction(
    execute_func=calendar.send_invite,
    reverse_func=lambda **record: calendar.cancel_invite(record["result"]["invite_id"])
)
```

If we also record a log of the agent’s actions, we can undo them if needed. The agent itself can be given a **“revert” tool** to undo its most recent action or revert all actions under certain circumstances.

---

### Pattern 2: Transaction Management

Real tasks often require multiple coordinated actions (e.g., check availability → create event → send notifications). **Transaction management** lets us treat these related actions as a single unit that either completely succeeds or completely fails.

Combined with reversible actions, if any step in a multi-step process fails, we can automatically roll back all previous steps. This prevents inconsistent states (e.g., an event exists but notifications weren’t sent, or notifications were sent but the event wasn’t created).

**Simple transaction system for agent loops:**

```python
import uuid

class ActionTransaction:
    def __init__(self):
        self.actions = []
        self.executed = []
        self.committed = False
        self.transaction_id = str(uuid.uuid4())

    def add(self, action: ReversibleAction, **args):
        """Queue an action for execution."""
        if self.committed:
            raise ValueError("Transaction already committed")
        self.actions.append((action, args))

    async def execute(self):
        """Execute all actions in the transaction."""
        try:
            for action, args in self.actions:
                action.run(**args)
                self.executed.append(action)
        except Exception as e:
            # If any action fails, reverse everything done so far
            await self.rollback()
            raise e

    async def rollback(self):
        """Reverse all executed actions in reverse order."""
        for action in reversed(self.executed):
            action.undo()
        self.executed = []

    def commit(self):
        """Mark transaction as committed."""
        self.committed = True
```

---

### Pattern 3: Staged Execution with Review

Sometimes we want more control than just automatic rollback on failure. **Staged execution** allows us to plan a series of actions, review them for safety/correctness, and only then execute them. This is valuable when some actions are less reversible than others or when we want a human (or a more capable AI) to review plans before execution.

The review phase can catch problems that might not be apparent when looking at individual actions (e.g., a valid set of steps that together would overwhelm a team’s schedule or violate policy). It also enables optimization (e.g., finding a more efficient sequence).

**Staging and review environment:**

```python
from typing import List, Dict

class StagedActionEnvironment(Environment):
    def __init__(self):
        self.staged_transactions: Dict[str, ActionTransaction] = {}
        self.llm = None  # High-capability LLM for review

    def stage_actions(self, task_id: str) -> ActionTransaction:
        """Create a new transaction for staging actions."""
        transaction = ActionTransaction()
        self.staged_transactions[task_id] = transaction
        return transaction

    def review_transaction(self, task_id: str) -> bool:
        """Have LLM review staged actions for safety."""
        transaction = self.staged_transactions.get(task_id)
        if not transaction:
            raise ValueError(f"No transaction found for task {task_id}")

        # Describe staged actions
        staged_actions = [
            f"Action: {action.__class__.__name__}\nArgs: {args}"
            for action, args in transaction.actions
        ]

        # Pre-human review by a more capable AI to filter issues early
        review_prompt = f"""Review these staged actions for safety:

        Task ID: {task_id}

        Staged Actions:
        {staged_actions}

        Consider:
        1. Are all actions necessary for the task?
        2. Could any action have unintended consequences?
        3. Are the actions in a safe order?
        4. Is there a safer way to achieve the same goal?

        Should these actions be approved?
        """

        response = self.llm.generate(review_prompt)
        return "approved" in response.lower()

# Example usage
async def schedule_team_meeting(env: StagedActionEnvironment, 
                                attendees: List[str],
                                duration: int):
    """Schedule a team meeting with safety checks."""
    task_id = str(uuid.uuid4())
    transaction = env.stage_actions(task_id)

    # Check availability (execute immediately)
    available_slots = calendar.check_availability(attendees, duration)
    if not available_slots:
        return {"error": "No available time slots"}

    best_slot = available_slots[0]

    # Stage the event creation
    transaction.add(create_event,
                    title="Team Meeting",
                    time=best_slot,
                    duration=duration)

    # Draft email (execute immediately)
    email_draft = email.draft_message(
        to=attendees,
        subject="Team Meeting",
        body=f"Team meeting scheduled for {best_slot}"
    )

    # Stage the email send
    transaction.add(send_email, draft_id=email_draft.id)

    # Review staged actions, then execute if approved
    if env.review_transaction(task_id):
        await transaction.execute()
        transaction.commit()
        return {"status": "scheduled"}
    else:
        return {"status": "rejected"}
```

---

### Pattern 4: Single Safe Tool vs. Multiple Risky Tools

Having many small, focused tools follows the Unix philosophy, but for agent safety this can backfire: small tools often lack context to make safe decisions and put orchestration burden on the agent.

A **single comprehensive tool** can encapsulate mechanics **and** business rules/safety constraints. Instead of hoping the agent checks availability, validates attendees, and handles timezones correctly, we build these checks into one tool. This makes it much harder for the agent to create problematic situations, even if used unexpectedly.

**Approach 1: Multiple loosely constrained tools**

```python
from typing import List

@register_tool(description="Create a calendar event")
def create_calendar_event(action_context: ActionContext,
                          title: str,
                          time: str,
                          attendees: List[str]) -> dict:
    """Create a calendar event."""
    return calendar.create_event(title=title, time=time, attendees=attendees)

@register_tool(description="Send email to attendees")
def send_email(action_context: ActionContext,
               to: List[str],
               subject: str,
               body: str) -> dict:
    """Send an email."""
    return email.send(to=to, subject=subject, body=body)

@register_tool(description="Update calendar event")
def update_event(action_context: ActionContext,
                 event_id: str,
                 updates: dict) -> dict:
    """Update any aspect of a calendar event."""
    return calendar.update_event(event_id, updates)
```

**Approach 2: Single comprehensive safe tool**

```python
@register_tool(description="Schedule a team meeting safely")
def schedule_team_meeting(action_context: ActionContext,
                          title: str,
                          description: str,
                          attendees: List[str],
                          duration_minutes: int,
                          timeframe: str = "next_week") -> dict:
    """
    Safely schedule a team meeting with all necessary coordination.

    This tool:
    1. Verifies all attendees are valid
    2. Checks calendar availability
    3. Creates the event at the best available time
    4. Sends appropriate notifications
    5. Handles all error cases
    """
    # Input validation
    if not 15 <= duration_minutes <= 120:
        raise ValueError("Meeting duration must be between 15 and 120 minutes")

    if len(attendees) > 10:
        raise ValueError("Cannot schedule meetings with more than 10 attendees")

    # Verify attendees
    valid_attendees = validate_attendees(attendees)
    if len(valid_attendees) != len(attendees):
        raise ValueError("Some attendees are invalid")

    # Find available times
    available_slots = find_available_times(
        attendees=valid_attendees,
        duration=duration_minutes,
        timeframe=timeframe
    )

    if not available_slots:
        return {"status": "no_availability", "message": "No suitable time slots found"}

    # Create event at best time
    event = calendar.create_event(
        title=title,
        description=description,
        time=available_slots[0],
        duration=duration_minutes,
        attendees=valid_attendees
    )

    # Send notifications
    notifications.send_meeting_scheduled(
        event_id=event.id,
        attendees=valid_attendees
    )

    return {
        "status": "scheduled",
        "event_id": event.id,
        "scheduled_time": available_slots[0]
    }
```

**Why the single comprehensive tool is safer:**

* It enforces proper sequencing of operations
* It includes all necessary validation
* It handles error cases consistently
* It prevents misuse of individual operations

With the multiple-tool approach, an agent might:

* Create an event without checking availability
* Send emails without creating the event
* Update events without notifying attendees
* Exceed reasonable size limits

The single tool encapsulates the business logic of what makes a **safe** meeting-scheduling operation, making it much harder for the agent to create problematic situations.
