<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/081_PlanFirst.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**This notebook introduces a new capability pattern — “Plan First” — that changes *how* an agent approaches a task before doing anything else.**

---

### **Key Ideas**

1. **Strategic Thinking Before Action**

   * Instead of immediately running tools, the agent is *forced* to produce a detailed plan first.
   * This mirrors how humans often work best — think, plan, then act — rather than jumping in blind.

2. **Memory-Driven Planning**

   * The plan is stored in the agent’s memory as a *system message*.
   * This means it becomes part of the agent’s ongoing decision-making context for every subsequent step.

3. **Plan Generation Logic**

   * The `create_plan` tool builds a structured prompt that:

     * Pulls in *available tools* and their descriptions.
     * Retrieves *task-relevant memory*.
     * Guides the LLM through a structured, step-by-step reasoning format.
   * This ensures the plan is specific, actionable, and tied to available capabilities.

4. **Capability Hook — `init()`**

   * The `PlanFirstCapability` runs its planning step in the **initialization phase** of the agent loop (`init` method).
   * This happens before *any* action execution starts.

5. **Keeps the Agent On Track**

   * Once the plan is in memory, the agent can refer back to it to avoid scope creep or tool misuse.
   * In longer tasks, this helps the agent “stay strategic” instead of drifting.

---

### **Why It’s Important**

* Adds **strategic discipline** to agent workflows.
* Improves **task consistency** by making the agent commit to a structure.
* Enhances **transparency** — humans can see and approve the plan before the agent starts executing.
* Makes agents **more explainable**, because you can trace decisions back to the original plan.





## 1) The planning tool, returns a plan string

```python
@register_tool(tags=["planning"])
def create_plan(action_context: ActionContext,
                memory: Memory,
                action_registry: ActionRegistry) -> str:
    """Create a detailed execution plan based on the task and available tools."""

    # Get tool descriptions for the prompt
    tool_descriptions = "\n".join(
        f"- {action.name}: {action.description}"
        for action in action_registry.get_actions()
    )

    # Get relevant memory content (user + system only)
    memory_content = "\n".join(
        f"{m['type']}: {m['content']}"
        for m in memory.items          # <-- fixed: was _memory
        if m.get('type') in ['user', 'system']
    )

    # Construct the prompt
    prompt = f"""Given the task in memory and the available tools, create a detailed plan.
Think through this step by step:

1. First, identify the key components of the task
2. Consider what tools you have available
3. Break down the task into logical steps
4. For each step, specify:
   - What needs to be done
   - What tool(s) will be used
   - What information is needed
   - What the expected outcome is

Write your plan in clear, numbered steps. Each step should be specific and actionable.

Available tools:
{tool_descriptions}

Task context from memory:
{memory_content}

Create a plan that accomplishes this task effectively."""

    return prompt_llm(action_context=action_context, prompt=prompt)
```



## 2) Example of the **plan text** the tool might return

*(This is just sample output, not code that runs.)*

```text
Plan for Sales Data Analysis:

1. Data Validation
   - Tool: validate_data()
   - Check data completeness and format
   - Ensure all required fields are present
   - Expected: Confirmation of valid dataset

2. Initial Analysis
   - Tool: analyze_data()
   - Calculate key metrics (revenue, growth)
   - Generate summary statistics
   - Expected: Basic statistical overview

3. Trend Identification
   - Tool: find_patterns()
   - Look for seasonal patterns
   - Identify sales trends
   - Expected: List of significant trends

4. Visualization
   - Tool: create_visualization()
   - Create relevant charts
   - Highlight key findings
   - Expected: Clear visual representations

5. Report Generation
   - Tool: generate_report()
   - Compile findings
   - Include visualizations
   - Expected: Comprehensive report

I'll begin with step 1: Data Validation...
```

# 3) What the capability does with that plan

*(Your earlier `PlanFirstCapability` calls `create_plan(...)` in `init()` and writes the plan into memory as a system directive.)*
No change needed here unless you want to add a review/revision loop.

# 4) Agent usage: enable the capability and run

```python
agent = Agent(
    goals=[
        Goal(name="analysis", description="Analyze sales data and create a report")
    ],
    capabilities=[
        PlanFirstCapability(track_progress=True)
    ],
    # ... other agent configuration: language, registry, env, llm, etc.
)

result = agent.run("Analyze our Q4 sales data and create a report")
```

# Notes / gotchas to focus on

* **Bug fix:** use `memory.items` (not `_memory.items`).
* **Scope control:** the planning tool only pulls **user/system** memories so the plan isn’t polluted by random assistant chatter.
* **Tool awareness:** `tool_descriptions` is injected so the plan explicitly maps steps → tools (great for traceability and later automation).
* **Separation of concerns:** the *tool* returns a string plan; the *capability* decides how/where to store and enforce it.
* **Extend later:** you can drop in the review/revise loop we discussed to harden plans before execution.


# Plan First: A New Capability

One key to making agents more effective is getting them to think strategically before taking action.  
Instead of jumping straight into executing tools, we want our agent to first develop a comprehensive plan.  
Let’s build a capability that enforces this “plan first” approach.

---

## The Plan First Pattern

Here’s how we’ll make our agent plan before acting:

1. When the agent first starts, we’ll prompt it to create a detailed plan
2. We’ll store this plan in the agent’s memory
3. The agent will refer back to this plan throughout its execution

---

## Implementation

```python
class PlanFirstCapability(Capability):
    def __init__(self, plan_memory_type="system", track_progress=False):
        super().__init__(
            name="Plan First Capability",
            description="The Agent will always create a plan and add it to memory"
        )
        self.plan_memory_type = plan_memory_type
        self.first_call = True
        self.track_progress = track_progress

    def init(self, agent, action_context):
        if self.first_call:
            self.first_call = False
            plan = create_plan(
                action_context=action_context,
                memory=action_context.get_memory(),
                action_registry=action_context.get_action_registry()
            )

            action_context.get_memory().add_memory({
                "type": self.plan_memory_type,
                "content": "You must follow these instructions carefully to complete the task:\n" + plan
            })


At this level of abstraction, building agents is really about:

1. **Defining intent in plain language**

   * The plan prompts, goal descriptions, and capability instructions are just highly structured English.
   * You’re telling the model *what to do* and *how to think about it*, not how to code it line-by-line.

2. **Structuring the workflow**

   * Instead of dumping everything in one mega-prompt, you break the process into **capabilities** (plan, revise, act) and **tools** (data fetch, transform, visualize).
   * Each piece is reusable, swappable, and easy to test.

3. **Controlling context & dependencies**

   * ActionContext and the Environment give the agent just the right “stuff” for the job — not too much, not too little.
   * This is how you keep it safe, modular, and predictable.

It’s a bit like working with a very talented assistant:

* You don’t tell them every keystroke; you give them the plan, the resources, and the rules of engagement.
* The system handles *how* the steps get executed, and you just decide *what* the steps should be.




These two pieces play different roles:

## Big picture

* **`create_plan` (tool)** = *What to produce.*
  A stateless, single-purpose function that **generates** a plan (plain text) using the LLM, given the current memory + available tools.

* **`PlanFirstCapability` (capability)** = *When and how to use it.*
  A lifecycle hook that **enforces planning behavior** in the agent loop: it **calls** `create_plan` at the right time (once at startup), **stores** the plan into memory, and (optionally) **tracks progress** or can later **gate** actions based on the plan.

---

## What to focus on

### 1) Separation of concerns (When/How vs What)

* **Tool (`create_plan`)** focuses on *content generation* only. It’s easy to test, reuse, and swap out.
* **Capability (`PlanFirstCapability`)** focuses on *policy and orchestration*: ensuring a plan exists, where it’s stored, and how it’s used.

### 2) Lifecycle integration

* `PlanFirstCapability.init(...)` runs **once** at the start of the agent run. That’s where it:

  * Calls `create_plan(...)`
  * Writes the plan into memory (type = `"system"` by default via `plan_memory_type`)
  * Can set flags for later enforcement (e.g., `track_progress`)
* Later, the capability could use other hooks (`process_prompt`, `process_action`, `should_terminate`, etc.) to:

  * Remind the agent to follow the plan
  * Prevent off-plan actions or trigger **re-planning** if context changes

### 3) Dependency injection & context access

* **Tool** signature explicitly asks for `action_context`, `memory`, and `action_registry`, making it **pure** and **testable**.
  It builds the plan from:

  * **`action_registry.get_actions()`** → list available tools
  * **`memory.items`** → pull relevant user/system context
* **Capability** uses `action_context.get_memory()` and `action_context.get_action_registry()` to fetch what it needs, without hard-coding any globals.

### 4) Reusability & portability

* You can reuse `create_plan` in other agents (or even call it manually in a script).
* You can attach `PlanFirstCapability` to any agent that should **always** plan first—no changes to the agent core.

### 5) Configuration points

* `plan_memory_type="system"` lets you control where the plan is saved (e.g., `"system"` vs `"assistant"`).
* `track_progress=True/False` gives you a hook for future extensions (progress ticks, plan adherence checks, re-plan triggers).

---

## Mental model (flow)

1. **Agent starts** → `PlanFirstCapability.init(...)` fires.
2. Capability **calls** `create_plan(...)`.
3. Capability **stores** the returned plan in memory (as a system instruction like “You must follow these instructions carefully…”).
4. Agent continues; capability can **enforce** adherence across the loop.

---

## Practical tips

* **Test `create_plan`** in isolation with fake memory/tool lists; it should return a well-structured plan string.
* **Evolve the capability** to:

  * Add a `process_action` check: block actions that aren’t in plan scope.
  * Add a `process_result` hook: mark plan steps as complete.
  * Add a `should_terminate` rule: stop when all planned steps are done.

In short: **`create_plan`** is the **planner**; **`PlanFirstCapability`** is the **project manager** making sure the planner’s output is created, saved, and followed.


You can turn “Plan First” into “Plan → Review → Revise → Execute.” Two easy ways to add this:

# Option A: Extend the existing capability

Add a lightweight review loop inside `PlanFirstCapability.init()` before committing the plan to memory.

```python
class PlanFirstCapability(Capability):
    def __init__(self, track_progress=False, max_plan_revisions=2, reviewer="ai"):
        super().__init__("Plan First Capability", "Draft, review, and finalize a plan before acting")
        self.track_progress = track_progress
        self.max_plan_revisions = max_plan_revisions
        self.reviewer = reviewer  # "ai" or "human"

    def init(self, agent, action_context):
        memory = action_context.get_memory()
        registry = action_context.get_action_registry()

        plan = create_plan(action_context=action_context, memory=memory, action_registry=registry)

        # --- review loop ---
        for i in range(self.max_plan_revisions):
            critique = critique_plan(action_context, plan)  # tool below
            if critique["approved"]:
                break
            plan = revise_plan(action_context, plan, critique["issues"])  # tool below

        # Store final plan
        memory.add_memory({
            "type": "system",
            "content": "You must follow these instructions carefully:\n" + plan
        })
```

Two tiny tools to support it:

```python
@register_tool(tags=["planning"])
def critique_plan(action_context: ActionContext, plan: str) -> dict:
    """Return {'approved': bool, 'issues': [..], 'risk': 'low/med/high'}."""
    schema = {"type":"object","properties":{"approved":{"type":"boolean"},"issues":{"type":"array","items":{"type":"string"}},"risk":{"type":"string"}},"required":["approved","issues"]}
    prompt = f"Critique this execution plan for clarity, feasibility, risks:\n\n{plan}\n\nDecide if it's executable as-is."
    return prompt_llm_for_json(action_context, schema, prompt)

@register_tool(tags=["planning"])
def revise_plan(action_context: ActionContext, plan: str, issues: list[str]) -> str:
    """Return an improved plan addressing issues."""
    prompt = f"Revise the plan below to address these issues {issues}. Keep numbered, actionable steps.\n\n{plan}"
    return action_context.get('llm')(prompt)
```

# Option B: Separate “PlanReviewCapability”

Keep concerns clean: one capability drafts; another reviews/revises (and can be toggled off in low-risk contexts).

```python
class PlanReviewCapability(Capability):
    def __init__(self, max_revisions=2): ...
    def init(self, agent, action_context):
        mem = action_context.get_memory()
        plan_msg = next((m for m in reversed(mem.items) if "You must follow these instructions carefully:" in m["content"]), None)
        if not plan_msg: return
        plan = plan_msg["content"].split("carefully:\n",1)[1]

        for _ in range(self.max_revisions):
            crit = critique_plan(action_context, plan)
            if crit["approved"]: break
            plan = revise_plan(action_context, plan, crit["issues"])

        plan_msg["content"] = "You must follow these instructions carefully:\n" + plan
```

# Nice refinements

* **Checklist-based review:** encode explicit criteria (scope, inputs, outputs, safety, tool fit, acceptance tests).
* **Risk gates:** auto-approve when risk = low; require human approval if risk = high.
* **Versioning:** store `plan_v1`, `plan_v2`, … in memory for traceability.
* **Staged execution:** pair with your staged-execution pattern—plan is approved, then actions are staged and reviewed before execution.
* **Iteration guardrails:** `max_plan_revisions`, time budget, and a fallback (“execute with caution” or escalate to human).

# Why this helps

* Better **quality** (forces structure and tool-awareness).
* Higher **safety** (explicit risks and acceptance checks).
* Stronger **explainability** (plan + critique trail).
* Still **modular** (can turn review on/off per task, or swap the reviewer persona).




# Making agents **capability-driven**

---

### 1. **Avoid modifying core agent logic**

If you want an agent to gain time-awareness, planning, logging, or any other behavior:

* In a *non-capability* setup → you’d open `Agent` and hardcode that new logic.
* In this setup → you write a **new capability** and pass it into the `Agent` constructor.

The Agent loop stays **exactly the same** — it just calls each capability hook at the right time.

---

### 2. **Mix and match behaviors like LEGO pieces**

Capabilities are **modular units**:

* `TimeAwareCapability()`
* `PlanFirstCapability()`
* `LoggingCapability()`

Want an agent that logs and plans? Just include both in the list:

```python
capabilities=[LoggingCapability(), PlanFirstCapability()]
```

Want one that plans first but doesn’t log? Remove the logging piece — nothing else breaks.

---

### 3. **Promote reusability across agents**

Because a capability doesn’t depend on *a specific agent*, you can plug it into **any** agent that follows the same lifecycle:

* Your "data analysis" agent
* Your "customer support" agent
* Your "marketing copywriter" agent
  All can use `PlanFirstCapability()` without rewriting it.

---

### 4. **Reduce bugs and side effects**

When you change a capability, you’re only touching *that one behavior*.
Since it’s not woven deep into the Agent code, you don’t risk breaking the Agent’s orchestration loop.

---

### 5. **Scale with less pain**

As your system grows:

* You’ll have dozens of agents.
* Each needs slightly different “addons.”
* Instead of cloning agents or branching code, you just build a library of capabilities and snap them together.

---

It’s the same reason big web frameworks (like Flask/Django or Express.js) use **middleware** — each layer is independent, but together they form the complete request/response cycle.



## What are Hooks?

In this context, **hooks** are *predefined points in the agent’s lifecycle* where you can “hook” in extra behavior without modifying the core code.

Think of them as **event listeners** or **extension points** — they’re places where the agent says:

> “Hey, I just finished this step — does any capability want to do something now?”

---

### In our agent lifecycle, hooks look like this:

Each method in `Capability` **is a hook**:

| Hook (Method)            | When it runs                  | Example use                       |
| ------------------------ | ----------------------------- | --------------------------------- |
| `init()`                 | Once when the agent starts    | Add initial memory, set time zone |
| `start_agent_loop()`     | Start of each loop            | Check progress, reset variables   |
| `process_prompt()`       | Before sending prompt to LLM  | Add current time, insert plan     |
| `process_response()`     | After LLM responds            | Sanitize or validate response     |
| `process_action()`       | Before executing a tool       | Add metadata, enforce rules       |
| `process_result()`       | After executing a tool        | Transform output, log results     |
| `process_new_memories()` | When new memories are created | Filter or enrich memory           |
| `end_agent_loop()`       | After each loop finishes      | Log summary, cleanup              |
| `should_terminate()`     | Before next loop starts       | Decide to stop                    |
| `terminate()`            | When agent stops              | Final cleanup                     |

---

### Why hooks are powerful

* **Separation of concerns** → The Agent doesn’t care *what* happens in each hook, it just calls them in order.
* **Plug-and-play** → You can add or remove behaviors just by adding/removing capabilities.
* **No core changes** → Hooks are always there, even if no capability uses them.

---

💡 **Analogy:**
Imagine the Agent is a cooking robot.
Hooks are like “pauses” where you can step in and do something:

* After it chops vegetables → you could taste-test.
* Before it plates the dish → you could sprinkle garnish.
* After it serves → you could take a photo.

You’re not rewriting the robot — you’re just hooking in at the right spots.

