Skip to content

Mid-task plan revision via revise_plan tool #439

@lmorchard

Description

@lmorchard

Current state

Pilo plans once at task start via planTask (webAgent.ts:1377-1477). The planner LLM returns:

{
  successCriteria: string;
  plan: string;          // Markdown step-by-step plan
  url?: string;
  actionItems?: string[];
}

These are stored on the WebAgent instance:

private plan: string;
private successCriteria: string;
private actionItems?: string[];
private url: string;

The plan is embedded permanently into messages[1] (the first user message after the system prompt) via buildTaskAndPlanPrompt. The agent reads this once, then proceeds through iterations. There is no mechanism to revise the plan mid-task.

The gap

Common patterns this breaks:

  1. The plan turns out to be wrong — the planner assumed the user can be reached at example.com but actually the site has moved. The original plan keeps showing up in messages[1] forever even though it's misleading.
  2. A new constraint emerges mid-task — the agent discovers a CAPTCHA, a paywall, a region-block. The plan didn't account for this. The agent reactively works around it without updating the canonical plan it's working from.
  3. Long tasks accumulate hidden context — by iteration 20, the model has discovered many facts about the page/task that aren't reflected in the stale plan. The plan in messages[1] is the original plan; everything since is implicit in the conversation history.

Compounding this: the actionItems array (3-6 word UI labels for plan steps) is set once and never updated. UI consumers showing progress see the original plan stage labels even when the agent has substantially deviated.

Proposed scope

A. Add a revise_plan tool (gated)

revise_plan: tool({
  description:
    "Update your task plan when your understanding of the task has materially changed " +
    "(e.g., a constraint emerged, the original approach won't work, or you discovered " +
    "a better path). Provide the revised plan as Markdown. Use sparingly — only when " +
    "the original plan is misleading or incomplete.",
  inputSchema: z.object({
    revisedPlan: z.string().describe("The updated plan as Markdown"),
    reason: z.string().describe("Brief explanation of why the plan needed revision"),
    revisedActionItems: z.array(z.string()).optional()
      .describe("Updated 3-6-word action item labels"),
  }),
  execute: async ({ revisedPlan, reason, revisedActionItems }) => {
    // Update agent-instance state
    // Emit PLAN_REVISED event
    return {
      success: true,
      action: "revise_plan",
      revisedPlan,
      reason,
      revisedActionItems,
    };
  },
}),

Gated on a config flag: WebAgentOptions.enableReplanning?: boolean (default false). Off by default — adds complexity, may not be worth it for all tasks.

B. Plan-update propagation

When revise_plan is called, update the agent's instance state (this.plan, this.actionItems) and append a system-message-style note to messages:

[Plan revised at iteration N]
Reason: {reason}
Updated plan:
{revisedPlan}

This makes the revised plan visible to subsequent turns. Do not modify messages[1] directly — leave the original plan as the historical anchor so the conversation history stays coherent.

C. Emit PLAN_REVISED event

PLAN_REVISED: {
  iterationId: string;
  iteration: number;
  reason: string;
  newPlan: string;
  newActionItems?: string[];
}

UI consumers (CLI progress display, extension popup) can re-render the action items list.

D. Surface in validator

If revise_plan was called, the validator should see both the original task and the revised plan. Update buildTaskValidationPrompt to include revisedPlan if it differs from the original.

E. System prompt update

If enableReplanning is true, append a best-practices bullet:

- If you discover the original plan won't work or needs significant adjustment (a
  constraint emerged, a key assumption was wrong), call revise_plan() with an
  updated plan and a brief reason. Do not call revise_plan() for minor tactical
  changes — only when the original plan is materially misleading.

Implementation notes

  • Replanning is a power tool that's easy to misuse. Without prompt guardrails, models may call revise_plan every few iterations as a form of nervous restructuring. Add to the prompt: "Use sparingly. Tactical changes don't need a revised plan."
  • The validator's success criteria are derived from the planner output. If revise_plan updates them, validator behavior could change mid-task. Decide whether revise_plan can update successCriteria (probably yes, but it's a sharper edge) or only plan and actionItems.
  • Plan revision interacts with the validation force-accept path. If the agent revises the plan to be much simpler and then claims done() on the simpler plan, the validator might rubber-stamp it. The validator prompt should be aware that a plan revision happened.
  • Test scenarios:
    • Agent revises plan once mid-task, completes successfully.
    • Agent abuses revise_plan (calls it 5 times in 10 iterations) — does the warning kick in?
    • Validator with revised plan correctly assesses against the revised success criteria.

Acceptance criteria

  • revise_plan tool exists, gated by enableReplanning config.
  • Calling it updates instance state and appends an annotated message to the conversation.
  • PLAN_REVISED event fires with the right payload.
  • Validator sees the revised plan when applicable.
  • System prompt includes guidance when feature is enabled.
  • Tests cover: single revision, multiple revisions, validator behavior after revision, gated-off behavior.

Effort estimate

2-3 days including tests and prompt iteration. The hard part is preventing over-use, not implementing the basic mechanism.

Related issues

Pairs with the validator-fix issue (validator with conversation context naturally absorbs plan revisions). Distinct from the multi-action-per-turn issue.

Files likely affected

  • packages/core/src/tools/planningTools.ts (new revise_plan tool)
  • packages/core/src/webAgent.ts (plumbing through generateAndProcessAction, instance state)
  • packages/core/src/types/ (WebAgentOptions, event types)
  • packages/core/src/events.ts (PLAN_REVISED)
  • packages/core/src/prompts.ts (system prompt update, validator prompt)
  • packages/core/test/webAgent.test.ts

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions