Current state
Pilo enforces "exactly one tool per turn" in three places:
- The AI SDK call uses
toolChoice: "required" (webAgent.ts:905) and the action loop processes only aiResponse.toolResults[0] (webAgent.ts:1019).
- The system prompt repeats "EXACTLY ONE tool" (
prompts.ts:213-217, also in the per-step snapshot prompt and error feedback prompt).
- When zero tool calls come back,
ToolExecutionError("You must use exactly one tool") is thrown (webAgent.ts:1011-1017).
For multi-step physical actions (filling a 5-field form, then submitting), this means 5+ separate LLM round-trips. Each round-trip is typically 5-15 seconds. A typical form submission is 30-60+ seconds of LLM time alone.
The gap
A 5-field form is one task step ("fill in the user details") but five LLM round-trips. The LLM round-trip is the dominant latency in most Pilo runs. Halving or thirding the count translates linearly into wall-clock improvement and a smaller LLM bill.
The constraint is also semantically off: many actions are obviously safe to batch (a fill doesn't change the URL or the ref space; multiple fills + an enter at the end is a coherent unit), while others are obviously not (a click on a link changes the page and invalidates remaining refs).
Proposed scope
A. Allow up to N tool calls per turn
Introduce WebAgentOptions.maxActionsPerStep?: number (default 1 for backwards compat; recommend 3 for trustworthy providers).
In generateAndProcessAction (webAgent.ts:874-1113):
// Stop requiring exactly one tool
const streamResult = streamText({
...this.providerConfig,
messages: this.messages,
tools: webActionTools,
toolChoice: this.maxActionsPerStep > 1 ? "auto" : "required",
// ...
});
// Process tool results in order
const results: ActionResult[] = [];
for (const toolResult of aiResponse.toolResults.slice(0, this.maxActionsPerStep)) {
const output = toolResult.output as ActionResult;
results.push(output);
// Stop the batch if this action changes the page or is terminal
if (output.isTerminal) break;
if (isPageChangingAction(output.action)) break;
if (!output.success && !output.isRecoverable) break;
}
isPageChangingAction: goto, back, forward, click (if it triggers navigation), enter (if it submits), webSearch, terminal actions. Conservative default: click and enter are always treated as page-changing (some clicks don't navigate, but we err safe).
B. Single AGENT_REASONED per turn, multiple AGENT_ACTION
Today the event stream emits one AGENT_REASONED at reasoning-end and one AGENT_ACTION per tool. Keep this shape; per-turn batch just emits multiple AGENT_ACTION events in sequence.
C. System prompt updates
Change the "EXACTLY ONE tool" instructions to "Up to {{ maxActionsPerStep }} tool(s) per turn":
**Action batching:**
You may call up to {{ maxActionsPerStep }} tools in one turn. Use this for related
actions that don't change the page — for example, filling several fields of the
same form. After a page-changing action (click, enter, goto, back, forward, search),
remaining actions in the batch are skipped because the refs become stale.
Safe to batch:
- Multiple fill() calls into one form
- focus() + send_keys() to navigate a combobox
- check() / uncheck() across related checkboxes
Page-changing — must be last or alone:
- click() (may trigger navigation)
- enter() (may submit a form)
- goto(), back(), forward()
- webSearch()
- done(), abort()
The instruction in toolCallInstruction (prompts.ts:213-217) and its echoes in the per-step user message and error feedback templates need parallel updates.
D. Error handling for mid-batch failures
If action 2 of a 3-action batch fails with a recoverable error (e.g., element ref stale), stop the batch and proceed as if a single-action turn errored. The next iteration's snapshot will refresh refs.
If action 1 succeeds, action 2 errors recoverably, action 3 was queued but didn't run: results contains just [action1, action2]. The next AI generation message will include both tool results so the model sees what happened.
E. Telemetry
Add to the AI_GENERATION event:
{ actionsRequested: N, actionsExecuted: M, batchTruncatedBy: "page-change" | "terminal" | "error" | "none" }
Implementation notes
- This is a substantial change to the inner loop. Recommend a feature flag (or just gating on
maxActionsPerStep > 1) so the new path can be enabled cautiously and rolled back if issues surface.
- Test extensively across providers. Some providers may always emit one tool call regardless of
toolChoice; the change should still work (just no win for them). Some may emit many; the truncation needs to be tested.
- The repetition detector (
checkAndHandleRepeatedAction) currently checks one action per iteration. With batching, it should check each action in the batch, not just the last one.
- The validator-on-done flow: if
done is in a batch, only execute the actions before done, then run validation on done's result. Don't execute anything after done in the batch.
- Update the system-prompt guidance about ref invalidation: refs ARE stable within a batch (no snapshot happens between batch actions). The instruction is "after a page-changing action, refs are stale" — within the batch this means the page-changer is last.
- Some tools have implicit page changes that aren't obvious (e.g., a click on a
<button type="submit"> inside a form). Conservative classification is fine — we lose a small amount of batch efficiency, not correctness.
Acceptance criteria
maxActionsPerStep option (default 1) controls batch size.
- With
maxActionsPerStep: 3, the agent can fill 3 form fields in one LLM turn.
- Page-changing actions correctly terminate the batch.
- Recoverable errors terminate the batch.
- Telemetry surfaces batch size and termination reason.
- Existing single-action behavior is unchanged when default is kept at 1.
- Tests cover: 3-fill batch, fill + enter terminating batch, mid-batch ref-stale error terminating batch, batch with
done last, batch with done mid-position (rejected by validator or treated as terminal).
- Manual eval: form-heavy task (e.g., signup or checkout) shows 2-3× latency reduction at
maxActionsPerStep: 3 vs 1.
Effort estimate
3-5 days. Most of the time is testing across providers and validating edge cases (page-change detection, mid-batch errors, repetition detection over a batch).
Related issues
Affects the repetition signature fix (the detector needs per-action tracking, not per-iteration). Affects prompt caching (longer histories = more to cache; this is complementary, not in conflict).
Files likely affected
packages/core/src/webAgent.ts (generateAndProcessAction, ExecutionState)
packages/core/src/prompts.ts (tool call instructions across multiple templates)
packages/core/src/types/ (WebAgentOptions)
packages/core/src/events.ts
packages/core/test/webAgent.test.ts
Current state
Pilo enforces "exactly one tool per turn" in three places:
toolChoice: "required"(webAgent.ts:905) and the action loop processes onlyaiResponse.toolResults[0](webAgent.ts:1019).prompts.ts:213-217, also in the per-step snapshot prompt and error feedback prompt).ToolExecutionError("You must use exactly one tool")is thrown (webAgent.ts:1011-1017).For multi-step physical actions (filling a 5-field form, then submitting), this means 5+ separate LLM round-trips. Each round-trip is typically 5-15 seconds. A typical form submission is 30-60+ seconds of LLM time alone.
The gap
A 5-field form is one task step ("fill in the user details") but five LLM round-trips. The LLM round-trip is the dominant latency in most Pilo runs. Halving or thirding the count translates linearly into wall-clock improvement and a smaller LLM bill.
The constraint is also semantically off: many actions are obviously safe to batch (a fill doesn't change the URL or the ref space; multiple fills + an enter at the end is a coherent unit), while others are obviously not (a click on a link changes the page and invalidates remaining refs).
Proposed scope
A. Allow up to N tool calls per turn
Introduce
WebAgentOptions.maxActionsPerStep?: number(default 1 for backwards compat; recommend 3 for trustworthy providers).In
generateAndProcessAction(webAgent.ts:874-1113):isPageChangingAction:goto,back,forward,click(if it triggers navigation),enter(if it submits),webSearch, terminal actions. Conservative default:clickandenterare always treated as page-changing (some clicks don't navigate, but we err safe).B. Single AGENT_REASONED per turn, multiple AGENT_ACTION
Today the event stream emits one AGENT_REASONED at reasoning-end and one AGENT_ACTION per tool. Keep this shape; per-turn batch just emits multiple AGENT_ACTION events in sequence.
C. System prompt updates
Change the "EXACTLY ONE tool" instructions to "Up to {{ maxActionsPerStep }} tool(s) per turn":
The instruction in
toolCallInstruction(prompts.ts:213-217) and its echoes in the per-step user message and error feedback templates need parallel updates.D. Error handling for mid-batch failures
If action 2 of a 3-action batch fails with a recoverable error (e.g., element ref stale), stop the batch and proceed as if a single-action turn errored. The next iteration's snapshot will refresh refs.
If action 1 succeeds, action 2 errors recoverably, action 3 was queued but didn't run:
resultscontains just [action1, action2]. The next AI generation message will include both tool results so the model sees what happened.E. Telemetry
Add to the
AI_GENERATIONevent:Implementation notes
maxActionsPerStep > 1) so the new path can be enabled cautiously and rolled back if issues surface.toolChoice; the change should still work (just no win for them). Some may emit many; the truncation needs to be tested.checkAndHandleRepeatedAction) currently checks one action per iteration. With batching, it should check each action in the batch, not just the last one.doneis in a batch, only execute the actions beforedone, then run validation ondone's result. Don't execute anything afterdonein the batch.<button type="submit">inside a form). Conservative classification is fine — we lose a small amount of batch efficiency, not correctness.Acceptance criteria
maxActionsPerStepoption (default 1) controls batch size.maxActionsPerStep: 3, the agent can fill 3 form fields in one LLM turn.donelast, batch withdonemid-position (rejected by validator or treated as terminal).maxActionsPerStep: 3vs 1.Effort estimate
3-5 days. Most of the time is testing across providers and validating edge cases (page-change detection, mid-batch errors, repetition detection over a batch).
Related issues
Affects the repetition signature fix (the detector needs per-action tracking, not per-iteration). Affects prompt caching (longer histories = more to cache; this is complementary, not in conflict).
Files likely affected
packages/core/src/webAgent.ts(generateAndProcessAction, ExecutionState)packages/core/src/prompts.ts(tool call instructions across multiple templates)packages/core/src/types/(WebAgentOptions)packages/core/src/events.tspackages/core/test/webAgent.test.ts