Support multi-action per turn

## Current state

Pilo enforces "exactly one tool per turn" in three places:

1. The AI SDK call uses `toolChoice: "required"` (`webAgent.ts:905`) and the action loop processes only `aiResponse.toolResults[0]` (`webAgent.ts:1019`).
2. The system prompt repeats "EXACTLY ONE tool" (`prompts.ts:213-217`, also in the per-step snapshot prompt and error feedback prompt).
3. When zero tool calls come back, `ToolExecutionError("You must use exactly one tool")` is thrown (`webAgent.ts:1011-1017`).

For multi-step physical actions (filling a 5-field form, then submitting), this means 5+ separate LLM round-trips. Each round-trip is typically 5-15 seconds. A typical form submission is 30-60+ seconds of LLM time alone.

## The gap

A 5-field form is one *task* step ("fill in the user details") but five LLM round-trips. The LLM round-trip is the dominant latency in most Pilo runs. Halving or thirding the count translates linearly into wall-clock improvement and a smaller LLM bill.

The constraint is also semantically off: many actions are obviously safe to batch (a fill doesn't change the URL or the ref space; multiple fills + an enter at the end is a coherent unit), while others are obviously not (a click on a link changes the page and invalidates remaining refs).

## Proposed scope

### A. Allow up to N tool calls per turn

Introduce `WebAgentOptions.maxActionsPerStep?: number` (default 1 for backwards compat; recommend 3 for trustworthy providers).

In `generateAndProcessAction` (`webAgent.ts:874-1113`):

```ts
// Stop requiring exactly one tool
const streamResult = streamText({
  ...this.providerConfig,
  messages: this.messages,
  tools: webActionTools,
  toolChoice: this.maxActionsPerStep > 1 ? "auto" : "required",
  // ...
});

// Process tool results in order
const results: ActionResult[] = [];
for (const toolResult of aiResponse.toolResults.slice(0, this.maxActionsPerStep)) {
  const output = toolResult.output as ActionResult;
  results.push(output);

  // Stop the batch if this action changes the page or is terminal
  if (output.isTerminal) break;
  if (isPageChangingAction(output.action)) break;
  if (!output.success && !output.isRecoverable) break;
}
```

`isPageChangingAction`: `goto`, `back`, `forward`, `click` (if it triggers navigation), `enter` (if it submits), `webSearch`, terminal actions. Conservative default: `click` and `enter` are always treated as page-changing (some clicks don't navigate, but we err safe).

### B. Single AGENT_REASONED per turn, multiple AGENT_ACTION

Today the event stream emits one AGENT_REASONED at reasoning-end and one AGENT_ACTION per tool. Keep this shape; per-turn batch just emits multiple AGENT_ACTION events in sequence.

### C. System prompt updates

Change the "EXACTLY ONE tool" instructions to "Up to {{ maxActionsPerStep }} tool(s) per turn":

```
**Action batching:**
You may call up to {{ maxActionsPerStep }} tools in one turn. Use this for related
actions that don't change the page — for example, filling several fields of the
same form. After a page-changing action (click, enter, goto, back, forward, search),
remaining actions in the batch are skipped because the refs become stale.

Safe to batch:
- Multiple fill() calls into one form
- focus() + send_keys() to navigate a combobox
- check() / uncheck() across related checkboxes

Page-changing — must be last or alone:
- click() (may trigger navigation)
- enter() (may submit a form)
- goto(), back(), forward()
- webSearch()
- done(), abort()
```

The instruction in `toolCallInstruction` (`prompts.ts:213-217`) and its echoes in the per-step user message and error feedback templates need parallel updates.

### D. Error handling for mid-batch failures

If action 2 of a 3-action batch fails with a recoverable error (e.g., element ref stale), stop the batch and proceed as if a single-action turn errored. The next iteration's snapshot will refresh refs.

If action 1 succeeds, action 2 errors recoverably, action 3 was queued but didn't run: `results` contains just [action1, action2]. The next AI generation message will include both tool results so the model sees what happened.

### E. Telemetry

Add to the `AI_GENERATION` event:

```ts
{ actionsRequested: N, actionsExecuted: M, batchTruncatedBy: "page-change" | "terminal" | "error" | "none" }
```

## Implementation notes

- This is a substantial change to the inner loop. Recommend a feature flag (or just gating on `maxActionsPerStep > 1`) so the new path can be enabled cautiously and rolled back if issues surface.
- Test extensively across providers. Some providers may always emit one tool call regardless of `toolChoice`; the change should still work (just no win for them). Some may emit many; the truncation needs to be tested.
- The repetition detector (`checkAndHandleRepeatedAction`) currently checks one action per iteration. With batching, it should check each action in the batch, not just the last one.
- The validator-on-done flow: if `done` is in a batch, only execute the actions before `done`, then run validation on `done`'s result. Don't execute anything after `done` in the batch.
- Update the system-prompt guidance about ref invalidation: refs ARE stable within a batch (no snapshot happens between batch actions). The instruction is "after a page-changing action, refs are stale" — within the batch this means the page-changer is last.
- Some tools have implicit page changes that aren't obvious (e.g., a click on a `<button type="submit">` inside a form). Conservative classification is fine — we lose a small amount of batch efficiency, not correctness.

## Acceptance criteria

- `maxActionsPerStep` option (default 1) controls batch size.
- With `maxActionsPerStep: 3`, the agent can fill 3 form fields in one LLM turn.
- Page-changing actions correctly terminate the batch.
- Recoverable errors terminate the batch.
- Telemetry surfaces batch size and termination reason.
- Existing single-action behavior is unchanged when default is kept at 1.
- Tests cover: 3-fill batch, fill + enter terminating batch, mid-batch ref-stale error terminating batch, batch with `done` last, batch with `done` mid-position (rejected by validator or treated as terminal).
- Manual eval: form-heavy task (e.g., signup or checkout) shows 2-3× latency reduction at `maxActionsPerStep: 3` vs 1.

## Effort estimate

3-5 days. Most of the time is testing across providers and validating edge cases (page-change detection, mid-batch errors, repetition detection over a batch).

## Related issues

Affects the repetition signature fix (the detector needs per-action tracking, not per-iteration). Affects prompt caching (longer histories = more to cache; this is complementary, not in conflict).

## Files likely affected

- `packages/core/src/webAgent.ts` (`generateAndProcessAction`, ExecutionState)
- `packages/core/src/prompts.ts` (tool call instructions across multiple templates)
- `packages/core/src/types/` (WebAgentOptions)
- `packages/core/src/events.ts`
- `packages/core/test/webAgent.test.ts`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-action per turn #438

Current state

The gap

Proposed scope

A. Allow up to N tool calls per turn

B. Single AGENT_REASONED per turn, multiple AGENT_ACTION

C. System prompt updates

D. Error handling for mid-batch failures

E. Telemetry

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support multi-action per turn #438

Description

Current state

The gap

Proposed scope

A. Allow up to N tool calls per turn

B. Single AGENT_REASONED per turn, multiple AGENT_ACTION

C. System prompt updates

D. Error handling for mid-batch failures

E. Telemetry

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions