Wire Anthropic prompt caching for system + task+plan messages

## Current state

Pilo's main action-loop LLM call (`webAgent.ts:874-989`) invokes `streamText` from the Vercel AI SDK with no provider-specific cache markers:

```ts
const streamResult = streamText({
  ...this.providerConfig,
  messages: this.messages,
  tools: webActionTools,
  toolChoice: "required",
  maxOutputTokens: DEFAULT_GENERATION_MAX_TOKENS,
  abortSignal: this.abortSignal,
});
```

The `messages` array contains, in this order:

1. The system prompt (built by `buildActionLoopSystemPrompt`) — ~3000-4000 tokens including tool examples and best practices
2. The task+plan user message (built by `buildTaskAndPlanPrompt`) — ~500-1500 tokens
3. Per-step snapshot user messages, assistant turns, tool results, error feedback, validation feedback (the conversation)

For a 50-iteration task on Claude with no caching, the system prompt + task+plan messages (positions 1 and 2) are billed at full input rate 50 times.

Anthropic supports prompt caching via `cache_control: { type: "ephemeral" }` markers on individual content parts. The Vercel AI SDK surfaces this through `providerOptions.anthropic` on individual messages. Cached tokens are billed at ~10% of normal input cost on hit (default 5-minute TTL).

## The gap

For Claude-based runs, Pilo currently pays full input cost on tokens that are stable across the entire run. On a long task with a 4000-token system prompt and 50 iterations, that's 200,000 tokens billed that could mostly be cache hits.

OpenAI's prompt caching is automatic (no markers needed) and already applies. Gemini's caching is structurally different and not addressed here. The win is specifically for Anthropic — and via OpenRouter routing to Anthropic models.

## Proposed scope

### A. Detect Anthropic-routed models

In `provider.ts`, add a helper to determine whether the active provider is using an Anthropic model (direct or via OpenRouter):

```ts
function isAnthropicModel(providerConfig: ProviderConfig): boolean {
  const modelId = providerConfig.model?.modelId ?? "";
  // Direct Anthropic provider
  if (providerConfig.providerOptions?.anthropic) return true;
  // OpenRouter routing to Anthropic
  if (/^anthropic\//.test(modelId)) return true;
  // Heuristic: model name contains "claude"
  if (/claude/i.test(modelId)) return true;
  return false;
}
```

### B. Mark cacheable messages

In `initializeSystemPromptAndTask` (`webAgent.ts:1641-1672`), when Anthropic-routed, mark the system message and the task+plan user message as cacheable:

```ts
const cacheableMeta = isAnthropicModel(this.providerConfig)
  ? { providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } } }
  : {};

this.messages = [
  { role: "system", content: systemPrompt, ...cacheableMeta },
  { role: "user", content: taskAndPlan, ...cacheableMeta },
];
```

Verify the exact key name (`providerOptions` vs `experimental_providerMetadata`) against the installed `@ai-sdk/anthropic` version in `package.json`.

### C. Optionally mark the latest snapshot as cacheable too

A more aggressive optimization: each snapshot becomes cacheable up until the next snapshot. This makes the entire conversation prefix cache-hit up to the most recent assistant turn. Tradeoff: cache writes are slightly more expensive than reads, and snapshots churn (one new every iteration), so the cache might invalidate frequently. Benchmark before enabling.

### D. Surface cache metrics

`streamText` returns `usage` and `providerMetadata`. On Anthropic, the usage includes `cacheReadInputTokens` and `cacheCreationInputTokens`. Surface these in the `AI_GENERATION` event:

```ts
this.eventEmitter.emit(WebAgentEventType.AI_GENERATION, {
  // ... existing fields ...
  cacheReadTokens: usage?.cacheReadInputTokens ?? 0,
  cacheWriteTokens: usage?.cacheCreationInputTokens ?? 0,
});
```

The eval-judge consumer (and any cost-tracking layer) can then compute per-task savings.

## Implementation notes

- The exact AI SDK syntax for cache markers varies between SDK versions. Verify against the version pinned in `packages/core/package.json` before writing the code.
- Cache markers on the system message *and* the task+plan message *should* form a single contiguous cacheable prefix. But the SDK may treat them as two separate cache entries (one per message). Test by checking `cacheReadInputTokens` on the second iteration of a fresh task.
- 5-minute TTL means: tasks that pause for >5 minutes between steps lose the cache. For typical browser-automation tasks (each step is 5-30 seconds), this isn't an issue.
- The cache is per-account and per-content-prefix. The system prompt's `currentDate` field changes daily — so the cache invalidates each midnight. Acceptable; can be addressed by extracting the date to an early-but-not-cached user-message position if it becomes a real cost concern.
- Don't add caching for non-Anthropic providers. OpenAI does its own automatic caching; Gemini doesn't support this style; Ollama/LM Studio don't either.

## Acceptance criteria

- For Anthropic-routed providers, system + task+plan messages carry `cacheControl: { type: "ephemeral" }`.
- For non-Anthropic providers, no cache markers are added (verify by message inspection in tests).
- The `AI_GENERATION` event includes `cacheReadTokens` / `cacheWriteTokens` (zero when no caching applies).
- A manual smoke run on a 5-step Claude task shows `cacheReadInputTokens > 0` on step 2+.
- Tests in `packages/core/test/` cover: cache-marker presence for Anthropic, absence for others, `AI_GENERATION` event field shape.

## Effort estimate

1-2 days including verification against the SDK and the smoke test.

## Related issues

Pairs with the per-model prompt variants issue — flash variants will have a shorter cacheable prefix, but the cache markers go on whichever variant is selected.

## Files likely affected

- `packages/core/src/provider.ts` (provider detection helper)
- `packages/core/src/webAgent.ts` (`initializeSystemPromptAndTask`, AI_GENERATION event)
- `packages/core/src/events.ts` (event field additions)
- `packages/core/test/webAgent.test.ts`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire Anthropic prompt caching for system + task+plan messages #433

Current state

The gap

Proposed scope

A. Detect Anthropic-routed models

B. Mark cacheable messages

C. Optionally mark the latest snapshot as cacheable too

D. Surface cache metrics

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wire Anthropic prompt caching for system + task+plan messages #433

Description

Current state

The gap

Proposed scope

A. Detect Anthropic-routed models

B. Mark cacheable messages

C. Optionally mark the latest snapshot as cacheable too

D. Surface cache metrics

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions