OpenRouter BYOK: Claude models do not benefit from prompt caching, causing excessive token costs in agent mode

## Summary

When using OpenRouter BYOK with Anthropic Claude models in GitHub Copilot, prompt caching is silently disabled. This leads to every agentic request re-sending the full conversation context at full price, which becomes extremely expensive for long agent sessions.

## Root Cause

The `OpenRouterLMProvider` extends `AbstractOpenAICompatibleLMProvider`, which sends requests using the OpenAI-compatible chat completions format via `createCapiRequestBody` → `rawMessageToCAPI`. When a `CacheBreakpoint` content part is present in a message, `rawMessageToCAPI` converts it to a **CAPI-specific** `copilot_cache_control: { type: 'ephemeral' }` field placed at the message object level (see [`openai.ts`](https://github.com/microsoft/vscode/blob/main/extensions/copilot/src/platform/networking/common/openai.ts)):

```ts
if (message.content.find(part => part.type === ChatCompletionContentPartKind.CacheBreakpoint)) {
    out.copilot_cache_control = { type: 'ephemeral' };
}
```

This `copilot_cache_control` field is a GitHub Copilot API extension understood only by CAPI. **OpenRouter does not recognise it** and therefore applies no caching at all.

OpenRouter's Anthropic Claude prompt caching requires one of:

1. **Automatic caching** (top-level request field): `"cache_control": { "type": "ephemeral" }` at the root of the request body.
2. **Explicit caching** (per-block): `cache_control` on individual content block objects within the `messages` array.

Neither format is emitted by the current implementation when routing through OpenRouter.

By contrast, the native `AnthropicLMProvider` (direct BYOK Anthropic) correctly attaches `cache_control` to individual content blocks using the Anthropic SDK, so caching works there. The gap is specific to the OpenAI-compatible code path used by OpenRouter.

## Relevant files

- [`extensions/copilot/src/extension/byok/vscode-node/openRouterProvider.ts`](https://github.com/microsoft/vscode/blob/main/extensions/copilot/src/extension/byok/vscode-node/openRouterProvider.ts) — OpenRouter provider (extends `AbstractOpenAICompatibleLMProvider`, no caching override)
- [`extensions/copilot/src/platform/networking/common/openai.ts`](https://github.com/microsoft/vscode/blob/main/extensions/copilot/src/platform/networking/common/openai.ts) — `rawMessageToCAPI` emits `copilot_cache_control` instead of standard `cache_control`
- [`extensions/copilot/src/extension/byok/node/openAIEndpoint.ts`](https://github.com/microsoft/vscode/blob/main/extensions/copilot/src/extension/byok/node/openAIEndpoint.ts) — `OpenAIEndpoint.createRequestBody` calls `createCapiRequestBody` for the non-Responses-API path
- [`extensions/copilot/src/extension/byok/vscode-node/anthropicProvider.ts`](https://github.com/microsoft/vscode/blob/main/extensions/copilot/src/extension/byok/vscode-node/anthropicProvider.ts) — Native Anthropic provider that correctly applies caching (for comparison)

## Steps to Reproduce

1. Configure OpenRouter BYOK in GitHub Copilot with an API key.
2. Select a Claude model (e.g., `anthropic/claude-sonnet-4-5` or `anthropic/claude-opus-4-5`).
3. Start a multi-turn agent session with a large system prompt or long conversation context (>1024–4096 tokens depending on model).
4. Observe via the [OpenRouter Activity page](https://openrouter.ai/activity) or the `/api/v1/generation` API that `cached_tokens` in `prompt_tokens_details` is always `0` — no cache hits occur.

## Expected Behavior

The system prompt and stable conversation history are cached on OpenRouter's Anthropic endpoint. Subsequent requests within the same conversation show `cached_tokens > 0` and a reduced effective cost (~0.1× input price on cached portions).

## Actual Behavior

`cached_tokens` is always `0`. Every request is billed at full input token price. For a typical 10-turn agent session with a 10k-token system prompt, this is ~10× more expensive than it should be.

## Cost Impact

Per OpenRouter's Anthropic pricing:
- Cache write (5-min TTL): 1.25× base input price
- Cache read: **0.1× base input price** (90% savings)

For a long agent run with e.g. 50k tokens of stable context repeated across 20 turns, this is a ~900% cost inflation vs. what the same workload costs with the native Anthropic BYOK provider.

## Suggested Fix

`OpenRouterLMProvider` (or the `OpenAIEndpoint` code path when used with OpenRouter) should translate `CacheBreakpoint` parts to standard `cache_control` objects that OpenRouter understands. The simplest approach is to add a **top-level automatic caching** field to the request body:

```json
{
  "model": "anthropic/claude-sonnet-4-5",
  "cache_control": { "type": "ephemeral" },
  "messages": [ ... ]
}
```

This is the format documented by OpenRouter for [automatic Anthropic prompt caching](https://openrouter.ai/docs/guides/best-practices/prompt-caching#anthropic-claude) and is the recommended approach for multi-turn conversations.

Alternatively, `cache_control` could be injected onto individual content blocks in the `messages` array when the provider is OpenRouter and the selected model is an Anthropic Claude model.

## References

- [OpenRouter prompt caching docs — Anthropic Claude section](https://openrouter.ai/docs/guides/best-practices/prompt-caching#anthropic-claude)
- [Anthropic prompt caching guide](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenRouter BYOK: Claude models do not benefit from prompt caching, causing excessive token costs in agent mode #312939

Summary

Root Cause

Relevant files

Steps to Reproduce

Expected Behavior

Actual Behavior

Cost Impact

Suggested Fix

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenRouter BYOK: Claude models do not benefit from prompt caching, causing excessive token costs in agent mode #312939

Description

Summary

Root Cause

Relevant files

Steps to Reproduce

Expected Behavior

Actual Behavior

Cost Impact

Suggested Fix

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions