Report 1h vs 5m Anthropic cache-creation token split in telemetry#319172
Merged
bhavyaus merged 2 commits intoMay 30, 2026
Conversation
aiday-mar
previously approved these changes
May 30, 2026
Parse the per-TTL breakdown from Anthropic's usage.cache_creation object (present in message_start across CAPI/Anthropic 1P, Bedrock InvokeModel, and Vertex AI) and emit two new measurements on response.success: - promptCacheCreation1hTokenCount: 1h-TTL writes (2x base input rate) - promptCacheCreation5mTokenCount: 5m-TTL writes (1.25x base input rate) Enables exact per-row COGS attribution for the chat.anthropic.promptCaching.extendedTtl A/B experiment without inferring rates from arm assignment. The new fields live on a nested anthropic_cache_creation? object on APIUsage.prompt_tokens_details, namespaced to make the provider-specificity explicit at the type level. Other providers leave it undefined; telemetry uses optional chaining so missing values drop cleanly from the row.
aad6ebe to
d80fd65
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends Copilot Chat’s Anthropic Messages API usage accounting to capture (and emit in existing response.success telemetry) a per-request split of prompt cache-creation (write) tokens by TTL (1h vs 5m), enabling more accurate cost attribution for prompt caching experiments.
Changes:
- Add
anthropic_cache_creationtoAPIUsage.prompt_tokens_detailsto represent Anthropic’susage.cache_creationTTL breakdown. - Parse and preserve the 1h/5m cache-creation token breakdown in both non-streaming and streaming Anthropic Messages API response handling.
- Emit two new
response.successtelemetry measurements and add unit tests covering the new parsing/streaming semantics.
Show a summary per file
| File | Description |
|---|---|
| extensions/copilot/src/platform/networking/common/openai.ts | Extends APIUsage typing to include Anthropic-specific cache-creation TTL breakdown. |
| extensions/copilot/src/platform/endpoint/node/messagesApi.ts | Parses/accumulates 1h vs 5m cache-creation token counts for streaming + non-streaming responses and surfaces them in prompt_tokens_details. |
| extensions/copilot/src/extension/prompt/node/chatMLFetcherTelemetry.ts | Adds two new response.success telemetry measurements sourced from anthropic_cache_creation. |
| extensions/copilot/src/platform/endpoint/test/node/messagesApi.spec.ts | Adds unit tests for non-streaming + streaming preservation/override behavior of the TTL breakdown. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 1
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
bd0c532 to
63513f3
Compare
dmitrivMS
approved these changes
May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds per-request 1h vs 5m cache-creation token split to telemetry, enabling exact COGS attribution for the
chat.anthropic.promptCaching.extendedTtlA/B experiment without inferring rates from arm assignment.The new fields live on a nested `anthropic_cache_creation?` object on `APIUsage.prompt_tokens_details`, namespaced to make the provider-specificity explicit at the type level. Other providers leave it undefined; telemetry uses optional chaining so missing values drop cleanly from the row.