Does the Agents / Runs API support prompt caching or expose cached token counts in usage? #4555
Unanswered
annanasi-mon
asked this question in
Q&A
Replies: 1 comment
-
|
From my point of view, this is exactly the kind of observability gap that makes prompt caching hard to operationalize. If cached token counts are hidden by the Agents and Runs abstraction, teams cannot really tell whether caching is helping cost and latency or only assume that it is. Even if the underlying API does not expose those fields yet, the docs should probably say that very directly so people stop hunting for metrics that are not currently available. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m using the Azure AI Agent client (e.g. agent_framework_azure_ai with AzureAIAgentClient and agent.run()), which uses the Agents Runs API (e.g. runs.stream()). I’m trying to verify prompt cache usage.
Cached token counts in usage
The run completion usage I see only has prompt_tokens, completion_tokens, and total_tokens (e.g. from RunStepCompletionUsage / RunCompletionUsage in the SDK). There is no prompt_tokens_details or cached_tokens (unlike the Chat Completions / Responses API, which can return prompt_tokens_details.cached_tokens).
Can you confirm that the Agents/Runs API does not currently expose cached token counts in usage, and whether there are plans to add this?
Prompt caching support
Is prompt caching (and optional prompt_cache_key-style routing) supported for agent runs at all? If yes, is it documented anywhere and are there plans to expose cache-related fields in run/step usage?
Beta Was this translation helpful? Give feedback.
All reactions