-
Notifications
You must be signed in to change notification settings - Fork 2k
fix(litellm): populate cached_content_token_count in usage_metadata #non-breaking #3094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(litellm): populate cached_content_token_count in usage_metadata #non-breaking #3094
Conversation
Summary of ChangesHello @omerfarukeskin01, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where cached token information from LiteLLM responses, particularly when using Azure, was not being correctly propagated into the ADK's usage metadata. By introducing a dedicated field and a robust extraction mechanism, this change ensures that Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Response from ADK Triaging Agent Hello @omerfarukeskin01, thank you for your contribution! To help us with the review process, could you please create a GitHub issue that this PR addresses and link it in the description? According to our contribution guidelines, all bug fixes and feature enhancements should have an associated issue. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the missing propagation of cached token counts from LiteLLM responses into the ADK's usage metadata. The new extractor function is robust, handling multiple data formats for cached tokens from various providers. The changes are well-implemented for both streaming and non-streaming modes and are accompanied by thorough unit tests. I have one minor suggestion to improve the clarity of the new extractor function.
Linked to existing issue #3049. |
Fixes #3049
When using Azure via LiteLLM, cached token information visible in debug logs was not propagated into ADK’s usage metadata; specifically, usage_metadata.cached_content_token_count remained null because lite_llm.py only mapped prompt/completion/total tokens and did not read provider-specific cache metrics (e.g., prompt_tokens_details.cached_tokens, list variants, cached_prompt_tokens, cached_tokens). This change adds a robust extractor for cached tokens from the LiteLLM usage payload and populates types.GenerateContentResponseUsageMetadata.cached_content_token_count in both non-stream and streaming paths. As a result, when you run the same large prompt a second time (cache hit), cached_content_token_count is now populated, enabling accurate runtime cost estimates. The updates are in src/google/adk/models/lite_llm.py with accompanying unit tests in tests/unittests/models/test_litellm.py, and the change is backward compatible (#non-breaking).