-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Fix: Map finish_reason for LiteLLM streaming responses #3677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Map finish_reason for LiteLLM streaming responses #3677
Conversation
Fixes google#3665 Streaming responses from LiteLLM models (Claude, GPT, etc.) were not setting finish_reason on aggregated LlmResponse objects, causing agent runners to not properly recognize completion states. This fix mirrors the finish_reason mapping logic from the non-streaming path (lines 776-784) and applies it to both streaming code paths: - Tool call responses (lines 1340-1368) - Text-only responses (lines 1369-1390) Without this fix, agents using Claude or GPT via LiteLLM would encounter stop conditions that couldn't be properly handled, leading to incomplete responses or unexpected agent behavior. Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI in production multi-agent system with MCP tools.
Summary of ChangesHello @thesynapses, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug where LiteLLM streaming responses were failing to populate the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a bug where finish_reason was not being mapped for streaming responses from LiteLLM, which could lead to incorrect agent behavior. The fix applies the existing mapping logic from the non-streaming path to both tool-calling and text-only streaming responses.
My review includes one suggestion to refactor the duplicated code into a helper function. This will improve the code's maintainability by adhering to the DRY principle. Overall, this is a good fix that improves the robustness of the LiteLLM integration.
|
Response from ADK Triaging Agent Hello @thesynapses, thank you for creating this PR! Could you please fill out the |
Fixes #3665
Streaming responses from LiteLLM models (Claude, GPT, etc.) were not setting finish_reason on aggregated LlmResponse objects, causing agent runners to not properly recognize completion states.
This fix mirrors the finish_reason mapping logic from the non-streaming path (lines 776-784) and applies it to both streaming code paths:
Without this fix, agents using Claude or GPT via LiteLLM would encounter stop conditions that couldn't be properly handled, leading to incomplete responses or unexpected agent behavior.
Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI in production multi-agent system with MCP tools.
Link to Issue or Description of Change
1. Link to an existing issue:
Testing Plan
Problem:
When using LiteLLM models in streaming mode, the
finish_reasonfield was never set on aggregatedLlmResponseobjects. This caused ADK agent runners to not properly detect when responses completed, leading to incomplete responses, agents not recognizing stop conditions, and unpredictable behavior with Claude/GPT models.Solution:
Added
finish_reasonmapping in both streaming code paths (tool calls and text-only), mirroring the existing non-streaming implementation at lines 776-784. Maps LiteLLM's string finish reasons ("stop","tool_calls", etc.) to ADK'stypes.FinishReasonenum values using the existing_FINISH_REASON_MAPPINGdictionary.Unit Tests:
Manual End-to-End (E2E) Tests:
Setup:
vertex_ai/claude-sonnet-4-5@20250929)azure/gpt-5-openai-latest)Test Cases:
Before Fix:
finish_reasonfield wasNoneon streaming responsesAfter Fix:
finish_reasoncorrectly set totypes.FinishReason.STOPLog Evidence:
Checklist
Additional context
This fix is critical for production systems using any LiteLLM-supported models (Claude, GPT, Mistral, etc.) in streaming mode. The bug affects all streaming scenarios where the ADK agent runner needs to detect proper completion. The fix ensures consistent behavior between streaming and non-streaming modes, making LiteLLM a viable production option for multi-agent systems.
Related to issue #3676 (double serialization) - both bugs prevented proper Claude/GPT operation with ADK.