-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
LiteLLM Streaming Responses Missing finish_reason in ADK
Issue Summary
Streaming responses from non-Gemini models (Claude, GPT, etc.) via LiteLLM do not set the finish_reason field on aggregated LlmResponse objects, preventing ADK agent runners from properly recognizing completion states.
Environment
- ADK Version: 1.19.0
- Affected Models: All models via LiteLLM (Claude via Vertex AI, Azure OpenAI GPT-5, etc.)
- Streaming Mode: SSE (Server-Sent Events)
- Working Models: Native Gemini models (not affected)
Problem Description
When using LiteLLM models in streaming mode, the ADK lite_llm.py implementation aggregates streaming chunks but fails to set the finish_reason field on the final LlmResponse objects. This occurs in two scenarios:
- Tool call responses (lines 1254-1268): When
finish_reasonis"tool_calls"or"stop"and function calls exist - Text-only responses (lines 1290-1305): When
finish_reasonis"stop"and text content exists
Root Cause
The streaming aggregation logic (lines 1242-1310) creates aggregated_llm_response and aggregated_llm_response_with_tool_call objects using _message_to_generate_content_response(), but never sets their finish_reason field. In contrast, the non-streaming path (lines 776-784) correctly maps finish reasons.
Impact
Without finish_reason set:
- ADK agent runners cannot detect normal completion (
STOP) - Cannot detect token limit exhaustion (
MAX_TOKENS) - Cannot detect safety filter triggers (
SAFETY) - Agents may not correctly handle completion states
- Downstream workflows relying on completion metadata fail
Steps to Reproduce
- Configure an ADK agent with LiteLLM model (e.g.,
vertex_ai/claude-sonnet-4-5@20250929) - Enable streaming:
LiteLlm(model="vertex_ai/claude-...", stream=True) - Run the agent with any request that triggers a streaming response
- Inspect the
LlmResponseobjects yielded during streaming - Observe:
finish_reasonfield isNoneor unset
Example Log Evidence
# From uvicorn_debug.log - Streaming chunk with finish_reason
ModelResponseStream(
id='chatcmpl-0012affd-9559-4886-9819-2e0573a38680',
choices=[StreamingChoices(finish_reason='stop', ...)],
usage=Usage(completion_tokens=291, prompt_tokens=0, ...)
)
# But aggregated LlmResponse lacks finish_reason:
# aggregated_llm_response.finish_reason = None Expected Behavior
Streaming responses should set finish_reason on aggregated LlmResponse objects, matching the behavior of:
- Non-streaming LiteLLM responses (lines 776-784)
- Native Gemini streaming responses (which already work correctly)
The finish_reason should be mapped from LiteLLM's string values to ADK's types.FinishReason enum using the existing _FINISH_REASON_MAPPING.
Proposed Fix
Add finish_reason mapping to both streaming aggregation paths, mirroring the non-streaming implementation:
# For tool call responses (after line 1268)
if isinstance(finish_reason, types.FinishReason):
aggregated_llm_response_with_tool_call.finish_reason = finish_reason
else:
finish_reason_str = str(finish_reason).lower()
aggregated_llm_response_with_tool_call.finish_reason = _FINISH_REASON_MAPPING.get(
finish_reason_str, types.FinishReason.OTHER
)
# For text-only responses (after line 1295)
if isinstance(finish_reason, types.FinishReason):
aggregated_llm_response.finish_reason = finish_reason
else:
finish_reason_str = str(finish_reason).lower()
aggregated_llm_response.finish_reason = _FINISH_REASON_MAPPING.get(
finish_reason_str, types.FinishReason.OTHER
)Testing Evidence
Before Fix
Event 74: Event
ADK streaming loop completed. Total events: 74
# LlmResponse.finish_reason = None
After Fix
Event 74: Event with finish_reason='stop'
# LlmResponse.finish_reason = FinishReason.STOP
ADK streaming loop completed. Total events: 74
Related Code
- File: lite_llm.py
- Affected lines: 1242-1310 (streaming aggregation)
- Reference implementation: Lines 776-784 (non-streaming finish_reason mapping)
- Mapping constant:
_FINISH_REASON_MAPPING(lines 28-33)
Affected Workflows
- Multi-agent orchestration systems relying on completion signals
- Usage tracking and billing systems reading finish_reason
- Error handling logic that checks for MAX_TOKENS or SAFETY stops
- Any downstream code expecting LlmResponse.finish_reason to be populated
Priority
High - Affects all LiteLLM streaming users, breaks completion detection, and creates inconsistency between streaming/non-streaming behavior.