Skip to content

LiteLLM Streaming Responses Missing finish_reason in ADK #3665

@thesynapses

Description

@thesynapses

LiteLLM Streaming Responses Missing finish_reason in ADK

Issue Summary

Streaming responses from non-Gemini models (Claude, GPT, etc.) via LiteLLM do not set the finish_reason field on aggregated LlmResponse objects, preventing ADK agent runners from properly recognizing completion states.

Environment

  • ADK Version: 1.19.0
  • Affected Models: All models via LiteLLM (Claude via Vertex AI, Azure OpenAI GPT-5, etc.)
  • Streaming Mode: SSE (Server-Sent Events)
  • Working Models: Native Gemini models (not affected)

Problem Description

When using LiteLLM models in streaming mode, the ADK lite_llm.py implementation aggregates streaming chunks but fails to set the finish_reason field on the final LlmResponse objects. This occurs in two scenarios:

  1. Tool call responses (lines 1254-1268): When finish_reason is "tool_calls" or "stop" and function calls exist
  2. Text-only responses (lines 1290-1305): When finish_reason is "stop" and text content exists

Root Cause

The streaming aggregation logic (lines 1242-1310) creates aggregated_llm_response and aggregated_llm_response_with_tool_call objects using _message_to_generate_content_response(), but never sets their finish_reason field. In contrast, the non-streaming path (lines 776-784) correctly maps finish reasons.

Impact

Without finish_reason set:

  • ADK agent runners cannot detect normal completion (STOP)
  • Cannot detect token limit exhaustion (MAX_TOKENS)
  • Cannot detect safety filter triggers (SAFETY)
  • Agents may not correctly handle completion states
  • Downstream workflows relying on completion metadata fail

Steps to Reproduce

  1. Configure an ADK agent with LiteLLM model (e.g., vertex_ai/claude-sonnet-4-5@20250929)
  2. Enable streaming: LiteLlm(model="vertex_ai/claude-...", stream=True)
  3. Run the agent with any request that triggers a streaming response
  4. Inspect the LlmResponse objects yielded during streaming
  5. Observe: finish_reason field is None or unset

Example Log Evidence

# From uvicorn_debug.log - Streaming chunk with finish_reason
ModelResponseStream(
    id='chatcmpl-0012affd-9559-4886-9819-2e0573a38680',
    choices=[StreamingChoices(finish_reason='stop', ...)],
    usage=Usage(completion_tokens=291, prompt_tokens=0, ...)
)

# But aggregated LlmResponse lacks finish_reason:
# aggregated_llm_response.finish_reason = None 

Expected Behavior

Streaming responses should set finish_reason on aggregated LlmResponse objects, matching the behavior of:

  1. Non-streaming LiteLLM responses (lines 776-784)
  2. Native Gemini streaming responses (which already work correctly)

The finish_reason should be mapped from LiteLLM's string values to ADK's types.FinishReason enum using the existing _FINISH_REASON_MAPPING.

Proposed Fix

Add finish_reason mapping to both streaming aggregation paths, mirroring the non-streaming implementation:

# For tool call responses (after line 1268)
if isinstance(finish_reason, types.FinishReason):
    aggregated_llm_response_with_tool_call.finish_reason = finish_reason
else:
    finish_reason_str = str(finish_reason).lower()
    aggregated_llm_response_with_tool_call.finish_reason = _FINISH_REASON_MAPPING.get(
        finish_reason_str, types.FinishReason.OTHER
    )

# For text-only responses (after line 1295)
if isinstance(finish_reason, types.FinishReason):
    aggregated_llm_response.finish_reason = finish_reason
else:
    finish_reason_str = str(finish_reason).lower()
    aggregated_llm_response.finish_reason = _FINISH_REASON_MAPPING.get(
        finish_reason_str, types.FinishReason.OTHER
    )

Testing Evidence

Before Fix

Event 74: Event
ADK streaming loop completed. Total events: 74
# LlmResponse.finish_reason = None

After Fix

Event 74: Event with finish_reason='stop'
# LlmResponse.finish_reason = FinishReason.STOP 
ADK streaming loop completed. Total events: 74

Related Code

  • File: lite_llm.py
  • Affected lines: 1242-1310 (streaming aggregation)
  • Reference implementation: Lines 776-784 (non-streaming finish_reason mapping)
  • Mapping constant: _FINISH_REASON_MAPPING (lines 28-33)

Affected Workflows

  • Multi-agent orchestration systems relying on completion signals
  • Usage tracking and billing systems reading finish_reason
  • Error handling logic that checks for MAX_TOKENS or SAFETY stops
  • Any downstream code expecting LlmResponse.finish_reason to be populated

Priority

High - Affects all LiteLLM streaming users, breaks completion detection, and creates inconsistency between streaming/non-streaming behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    models[Component] Issues related to model support

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions