Skip to content

LiteLLM Streaming Content Duplication in Tool Call Responses #3697

@thesynapses

Description

@thesynapses

Summary

When using LiteLLM models with ADK's planning features (e.g., PlanReActPlanner) in streaming mode, planning and reasoning content appears twice in responses when tool calls are made:

  1. First during streaming as individual text chunks (lines 1288-1296)
  2. Again in the aggregated tool-call message with content=text (line 1352)

This violates OpenAI/LiteLLM conventions and creates unnecessary duplication in conversation history.

Environment

  • ADK Version: 1.19.0
  • Affected File: lite_llm.py
  • Python Version: 3.11+
  • Models Affected: All non-Gemini models accessed via LiteLLM (Claude, GPT, etc.) when using planning workflows
  • Feature: Streaming responses with tool calls

Expected Behavior

According to OpenAI/LiteLLM API specifications:

  • When a message contains only tool calls (no user-facing answer text), the content field should be None
  • Planning/reasoning text like <PLANNING>I need to search...</PLANNING> is internal reasoning, not the final answer
  • Tool-call messages should follow this structure:
{
  "role": "assistant",
  "content": None,  # No content for tool-only messages
  "tool_calls": [...]
}

Actual Behavior

The aggregated response at line 1348-1359 sets content=text, including all accumulated planning/reasoning text:

aggregated_llm_response_with_tool_call = (
    _message_to_generate_content_response(
        ChatCompletionAssistantMessage(
            role="assistant",
            content=text,  # Includes planning text, causing duplication
            tool_calls=tool_calls,
        ),
        model_version=part.model,
        thought_parts=list(reasoning_parts)
        if reasoning_parts
        else None,
    )
)

Result: Planning text appears twice:

  1. During streaming (lines 1288-1296): <PLANNING>I need to search...</PLANNING> streamed chunk-by-chunk
  2. In aggregated message (line 1352): Same text included in content field

Impact

1. Content Duplication

  • Frontend receives the same planning text twice
  • Requires additional filtering logic in application code
  • Poor user experience if not handled

2. API Convention Violation

  • OpenAI/Claude/GPT APIs expect content=None for tool-only messages
  • Current implementation sends content=<planning_text>, which is semantically incorrect
  • Tool-call messages should not contain answer text in content

3. Conversation History Bloat

  • Planning text unnecessarily stored in message content field
  • Already preserved separately in thought_parts (line 1357)
  • Increases storage and memory overhead

4. Semantic Confusion

  • content=text implies "model generated answer text AND called tools"
  • Reality: model only generated internal reasoning before calling tools
  • Misrepresents the actual interaction flow

Steps to Reproduce

  1. Create an agent with LiteLLM model:
from google.adk.agents import Agent
from google.adk.models import LiteLlm
from google.adk.planners import PlanReActPlanner

agent = Agent(
    model=LiteLlm(model="vertex_ai/claude-3-5-sonnet-v2@20241022"),
    planner=PlanReActPlanner(),
    tools=[search_tool, ...]
)
  1. Enable streaming and send a query requiring tools:
async for response in agent.run_streaming("What's the weather in Boston?"):
    print(response.content)
  1. Observe in logs:

    • Planning text like <PLANNING>I need to search for weather</PLANNING> streamed as chunks
    • Same planning text appears again in aggregated response content field when tool calls are made
  2. Check conversation history:

    • Tool-call message has content="<PLANNING>..." instead of content=None

Root Cause

Lines 1268-1303: Accumulate all text chunks (including planning) into text variable:

text = ""
...
elif isinstance(chunk, TextChunk):
    text += chunk.text  # Accumulates planning/reasoning text
    yield _message_to_generate_content_response(...)  # Already streamed to user

Line 1352: Includes accumulated text again in aggregated message:

content=text,  # Duplicates already-streamed planning text

Line 1357: Planning already preserved separately:

thought_parts=list(reasoning_parts) if reasoning_parts else None,

Proposed Fix

Change line 1352 to set content=None for tool-only messages:

aggregated_llm_response_with_tool_call = (
    _message_to_generate_content_response(
        ChatCompletionAssistantMessage(
            role="assistant",
            content=None,  # ✅ FIX: No duplication, follows OpenAI/LiteLLM spec
            tool_calls=tool_calls,
        ),
        model_version=part.model,
        thought_parts=list(reasoning_parts)
        if reasoning_parts
        else None,
    )
)

Comparison with Non-Streaming

Non-streaming path (around line 770) correctly handles this by:

  • Creating a single response with complete tool call information
  • No opportunity for duplication (no incremental streaming)

Streaming path (lines 1268-1400) has the duplication issue because:

  • Text chunks yielded immediately during streaming
  • Then included again in final aggregated message

The fix brings streaming behavior in line with non-streaming and API conventions.

Additional Context

  • This issue specifically affects planning workflows where models generate reasoning text before calling tools
  • Does not affect simple tool-call scenarios without planning text
  • thought_parts parameter already exists to preserve reasoning separately from message content
  • Frontend applications using ADK planning need to implement workarounds to deduplicate content

Recommended Fix (Summary)

# Line 1348-1359
aggregated_llm_response_with_tool_call = (
    _message_to_generate_content_response(
        ChatCompletionAssistantMessage(
            role="assistant",
            content=None,  # ✅ FIX: Avoid duplication, follow OpenAI spec
            tool_calls=tool_calls,
        ),
        model_version=part.model,
        thought_parts=list(reasoning_parts)
        if reasoning_parts
        else None,
    )
)

This single-line change eliminates content duplication, aligns with API standards, and maintains semantic correctness for tool-call messages in streaming responses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementationtools[Component] This issue is related to tools

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions