-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Summary
When using LiteLLM models with ADK's planning features (e.g., PlanReActPlanner) in streaming mode, planning and reasoning content appears twice in responses when tool calls are made:
- First during streaming as individual text chunks (lines 1288-1296)
- Again in the aggregated tool-call message with
content=text(line 1352)
This violates OpenAI/LiteLLM conventions and creates unnecessary duplication in conversation history.
Environment
- ADK Version: 1.19.0
- Affected File: lite_llm.py
- Python Version: 3.11+
- Models Affected: All non-Gemini models accessed via LiteLLM (Claude, GPT, etc.) when using planning workflows
- Feature: Streaming responses with tool calls
Expected Behavior
According to OpenAI/LiteLLM API specifications:
- When a message contains only tool calls (no user-facing answer text), the
contentfield should beNone - Planning/reasoning text like
<PLANNING>I need to search...</PLANNING>is internal reasoning, not the final answer - Tool-call messages should follow this structure:
{
"role": "assistant",
"content": None, # No content for tool-only messages
"tool_calls": [...]
}Actual Behavior
The aggregated response at line 1348-1359 sets content=text, including all accumulated planning/reasoning text:
aggregated_llm_response_with_tool_call = (
_message_to_generate_content_response(
ChatCompletionAssistantMessage(
role="assistant",
content=text, # Includes planning text, causing duplication
tool_calls=tool_calls,
),
model_version=part.model,
thought_parts=list(reasoning_parts)
if reasoning_parts
else None,
)
)Result: Planning text appears twice:
- During streaming (lines 1288-1296):
<PLANNING>I need to search...</PLANNING>streamed chunk-by-chunk - In aggregated message (line 1352): Same text included in
contentfield
Impact
1. Content Duplication
- Frontend receives the same planning text twice
- Requires additional filtering logic in application code
- Poor user experience if not handled
2. API Convention Violation
- OpenAI/Claude/GPT APIs expect
content=Nonefor tool-only messages - Current implementation sends
content=<planning_text>, which is semantically incorrect - Tool-call messages should not contain answer text in
content
3. Conversation History Bloat
- Planning text unnecessarily stored in message
contentfield - Already preserved separately in
thought_parts(line 1357) - Increases storage and memory overhead
4. Semantic Confusion
content=textimplies "model generated answer text AND called tools"- Reality: model only generated internal reasoning before calling tools
- Misrepresents the actual interaction flow
Steps to Reproduce
- Create an agent with LiteLLM model:
from google.adk.agents import Agent
from google.adk.models import LiteLlm
from google.adk.planners import PlanReActPlanner
agent = Agent(
model=LiteLlm(model="vertex_ai/claude-3-5-sonnet-v2@20241022"),
planner=PlanReActPlanner(),
tools=[search_tool, ...]
)- Enable streaming and send a query requiring tools:
async for response in agent.run_streaming("What's the weather in Boston?"):
print(response.content)-
Observe in logs:
- Planning text like
<PLANNING>I need to search for weather</PLANNING>streamed as chunks - Same planning text appears again in aggregated response
contentfield when tool calls are made
- Planning text like
-
Check conversation history:
- Tool-call message has
content="<PLANNING>..."instead ofcontent=None
- Tool-call message has
Root Cause
Lines 1268-1303: Accumulate all text chunks (including planning) into text variable:
text = ""
...
elif isinstance(chunk, TextChunk):
text += chunk.text # Accumulates planning/reasoning text
yield _message_to_generate_content_response(...) # Already streamed to userLine 1352: Includes accumulated text again in aggregated message:
content=text, # Duplicates already-streamed planning textLine 1357: Planning already preserved separately:
thought_parts=list(reasoning_parts) if reasoning_parts else None,Proposed Fix
Change line 1352 to set content=None for tool-only messages:
aggregated_llm_response_with_tool_call = (
_message_to_generate_content_response(
ChatCompletionAssistantMessage(
role="assistant",
content=None, # ✅ FIX: No duplication, follows OpenAI/LiteLLM spec
tool_calls=tool_calls,
),
model_version=part.model,
thought_parts=list(reasoning_parts)
if reasoning_parts
else None,
)
)Comparison with Non-Streaming
Non-streaming path (around line 770) correctly handles this by:
- Creating a single response with complete tool call information
- No opportunity for duplication (no incremental streaming)
Streaming path (lines 1268-1400) has the duplication issue because:
- Text chunks yielded immediately during streaming
- Then included again in final aggregated message
The fix brings streaming behavior in line with non-streaming and API conventions.
Additional Context
- This issue specifically affects planning workflows where models generate reasoning text before calling tools
- Does not affect simple tool-call scenarios without planning text
thought_partsparameter already exists to preserve reasoning separately from message content- Frontend applications using ADK planning need to implement workarounds to deduplicate content
Recommended Fix (Summary)
# Line 1348-1359
aggregated_llm_response_with_tool_call = (
_message_to_generate_content_response(
ChatCompletionAssistantMessage(
role="assistant",
content=None, # ✅ FIX: Avoid duplication, follow OpenAI spec
tool_calls=tool_calls,
),
model_version=part.model,
thought_parts=list(reasoning_parts)
if reasoning_parts
else None,
)
)This single-line change eliminates content duplication, aligns with API standards, and maintains semantic correctness for tool-call messages in streaming responses.