-
Notifications
You must be signed in to change notification settings - Fork 785
Description
Summary
When using Gemini 2.5 Flash with ThinkingConfig (include_thoughts=True or omitted), thought/reasoning content sometimes appears in the regular text field of response parts with a "THOUGHT:" prefix, but no part has part.thought=True. All parts have thought: null and has_thought_signature: false.
Per the Gemini thinking docs, thought content should be returned as separate parts with the thought boolean set to true. Instead, it arrives as literal text in normal text parts.
Environment
- Package: google-genai 1.64.0 (also observed with 1.57.0)
- Model: gemini-2.5-flash
- Backend: Vertex AI (me-west1, global)
- Mode: Streaming (
generate_content_stream) - Python: 3.11
Steps to Reproduce
- Configure
ThinkingConfigwiththinking_budget=128(or similar) for gemini-2.5-flash - Call
client.aio.models.generate_content_stream()with a prompt that triggers reasoning - Iterate over streamed chunks and inspect
chunk.candidates[0].content.parts - Observe: thought content appears in
part.textwith"THOUGHT:"prefix, butpart.thoughtisNonefor all parts
Expected Behavior
- Thought content should be in parts with
part.thought=True - Client code can filter by
getattr(part, "thought", False)to separate thoughts from answer text
Actual Behavior
- Thought content appears in
part.textwith"THOUGHT:"prefix - All parts have
part.thought is Noneandpart.thought_signature is None - Part-level filtering (
if part.thought: continue) never triggers - We must use text-level filtering (strip
THOUGHT:blocks) as a workaround
Diagnostic Data (from production log)
{
"event": "gemini_thought_leak_diagnostic",
"model": "gemini-2.5-flash",
"vertex_region": "global",
"raw_response": "THOUGHT: The user chose option 2 for character dynamics...\n\n[actual response text]",
"removed_blocks_count": 1,
"total_chars_removed": 243,
"thought_parts_filtered": 0,
"chunks_metadata": [
{"parts": [{"thought": null, "has_thought_signature": false, "text_length": 2}], "thought_parts_filtered": 0},
{"parts": [{"thought": null, "has_thought_signature": false, "text_length": 77}], "thought_parts_filtered": 0}
]
}Key observation: thought_parts_filtered: 0 — no parts were marked as thought by the API, yet we had to remove a 243-char THOUGHT: block from the text.
Workaround (not effective)
We tried to strips THOUGHT: blocks from the concatenated text stream when they appear at block boundaries but it was not working. Also resending was not helping nor changing params like temperture
Question
Is this a known API/streaming behavior? Should thought content ever appear in part.text without part.thought=True? If so, is there a recommended way to detect and filter it at the SDK level?