Skip to content

Gemini returns THOUGHT: content in text parts without part.thought=True #2121

@itaihochman-cesura

Description

@itaihochman-cesura

Summary

When using Gemini 2.5 Flash with ThinkingConfig (include_thoughts=True or omitted), thought/reasoning content sometimes appears in the regular text field of response parts with a "THOUGHT:" prefix, but no part has part.thought=True. All parts have thought: null and has_thought_signature: false.

Per the Gemini thinking docs, thought content should be returned as separate parts with the thought boolean set to true. Instead, it arrives as literal text in normal text parts.

Environment

  • Package: google-genai 1.64.0 (also observed with 1.57.0)
  • Model: gemini-2.5-flash
  • Backend: Vertex AI (me-west1, global)
  • Mode: Streaming (generate_content_stream)
  • Python: 3.11

Steps to Reproduce

  1. Configure ThinkingConfig with thinking_budget=128 (or similar) for gemini-2.5-flash
  2. Call client.aio.models.generate_content_stream() with a prompt that triggers reasoning
  3. Iterate over streamed chunks and inspect chunk.candidates[0].content.parts
  4. Observe: thought content appears in part.text with "THOUGHT:" prefix, but part.thought is None for all parts

Expected Behavior

  • Thought content should be in parts with part.thought=True
  • Client code can filter by getattr(part, "thought", False) to separate thoughts from answer text

Actual Behavior

  • Thought content appears in part.text with "THOUGHT:" prefix
  • All parts have part.thought is None and part.thought_signature is None
  • Part-level filtering (if part.thought: continue) never triggers
  • We must use text-level filtering (strip THOUGHT: blocks) as a workaround

Diagnostic Data (from production log)

{
  "event": "gemini_thought_leak_diagnostic",
  "model": "gemini-2.5-flash",
  "vertex_region": "global",
  "raw_response": "THOUGHT: The user chose option 2 for character dynamics...\n\n[actual response text]",
  "removed_blocks_count": 1,
  "total_chars_removed": 243,
  "thought_parts_filtered": 0,
  "chunks_metadata": [
    {"parts": [{"thought": null, "has_thought_signature": false, "text_length": 2}], "thought_parts_filtered": 0},
    {"parts": [{"thought": null, "has_thought_signature": false, "text_length": 77}], "thought_parts_filtered": 0}
  ]
}

Key observation: thought_parts_filtered: 0 — no parts were marked as thought by the API, yet we had to remove a 243-char THOUGHT: block from the text.

Workaround (not effective)

We tried to strips THOUGHT: blocks from the concatenated text stream when they appear at block boundaries but it was not working. Also resending was not helping nor changing params like temperture

Question

Is this a known API/streaming behavior? Should thought content ever appear in part.text without part.thought=True? If so, is there a recommended way to detect and filter it at the SDK level?

Metadata

Metadata

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions