Summary
google.adk.models.lite_llm._is_thinking_blocks_format, introduced in 1.28.0 via
fc45fa6 (PR closing #4801), gates Anthropic thinking_blocks parsing on the presence of a
per-block signature key:
# src/google/adk/models/lite_llm.py (main)
def _is_thinking_blocks_format(reasoning_value: Any) -> bool:
"""Returns True if reasoning_value is Anthropic thinking_blocks format."""
if not isinstance(reasoning_value, list) or not reasoning_value:
return False
first = reasoning_value[0]
return isinstance(first, dict) and "signature" in first
LiteLLM's Gemini integration also emits thinking_blocks when thinking is enabled on Gemini 2.5 / 3 models, but the per-block dicts do not carry a signature — the thought
signatures are returned at the response level under provider_specific_fields.thought_signatures as a parallel array. The detector therefore returns False, falls through to
_iter_reasoning_texts, which only matches dict keys ("text", "content", "reasoning", "reasoning_content") — Gemini blocks have "type" and "thinking", so nothing is
yielded and the response surfaces zero thought Parts to the agent layer.
Net effect: a regression from <1.28.0 for any agent built on LiteLlm + a Gemini thinking model.
Affected versions
- google-adk >= 1.28.0 (still present on
main, 2026-05-15)
Environment
- Python 3.12
- google-adk 1.28.0+
- litellm latest
- Models reproduced on:
gemini-3-flash-preview, gemini-2.5-pro (via LiteLLM proxy)
Actual LiteLLM response payload
Captured directly from LiteLLM with thought output enabled. Note choices[0].message.thinking_blocks shape and the separate response-level
provider_specific_fields.thought_signatures field:
{
"model": "gemini-3-flash-preview",
"choices": [{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "I am a large language model, trained by Google.",
"reasoning_content": "**Understanding the User's Query and My Identity** ...",
"thinking_blocks": [
{
"type": "thinking",
"thinking": "**Understanding the User's Query and My Identity** ..."
}
],
"provider_specific_fields": {
"thought_signatures": [
"AY89a1/RGkcaRoJvGVOsj0pMpznJpT6OZESRZQF8ZYxB1+YHABJ+NjzLIb0fk8FOFQ..."
]
}
}
}],
"usage": {
"completion_tokens": 73,
"prompt_tokens": 5,
"total_tokens": 78,
"completion_tokens_details": {"reasoning_tokens": 62, "text_tokens": 11}
}
}
Call trace through main
_extract_reasoning_value(message) prefers thinking_blocks over reasoning_content — returns the Gemini list.
_convert_reasoning_value_to_parts(reasoning_value) calls _is_thinking_blocks_format(...) → False (no per-block signature).
- Falls back to
_iter_reasoning_texts, which for each dict only yields under keys ("text", "content", "reasoning", "reasoning_content") — none present → yields nothing.
- Returned thought parts:
[]. The thought is lost.
Expected behavior
Gemini-shaped thinking_blocks should be recognized as a thinking-blocks payload and surfaced as Part(thought=True, text=...). The parallel signatures from
provider_specific_fields.thought_signatures should be attached to the corresponding thought parts so they can be relayed back to the model on subsequent turns.
Suggested fix
Normalize Gemini-shaped thinking_blocks into the Anthropic shape inside _extract_reasoning_value, by zipping the response-level thought_signatures onto each block. The
existing Anthropic codepath in _convert_reasoning_value_to_parts then handles both providers unchanged.
PR / unit tests below. Happy to open the PR if it looks right.
Related
PR diff
src/google/adk/models/lite_llm.py:
@@ def _extract_reasoning_value(message: Message | Delta | None) -> Any:
if message is None:
return None
# Anthropic models return thinking_blocks with type/thinking/signature fields.
# This must be preserved to maintain thinking across tool call boundaries.
thinking_blocks = message.get("thinking_blocks")
if thinking_blocks is not None:
+ # Gemini also emits thinking_blocks, but each block lacks a per-block
+ # `signature`; signatures arrive in parallel under
+ # `provider_specific_fields.thought_signatures`. Zip them in so the
+ # downstream Anthropic codepath handles both providers uniformly.
+ if (
+ isinstance(thinking_blocks, list)
+ and thinking_blocks
+ and isinstance(thinking_blocks[0], dict)
+ and "signature" not in thinking_blocks[0]
+ ):
+ provider_fields = message.get("provider_specific_fields") or {}
+ signatures = provider_fields.get("thought_signatures") or []
+ if signatures:
+ merged: list[dict] = []
+ for index, block in enumerate(thinking_blocks):
+ if (
+ isinstance(block, dict)
+ and index < len(signatures)
+ and signatures[index]
+ ):
+ merged.append({**block, "signature": signatures[index]})
+ else:
+ merged.append(block)
+ thinking_blocks = merged
return thinking_blocks
reasoning_content = message.get("reasoning_content")
if reasoning_content is not None:
return reasoning_content
return message.get("reasoning")
A note for maintainers (worth adding to the PR description, not the code): Anthropic per-block signature is treated as an opaque token and stored on Part.thought_signature via
signature.encode("utf-8"). Gemini signatures are base64-encoded bytes. If Part.thought_signature is expected to hold the decoded bytes (matching the outbound b64encode(...) path
in _extract_thought_signature_from_tool_call's counterpart), _convert_reasoning_value_to_parts should base64.b64decode(signature) when the source is Gemini. Left out of this PR to
keep the diff surgical — happy to address as a follow-up once you confirm the desired semantics.
Unit tests
Append to tests/unittests/models/test_litellm.py:
def test_extract_reasoning_value_gemini_thinking_blocks_zips_signatures():
"""Gemini emits thinking_blocks without per-block signatures; signatures
arrive in parallel under provider_specific_fields.thought_signatures.
_extract_reasoning_value should normalize them into the Anthropic shape."""
message = {
"role": "assistant",
"content": "I am a large language model.",
"thinking_blocks": [
{"type": "thinking", "thinking": "Step one ..."},
{"type": "thinking", "thinking": "Step two ..."},
],
"provider_specific_fields": {
"thought_signatures": ["sig-1", "sig-2"],
},
}
result = _extract_reasoning_value(message)
assert result == [
{"type": "thinking", "thinking": "Step one ...", "signature": "sig-1"},
{"type": "thinking", "thinking": "Step two ...", "signature": "sig-2"},
]
def test_extract_reasoning_value_gemini_thinking_blocks_without_signatures():
"""If provider_specific_fields is absent, Gemini thinking_blocks pass
through unchanged. Downstream detector should still accept them once
broadened — covered separately."""
message = {
"role": "assistant",
"content": "Answer",
"thinking_blocks": [
{"type": "thinking", "thinking": "Inner monologue"},
],
}
result = _extract_reasoning_value(message)
assert result == [{"type": "thinking", "thinking": "Inner monologue"}]
def test_extract_reasoning_value_anthropic_thinking_blocks_unchanged():
"""Regression guard: Anthropic-shaped blocks (already carrying signature)
must not be re-zipped or otherwise modified."""
blocks = [
{"type": "thinking", "thinking": "Anthropic thought", "signature": "abc"},
]
message = {
"role": "assistant",
"content": "Answer",
"thinking_blocks": blocks,
"provider_specific_fields": {"thought_signatures": ["should-be-ignored"]},
}
result = _extract_reasoning_value(message)
assert result == blocks
def test_message_to_generate_content_response_gemini_thinking_blocks():
"""End-to-end: a Gemini-shaped message should surface a thought Part and
the visible text Part, with the thought signature attached as bytes."""
message = {
"role": "assistant",
"content": "I am a large language model.",
"thinking_blocks": [
{"type": "thinking", "thinking": "Identity check ..."},
],
"provider_specific_fields": {
"thought_signatures": ["AY89a1/RGkc"],
},
}
response = _message_to_generate_content_response(message)
assert len(response.content.parts) == 2
thought_part = response.content.parts[0]
text_part = response.content.parts[1]
assert thought_part.thought is True
assert thought_part.text == "Identity check ..."
assert thought_part.thought_signature == b"AY89a1/RGkc"
assert text_part.text == "I am a large language model."
Summary
google.adk.models.lite_llm._is_thinking_blocks_format, introduced in 1.28.0 viafc45fa6 (PR closing #4801), gates Anthropic thinking_blocks parsing on the presence of a
per-block
signaturekey:LiteLLM's Gemini integration also emits
thinking_blockswhen thinking is enabled on Gemini 2.5 / 3 models, but the per-block dicts do not carry asignature— the thoughtsignatures are returned at the response level under
provider_specific_fields.thought_signaturesas a parallel array. The detector therefore returnsFalse, falls through to_iter_reasoning_texts, which only matches dict keys("text", "content", "reasoning", "reasoning_content")— Gemini blocks have"type"and"thinking", so nothing isyielded and the response surfaces zero thought
Parts to the agent layer.Net effect: a regression from <1.28.0 for any agent built on
LiteLlm+ a Gemini thinking model.Affected versions
main, 2026-05-15)Environment
gemini-3-flash-preview,gemini-2.5-pro(via LiteLLM proxy)Actual LiteLLM response payload
Captured directly from LiteLLM with thought output enabled. Note
choices[0].message.thinking_blocksshape and the separate response-levelprovider_specific_fields.thought_signaturesfield:{ "model": "gemini-3-flash-preview", "choices": [{ "finish_reason": "stop", "index": 0, "message": { "role": "assistant", "content": "I am a large language model, trained by Google.", "reasoning_content": "**Understanding the User's Query and My Identity** ...", "thinking_blocks": [ { "type": "thinking", "thinking": "**Understanding the User's Query and My Identity** ..." } ], "provider_specific_fields": { "thought_signatures": [ "AY89a1/RGkcaRoJvGVOsj0pMpznJpT6OZESRZQF8ZYxB1+YHABJ+NjzLIb0fk8FOFQ..." ] } } }], "usage": { "completion_tokens": 73, "prompt_tokens": 5, "total_tokens": 78, "completion_tokens_details": {"reasoning_tokens": 62, "text_tokens": 11} } }Call trace through
main_extract_reasoning_value(message)prefersthinking_blocksoverreasoning_content— returns the Gemini list._convert_reasoning_value_to_parts(reasoning_value)calls_is_thinking_blocks_format(...)→False(no per-blocksignature)._iter_reasoning_texts, which for each dict only yields under keys("text", "content", "reasoning", "reasoning_content")— none present → yields nothing.[]. The thought is lost.Expected behavior
Gemini-shaped
thinking_blocksshould be recognized as a thinking-blocks payload and surfaced asPart(thought=True, text=...). The parallel signatures fromprovider_specific_fields.thought_signaturesshould be attached to the corresponding thought parts so they can be relayed back to the model on subsequent turns.Suggested fix
Normalize Gemini-shaped
thinking_blocksinto the Anthropic shape inside_extract_reasoning_value, by zipping the response-levelthought_signaturesonto each block. Theexisting Anthropic codepath in
_convert_reasoning_value_to_partsthen handles both providers unchanged.PR / unit tests below. Happy to open the PR if it looks right.
Related
PR diff
src/google/adk/models/lite_llm.py:
A note for maintainers (worth adding to the PR description, not the code): Anthropic per-block signature is treated as an opaque token and stored on Part.thought_signature via
signature.encode("utf-8"). Gemini signatures are base64-encoded bytes. If Part.thought_signature is expected to hold the decoded bytes (matching the outbound b64encode(...) path
in _extract_thought_signature_from_tool_call's counterpart), _convert_reasoning_value_to_parts should base64.b64decode(signature) when the source is Gemini. Left out of this PR to
keep the diff surgical — happy to address as a follow-up once you confirm the desired semantics.
Unit tests
Append to tests/unittests/models/test_litellm.py: