Skip to content

feat(models): add extended thinking support for AnthropicLlm#5392

Open
sebastienc wants to merge 3 commits intogoogle:mainfrom
sebastienc:feat/anthropic-reasoning-tokens
Open

feat(models): add extended thinking support for AnthropicLlm#5392
sebastienc wants to merge 3 commits intogoogle:mainfrom
sebastienc:feat/anthropic-reasoning-tokens

Conversation

@sebastienc
Copy link
Copy Markdown

@sebastienc sebastienc commented Apr 19, 2026

Summary

Fixes #3079

Disclaimer: I'm using Claude Code to generate the following text. I find the summary is better than what I would've wrote.

Adds extended thinking (reasoning token) support for AnthropicLlm and Claude (Vertex AI), addressing the long-standing gap vs. Gemini's native thinking support.

  • _build_thinking_param — maps llm_request.config.thinking_config.thinking_budget to Anthropic's ThinkingConfigEnabledParam; clamps to max_tokens - 1 to satisfy the API constraint; returns NOT_GIVEN when thinking is not configured
  • part_to_message_block — new part.thought=True branch (checked before part.text to avoid misclassification) produces ThinkingBlockParam or RedactedThinkingBlockParam for multi-turn continuity
  • content_block_to_part — handles ThinkingBlock and RedactedThinkingBlock response blocks, converting them to Part(thought=True, thought_signature=…) matching the pattern used by LiteLlm
  • generate_content_async — passes thinking param to both streaming and non-streaming messages.create calls
  • _generate_content_streaming — accumulates ThinkingDelta and SignatureDelta events into a new _ThinkingAccumulator; thinking blocks appear in the final aggregated response only (not as partials, keeping consumers simple); RedactedThinkingBlock data captured at content_block_start
  • _ThinkingAccumulator — Pydantic BaseModel (consistent with streaming accumulators in lite_llm.py)

Note on previous PR #3070: that PR was closed because it used the synchronous Anthropic client after the codebase had migrated to AsyncAnthropic. This implementation is fully async and uses the native thinking parameter available in the current SDK — no beta headers required.

Testing plan

Unit tests

18 new tests added to tests/unittests/models/test_anthropic_llm.py, all passing. Full suite: 59 passed.

tests/unittests/models/test_anthropic_llm.py::test_build_thinking_param_none_config PASSED
tests/unittests/models/test_anthropic_llm.py::test_build_thinking_param_zero_budget PASSED
tests/unittests/models/test_anthropic_llm.py::test_build_thinking_param_valid_budget PASSED
tests/unittests/models/test_anthropic_llm.py::test_build_thinking_param_clamps_to_max_tokens PASSED
tests/unittests/models/test_anthropic_llm.py::test_part_to_message_block_thought_regular PASSED
tests/unittests/models/test_anthropic_llm.py::test_part_to_message_block_thought_redacted PASSED
tests/unittests/models/test_anthropic_llm.py::test_part_to_message_block_thought_checked_before_text PASSED
tests/unittests/models/test_anthropic_llm.py::test_content_block_to_part_thinking_block PASSED
tests/unittests/models/test_anthropic_llm.py::test_content_block_to_part_redacted_thinking_block PASSED
tests/unittests/models/test_anthropic_llm.py::test_thought_part_round_trips_in_content_to_message_param PASSED
tests/unittests/models/test_anthropic_llm.py::test_generate_content_async_passes_thinking_param PASSED
tests/unittests/models/test_anthropic_llm.py::test_generate_content_async_no_thinking_config_passes_not_given PASSED
tests/unittests/models/test_anthropic_llm.py::test_generate_content_async_thinking_budget_zero_passes_not_given PASSED
tests/unittests/models/test_anthropic_llm.py::test_generate_content_async_thinking_budget_clamped_to_max_tokens PASSED
tests/unittests/models/test_anthropic_llm.py::test_streaming_thinking_block_in_final_response PASSED
tests/unittests/models/test_anthropic_llm.py::test_streaming_redacted_thinking_block_in_final_response PASSED
tests/unittests/models/test_anthropic_llm.py::test_streaming_thinking_does_not_yield_partial PASSED
tests/unittests/models/test_anthropic_llm.py::test_streaming_passes_thinking_param PASSED

Manual E2E test

Sample agent: contributing/samples/hello_world_anthropic_thinking/

Uses AnthropicLlm(model="claude-sonnet-4-6") with BuiltInPlanner(thinking_config=types.ThinkingConfig(thinking_budget=5000)).

$ echo "Which of these are prime: 7, 10, 13, 97, 100?" | adk run contributing/samples/hello_world_anthropic_thinking

Running agent anthropic_thinking_agent, type exit to exit.
[user]: [anthropic_thinking_agent]: The user wants to check which of the numbers 7, 10, 13, 97, 100 are prime. I'll use the check_prime tool with all these numbers at once.Sure! Let me check all of those numbers at once for you!
[anthropic_thinking_agent]: Out of the numbers you provided, **7, 13, and 97** are prime numbers! Here's a quick breakdown:

- **7** ✅ Prime – only divisible by 1 and 7.
- **10** ❌ Not prime – divisible by 1, 2, 5, and 10.
- **13** ✅ Prime – only divisible by 1 and 13.
- **97** ✅ Prime – only divisible by 1 and 97.
- **100** ❌ Not prime – divisible by 1, 2, 4, 5, 10, 20, 25, 50, and 100.

The thinking block text (The user wants to check which of the numbers…) surfaces in the event stream before the final response, confirming the ThinkingBlock → Part(thought=True) pipeline works end-to-end. Two successful POST /v1/messages calls observed (first: thinking + tool call; second: tool result → final answer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models [Component] Issues related to model support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include reasoning tokens and token streaming for anthropic LLMs

2 participants