Current behavior
When streaming with content="all", each ContentThinking chunk is yielded as an independent object. If you print or concatenate these, each fragment gets its own <thinking>...</thinking> wrapper (via __str__()):
chat = ChatAnthropic(model="claude-sonnet-4-20250514")
async for chunk in chat.stream_async("Explain recursion", content="all"):
print(chunk, end="")
Output:
<thinking>
The user wants an explanation
</thinking>
<thinking>
of recursion. I should
</thinking>
<thinking>
start with a simple definition...
</thinking>
Recursion is a technique where...
With content="text", the situation is different but also problematic — thinking text is yielded as bare strings indistinguishable from response text:
async for chunk in chat.stream_async("Explain recursion", content="text"):
print(chunk, end="")
Output:
The user wants an explanation of recursion. I should start with a simple definition...Recursion is a technique where...
No way to tell where thinking ended and the response began.
Expected behavior
The streaming layer should emit <thinking> tags at the boundaries — once at the start of thinking, once at the end — so that concatenated output is well-formed:
For content="text":
<thinking>
The user wants an explanation of recursion. I should start with a simple definition...
</thinking>
Recursion is a technique where...
For content="all", the yielded objects stay the same (ContentThinking per chunk), but the display/echo output should show proper boundaries.
Why this matters
Downstream consumers (like shinychat) that receive a stream and need to separate thinking from response content currently have to implement their own stateful tracking of ContentThinking objects, accumulate the thinking text, and reconstruct the tag boundaries themselves. If the stream already had correct boundaries in text mode, consumers could treat it as a text stream with well-formed <thinking> tags and parse accordingly — no type inspection needed.
Current behavior
When streaming with
content="all", eachContentThinkingchunk is yielded as an independent object. If you print or concatenate these, each fragment gets its own<thinking>...</thinking>wrapper (via__str__()):Output:
With
content="text", the situation is different but also problematic — thinking text is yielded as bare strings indistinguishable from response text:Output:
No way to tell where thinking ended and the response began.
Expected behavior
The streaming layer should emit
<thinking>tags at the boundaries — once at the start of thinking, once at the end — so that concatenated output is well-formed:For
content="text":For
content="all", the yielded objects stay the same (ContentThinkingper chunk), but the display/echo output should show proper boundaries.Why this matters
Downstream consumers (like shinychat) that receive a stream and need to separate thinking from response content currently have to implement their own stateful tracking of
ContentThinkingobjects, accumulate the thinking text, and reconstruct the tag boundaries themselves. If the stream already had correct boundaries in text mode, consumers could treat it as a text stream with well-formed<thinking>tags and parse accordingly — no type inspection needed.