Skip to content

Emit <thinking> tag boundaries during streaming#294

Merged
cpsievert merged 3 commits intomainfrom
fix/streaming-thinking-tags
May 6, 2026
Merged

Emit <thinking> tag boundaries during streaming#294
cpsievert merged 3 commits intomainfrom
fix/streaming-thinking-tags

Conversation

@cpsievert
Copy link
Copy Markdown
Collaborator

@cpsievert cpsievert commented May 6, 2026

Summary

  • When ContentThinking chunks are emitted during streaming, the streaming loop now emits <thinking>\n before the first thinking chunk and \n</thinking>\n\n when transitioning to non-thinking content (or at end of stream)
  • For content="text" consumers, tags are yielded as string chunks — concatenated output is well-formed
  • For content="all" consumers, behavior is unchanged — typed ContentThinking objects are yielded, no tag strings
  • ContentThinking now has a _complete PrivateAttr (default True). Streaming chunks are constructed via _as_chunk() with _complete=False, so __str__() returns bare text for fragments instead of wrapping each one in <thinking>...</thinking> tags
  • Removes the synthetic "\n\n" separator from the OpenAI provider's reasoning_summary_text.done event (now redundant)

Companion to tidyverse/ellmer#975.

Motivation

Currently, streaming thinking content has two issues:

  1. content="all" mode yields ContentThinking objects whose __str__() independently wraps each chunk in <thinking>...</thinking> — printing produces repeated tags around each fragment
  2. content="text" mode yields thinking as bare strings indistinguishable from response text

After this change, concatenating a content="text" stream produces:

<thinking>
reasoning content here...
</thinking>

Response text here...

And calling str() on individual ContentThinking chunks in content="all" mode returns the raw thinking text without tag wrapping.

Test plan

  • 10 unit tests covering sync/async, text/all modes, thinking-only streams, text-only streams, tag chunk boundaries, and str() on chunks
  • Full existing test suite passes (190 passed, 3 skipped — only bedrock fails due to live API requirement)
  • Pyright passes with 0 errors on changed files

The streaming loop now emits `<thinking>\n` before the first thinking
chunk and `\n</thinking>\n\n` on transition to non-thinking content
(or at end of stream), giving consumers well-formed output.

For `content="text"` mode, tags are yielded as string chunks so
concatenated output is properly delimited. For `content="all"` mode,
behavior is unchanged — typed ContentThinking objects are yielded.

Also removes the synthetic "\n\n" separator from the OpenAI provider's
reasoning_summary_text.done event since the thinking→text transition
now provides the visual break.

Companion to tidyverse/ellmer#975.
@cpsievert cpsievert force-pushed the fix/streaming-thinking-tags branch from c748d39 to d0badcb Compare May 6, 2026 22:15
@cpsievert cpsievert requested a review from Copilot May 6, 2026 22:20

This comment was marked as resolved.

Streaming chunks are fragments, not complete thoughts. Adding a
_complete PrivateAttr (default True) lets __str__() skip tag wrapping
for chunks emitted during streaming, preventing repeated
<thinking>...</thinking> around each fragment in content="all" mode.

Providers now use ContentThinking._as_chunk() for streaming fragments.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

@cpsievert cpsievert merged commit dbb45f0 into main May 6, 2026
12 checks passed
@cpsievert cpsievert deleted the fix/streaming-thinking-tags branch May 6, 2026 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants