Simplify thinking: remove server-side complexity and transport type#210
Simplify thinking: remove server-side complexity and transport type#210cpsievert wants to merge 5 commits intoui/thinkingfrom
Conversation
Separates format (content_type: markdown/html/text) from semantic role (block_type: thinking/content) in the transport protocol. Removes server-side thinking accumulator and tag reconstruction — the server now does a simple isinstance check per chunk and sends block_type: "thinking". Python: removes _current_stream_thinking, the finally-block tag reconstruction, and renames content_type_override to block_type. R: sends block_type instead of content_type = "thinking". JS: fast path now checks action.block_type instead of content_type.
The server no longer detects or signals thinking content. Everything flows as markdown strings with embedded <thinking> tags; the client's tag parser is the single code path for all providers. Python: removes _is_content_thinking, BlockType, and the thinking branch in _append_message_stream. Simplifies PendingMessage tuple and _send_append_message. R: removes shinychat_thinking class, contents_shinychat for ContentThinking now returns plain text, removes block_type from actions. JS: removes BlockType, block_type from transport types, removes the block_type fast path in the chunk handler. Tests rewritten to use <thinking> tags.
f0b742e to
ea40a65
Compare
Remove the `if (!yield_as_content)` guards so that `<thinking>` and `</thinking>` boundary strings are yielded to consumers in all stream modes, not just stream='text'. Needed for posit-dev/shinychat#210, which removes server-side thinking detection and relies on the client's tag parser seeing tag boundaries in the stream. Without this, tool-use apps (stream='content') with thinking-capable models would render thinking as regular text. Companion to posit-dev/chatlas#297.
* fix: yield thinking tag boundaries in content='all' mode Previously `<thinking>\n` and `\n</thinking>\n\n` boundary strings were only yielded to consumers when `content="text"`. Remove that guard so they are emitted in all modes, including `content="all"`. This is required by shinychat PR posit-dev/shinychat#210, which removes server-side thinking detection and relies on the client tag parser seeing `<thinking>` markers in the stream. Without this fix, tool-use apps (which require `content="all"`) with thinking-capable models render thinking content as regular assistant text. Both the sync and async `_submit_turns` paths are updated. Tests in `TestStreamThinkingAll` and the async `test_content_all_async` are updated to assert that tag boundary strings ARE present in the output. * docs: update CHANGELOG to reflect tag boundaries in all modes
The server no longer inspects ContentThinking objects directly — thinking detection is purely tag-based via the client's tag parser. Update docstrings in both Python and R packages to describe the unified mechanism, and fix the stale comment in _chat_normalize.py.
gadenbuie
left a comment
There was a problem hiding this comment.
The block_type change is a solid improvement.
I'm pretty strongly opposed to any kind of optimization that reduces data fidelity through string parsing. I included client side parsing of <thinking> tags because I've seen it from now years-old models, but at this point even local model services return reasoning as a differentiated data type.
For me "prefer objects over strings" is the new "prefer enums over booleans". I don't mind supporting strings as a fallback, but I'm not comfortable with it being the primary modality.
gadenbuie
left a comment
There was a problem hiding this comment.
Thanks for this, Carson. The two-commit separation makes it easy to evaluate each idea separately.
On Commit 1 (block_type field)
My initial reaction was positive — separating "format" from "semantic role" sounds right. But I can't find a case where the two fields vary independently. There's no scenario for block_type: "thinking" + content_type: "html", so my sense is that we can keep this as one field and expand its usage.
I tested this against a change I've been considering: sending tool requests/results as structured data rather than pre-rendered HTML. In that design, the natural endpoint is to use content_type: "tool_result", with any rendering variant (data card vs. custom HTML) as an internal field in the payload. The transport routes on content_type alone. So block_type doesn't add information that content_type doesn't already carry.
My take is that content_type: "thinking" isn't overloading — it's content_type doing what it's always done: telling the client which component handles this block.
On Commit 2 (pure pass-through server)
This one I feel more strongly about declining.
-
Coupling direction. Today shinychat owns the contract: "give me a
ContentThinkingand I route it." With a pass-through server, correctness depends on how chatlas/ellmer format their text. That this commit requires specific upstream PRs is a symptom — we'd be coupling to an implementation detail rather than a typed interface. -
Framework independence. On the Python side especially, we want shinychat to work with libraries beyond chatlas.
content_type: "thinking"is an explicit protocol any backend can implement. Requiring every library to emit well-formed<thinking>tags asks them to conform to a string convention that only shinychat cares about. -
The two paths aren't redundant. Type dispatch handles structured providers (Claude, OpenAI reasoning). The tag parser handles raw-tag models (DeepSeek, QwQ). They serve different input classes, and a bug in one doesn't affect users of the other.
What I do like
The motivation behind both commits, reducing server-side state and plumbing, is sound. The Python _current_stream_thinking accumulator and finally-block tag reconstruction is clunky. I'm very open to suggestions about how to simplify that area. I do think it points at us thinking about reasoning tokens differently — I'm personally seeing them as separate content from the assistant message, i.e. something closer to tool calls.
I appreciate the thorough writeup and test coverage. Let me know if you see it differently.
* Replace tag injection with ContentThinkingDelta class
Providers now emit ContentThinkingDelta (with a phase property) instead
of ContentThinking._as_chunk(). The central streaming loop annotates
phase ("start", "body", "end") and suppresses thinking in content="text"
mode. In content="all" mode, typed delta objects are yielded instead of
synthetic tag strings.
This aligns chatlas with the approach taken in tidyverse/ellmer#975 and
addresses feedback from posit-dev/shinychat#210 about preferring typed
objects over string conventions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add ContentThinking and ContentThinkingDelta to API reference
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Address review feedback on ContentThinkingDelta
- Give ContentThinkingDelta a distinct content_type ("thinking_delta")
so it won't round-trip as ContentThinking during (de)serialization.
- Construct a new ContentThinkingDelta instead of mutating phase
in-place, avoiding Pydantic validation bypass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Use TypeGuard instead of assert for ContentThinkingDelta narrowing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Move is_thinking_delta() near other helpers at bottom of file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update changelog
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Superseded by #211 |
Summary
Two-stage simplification of how thinking content flows through the transport protocol, enabled by chatlas/ellmer now emitting well-formed
<thinking>tag boundaries during streaming (posit-dev/chatlas#294, tidyverse/ellmer#975).If the second commit feels too aggressive, the first commit stands alone as a clean improvement.
Commit 1: Replace
content_type: "thinking"withblock_typefieldSeparates format (
content_type: markdown/html/text) from semantic role (block_type: thinking/content). This is a straightforward improvement —"thinking"was never a content format, it was a semantic signal that happened to be shoehorned into the format field._current_stream_thinking) and end-of-stream tag reconstructionisinstancecheck per chunk and sendsblock_type: "thinking"— no state, no accumulationblock_typeinstead ofcontent_typeCommit 2: Remove
block_typefrom transport entirelyMakes the server pure pass-through. The client's existing tag parser (
processThinkingTags) becomes the single code path for all providers — structured (Claude, OpenAI reasoning) and raw-tag (DeepSeek, QwQ) alike._is_content_thinking,BlockType, and the thinking branch in_append_message_streamcontents_shinychatforContentThinkingreturns plain text; removesshinychat_thinkingclassBlockTypeand theblock_typefast path in the chunk handlerThis second commit is potentially more controversial. It trades a guaranteed-correct fast path (type dispatch) for a single unified code path (tag parsing). The tag parser is already battle-tested (it handles DeepSeek/QwQ today and has comprehensive tests), and having one path eliminates "works with provider X but not Y" bugs. But there are tradeoffs:
<even when thinking isn't involved. In practice this is cheap (single indexOf check short-circuits), but it's not zero-cost like a type dispatch.Test plan