Skip to content

Simplify thinking: remove server-side complexity and transport type#210

Closed
cpsievert wants to merge 5 commits intoui/thinkingfrom
ui/thinking-simplify
Closed

Simplify thinking: remove server-side complexity and transport type#210
cpsievert wants to merge 5 commits intoui/thinkingfrom
ui/thinking-simplify

Conversation

@cpsievert
Copy link
Copy Markdown
Collaborator

@cpsievert cpsievert commented May 6, 2026

Summary

Two-stage simplification of how thinking content flows through the transport protocol, enabled by chatlas/ellmer now emitting well-formed <thinking> tag boundaries during streaming (posit-dev/chatlas#294, tidyverse/ellmer#975).

If the second commit feels too aggressive, the first commit stands alone as a clean improvement.

Commit 1: Replace content_type: "thinking" with block_type field

Separates format (content_type: markdown/html/text) from semantic role (block_type: thinking/content). This is a straightforward improvement — "thinking" was never a content format, it was a semantic signal that happened to be shoehorned into the format field.

  • Removes the Python server-side thinking accumulator (_current_stream_thinking) and end-of-stream tag reconstruction
  • Server now does a simple isinstance check per chunk and sends block_type: "thinking" — no state, no accumulation
  • Client fast path unchanged in behavior, just keyed off block_type instead of content_type

Commit 2: Remove block_type from transport entirely

Makes the server pure pass-through. The client's existing tag parser (processThinkingTags) becomes the single code path for all providers — structured (Claude, OpenAI reasoning) and raw-tag (DeepSeek, QwQ) alike.

  • Python: removes _is_content_thinking, BlockType, and the thinking branch in _append_message_stream
  • R: contents_shinychat for ContentThinking returns plain text; removes shinychat_thinking class
  • JS: removes BlockType and the block_type fast path in the chunk handler

This second commit is potentially more controversial. It trades a guaranteed-correct fast path (type dispatch) for a single unified code path (tag parsing). The tag parser is already battle-tested (it handles DeepSeek/QwQ today and has comprehensive tests), and having one path eliminates "works with provider X but not Y" bugs. But there are tradeoffs:

  • Performance: the tag parser runs on every chunk, doing string scans for < even when thinking isn't involved. In practice this is cheap (single indexOf check short-circuits), but it's not zero-cost like a type dispatch.
  • Correctness: we now depend on chatlas/ellmer emitting well-formed tags. If a bug upstream produces malformed tags, the client will misparse. With the fast path, the server's type dispatch was immune to tag formatting issues.

Test plan

  • All 283 JS tests pass (vitest)
  • All 63 Python tests pass (pytest + playwright)
  • All 126 R tests pass (testthat)
  • TypeScript compiles clean (tsc --noEmit)
  • Pyright passes with 0 errors
  • Manual test with a thinking-capable model (Claude extended thinking or DeepSeek) to verify thinking UI still renders correctly

@cpsievert cpsievert marked this pull request as draft May 6, 2026 23:06
cpsievert added 2 commits May 6, 2026 18:27
Separates format (content_type: markdown/html/text) from semantic role
(block_type: thinking/content) in the transport protocol. Removes
server-side thinking accumulator and tag reconstruction — the server now
does a simple isinstance check per chunk and sends block_type: "thinking".

Python: removes _current_stream_thinking, the finally-block tag
reconstruction, and renames content_type_override to block_type.
R: sends block_type instead of content_type = "thinking".
JS: fast path now checks action.block_type instead of content_type.
The server no longer detects or signals thinking content. Everything
flows as markdown strings with embedded <thinking> tags; the client's
tag parser is the single code path for all providers.

Python: removes _is_content_thinking, BlockType, and the thinking
branch in _append_message_stream. Simplifies PendingMessage tuple and
_send_append_message.
R: removes shinychat_thinking class, contents_shinychat for
ContentThinking now returns plain text, removes block_type from actions.
JS: removes BlockType, block_type from transport types, removes the
block_type fast path in the chunk handler. Tests rewritten to use
<thinking> tags.
@cpsievert cpsievert force-pushed the ui/thinking-simplify branch from f0b742e to ea40a65 Compare May 6, 2026 23:33
cpsievert added a commit to cpsievert/ellmer that referenced this pull request May 7, 2026
Remove the `if (!yield_as_content)` guards so that `<thinking>` and
`</thinking>` boundary strings are yielded to consumers in all stream
modes, not just stream='text'.

Needed for posit-dev/shinychat#210, which removes server-side thinking
detection and relies on the client's tag parser seeing tag boundaries
in the stream. Without this, tool-use apps (stream='content') with
thinking-capable models would render thinking as regular text.

Companion to posit-dev/chatlas#297.
cpsievert added a commit to posit-dev/chatlas that referenced this pull request May 7, 2026
* fix: yield thinking tag boundaries in content='all' mode

Previously `<thinking>\n` and `\n</thinking>\n\n` boundary strings were
only yielded to consumers when `content="text"`. Remove that guard so
they are emitted in all modes, including `content="all"`.

This is required by shinychat PR posit-dev/shinychat#210, which removes
server-side thinking detection and relies on the client tag parser seeing
`<thinking>` markers in the stream. Without this fix, tool-use apps
(which require `content="all"`) with thinking-capable models render
thinking content as regular assistant text.

Both the sync and async `_submit_turns` paths are updated. Tests in
`TestStreamThinkingAll` and the async `test_content_all_async` are
updated to assert that tag boundary strings ARE present in the output.

* docs: update CHANGELOG to reflect tag boundaries in all modes
@cpsievert cpsievert requested a review from gadenbuie May 7, 2026 00:28
@cpsievert cpsievert marked this pull request as ready for review May 7, 2026 00:28
cpsievert and others added 2 commits May 6, 2026 19:28
The server no longer inspects ContentThinking objects directly —
thinking detection is purely tag-based via the client's tag parser.
Update docstrings in both Python and R packages to describe the
unified mechanism, and fix the stale comment in _chat_normalize.py.
Copy link
Copy Markdown
Collaborator

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block_type change is a solid improvement.

I'm pretty strongly opposed to any kind of optimization that reduces data fidelity through string parsing. I included client side parsing of <thinking> tags because I've seen it from now years-old models, but at this point even local model services return reasoning as a differentiated data type.

For me "prefer objects over strings" is the new "prefer enums over booleans". I don't mind supporting strings as a fallback, but I'm not comfortable with it being the primary modality.

Copy link
Copy Markdown
Collaborator

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, Carson. The two-commit separation makes it easy to evaluate each idea separately.

On Commit 1 (block_type field)

My initial reaction was positive — separating "format" from "semantic role" sounds right. But I can't find a case where the two fields vary independently. There's no scenario for block_type: "thinking" + content_type: "html", so my sense is that we can keep this as one field and expand its usage.

I tested this against a change I've been considering: sending tool requests/results as structured data rather than pre-rendered HTML. In that design, the natural endpoint is to use content_type: "tool_result", with any rendering variant (data card vs. custom HTML) as an internal field in the payload. The transport routes on content_type alone. So block_type doesn't add information that content_type doesn't already carry.

My take is that content_type: "thinking" isn't overloading — it's content_type doing what it's always done: telling the client which component handles this block.

On Commit 2 (pure pass-through server)

This one I feel more strongly about declining.

  1. Coupling direction. Today shinychat owns the contract: "give me a ContentThinking and I route it." With a pass-through server, correctness depends on how chatlas/ellmer format their text. That this commit requires specific upstream PRs is a symptom — we'd be coupling to an implementation detail rather than a typed interface.

  2. Framework independence. On the Python side especially, we want shinychat to work with libraries beyond chatlas. content_type: "thinking" is an explicit protocol any backend can implement. Requiring every library to emit well-formed <thinking> tags asks them to conform to a string convention that only shinychat cares about.

  3. The two paths aren't redundant. Type dispatch handles structured providers (Claude, OpenAI reasoning). The tag parser handles raw-tag models (DeepSeek, QwQ). They serve different input classes, and a bug in one doesn't affect users of the other.

What I do like

The motivation behind both commits, reducing server-side state and plumbing, is sound. The Python _current_stream_thinking accumulator and finally-block tag reconstruction is clunky. I'm very open to suggestions about how to simplify that area. I do think it points at us thinking about reasoning tokens differently — I'm personally seeing them as separate content from the assistant message, i.e. something closer to tool calls.

I appreciate the thorough writeup and test coverage. Let me know if you see it differently.

cpsievert added a commit to posit-dev/chatlas that referenced this pull request May 7, 2026
* Replace tag injection with ContentThinkingDelta class

Providers now emit ContentThinkingDelta (with a phase property) instead
of ContentThinking._as_chunk(). The central streaming loop annotates
phase ("start", "body", "end") and suppresses thinking in content="text"
mode. In content="all" mode, typed delta objects are yielded instead of
synthetic tag strings.

This aligns chatlas with the approach taken in tidyverse/ellmer#975 and
addresses feedback from posit-dev/shinychat#210 about preferring typed
objects over string conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add ContentThinking and ContentThinkingDelta to API reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address review feedback on ContentThinkingDelta

- Give ContentThinkingDelta a distinct content_type ("thinking_delta")
  so it won't round-trip as ContentThinking during (de)serialization.
- Construct a new ContentThinkingDelta instead of mutating phase
  in-place, avoiding Pydantic validation bypass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Use TypeGuard instead of assert for ContentThinkingDelta narrowing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move is_thinking_delta() near other helpers at bottom of file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update changelog

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpsievert
Copy link
Copy Markdown
Collaborator Author

cpsievert commented May 7, 2026

Superseded by #211

@cpsievert cpsievert closed this May 7, 2026
@cpsievert cpsievert deleted the ui/thinking-simplify branch May 7, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants