Skip to content

feat(client): add streaming robustness and extended thinking support#5

Merged
hakula139 merged 22 commits intomainfrom
feat/streaming-robustness
Apr 5, 2026
Merged

feat(client): add streaming robustness and extended thinking support#5
hakula139 merged 22 commits intomainfrom
feat/streaming-robustness

Conversation

@hakula139
Copy link
Copy Markdown
Owner

@hakula139 hakula139 commented Apr 4, 2026

Summary

Add proper support for extended thinking, redacted thinking, server tool use, and unknown content block types in the SSE streaming pipeline. Enable adaptive thinking by default.

  • Add typed Thinking, RedactedThinking, ServerToolUse variants to ContentBlockInfo, Delta, BlockAccumulator, and ContentBlock for proper streaming, accumulation, and API round-tripping
  • Add ThinkingDelta and SignatureDelta to Delta (signature overwrites, not appends)
  • Add ThinkingConfig (adaptive-only) in config module, wired into CreateMessageRequest
  • Enable adaptive thinking by default (ThinkingConfig::Adaptive)
  • Add strip_trailing_thinking — strips trailing thinking blocks from the last assistant message (the API rejects messages ending with thinking blocks); inserts placeholder for thinking-only responses to preserve user / assistant alternation
  • Keep #[serde(other)] Unknown catch-all on StreamEvent, ContentBlockInfo, and Delta for unrecognized future types
  • Extract init_accumulator, apply_delta, and parse_tool_json helpers from stream_response
  • Optional dimmed thinking display via OX_SHOW_THINKING env var — streams thinking deltas to stdout with ANSI dim styling, off by default

Design Decisions

  • Thinking enabled by default: Adaptive mode lets the model decide the budget. No user configuration needed — thinking just works.
  • Round-trip preservation: Thinking and redacted_thinking blocks are stored in ContentBlock and sent back in subsequent API requests, preserving conversation continuity.
  • Signature handling: SignatureDelta overwrites (not appends) — it's a full cryptographic value, not incremental text. Credential rotation stripping is deferred to the Keychain OAuth PR.
  • ThinkingConfig in config module: Avoids an inverted dependency where config.rs would import from client::anthropic. The type conceptually belongs with configuration.
  • Adaptive-only: ThinkingConfig::Enabled (fixed budget) was removed — no production or planned code path needs it. The Enabled variant can be trivially re-added when a fixed-budget mode is actually required (e.g., for older models on 3P providers).
  • Thinking-only placeholder: strip_trailing_thinking inserts a [No message content] placeholder when stripping removes all content, instead of deleting the message. Deletion would break user / assistant alternation, causing consecutive user messages that the API rejects. Matches Claude Code's filterTrailingThinkingFromLastAssistant behavior.
  • Thinking display opt-in: Off by default since the bare REPL lacks collapsible sections. When enabled (OX_SHOW_THINKING=1), each thinking delta is wrapped in ANSI dim codes so the accumulator stores clean text for API round-tripping.
  • Variant ordering: ContentBlock, ContentBlockInfo, and BlockAccumulator follow the same variant order — tool-use variants grouped together (ToolUse, ServerToolUse), then ToolResult, then thinking variants.

Changes

File Description
client/anthropic.rs Thinking, RedactedThinking, ServerToolUse on ContentBlockInfo; ThinkingDelta, SignatureDelta on Delta; Unknown catch-alls; thinking field on CreateMessageRequest; 9 new tests
config.rs ThinkingConfig enum (adaptive-only); env_bool helper; show_thinking field from OX_SHOW_THINKING env var; 1 new test
main.rs BlockAccumulator variants for all block types; init_accumulator, apply_delta, parse_tool_json helpers; strip_trailing_thinking call; dimmed thinking display gated on show_thinking; ContentBlockStop handling for thinking newline
message.rs ServerToolUse, Thinking, RedactedThinking on ContentBlock; variant ordering aligned with ContentBlockInfo; strip_trailing_thinking targeting last assistant message with placeholder insertion; 9 new tests
CLAUDE.md Code Review criteria (DRY, cross-file consistency, idiomatic Rust); test conciseness convention
docs/roadmap.md Restructure "Working Today" into subsections; update thinking description; add TOML config file to Next Phase
docs/research/extended-thinking.md Research notes on thinking blocks, signatures, round-tripping, content block type taxonomy, and normalization pipeline

Test plan

  • cargo fmt --all --check — clean
  • cargo build compiles cleanly
  • cargo clippy --all-targets -- -D warnings — zero warnings
  • cargo test — 162 tests pass (19 new)
  • cargo llvm-cov --ignore-filename-regex 'main\.rs' — 86% line coverage
  • Manual test: OX_SHOW_THINKING=1 ox — thinking text streams dimmed before response; adaptive mode skips thinking for trivial queries

Add #[serde(other)] catch-all Unknown variants to StreamEvent,
ContentBlockInfo, and Delta so that unrecognized types (e.g., thinking,
redacted_thinking, signature_delta) deserialize without crashing. Add a
Skipped variant to BlockAccumulator that absorbs deltas silently and
produces no ContentBlock, keeping the agent loop stable when the API
introduces new block types.
@hakula139 hakula139 added the bug Something isn't working label Apr 4, 2026
@hakula139 hakula139 self-assigned this Apr 4, 2026
@hakula139 hakula139 force-pushed the feat/streaming-robustness branch from a962497 to 3417e2f Compare April 4, 2026 16:32
Document how Claude Code handles thinking, redacted_thinking,
server_tool_use, and signature blocks. Covers streaming lifecycle,
round-tripping requirements, credential rotation constraints, and
implementation implications for oxide-code.
@hakula139 hakula139 force-pushed the feat/streaming-robustness branch from cf08e20 to da8c365 Compare April 4, 2026 16:37
…_use support

Replace the Unknown catch-all with proper typed variants for thinking,
redacted_thinking, and server_tool_use content blocks. Add ThinkingDelta
and SignatureDelta to the Delta enum. Add block accumulators that
preserve thinking text and signatures for API round-tripping.

Enable adaptive thinking by default in Config. Add
strip_trailing_thinking to remove thinking blocks from the end of
assistant messages before sending (API constraint). Extract
init_accumulator and apply_delta helpers from stream_response to keep it
under the line limit.

Add ThinkingConfig (adaptive / enabled) to CreateMessageRequest, driven
by Config.thinking. The Unknown catch-all remains for truly unrecognized
future types.
@hakula139 hakula139 changed the title fix(client): handle unknown SSE content block and delta types gracefully feat(client): add streaming robustness and extended thinking support Apr 4, 2026
@hakula139 hakula139 added enhancement New feature or request and removed bug Something isn't working labels Apr 4, 2026
…e status

Replace the stale "Implementation Implications" planning section with a
brief inline status note, matching the factual reference style of
anthropic-api.md.
@hakula139 hakula139 force-pushed the feat/streaming-robustness branch from 04eb661 to 62d1dba Compare April 4, 2026 16:57
hakula139 added 10 commits April 5, 2026 01:19
ThinkingConfig conceptually belongs with configuration, not the HTTP
client. Moving it to config.rs fixes the inverted dependency where
config imported from client::anthropic.

Also fixes the #[expect(dead_code)] reason string to describe current
state per convention, and adds a comment explaining the hardcoded
adaptive thinking default.
- Add Debug derive to BlockAccumulator for consistency with parallel
  enums (ContentBlockInfo, ContentBlock) and diagnostic traceability.
- Log unhandled block/delta combinations at debug level instead of
  silently dropping them, aiding protocol issue diagnosis.
- Guard trailing newline emission against empty text blocks to prevent
  spurious output.
- Assert the surviving block type in removes_redacted_at_end (was only
  checking length, which would pass even if the wrong block survived).
- Add test for multiple consecutive trailing thinking blocks to exercise
  the while loop.
- Add test for all-thinking assistant message to document the empty
  content vec edge case.
- Fix roadmap streaming robustness bullet to reflect that thinking,
  redacted_thinking, and server_tool_use are now fully handled, not
  just silently skipped.
- Add DRY, cross-file consistency, and idiomatic Rust to the Code
  Review checklist in CLAUDE.md.
Drop removes_redacted_at_end — it was subsumed by
removes_multiple_consecutive, which already exercises both Thinking
and RedactedThinking removal through the while loop. Strengthen
preserves_non_trailing to assert block identity and order, not just
count.

Add test conciseness convention to CLAUDE.md: prefer fewer thorough
tests over many minimal ones; drop tests subsumed by more
comprehensive ones.
strip_trailing_thinking can leave an assistant message with empty
content if the response contained only thinking blocks. The API
rejects empty content arrays, so filter these out before sending.

Also include the block type in the delta mismatch debug trace for
better diagnostic context.
Move ContentBlock::ServerToolUse tests before ContentBlock::Thinking
to mirror the enum definition order (ToolResult, ServerToolUse,
Thinking, RedactedThinking).
The docstring still listed only Text and ToolUse for assistant messages,
missing ServerToolUse, Thinking, and RedactedThinking added in this PR.
When OX_SHOW_THINKING=1, stream thinking deltas to stdout with ANSI dim
styling (\x1b[2m). Off by default — thinking blocks are accumulated
silently for API round-tripping as before.

- Add `show_thinking` field to Config, loaded from OX_SHOW_THINKING env var
- Thread the flag through repl → agent_turn → stream_response → helpers
- Write dim text in init_accumulator (initial thinking) and apply_delta
  (thinking deltas)
- Handle ContentBlockStop for thinking blocks to emit a trailing newline
  separating thinking from text output
- Update extended thinking bullet to mention OX_SHOW_THINKING.
- Add Configuration File section under Next Phase: TOML config with
  layered loading (global → user → project → env var overrides).
@hakula139 hakula139 force-pushed the feat/streaming-robustness branch from e8d1577 to 2ab3fd0 Compare April 4, 2026 18:43
…iling_thinking

- Move ServerToolUse before ToolResult to align variant order with
  ContentBlockInfo and BlockAccumulator (tool-use variants grouped).
- Narrow strip_trailing_thinking to target only the last assistant
  message via rfind — earlier messages were already processed.
- Clarify comment on empty-message removal after thinking stripping.
- Reorder test sections to mirror new variant order.
- Add strip_trailing_thinking_targets_only_last_assistant test.
Use consistent phrasing ("Silently skipped during stream processing")
across all three #[serde(other)] Unknown variants: StreamEvent,
ContentBlockInfo, and Delta.
Extract the truthiness check (`"1"` / `"true"`) into a reusable
`env_bool` function, pairing with `non_empty_env` for string-valued
env vars. Simplifies the `show_thinking` assignment and provides a
consistent pattern for future `OX_*` boolean flags.
Only Adaptive is used — no production or planned code path constructs
Enabled. Adding it back is trivial when a fixed-budget thinking mode
is actually needed.
Replace write!(stdout, "{text}") with stdout.write_all(text.as_bytes())
where no format interpolation is needed.
…of deleting

Deleting an empty-after-stripping assistant message breaks user/assistant
alternation, causing consecutive user messages that the API rejects.
Insert a "[No message content]" placeholder instead, matching Claude Code's
filterTrailingThinkingFromLastAssistant behavior.

Also update research notes with the full normalization pipeline and
ordering constraints discovered in Claude Code's source.
Align with the editorial bracket convention and the existing
[N chars] marker in truncate_line.
@hakula139 hakula139 merged commit 432e594 into main Apr 5, 2026
1 check passed
@hakula139 hakula139 deleted the feat/streaming-robustness branch April 5, 2026 11:54
hakula139 added a commit that referenced this pull request May 5, 2026
PR #64 (modal infrastructure) shipped Option C: bare /model opens the combined
picker, bare /effort errors with a usage hint pointing at /model. The user
guide, design notes, and roadmap still described the older "both bare forms
open the picker with different initial focus" shape. Updated:

- docs/guide/slash-commands.md — table description, mid-turn classification
  paragraph, and the "Switching the Effort" / "Switching the Model" sections.
- docs/design/slash/commands.md — design decision #5, /effort and /model
  per-command notes, source list (`agent_loop_task` → `agent_turn`).
- docs/design/slash/modals.md — design decisions #4 (`SessionInfo` →
  `LiveSessionInfo`) and #7 (typed-arg-only contract).
- docs/roadmap.md — moved the combined picker out of "Current Focus" (shipped
  in PR #64) into Working Today; replaced with the deferred /effort slider.
- CLAUDE.md — `slash/effort.rs` description updated to match the typed-arg
  contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant