feat(client): add streaming robustness and extended thinking support#5
Merged
feat(client): add streaming robustness and extended thinking support#5
Conversation
Add #[serde(other)] catch-all Unknown variants to StreamEvent, ContentBlockInfo, and Delta so that unrecognized types (e.g., thinking, redacted_thinking, signature_delta) deserialize without crashing. Add a Skipped variant to BlockAccumulator that absorbs deltas silently and produces no ContentBlock, keeping the agent loop stable when the API introduces new block types.
a962497 to
3417e2f
Compare
Document how Claude Code handles thinking, redacted_thinking, server_tool_use, and signature blocks. Covers streaming lifecycle, round-tripping requirements, credential rotation constraints, and implementation implications for oxide-code.
cf08e20 to
da8c365
Compare
…_use support Replace the Unknown catch-all with proper typed variants for thinking, redacted_thinking, and server_tool_use content blocks. Add ThinkingDelta and SignatureDelta to the Delta enum. Add block accumulators that preserve thinking text and signatures for API round-tripping. Enable adaptive thinking by default in Config. Add strip_trailing_thinking to remove thinking blocks from the end of assistant messages before sending (API constraint). Extract init_accumulator and apply_delta helpers from stream_response to keep it under the line limit. Add ThinkingConfig (adaptive / enabled) to CreateMessageRequest, driven by Config.thinking. The Unknown catch-all remains for truly unrecognized future types.
…e status Replace the stale "Implementation Implications" planning section with a brief inline status note, matching the factual reference style of anthropic-api.md.
04eb661 to
62d1dba
Compare
ThinkingConfig conceptually belongs with configuration, not the HTTP client. Moving it to config.rs fixes the inverted dependency where config imported from client::anthropic. Also fixes the #[expect(dead_code)] reason string to describe current state per convention, and adds a comment explaining the hardcoded adaptive thinking default.
- Add Debug derive to BlockAccumulator for consistency with parallel enums (ContentBlockInfo, ContentBlock) and diagnostic traceability. - Log unhandled block/delta combinations at debug level instead of silently dropping them, aiding protocol issue diagnosis. - Guard trailing newline emission against empty text blocks to prevent spurious output.
- Assert the surviving block type in removes_redacted_at_end (was only checking length, which would pass even if the wrong block survived). - Add test for multiple consecutive trailing thinking blocks to exercise the while loop. - Add test for all-thinking assistant message to document the empty content vec edge case.
- Fix roadmap streaming robustness bullet to reflect that thinking, redacted_thinking, and server_tool_use are now fully handled, not just silently skipped. - Add DRY, cross-file consistency, and idiomatic Rust to the Code Review checklist in CLAUDE.md.
Drop removes_redacted_at_end — it was subsumed by removes_multiple_consecutive, which already exercises both Thinking and RedactedThinking removal through the while loop. Strengthen preserves_non_trailing to assert block identity and order, not just count. Add test conciseness convention to CLAUDE.md: prefer fewer thorough tests over many minimal ones; drop tests subsumed by more comprehensive ones.
strip_trailing_thinking can leave an assistant message with empty content if the response contained only thinking blocks. The API rejects empty content arrays, so filter these out before sending. Also include the block type in the delta mismatch debug trace for better diagnostic context.
Move ContentBlock::ServerToolUse tests before ContentBlock::Thinking to mirror the enum definition order (ToolResult, ServerToolUse, Thinking, RedactedThinking).
The docstring still listed only Text and ToolUse for assistant messages, missing ServerToolUse, Thinking, and RedactedThinking added in this PR.
When OX_SHOW_THINKING=1, stream thinking deltas to stdout with ANSI dim styling (\x1b[2m). Off by default — thinking blocks are accumulated silently for API round-tripping as before. - Add `show_thinking` field to Config, loaded from OX_SHOW_THINKING env var - Thread the flag through repl → agent_turn → stream_response → helpers - Write dim text in init_accumulator (initial thinking) and apply_delta (thinking deltas) - Handle ContentBlockStop for thinking blocks to emit a trailing newline separating thinking from text output
- Update extended thinking bullet to mention OX_SHOW_THINKING. - Add Configuration File section under Next Phase: TOML config with layered loading (global → user → project → env var overrides).
e8d1577 to
2ab3fd0
Compare
…iling_thinking - Move ServerToolUse before ToolResult to align variant order with ContentBlockInfo and BlockAccumulator (tool-use variants grouped). - Narrow strip_trailing_thinking to target only the last assistant message via rfind — earlier messages were already processed. - Clarify comment on empty-message removal after thinking stripping. - Reorder test sections to mirror new variant order. - Add strip_trailing_thinking_targets_only_last_assistant test.
Use consistent phrasing ("Silently skipped during stream processing")
across all three #[serde(other)] Unknown variants: StreamEvent,
ContentBlockInfo, and Delta.
Extract the truthiness check (`"1"` / `"true"`) into a reusable `env_bool` function, pairing with `non_empty_env` for string-valued env vars. Simplifies the `show_thinking` assignment and provides a consistent pattern for future `OX_*` boolean flags.
Only Adaptive is used — no production or planned code path constructs Enabled. Adding it back is trivial when a fixed-budget thinking mode is actually needed.
Replace write!(stdout, "{text}") with stdout.write_all(text.as_bytes())
where no format interpolation is needed.
…of deleting Deleting an empty-after-stripping assistant message breaks user/assistant alternation, causing consecutive user messages that the API rejects. Insert a "[No message content]" placeholder instead, matching Claude Code's filterTrailingThinkingFromLastAssistant behavior. Also update research notes with the full normalization pipeline and ordering constraints discovered in Claude Code's source.
Align with the editorial bracket convention and the existing [N chars] marker in truncate_line.
hakula139
added a commit
that referenced
this pull request
May 5, 2026
PR #64 (modal infrastructure) shipped Option C: bare /model opens the combined picker, bare /effort errors with a usage hint pointing at /model. The user guide, design notes, and roadmap still described the older "both bare forms open the picker with different initial focus" shape. Updated: - docs/guide/slash-commands.md — table description, mid-turn classification paragraph, and the "Switching the Effort" / "Switching the Model" sections. - docs/design/slash/commands.md — design decision #5, /effort and /model per-command notes, source list (`agent_loop_task` → `agent_turn`). - docs/design/slash/modals.md — design decisions #4 (`SessionInfo` → `LiveSessionInfo`) and #7 (typed-arg-only contract). - docs/roadmap.md — moved the combined picker out of "Current Focus" (shipped in PR #64) into Working Today; replaced with the deferred /effort slider. - CLAUDE.md — `slash/effort.rs` description updated to match the typed-arg contract.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add proper support for extended thinking, redacted thinking, server tool use, and unknown content block types in the SSE streaming pipeline. Enable adaptive thinking by default.
Thinking,RedactedThinking,ServerToolUsevariants toContentBlockInfo,Delta,BlockAccumulator, andContentBlockfor proper streaming, accumulation, and API round-trippingThinkingDeltaandSignatureDeltatoDelta(signature overwrites, not appends)ThinkingConfig(adaptive-only) in config module, wired intoCreateMessageRequestThinkingConfig::Adaptive)strip_trailing_thinking— strips trailing thinking blocks from the last assistant message (the API rejects messages ending with thinking blocks); inserts placeholder for thinking-only responses to preserve user / assistant alternation#[serde(other)] Unknowncatch-all onStreamEvent,ContentBlockInfo, andDeltafor unrecognized future typesinit_accumulator,apply_delta, andparse_tool_jsonhelpers fromstream_responseOX_SHOW_THINKINGenv var — streams thinking deltas to stdout with ANSI dim styling, off by defaultDesign Decisions
ContentBlockand sent back in subsequent API requests, preserving conversation continuity.SignatureDeltaoverwrites (not appends) — it's a full cryptographic value, not incremental text. Credential rotation stripping is deferred to the Keychain OAuth PR.ThinkingConfigin config module: Avoids an inverted dependency whereconfig.rswould import fromclient::anthropic. The type conceptually belongs with configuration.ThinkingConfig::Enabled(fixed budget) was removed — no production or planned code path needs it. TheEnabledvariant can be trivially re-added when a fixed-budget mode is actually required (e.g., for older models on 3P providers).strip_trailing_thinkinginserts a[No message content]placeholder when stripping removes all content, instead of deleting the message. Deletion would break user / assistant alternation, causing consecutive user messages that the API rejects. Matches Claude Code'sfilterTrailingThinkingFromLastAssistantbehavior.OX_SHOW_THINKING=1), each thinking delta is wrapped in ANSI dim codes so the accumulator stores clean text for API round-tripping.ContentBlock,ContentBlockInfo, andBlockAccumulatorfollow the same variant order — tool-use variants grouped together (ToolUse,ServerToolUse), thenToolResult, then thinking variants.Changes
client/anthropic.rsThinking,RedactedThinking,ServerToolUseonContentBlockInfo;ThinkingDelta,SignatureDeltaonDelta;Unknowncatch-alls;thinkingfield onCreateMessageRequest; 9 new testsconfig.rsThinkingConfigenum (adaptive-only);env_boolhelper;show_thinkingfield fromOX_SHOW_THINKINGenv var; 1 new testmain.rsBlockAccumulatorvariants for all block types;init_accumulator,apply_delta,parse_tool_jsonhelpers;strip_trailing_thinkingcall; dimmed thinking display gated onshow_thinking;ContentBlockStophandling for thinking newlinemessage.rsServerToolUse,Thinking,RedactedThinkingonContentBlock; variant ordering aligned withContentBlockInfo;strip_trailing_thinkingtargeting last assistant message with placeholder insertion; 9 new testsCLAUDE.mddocs/roadmap.mddocs/research/extended-thinking.mdTest plan
cargo fmt --all --check— cleancargo buildcompiles cleanlycargo clippy --all-targets -- -D warnings— zero warningscargo test— 162 tests pass (19 new)cargo llvm-cov --ignore-filename-regex 'main\.rs'— 86% line coverageOX_SHOW_THINKING=1 ox— thinking text streams dimmed before response; adaptive mode skips thinking for trivial queries