refactor(tool): centralize MAX_OUTPUT_BYTES in ToolRegistry::run#50
Merged
refactor(tool): centralize MAX_OUTPUT_BYTES in ToolRegistry::run#50
Conversation
…urveys Two parallel-track research notes captured during the Plan A / Plan B exploration. Each surveys how Claude Code, OpenAI Codex, and opencode solve a problem the oxide-code roadmap calls out, ending with the design decisions that shape the planned implementation. - tool-truncation.md — per-tool vs centralized output caps, head/tail vs spillover, and why the dispatcher should own the byte safety net while per-tool view-shape budgets stay where they are. - file-tracking.md — Read-before-Edit gates, mtime / content staleness checks, and the persist-on-finish + verify-on-resume approach (vs claude-code's message-history rehydration). Registers both in the research index. The new docs introduce two GrowthBook flag / token-shorthand terms (`tengu`, `toks`) that needed to land in the cspell wordlist for `pnpm spellcheck` to stay clean.
Existing docs mixed `claude-code` (the repo path) with `Claude Code` (the product), and used `codex-rs` (the workspace directory) where the prose meant `OpenAI Codex` (the product). The new surveys landed with the product-name convention, so this sweep brings the older docs in line so cross-doc reading is consistent. Repo paths in code references stay as-is — `claude-code/src/...`, `codex-rs/core/...`, and `opencode/packages/...` are filesystem paths, not subjects. Only prose subjects, hyperlink labels, and section headings change. - anthropic-api.md — 12 prose refs + opening link label. - extended-thinking.md — opening link label. - session-persistence.md — opening hyperlinks all three reference projects; subsection heading + 1 prose ref. - system-prompt.md — opening link label. - tui.md — opening hyperlinks; `### claude-code` and `### codex-rs` headings; 4 prose refs.
…ry::run Per-tool truncation was hand-rolled and incomplete: bash and read enforced the 128 KB byte cap, glob / grep / edit / write did not. A pathological grep response (250 rows × 500 chars before per-line overhead is 125 KB; long paths can push glob past 128 KB too) could flood the model context past the intended ceiling, and a future MCP tool plug-in would have to re-implement the cap or trust the model not to drown in output. Lift the byte safety net into the dispatcher. New `cap_output` runs the head-tail strategy (lifted from bash) so the model still sees both setup context and final outcome; new `ToolRegistry::run` wraps `Tool::run` with the cap and stamps `metadata.truncated_total` with the pre-cap byte count when the cap fires — renderers consume the structural field instead of parsing footer prose. Per-tool view-shape budgets (row caps, line-length caps) stay where they are; the two layers separate "tool-semantic budget" from "absolute byte ceiling". The agent-loop dispatcher swap is part of this commit so unknown-tool fallback also flows through the registry; bash and read still self-cap until the next commit removes their now-redundant logic.
…y cap Now that the dispatcher owns the byte safety net, bash and read no longer need their own implementations. - bash: drop `truncate_output` + `TRUNCATION_OVERHEAD` and the three boundary tests; the registry runs the same head-tail strategy with a structural footer (`... [N bytes truncated; head + tail kept] ...`) and stamps `metadata.truncated_total`. - read: drop the mid-loop byte-budget check and `truncated_by_bytes` flag. The row cap (`DEFAULT_LINE_LIMIT`) and per-line cap (`truncate_line`) keep the output bounded for view-shape; the registry handles the absolute byte ceiling. The `(Showing lines N–M of TOTAL total)` footer stays — view-shape signal, separate from the byte cap. - tool.rs: add an end-to-end ReadTool dispatcher test (replaces read.rs's deleted byte-budget test) so a regression that bypasses the wrapper trips on real tool output, not just StubTool. - docs/roadmap.md: drop the now-shipped Current Focus bullet. Behavior change: bash's footer wording and unit (`N lines truncated` → `N bytes truncated; head + tail kept`) — bytes is the lingua franca for the registry-level cap since it applies uniformly across line-oriented and non-line-oriented tools.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
PR review caught that `truncated_total` had picked up two units: glob writes a match count there, and the registry's byte safety net was also writing the pre-cap byte count into the same field. With ~1.3 KB average path budget the byte cap is reachable in deep monorepos, and a collision would silently render bytes as glob's "X of N matches" count. Splitting the field encodes the units at the type level. Glob renderers keep reading `truncated_total`; the registry now stamps a new `truncated_bytes`. Other review responses folded in: - `// ── Output Cap ──` section co-locates the constants and `cap_output` next to the dispatcher with a preamble naming the two cap layers (per-tool view-shape vs central byte budget). Trait `Tool::run` picks up a one-liner pointing at the wrapper. - `cap_output` drops to private; single caller. - Dispatcher tests now drive `ReadTool` over a temp file (drops `StubTool`), which closes the codecov gap on the fixture's unused trait stubs. New fallthrough test pins `result_view` dropping to `Text` when the byte cap replaces read's footer. - `split_read_footer` picks up a doc comment explaining the intentional asymmetry with glob's metadata-driven approach. - Subtraction-safety invariant comment in `cap_output`; redundant assertion in `cap_output_keeps_head_and_tail` dropped; new field tipped `ToolOutput` past clippy's 128-byte threshold so `parse_input` picks up `#[expect(clippy::result_large_err)]`.
Crate-wide convention is single-line `reason = "..."` strings — 10 of 11 sites already do this even at 150+ chars. `model.rs` was the lone holdout using `\` continuations. Aligning for consistency; behavior unchanged.
CLAUDE.md mandates `...` everywhere; the new fallthrough test slipped in a U+2026.
Decision #4 in the design section claimed `truncated_total` would become the single structural signal; the PR ended up splitting into `truncated_total` (view-shape) + `truncated_bytes` (byte cap) after review caught a unit-conflation hazard. Notes now describe the split and the rationale. Source-line list also updated: the bash and read self-cap references are gone with the code; remaining entries point at the constants and helpers without brittle line ranges. Test-name references in decision #2 follow the rename from `truncate_output_*` to `cap_output_*`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per-tool truncation was hand-rolled and incomplete:
bashandreadenforced the 128 KB byte cap;glob/grep/edit/writedid not. A pathological grep response (250 rows × 500 chars before per-line overhead is 125 KB; long paths can push glob past 128 KB too) could flood the model context past the intended ceiling, and a future MCP-tool plug-in would have to re-implement the cap or trust the model not to drown.This PR lifts the byte safety net into the dispatcher. New
cap_outputruns head-tail truncation (lifted from bash) so the model still sees both setup context and final outcome; newToolRegistry::runwrapsTool::runwith the cap. Per-tool view-shape budgets (row caps, line-length caps) stay where they are; the two layers separate "tool-semantic budget" from "absolute byte ceiling".The two leading commits are research surveys (cherry-picked from
docs/tool-research) framing the design —tool-truncation.mdcovers this PR;file-tracking.mdscopes a separate companion workstream (Read-before-Edit gates, staleness detection) so the implementation here stays a focused refactor.Design decisions
truncated_totalcarries view-shape match counts (set by per-tool caps like glob'sMAX_RESULTS);truncated_bytescarries pre-cap byte counts (set only by the registry). Splitting keeps glob's(X of N matches)from accidentally rendering bytes when both layers fire — plausible in deep monorepos with long paths.MAX_OUTPUT_BYTES = 128 KBunchanged. Don't tune in the same PR that restructures responsibility.Changes
tool.rs// ── Output Cap ──section co-locatesMAX_OUTPUT_BYTES,TRUNCATION_OVERHEAD, andcap_outputnext to the dispatcher with a two-layer preamble.ToolRegistry::runwrapsTool::run, applies the cap, and stampsmetadata.truncated_bytes(separate from view-shape'struncated_total). TraitTool::rundoc tells implementations not to pre-truncate by bytes. Tests covercap_outputboundaries (short / head-tail / multibyte at split / barely-over) and the dispatcher path (overflow, within-cap, unknown-tool, byte-cap-on-read fallthrough toTextview).parse_inputpicks up#[expect(clippy::result_large_err)]since the new field tippedToolOutputpast clippy's threshold.agent.rsmatch tools.get(...) { Some(t) => t.run(...).await, None => Unknown-tool error }fortools.run(&name, input).await. Unknown-tool fallback now lives in the registry.tool/bash.rstruncate_outputand the boundary tests; the dispatcher'scap_outputis the new home.tool/read.rsDEFAULT_LINE_LIMIT) and per-line cap (truncate_line) keep view-shape bounded;(Showing lines N–M of TOTAL total)footer stays as a view-shape signal.split_read_footerdoc explains the intentional asymmetry with glob's metadata-driven approach.model.rs#[expect]reason onCapabilitiesto a single line — matches the convention used by the other 10#[expect]sites in the crate.docs/research/tool-truncation.md(new)docs/research/file-tracking.md(new)docs/research/README.mddocs/research/anthropic-api.md,docs/research/extended-thinking.md,docs/research/session-persistence.md,docs/research/system-prompt.md,docs/research/tui.mdclaude-coderepo path →Claude Codeproduct;codex-rsdirectory →OpenAI Codexproduct). Repo paths in code references stay as-is.docs/roadmap.md.cspell/words.txttenguandtoks(introduced bytool-truncation.md).Test plan
cargo fmt --all --checkcargo buildcompiles cleanlycargo clippy --all-targets -- -D warnings— zero warningscargo test— 1196 passedcargo llvm-cov --ignore-filename-regex 'main\.rs'— 98.55% workspace line / 98.49% ontool.rspnpm lint/pnpm spellcheck— 0 issuesOut of scope
docs/research/file-tracking.mdfor the design notes.MAX_OUTPUT_BYTES. Carries over from the bash-era cap; revisit based on observed model behavior.