refactor(tool): centralize MAX_OUTPUT_BYTES in ToolRegistry::run by hakula139 · Pull Request #50 · hakula139/oxide-code

hakula139 · 2026-04-29T09:04:12Z

Summary

Per-tool truncation was hand-rolled and incomplete: bash and read enforced the 128 KB byte cap; glob / grep / edit / write did not. A pathological grep response (250 rows × 500 chars before per-line overhead is 125 KB; long paths can push glob past 128 KB too) could flood the model context past the intended ceiling, and a future MCP-tool plug-in would have to re-implement the cap or trust the model not to drown.

This PR lifts the byte safety net into the dispatcher. New cap_output runs head-tail truncation (lifted from bash) so the model still sees both setup context and final outcome; new ToolRegistry::run wraps Tool::run with the cap. Per-tool view-shape budgets (row caps, line-length caps) stay where they are; the two layers separate "tool-semantic budget" from "absolute byte ceiling".

The two leading commits are research surveys (cherry-picked from docs/tool-research) framing the design — tool-truncation.md covers this PR; file-tracking.md scopes a separate companion workstream (Read-before-Edit gates, staleness detection) so the implementation here stays a focused refactor.

Design decisions

Two layers, not one. View-shape (per-tool, semantic — "show 250 grep matches", "first 2000 read lines") and byte-budget (centralized, safety net) are separate concerns. Combining them forces every tool to know about bytes, which is what we're trying to escape.
Head-tail, not tail-only. Preserves both setup context (file headers, command echo) and final outcome (error message, last lines) — the two pieces models most often need to reason about.
Two distinct metadata fields, not one overloaded signal. truncated_total carries view-shape match counts (set by per-tool caps like glob's MAX_RESULTS); truncated_bytes carries pre-cap byte counts (set only by the registry). Splitting keeps glob's (X of N matches) from accidentally rendering bytes when both layers fire — plausible in deep monorepos with long paths.
No spillover to disk for v1. opencode-style "use Grep on /tmp/spill.txt" needs a Task agent to consume the spilled file; oxide-code has none. Revisit when Task lands.
MAX_OUTPUT_BYTES = 128 KB unchanged. Don't tune in the same PR that restructures responsibility.

Changes

File	Description
`tool.rs`	New `// ── Output Cap ──` section co-locates `MAX_OUTPUT_BYTES`, `TRUNCATION_OVERHEAD`, and `cap_output` next to the dispatcher with a two-layer preamble. `ToolRegistry::run` wraps `Tool::run`, applies the cap, and stamps `metadata.truncated_bytes` (separate from view-shape's `truncated_total`). Trait `Tool::run` doc tells implementations not to pre-truncate by bytes. Tests cover `cap_output` boundaries (short / head-tail / multibyte at split / barely-over) and the dispatcher path (overflow, within-cap, unknown-tool, byte-cap-on-read fallthrough to `Text` view). `parse_input` picks up `#[expect(clippy::result_large_err)]` since the new field tipped `ToolOutput` past clippy's threshold.
`agent.rs`	Dispatch swaps `match tools.get(...) { Some(t) => t.run(...).await, None => Unknown-tool error }` for `tools.run(&name, input).await`. Unknown-tool fallback now lives in the registry.
`tool/bash.rs`	Drop `truncate_output` and the boundary tests; the dispatcher's `cap_output` is the new home.
`tool/read.rs`	Drop the mid-loop byte-budget; row cap (`DEFAULT_LINE_LIMIT`) and per-line cap (`truncate_line`) keep view-shape bounded; `(Showing lines N–M of TOTAL total)` footer stays as a view-shape signal. `split_read_footer` doc explains the intentional asymmetry with glob's metadata-driven approach.
`model.rs`	Drive-by: collapse the `#[expect]` reason on `Capabilities` to a single line — matches the convention used by the other 10 `#[expect]` sites in the crate.
`docs/research/tool-truncation.md` (new)	Per-tool vs centralized output caps, head / tail vs spillover, why the dispatcher owns the byte safety net while per-tool view-shape budgets stay.
`docs/research/file-tracking.md` (new)	Companion survey for a deferred follow-up (Read-before-Edit gates, mtime / content staleness, persist-on-finish + verify-on-resume).
`docs/research/README.md`	Registers both new surveys.
`docs/research/anthropic-api.md`, `docs/research/extended-thinking.md`, `docs/research/session-persistence.md`, `docs/research/system-prompt.md`, `docs/research/tui.md`	Reference-project name normalization across older surveys (`claude-code` repo path → `Claude Code` product; `codex-rs` directory → `OpenAI Codex` product). Repo paths in code references stay as-is.
`docs/roadmap.md`	Drop the now-shipped "Centralized output truncation" Current Focus bullet.
`.cspell/words.txt`	Adds `tengu` and `toks` (introduced by `tool-truncation.md`).

Test plan

cargo fmt --all --check
cargo build compiles cleanly
cargo clippy --all-targets -- -D warnings — zero warnings
cargo test — 1196 passed
cargo llvm-cov --ignore-filename-regex 'main\.rs' — 98.55% workspace line / 98.49% on tool.rs
pnpm lint / pnpm spellcheck — 0 issues

Out of scope

File-change tracking (Read-before-Edit gates + mtime / content staleness detection). Survey doc lands here; implementation is a separate workstream — see docs/research/file-tracking.md for the design notes.
Spillover-to-disk for over-cap output. Needs a Task agent to consume the spilled file before the hint is useful.
Tuning MAX_OUTPUT_BYTES. Carries over from the bash-era cap; revisit based on observed model behavior.

…urveys Two parallel-track research notes captured during the Plan A / Plan B exploration. Each surveys how Claude Code, OpenAI Codex, and opencode solve a problem the oxide-code roadmap calls out, ending with the design decisions that shape the planned implementation. - tool-truncation.md — per-tool vs centralized output caps, head/tail vs spillover, and why the dispatcher should own the byte safety net while per-tool view-shape budgets stay where they are. - file-tracking.md — Read-before-Edit gates, mtime / content staleness checks, and the persist-on-finish + verify-on-resume approach (vs claude-code's message-history rehydration). Registers both in the research index. The new docs introduce two GrowthBook flag / token-shorthand terms (`tengu`, `toks`) that needed to land in the cspell wordlist for `pnpm spellcheck` to stay clean.

Existing docs mixed `claude-code` (the repo path) with `Claude Code` (the product), and used `codex-rs` (the workspace directory) where the prose meant `OpenAI Codex` (the product). The new surveys landed with the product-name convention, so this sweep brings the older docs in line so cross-doc reading is consistent. Repo paths in code references stay as-is — `claude-code/src/...`, `codex-rs/core/...`, and `opencode/packages/...` are filesystem paths, not subjects. Only prose subjects, hyperlink labels, and section headings change. - anthropic-api.md — 12 prose refs + opening link label. - extended-thinking.md — opening link label. - session-persistence.md — opening hyperlinks all three reference projects; subsection heading + 1 prose ref. - system-prompt.md — opening link label. - tui.md — opening hyperlinks; `### claude-code` and `### codex-rs` headings; 4 prose refs.

…ry::run Per-tool truncation was hand-rolled and incomplete: bash and read enforced the 128 KB byte cap, glob / grep / edit / write did not. A pathological grep response (250 rows × 500 chars before per-line overhead is 125 KB; long paths can push glob past 128 KB too) could flood the model context past the intended ceiling, and a future MCP tool plug-in would have to re-implement the cap or trust the model not to drown in output. Lift the byte safety net into the dispatcher. New `cap_output` runs the head-tail strategy (lifted from bash) so the model still sees both setup context and final outcome; new `ToolRegistry::run` wraps `Tool::run` with the cap and stamps `metadata.truncated_total` with the pre-cap byte count when the cap fires — renderers consume the structural field instead of parsing footer prose. Per-tool view-shape budgets (row caps, line-length caps) stay where they are; the two layers separate "tool-semantic budget" from "absolute byte ceiling". The agent-loop dispatcher swap is part of this commit so unknown-tool fallback also flows through the registry; bash and read still self-cap until the next commit removes their now-redundant logic.

…y cap Now that the dispatcher owns the byte safety net, bash and read no longer need their own implementations. - bash: drop `truncate_output` + `TRUNCATION_OVERHEAD` and the three boundary tests; the registry runs the same head-tail strategy with a structural footer (`... [N bytes truncated; head + tail kept] ...`) and stamps `metadata.truncated_total`. - read: drop the mid-loop byte-budget check and `truncated_by_bytes` flag. The row cap (`DEFAULT_LINE_LIMIT`) and per-line cap (`truncate_line`) keep the output bounded for view-shape; the registry handles the absolute byte ceiling. The `(Showing lines N–M of TOTAL total)` footer stays — view-shape signal, separate from the byte cap. - tool.rs: add an end-to-end ReadTool dispatcher test (replaces read.rs's deleted byte-budget test) so a regression that bypasses the wrapper trips on real tool output, not just StubTool. - docs/roadmap.md: drop the now-shipped Current Focus bullet. Behavior change: bash's footer wording and unit (`N lines truncated` → `N bytes truncated; head + tail kept`) — bytes is the lingua franca for the registry-level cap since it applies uniformly across line-oriented and non-line-oriented tools.

codecov · 2026-04-29T09:05:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

PR review caught that `truncated_total` had picked up two units: glob writes a match count there, and the registry's byte safety net was also writing the pre-cap byte count into the same field. With ~1.3 KB average path budget the byte cap is reachable in deep monorepos, and a collision would silently render bytes as glob's "X of N matches" count. Splitting the field encodes the units at the type level. Glob renderers keep reading `truncated_total`; the registry now stamps a new `truncated_bytes`. Other review responses folded in: - `// ── Output Cap ──` section co-locates the constants and `cap_output` next to the dispatcher with a preamble naming the two cap layers (per-tool view-shape vs central byte budget). Trait `Tool::run` picks up a one-liner pointing at the wrapper. - `cap_output` drops to private; single caller. - Dispatcher tests now drive `ReadTool` over a temp file (drops `StubTool`), which closes the codecov gap on the fixture's unused trait stubs. New fallthrough test pins `result_view` dropping to `Text` when the byte cap replaces read's footer. - `split_read_footer` picks up a doc comment explaining the intentional asymmetry with glob's metadata-driven approach. - Subtraction-safety invariant comment in `cap_output`; redundant assertion in `cap_output_keeps_head_and_tail` dropped; new field tipped `ToolOutput` past clippy's 128-byte threshold so `parse_input` picks up `#[expect(clippy::result_large_err)]`.

Crate-wide convention is single-line `reason = "..."` strings — 10 of 11 sites already do this even at 150+ chars. `model.rs` was the lone holdout using `\` continuations. Aligning for consistency; behavior unchanged.

CLAUDE.md mandates `...` everywhere; the new fallthrough test slipped in a U+2026.

Decision #4 in the design section claimed `truncated_total` would become the single structural signal; the PR ended up splitting into `truncated_total` (view-shape) + `truncated_bytes` (byte cap) after review caught a unit-conflation hazard. Notes now describe the split and the rationale. Source-line list also updated: the bash and read self-cap references are gone with the code; remaining entries point at the constants and helpers without brittle line ranges. Test-name references in decision #2 follow the rename from `truncate_output_*` to `cap_output_*`.

hakula139 added 4 commits April 29, 2026 16:35

hakula139 added the enhancement New feature or request label Apr 29, 2026

hakula139 self-assigned this Apr 29, 2026

hakula139 added 4 commits April 29, 2026 17:29

style(model): collapse #[expect] reason to single line

356145c

Crate-wide convention is single-line `reason = "..."` strings — 10 of 11 sites already do this even at 150+ chars. `model.rs` was the lone holdout using `\` continuations. Aligning for consistency; behavior unchanged.

style(tool): use ASCII ... ellipsis in test comment

78b723e

CLAUDE.md mandates `...` everywhere; the new fallthrough test slipped in a U+2026.

hakula139 merged commit 081d400 into main Apr 29, 2026
4 checks passed

hakula139 deleted the refactor/tool-truncation branch April 29, 2026 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(tool): centralize MAX_OUTPUT_BYTES in ToolRegistry::run#50

refactor(tool): centralize MAX_OUTPUT_BYTES in ToolRegistry::run#50
hakula139 merged 8 commits intomainfrom
refactor/tool-truncation

hakula139 commented Apr 29, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hakula139 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Changes

Test plan

Out of scope

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hakula139 commented Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading