Skip to content

refactor(tool): centralize MAX_OUTPUT_BYTES in ToolRegistry::run#50

Merged
hakula139 merged 8 commits intomainfrom
refactor/tool-truncation
Apr 29, 2026
Merged

refactor(tool): centralize MAX_OUTPUT_BYTES in ToolRegistry::run#50
hakula139 merged 8 commits intomainfrom
refactor/tool-truncation

Conversation

@hakula139
Copy link
Copy Markdown
Owner

@hakula139 hakula139 commented Apr 29, 2026

Summary

Per-tool truncation was hand-rolled and incomplete: bash and read enforced the 128 KB byte cap; glob / grep / edit / write did not. A pathological grep response (250 rows × 500 chars before per-line overhead is 125 KB; long paths can push glob past 128 KB too) could flood the model context past the intended ceiling, and a future MCP-tool plug-in would have to re-implement the cap or trust the model not to drown.

This PR lifts the byte safety net into the dispatcher. New cap_output runs head-tail truncation (lifted from bash) so the model still sees both setup context and final outcome; new ToolRegistry::run wraps Tool::run with the cap. Per-tool view-shape budgets (row caps, line-length caps) stay where they are; the two layers separate "tool-semantic budget" from "absolute byte ceiling".

The two leading commits are research surveys (cherry-picked from docs/tool-research) framing the design — tool-truncation.md covers this PR; file-tracking.md scopes a separate companion workstream (Read-before-Edit gates, staleness detection) so the implementation here stays a focused refactor.

Design decisions

  • Two layers, not one. View-shape (per-tool, semantic — "show 250 grep matches", "first 2000 read lines") and byte-budget (centralized, safety net) are separate concerns. Combining them forces every tool to know about bytes, which is what we're trying to escape.
  • Head-tail, not tail-only. Preserves both setup context (file headers, command echo) and final outcome (error message, last lines) — the two pieces models most often need to reason about.
  • Two distinct metadata fields, not one overloaded signal. truncated_total carries view-shape match counts (set by per-tool caps like glob's MAX_RESULTS); truncated_bytes carries pre-cap byte counts (set only by the registry). Splitting keeps glob's (X of N matches) from accidentally rendering bytes when both layers fire — plausible in deep monorepos with long paths.
  • No spillover to disk for v1. opencode-style "use Grep on /tmp/spill.txt" needs a Task agent to consume the spilled file; oxide-code has none. Revisit when Task lands.
  • MAX_OUTPUT_BYTES = 128 KB unchanged. Don't tune in the same PR that restructures responsibility.

Changes

File Description
tool.rs New // ── Output Cap ── section co-locates MAX_OUTPUT_BYTES, TRUNCATION_OVERHEAD, and cap_output next to the dispatcher with a two-layer preamble. ToolRegistry::run wraps Tool::run, applies the cap, and stamps metadata.truncated_bytes (separate from view-shape's truncated_total). Trait Tool::run doc tells implementations not to pre-truncate by bytes. Tests cover cap_output boundaries (short / head-tail / multibyte at split / barely-over) and the dispatcher path (overflow, within-cap, unknown-tool, byte-cap-on-read fallthrough to Text view). parse_input picks up #[expect(clippy::result_large_err)] since the new field tipped ToolOutput past clippy's threshold.
agent.rs Dispatch swaps match tools.get(...) { Some(t) => t.run(...).await, None => Unknown-tool error } for tools.run(&name, input).await. Unknown-tool fallback now lives in the registry.
tool/bash.rs Drop truncate_output and the boundary tests; the dispatcher's cap_output is the new home.
tool/read.rs Drop the mid-loop byte-budget; row cap (DEFAULT_LINE_LIMIT) and per-line cap (truncate_line) keep view-shape bounded; (Showing lines N–M of TOTAL total) footer stays as a view-shape signal. split_read_footer doc explains the intentional asymmetry with glob's metadata-driven approach.
model.rs Drive-by: collapse the #[expect] reason on Capabilities to a single line — matches the convention used by the other 10 #[expect] sites in the crate.
docs/research/tool-truncation.md (new) Per-tool vs centralized output caps, head / tail vs spillover, why the dispatcher owns the byte safety net while per-tool view-shape budgets stay.
docs/research/file-tracking.md (new) Companion survey for a deferred follow-up (Read-before-Edit gates, mtime / content staleness, persist-on-finish + verify-on-resume).
docs/research/README.md Registers both new surveys.
docs/research/anthropic-api.md, docs/research/extended-thinking.md, docs/research/session-persistence.md, docs/research/system-prompt.md, docs/research/tui.md Reference-project name normalization across older surveys (claude-code repo path → Claude Code product; codex-rs directory → OpenAI Codex product). Repo paths in code references stay as-is.
docs/roadmap.md Drop the now-shipped "Centralized output truncation" Current Focus bullet.
.cspell/words.txt Adds tengu and toks (introduced by tool-truncation.md).

Test plan

  • cargo fmt --all --check
  • cargo build compiles cleanly
  • cargo clippy --all-targets -- -D warnings — zero warnings
  • cargo test — 1196 passed
  • cargo llvm-cov --ignore-filename-regex 'main\.rs' — 98.55% workspace line / 98.49% on tool.rs
  • pnpm lint / pnpm spellcheck — 0 issues

Out of scope

  • File-change tracking (Read-before-Edit gates + mtime / content staleness detection). Survey doc lands here; implementation is a separate workstream — see docs/research/file-tracking.md for the design notes.
  • Spillover-to-disk for over-cap output. Needs a Task agent to consume the spilled file before the hint is useful.
  • Tuning MAX_OUTPUT_BYTES. Carries over from the bash-era cap; revisit based on observed model behavior.

…urveys

Two parallel-track research notes captured during the Plan A / Plan B
exploration. Each surveys how Claude Code, OpenAI Codex, and opencode
solve a problem the oxide-code roadmap calls out, ending with the
design decisions that shape the planned implementation.

- tool-truncation.md — per-tool vs centralized output caps, head/tail
  vs spillover, and why the dispatcher should own the byte safety net
  while per-tool view-shape budgets stay where they are.
- file-tracking.md — Read-before-Edit gates, mtime / content
  staleness checks, and the persist-on-finish + verify-on-resume
  approach (vs claude-code's message-history rehydration).

Registers both in the research index. The new docs introduce two
GrowthBook flag / token-shorthand terms (`tengu`, `toks`) that needed
to land in the cspell wordlist for `pnpm spellcheck` to stay clean.
Existing docs mixed `claude-code` (the repo path) with `Claude Code`
(the product), and used `codex-rs` (the workspace directory) where
the prose meant `OpenAI Codex` (the product). The new surveys
landed with the product-name convention, so this sweep brings the
older docs in line so cross-doc reading is consistent.

Repo paths in code references stay as-is — `claude-code/src/...`,
`codex-rs/core/...`, and `opencode/packages/...` are filesystem
paths, not subjects. Only prose subjects, hyperlink labels, and
section headings change.

- anthropic-api.md — 12 prose refs + opening link label.
- extended-thinking.md — opening link label.
- session-persistence.md — opening hyperlinks all three reference
  projects; subsection heading + 1 prose ref.
- system-prompt.md — opening link label.
- tui.md — opening hyperlinks; `### claude-code` and `### codex-rs`
  headings; 4 prose refs.
…ry::run

Per-tool truncation was hand-rolled and incomplete: bash and read
enforced the 128 KB byte cap, glob / grep / edit / write did not. A
pathological grep response (250 rows × 500 chars before per-line
overhead is 125 KB; long paths can push glob past 128 KB too) could
flood the model context past the intended ceiling, and a future MCP
tool plug-in would have to re-implement the cap or trust the model
not to drown in output.

Lift the byte safety net into the dispatcher. New `cap_output` runs
the head-tail strategy (lifted from bash) so the model still sees
both setup context and final outcome; new `ToolRegistry::run` wraps
`Tool::run` with the cap and stamps `metadata.truncated_total` with
the pre-cap byte count when the cap fires — renderers consume the
structural field instead of parsing footer prose. Per-tool view-shape
budgets (row caps, line-length caps) stay where they are; the two
layers separate "tool-semantic budget" from "absolute byte ceiling".

The agent-loop dispatcher swap is part of this commit so unknown-tool
fallback also flows through the registry; bash and read still
self-cap until the next commit removes their now-redundant logic.
…y cap

Now that the dispatcher owns the byte safety net, bash and read no
longer need their own implementations.

- bash: drop `truncate_output` + `TRUNCATION_OVERHEAD` and the
  three boundary tests; the registry runs the same head-tail
  strategy with a structural footer (`... [N bytes truncated; head +
  tail kept] ...`) and stamps `metadata.truncated_total`.
- read: drop the mid-loop byte-budget check and `truncated_by_bytes`
  flag. The row cap (`DEFAULT_LINE_LIMIT`) and per-line cap
  (`truncate_line`) keep the output bounded for view-shape; the
  registry handles the absolute byte ceiling. The
  `(Showing lines N–M of TOTAL total)` footer stays — view-shape
  signal, separate from the byte cap.
- tool.rs: add an end-to-end ReadTool dispatcher test (replaces
  read.rs's deleted byte-budget test) so a regression that bypasses
  the wrapper trips on real tool output, not just StubTool.
- docs/roadmap.md: drop the now-shipped Current Focus bullet.

Behavior change: bash's footer wording and unit (`N lines truncated`
→ `N bytes truncated; head + tail kept`) — bytes is the lingua
franca for the registry-level cap since it applies uniformly across
line-oriented and non-line-oriented tools.
@hakula139 hakula139 added the enhancement New feature or request label Apr 29, 2026
@hakula139 hakula139 self-assigned this Apr 29, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

PR review caught that `truncated_total` had picked up two units: glob
writes a match count there, and the registry's byte safety net was
also writing the pre-cap byte count into the same field. With ~1.3 KB
average path budget the byte cap is reachable in deep monorepos, and
a collision would silently render bytes as glob's "X of N matches"
count.

Splitting the field encodes the units at the type level. Glob
renderers keep reading `truncated_total`; the registry now stamps a
new `truncated_bytes`.

Other review responses folded in:

- `// ── Output Cap ──` section co-locates the constants and
  `cap_output` next to the dispatcher with a preamble naming the two
  cap layers (per-tool view-shape vs central byte budget). Trait
  `Tool::run` picks up a one-liner pointing at the wrapper.
- `cap_output` drops to private; single caller.
- Dispatcher tests now drive `ReadTool` over a temp file (drops
  `StubTool`), which closes the codecov gap on the fixture's unused
  trait stubs. New fallthrough test pins `result_view` dropping to
  `Text` when the byte cap replaces read's footer.
- `split_read_footer` picks up a doc comment explaining the
  intentional asymmetry with glob's metadata-driven approach.
- Subtraction-safety invariant comment in `cap_output`; redundant
  assertion in `cap_output_keeps_head_and_tail` dropped; new field
  tipped `ToolOutput` past clippy's 128-byte threshold so
  `parse_input` picks up `#[expect(clippy::result_large_err)]`.
Crate-wide convention is single-line `reason = "..."` strings — 10 of
11 sites already do this even at 150+ chars. `model.rs` was the lone
holdout using `\` continuations. Aligning for consistency; behavior
unchanged.
CLAUDE.md mandates `...` everywhere; the new fallthrough test slipped
in a U+2026.
Decision #4 in the design section claimed `truncated_total` would
become the single structural signal; the PR ended up splitting into
`truncated_total` (view-shape) + `truncated_bytes` (byte cap) after
review caught a unit-conflation hazard. Notes now describe the split
and the rationale.

Source-line list also updated: the bash and read self-cap references
are gone with the code; remaining entries point at the constants and
helpers without brittle line ranges. Test-name references in
decision #2 follow the rename from `truncate_output_*` to
`cap_output_*`.
@hakula139 hakula139 merged commit 081d400 into main Apr 29, 2026
4 checks passed
@hakula139 hakula139 deleted the refactor/tool-truncation branch April 29, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant