Conversation
LLM-generated filters that end with `| @tsv` (or @csv) hit a confusing "@tsv/0 is not defined" compile error because jaq-std doesn't ship those formatters. The error path then dumped jaq's internal Debug output (File { code: ..., path: () }, [("@TSV", Filter(0))]) which leaked the entire ~800 chars of prepended compat-defs source into stderr. - Add jq-compatible @TSV and @csv definitions to JQ_COMPAT_DEFS so the filters compile and run. - Format compile errors as `name/arity is not defined` and parse/lex errors as short `expected X` messages, matching real jq's tone. - Cap diagnostic output at 240 chars and strip offending-token text so the prepended stdlib can't leak through any error path.
…Debug leaks
Two complementary defenses against the class of bug we just fixed in jq
(leaking jaq's `File { code: "...", path: () }, [("@TSV", Filter(0))]`
into stderr):
1. Static tripwire — `scripts/check-no-debug-fmt.sh` greps
`crates/bashkit/src/builtins/` for `{:?}` / `{:#?}` / `{name:?}`.
Fails CI on match. Per-line opt-out via `// debug-ok: <reason>`.
Wired into `just pre-pr`.
2. Dynamic test — `tests/no_debug_leak_tests.rs` fires curated
malformed inputs at high-risk builtins (jq, awk, grep, sed, json,
yaml, tomlq, csv, bc, expr, semver, date, numfmt) plus a sweep of
every common builtin with a bogus flag. For each, asserts stderr:
- is ≤ 1 KB
- contains no banned Debug-shape substrings (`File {`, `Token(`,
`Vec [`, `Span {`, `Undefined::`, `Errors {`, …)
- contains no jq-specific internals (`__bashkit_env__`, prepended
compat-defs source, raw `Filter(N)` variant tags)
Annotated existing legitimate test-assert `{:?}` uses in awk.rs,
grep.rs, jq.rs, ls.rs with `// debug-ok: assert-failure message` so
they pass the static check.
Documented the policy in AGENTS.md.
…t as TM-INF-022
Restructure of the previous separate-suite + shell-script approach:
- The shell script `scripts/check-no-debug-fmt.sh` is replaced by a Rust
test `builtins::tests::no_debug_fmt_in_builtin_source` that walks
`src/builtins/*.rs` and forbids `{:?}` directives. Runs as part of
`cargo test` — no separate justfile recipe, no shell dependency.
- The standalone `tests/no_debug_leak_tests.rs` suite is split into
per-tool tests collocated with each tool's existing `mod tests`:
jq.rs, awk.rs, grep.rs, sed.rs, json.rs, yaml.rs, tomlq.rs, csv.rs,
bc.rs, expr.rs, semver.rs, date.rs, numfmt.rs.
- Shared helpers (`UNIVERSAL_BANNED`, `assert_no_leak`, `run`) live in
`builtins::debug_leak_check` so the policy is centralized but per-
tool tests stay with their tool.
- Cross-tool bogus-flag sweep moves to `builtins::tests`.
- New threat-model entry **TM-INF-022** documents the leak class, the
originating jq bug, and the two-layer enforcement. AGENTS.md links to
it instead of restating the rules.
- Reverts the `check-no-debug-fmt` justfile recipe — `cargo test`
already runs the static check.
Net: 1 static test + 32 per-tool dynamic tests + 1 cross-tool sweep,
all under `cargo test`. 2640 lib tests pass (was 2604).
Extends the no-Debug-leak guard from per-tool unit tests into the
fuzz/proptest layer so leak regressions get caught on random inputs,
not just on hand-curated cases.
What's new:
1. **`bashkit::testing`** (crates/bashkit/src/testing.rs) — public,
`#[doc(hidden)]` module exposing the helpers previously only
reachable from inline `#[cfg(test)]` modules. Lets external tests
and cargo-fuzz targets share the same banned-substring list and
canary plumbing.
2. **Three invariants** enforced by `assert_fuzz_invariants`:
- **A. No Debug shapes** in stderr (TM-INF-022, already in `no_leak`).
- **B. Per-call stderr ≤ 1 KB** (`MAX_STDERR_BYTES`) — catches
"one bad input floods stderr" regressions.
- **C. No host paths** in stderr — extends `UNIVERSAL_BANNED` with
`/rustc/`, `/.cargo/registry/`, `target/{debug,release}/deps/`,
`/.rustup/toolchains/` (TM-INF-016 fuzz coverage).
- **E. No host-env canary** in stdout/stderr — `fuzz_init()` seeds
a magic `BASHKIT_FUZZ_HOST_CANARY_*` value into the host OS env
once per process; if it ever appears in builtin output, a
builtin is reading `std::env` instead of sandboxed `ctx.env`
(TM-INF-013 regression guard).
3. **14 cargo-fuzz targets updated** to drop `2>/dev/null` (which
discarded exactly what we want to inspect) and route through
`bashkit::testing::fuzz_exec` instead of bare `bash.exec`.
Skipped: `lexer_fuzz`, `parser_fuzz` (don't go through builtins).
4. **Scaffold tests** (`tests/{jq,awk}_fuzz_scaffold_tests.rs`)
collapse to use `fuzz_exec` directly — same checks as the fuzz
targets, runnable under `cargo test`.
5. **5 new proptest cases** in `tests/proptest_security.rs` push 64
random strings each through jq/awk/grep/sed/json (320 random
invocations per `cargo test` run), all asserting the same
invariants.
6. **`builtins::debug_leak_check`** is now a thin re-export of
`crate::testing` so existing per-tool inline tests keep working.
Threat model and AGENTS.md updated to describe the three-layer
enforcement (static / dynamic per-tool / fuzz). 2640 lib tests pass;
18 directly-affected integration tests pass.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
bashkit | 40a41c0 | Commit Preview URL Branch Preview URL |
May 02 2026, 02:27 AM |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a real-world jq filter that LLM agents were hitting (
... | @tsvreturning a multi-page jaq Debug dump), then generalizes the underlying leak class —format!("{:?}", e)on a wrapped library error — into a cross-tool guard with three enforcement layers (static + dynamic + fuzz).What
The originating bug
jq -r 'if (.data | length) == 0 then "..." else (.data[] | [...] | @tsv) end'failed with stderr containing the entire jaq
File { code: "<800 chars of prepended compat-defs source>", path: () }, [("@tsv", Filter(0))]. Two issues:@tsv/@csvaren't defined in jaq-std — bashkit needs to provide them.format!("{:?}", e)which dumps internal jaq struct shapes.The fix
builtins/jq.rs: added@tsvand@csvtoJQ_COMPAT_DEFS(using jq'sdef @name:syntax matching jaq's existing@json,@text).format_jq_compile_errors/format_jq_load_errors: replace the{:?}formatting with domain-specific formatters that emit short jq-style messages —@tsv/0 is not defined,expected then, etc. — capped at 240 chars.The generalization (TM-INF-022)
Realised this leak class affects every builtin wrapping a Debug-ful library. Three-layer guard, all run by
cargo test:builtins::tests::no_debug_fmt_in_builtin_sourcewalkssrc/builtins/*.rsand forbids{:?}/{:#?}/{name:?}directives. Per-line opt-out via// debug-ok: <reason>.mod tests(jq, awk, grep, sed, json, yaml, tomlq, csv, bc, expr, semver, date, numfmt) callsbashkit::testing::assert_no_leakagainst curated malformed inputs. Plus a cross-tool sweep that throws--xyzzy-not-a-real-flagat every common builtin.bashkit::testing::assert_fuzz_invariantson random inputs. This also enforces:fuzz_init()seeds a magicBASHKIT_FUZZ_HOST_CANARY_*value into the host OS env. If it ever appears in builtin stdout/stderr, a builtin is readingstd::envinstead of sandboxedctx.env.UNIVERSAL_BANNEDincludes/rustc/,/.cargo/registry/,target/{debug,release}/deps/,/.rustup/toolchains/.The shared helpers live in
bashkit::testing(#[doc(hidden)] pub mod) so internal tests, integration scaffold tests, and the cargo-fuzz crate all use the same banned-substring list and canary plumbing.Why
Stderr is the agent-facing error API. Real shell tools print short, opaque error messages — Debug output dumps internal struct shapes that confuse LLM agents and waste tokens. Documented as TM-INF-022 in
specs/threat-model.md.How
2>/dev/nullstripped (it was hiding exactly what we want to inspect) and route throughbashkit::testing::fuzz_execfor the assertion.builtins::debug_leak_checkis now a thin re-export ofcrate::testingso the per-tool inline tests keep their existing imports.{:?}test asserts inawk.rs,grep.rs,jq.rs,ls.rsannotated with// debug-ok: assert-failure message.Test plan
cargo test --all-features --lib→ 2640 passed (was 2604 — +36 new tests)cargo test --all-features --tests→ all integration suites pass exceptssh_supabase_tests::ssh_supabase_connectswhich is pre-existing/network-dependent and fails onorigin/maintoocargo test --test jq_fuzz_scaffold_tests --test awk_fuzz_scaffold_tests --test proptest_security→ 18 passed (5 proptest cases × 64 random inputs each = 320 random invocations exercised)cargo checkincrates/bashkit/fuzz/→ all 16 fuzz targets compilecargo fmt --checkcleancargo clippy --all-targets -- -D warningsclean across the workspacebashkit-cli -c '...'produces correct tab-separated outputecho {} | jq totally_undefinedproducesjq: error: totally_undefined/0 is not defined(real-jq-style, no leak)Generated by Claude Code