Skip to content

fuzz: batching, fragment-reuse, k-path coverage, structural shrinking, input-adequacy (+2 bug fixes)#12

Merged
emeryberger merged 3 commits into
mainfrom
fuzz-scale
Jul 5, 2026
Merged

fuzz: batching, fragment-reuse, k-path coverage, structural shrinking, input-adequacy (+2 bug fixes)#12
emeryberger merged 3 commits into
mainfrom
fuzz-scale

Conversation

@emeryberger

Copy link
Copy Markdown
Owner

Implements PLAN.md fuzzing-at-scale tasks 1–5, plus two transpiler soundness bugs (F12/F13) that task 2 surfaced.

Tasks

  1. Batching (--batch B) — pack B seeds per Lean spawn, each in its own #eval block so one ill-typed seed can't abort the batch (a shared #eval aborts entirely via the sorry axiom). Failing/suspect seeds re-run alone to isolate. ~5x fewer spawns; identical verdict + coverage (determinism preserved).
  2. Fragment-reuse (fuzz/corpus_frags.py, --corpus) — harvest 78 real Corpus/*.lean defs and fuzz them on random inputs. Exercises constructs the generative grammar can't invent — found two real bugs (below).
  3. k-path coverage (gen.Gen ctx/all_kpaths/kpaths, KMAX=3) — track and report chains of nested productions. Single-production coverage saturates ~100% while k-path sits ~89% at scale, surfacing untested combinations.
  4. Structural shrinker (fuzz.struct_shrink + gen.single_def_body) — HDD/C-Reduce-style term reduction after the count-based minimizer isolates one def; keeps any rewrite that still elaborates and still reproduces. 373 chars → 13.
  5. Input-adequacy (fuzz/pycov.py, --pycov) — sys.settrace line coverage of the transpiled Python during oracle execution (portable to 3.11; slipcover needs 3.12 + imported files). Flags functions whose branches no input reaches.

Bugs found (by task 2) and fixed

  • F12 — Nat/Int div/mod by zero. Lean division is total (n/0=0, n%0=n, incl. Euclidean Int); transpiler emitted bare Python ///% which crash. Fixed centrally via guardZeroDiv at all emission sites. The generative grammar structurally couldn't hit this (it always guards divisors).
  • F13 — Array.qsort mis-indexing. Recent Lean materializes qsort's low/high optParams, so LCNF args are […,as,lt,low,high]; handler grabbed the last two → sorted(0, key=high). Fixed to index as/lt from size-4/size-3 with a 2-arg fallback.

Validation (cloudnew, 192 cores)

  • Grammar 5000 seeds clean — 61/61 production, 89% k-path coverage
  • Corpus 3000 seeds clean — 78/78 functions; all 216 pre-fix failures were exactly F12/F13
  • pycov 1000 seeds — 98% transpiled-line adequacy

Docs updated: README, RELATED_WORK, VERIFICATION (F12/F13, 18 total bugs), PLAN.

Note: the PLAN.md claim that one ill-typed def in a batch drops only its own rows was wrong — measured that a single shared #eval aborts entirely via the sorry axiom, hence the per-seed #eval block design.

README (7) and RELATED_WORK (13) reference lists now carry a resolvable
DOI where one exists, or a canonical URL otherwise (USENIX page for
LangFuzz, lean-egg repo for Rossel, thesis PDF for Letouzey 2004,
certicoq.org for CertiCoq). Purdom 1972 uses the Springer BF DOI.
…, input-adequacy (+2 bug fixes)

Implements PLAN.md fuzzing-at-scale tasks 1-5:

1. Batching (--batch B): pack B seeds per Lean spawn, each in its own #eval
   block so one ill-typed seed can't abort the batch (a shared #eval aborts
   entirely via the sorry axiom). Failing/suspect seeds re-run alone to isolate.
   ~5x fewer spawns; identical verdict + coverage (determinism preserved).

2. Fragment-reuse (fuzz/corpus_frags.py, --corpus): harvest 78 real Corpus/*.lean
   defs and fuzz them on random inputs. Exercises constructs the generative
   grammar can't invent — immediately found two real soundness bugs:
     F12: Nat/Int div/mod by zero. Lean division is TOTAL (n/0=0, n%0=n, incl.
          Euclidean Int); transpiler emitted bare Python //,% which crash. Fixed
          centrally via guardZeroDiv at all emission sites.
     F13: Array.qsort mis-indexing. Recent Lean materializes qsort's low/high
          optParams, so LCNF args are [...,as,lt,low,high]; handler grabbed the
          last two -> sorted(0, key=high). Fixed to index as/lt from size-4/-3.

3. k-path coverage (gen.Gen ctx/all_kpaths/kpaths, KMAX=3): track and report
   chains of nested productions. Single-production coverage saturates ~100%
   while k-path sits ~89% at scale — surfaces untested combinations.

4. Structural shrinker (fuzz.struct_shrink + gen.single_def_body): HDD/C-Reduce
   style term reduction after the count-based minimizer isolates one def; keeps
   any rewrite that still elaborates and still reproduces. 373 chars -> 13.

5. Input-adequacy (fuzz/pycov.py, --pycov): sys.settrace line coverage of the
   TRANSPILED Python during oracle execution (portable to 3.11; slipcover needs
   3.12 + imported files). Flags functions whose branches no input reaches.

Validated on cloudnew (192 cores): grammar 5000 seeds clean (61/61 prod, 89%
k-path), corpus 3000 seeds clean (78/78 fns, both fixes hold), pycov 1000 seeds
98% adequacy. Docs updated: README, RELATED_WORK, VERIFICATION (F12/F13,
18 total bugs), PLAN.
The div-by-zero guard (F12) wrapped every // and % in an 'if b != 0' ternary,
including cases with a literal nonzero divisor like 'x % 2'. That's pointless
(a nonzero literal can't be zero) and it broke the comprehension regression
test expecting a clean '(x % 2) == 0'. isNonzeroLiteral now detects a
syntactically nonzero int literal divisor and emits the bare operator; variable
divisors still get the guard.
@emeryberger emeryberger merged commit 8ea772c into main Jul 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant