fuzz: batching, fragment-reuse, k-path coverage, structural shrinking, input-adequacy (+2 bug fixes) by emeryberger · Pull Request #12 · emeryberger/LeanToPython

emeryberger · 2026-07-05T00:47:05Z

Implements PLAN.md fuzzing-at-scale tasks 1–5, plus two transpiler soundness bugs (F12/F13) that task 2 surfaced.

Tasks

Batching (--batch B) — pack B seeds per Lean spawn, each in its own #eval block so one ill-typed seed can't abort the batch (a shared #eval aborts entirely via the sorry axiom). Failing/suspect seeds re-run alone to isolate. ~5x fewer spawns; identical verdict + coverage (determinism preserved).
Fragment-reuse (fuzz/corpus_frags.py, --corpus) — harvest 78 real Corpus/*.lean defs and fuzz them on random inputs. Exercises constructs the generative grammar can't invent — found two real bugs (below).
k-path coverage (gen.Gen ctx/all_kpaths/kpaths, KMAX=3) — track and report chains of nested productions. Single-production coverage saturates ~100% while k-path sits ~89% at scale, surfacing untested combinations.
Structural shrinker (fuzz.struct_shrink + gen.single_def_body) — HDD/C-Reduce-style term reduction after the count-based minimizer isolates one def; keeps any rewrite that still elaborates and still reproduces. 373 chars → 13.
Input-adequacy (fuzz/pycov.py, --pycov) — sys.settrace line coverage of the transpiled Python during oracle execution (portable to 3.11; slipcover needs 3.12 + imported files). Flags functions whose branches no input reaches.

Bugs found (by task 2) and fixed

F12 — Nat/Int div/mod by zero. Lean division is total (n/0=0, n%0=n, incl. Euclidean Int); transpiler emitted bare Python ///% which crash. Fixed centrally via guardZeroDiv at all emission sites. The generative grammar structurally couldn't hit this (it always guards divisors).
F13 — Array.qsort mis-indexing. Recent Lean materializes qsort's low/high optParams, so LCNF args are […,as,lt,low,high]; handler grabbed the last two → sorted(0, key=high). Fixed to index as/lt from size-4/size-3 with a 2-arg fallback.

Validation (cloudnew, 192 cores)

Grammar 5000 seeds clean — 61/61 production, 89% k-path coverage
Corpus 3000 seeds clean — 78/78 functions; all 216 pre-fix failures were exactly F12/F13
pycov 1000 seeds — 98% transpiled-line adequacy

Docs updated: README, RELATED_WORK, VERIFICATION (F12/F13, 18 total bugs), PLAN.

Note: the PLAN.md claim that one ill-typed def in a batch drops only its own rows was wrong — measured that a single shared #eval aborts entirely via the sorry axiom, hence the per-seed #eval block design.

README (7) and RELATED_WORK (13) reference lists now carry a resolvable DOI where one exists, or a canonical URL otherwise (USENIX page for LangFuzz, lean-egg repo for Rossel, thesis PDF for Letouzey 2004, certicoq.org for CertiCoq). Purdom 1972 uses the Springer BF DOI.

…, input-adequacy (+2 bug fixes) Implements PLAN.md fuzzing-at-scale tasks 1-5: 1. Batching (--batch B): pack B seeds per Lean spawn, each in its own #eval block so one ill-typed seed can't abort the batch (a shared #eval aborts entirely via the sorry axiom). Failing/suspect seeds re-run alone to isolate. ~5x fewer spawns; identical verdict + coverage (determinism preserved). 2. Fragment-reuse (fuzz/corpus_frags.py, --corpus): harvest 78 real Corpus/*.lean defs and fuzz them on random inputs. Exercises constructs the generative grammar can't invent — immediately found two real soundness bugs: F12: Nat/Int div/mod by zero. Lean division is TOTAL (n/0=0, n%0=n, incl. Euclidean Int); transpiler emitted bare Python //,% which crash. Fixed centrally via guardZeroDiv at all emission sites. F13: Array.qsort mis-indexing. Recent Lean materializes qsort's low/high optParams, so LCNF args are [...,as,lt,low,high]; handler grabbed the last two -> sorted(0, key=high). Fixed to index as/lt from size-4/-3. 3. k-path coverage (gen.Gen ctx/all_kpaths/kpaths, KMAX=3): track and report chains of nested productions. Single-production coverage saturates ~100% while k-path sits ~89% at scale — surfaces untested combinations. 4. Structural shrinker (fuzz.struct_shrink + gen.single_def_body): HDD/C-Reduce style term reduction after the count-based minimizer isolates one def; keeps any rewrite that still elaborates and still reproduces. 373 chars -> 13. 5. Input-adequacy (fuzz/pycov.py, --pycov): sys.settrace line coverage of the TRANSPILED Python during oracle execution (portable to 3.11; slipcover needs 3.12 + imported files). Flags functions whose branches no input reaches. Validated on cloudnew (192 cores): grammar 5000 seeds clean (61/61 prod, 89% k-path), corpus 3000 seeds clean (78/78 fns, both fixes hold), pycov 1000 seeds 98% adequacy. Docs updated: README, RELATED_WORK, VERIFICATION (F12/F13, 18 total bugs), PLAN.

The div-by-zero guard (F12) wrapped every // and % in an 'if b != 0' ternary, including cases with a literal nonzero divisor like 'x % 2'. That's pointless (a nonzero literal can't be zero) and it broke the comprehension regression test expecting a clean '(x % 2) == 0'. isNonzeroLiteral now detects a syntactically nonzero int literal divisor and emits the bare operator; variable divisors still get the guard.

emeryberger added 3 commits July 4, 2026 16:13

emeryberger merged commit 8ea772c into main Jul 5, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuzz: batching, fragment-reuse, k-path coverage, structural shrinking, input-adequacy (+2 bug fixes)#12

fuzz: batching, fragment-reuse, k-path coverage, structural shrinking, input-adequacy (+2 bug fixes)#12
emeryberger merged 3 commits into
mainfrom
fuzz-scale

emeryberger commented Jul 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

emeryberger commented Jul 5, 2026

Tasks

Bugs found (by task 2) and fixed

Validation (cloudnew, 192 cores)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant