fuzz: batching, fragment-reuse, k-path coverage, structural shrinking, input-adequacy (+2 bug fixes)#12
Merged
Merged
Conversation
README (7) and RELATED_WORK (13) reference lists now carry a resolvable DOI where one exists, or a canonical URL otherwise (USENIX page for LangFuzz, lean-egg repo for Rossel, thesis PDF for Letouzey 2004, certicoq.org for CertiCoq). Purdom 1972 uses the Springer BF DOI.
…, input-adequacy (+2 bug fixes)
Implements PLAN.md fuzzing-at-scale tasks 1-5:
1. Batching (--batch B): pack B seeds per Lean spawn, each in its own #eval
block so one ill-typed seed can't abort the batch (a shared #eval aborts
entirely via the sorry axiom). Failing/suspect seeds re-run alone to isolate.
~5x fewer spawns; identical verdict + coverage (determinism preserved).
2. Fragment-reuse (fuzz/corpus_frags.py, --corpus): harvest 78 real Corpus/*.lean
defs and fuzz them on random inputs. Exercises constructs the generative
grammar can't invent — immediately found two real soundness bugs:
F12: Nat/Int div/mod by zero. Lean division is TOTAL (n/0=0, n%0=n, incl.
Euclidean Int); transpiler emitted bare Python //,% which crash. Fixed
centrally via guardZeroDiv at all emission sites.
F13: Array.qsort mis-indexing. Recent Lean materializes qsort's low/high
optParams, so LCNF args are [...,as,lt,low,high]; handler grabbed the
last two -> sorted(0, key=high). Fixed to index as/lt from size-4/-3.
3. k-path coverage (gen.Gen ctx/all_kpaths/kpaths, KMAX=3): track and report
chains of nested productions. Single-production coverage saturates ~100%
while k-path sits ~89% at scale — surfaces untested combinations.
4. Structural shrinker (fuzz.struct_shrink + gen.single_def_body): HDD/C-Reduce
style term reduction after the count-based minimizer isolates one def; keeps
any rewrite that still elaborates and still reproduces. 373 chars -> 13.
5. Input-adequacy (fuzz/pycov.py, --pycov): sys.settrace line coverage of the
TRANSPILED Python during oracle execution (portable to 3.11; slipcover needs
3.12 + imported files). Flags functions whose branches no input reaches.
Validated on cloudnew (192 cores): grammar 5000 seeds clean (61/61 prod, 89%
k-path), corpus 3000 seeds clean (78/78 fns, both fixes hold), pycov 1000 seeds
98% adequacy. Docs updated: README, RELATED_WORK, VERIFICATION (F12/F13,
18 total bugs), PLAN.
The div-by-zero guard (F12) wrapped every // and % in an 'if b != 0' ternary, including cases with a literal nonzero divisor like 'x % 2'. That's pointless (a nonzero literal can't be zero) and it broke the comprehension regression test expecting a clean '(x % 2) == 0'. isNonzeroLiteral now detects a syntactically nonzero int literal divisor and emits the bare operator; variable divisors still get the guard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements PLAN.md fuzzing-at-scale tasks 1–5, plus two transpiler soundness bugs (F12/F13) that task 2 surfaced.
Tasks
--batch B) — pack B seeds per Lean spawn, each in its own#evalblock so one ill-typed seed can't abort the batch (a shared#evalaborts entirely via thesorryaxiom). Failing/suspect seeds re-run alone to isolate. ~5x fewer spawns; identical verdict + coverage (determinism preserved).fuzz/corpus_frags.py,--corpus) — harvest 78 realCorpus/*.leandefs and fuzz them on random inputs. Exercises constructs the generative grammar can't invent — found two real bugs (below).gen.Genctx/all_kpaths/kpaths, KMAX=3) — track and report chains of nested productions. Single-production coverage saturates ~100% while k-path sits ~89% at scale, surfacing untested combinations.fuzz.struct_shrink+gen.single_def_body) — HDD/C-Reduce-style term reduction after the count-based minimizer isolates one def; keeps any rewrite that still elaborates and still reproduces. 373 chars → 13.fuzz/pycov.py,--pycov) —sys.settraceline coverage of the transpiled Python during oracle execution (portable to 3.11; slipcover needs 3.12 + imported files). Flags functions whose branches no input reaches.Bugs found (by task 2) and fixed
n/0=0,n%0=n, incl. Euclidean Int); transpiler emitted bare Python///%which crash. Fixed centrally viaguardZeroDivat all emission sites. The generative grammar structurally couldn't hit this (it always guards divisors).low/highoptParams, so LCNF args are[…,as,lt,low,high]; handler grabbed the last two →sorted(0, key=high). Fixed to index as/lt from size-4/size-3 with a 2-arg fallback.Validation (cloudnew, 192 cores)
Docs updated: README, RELATED_WORK, VERIFICATION (F12/F13, 18 total bugs), PLAN.
Note: the PLAN.md claim that one ill-typed def in a batch drops only its own rows was wrong — measured that a single shared
#evalaborts entirely via thesorryaxiom, hence the per-seed#evalblock design.