Fix LCNF extraction bugs: Bool-param branch inversion & binary min/max operand drop#1
Merged
Merged
Conversation
…n/max Found while extracting real-world code (Scalene's profiler) to Python. 1. Bool-typed-parameter branch inversion. `if boolParam then a else b` lowers to a Decidable `cases` on `instDecidableEqBool boolParam true`. emitDecidableCases decided branch order from a NAME heuristic (`_x`/`x`/`b`/`cond`/...); a Bool parameter whose name wasn't recognized (e.g. `isMalloc`, `flag`) fell through to the Nat-matched-against-0 path, which SWAPS the branches — silently emitting `if flag: <else-body> else: <then-body>`. Fix: track each fvar's LCNF `Bool` type (boolTypedVars, populated for params in emitFunParams and for let-bindings in emitLetValue) and resolve the discriminant through the alias map (the Decidable node aliases to the underlying Bool fvar). If the discriminant is Bool-typed, use normal branch order; otherwise fall back to the existing name heuristic. Also repairs the previously-broken mod_pow / fast_pow corpus cases. 2. Binary stdlib operand drop for min/max. `Nat.min`/`Nat.max` map to Python `min`/`max` via stdlibFnToPython?, but the 'simple wrapper' emitter passed only the LAST arg — `Nat.min a b` became `min(b)`, dropping the first operand. Fix: for min/max, emit all value (fvar) operands. Adds RegressionFixes.lean + RegressionFixes_test.py covering both. Full correctness suite stays at LCNF 132/132 (and gains the 2 previously-failing Bool cases).
emeryberger
added a commit
to plasma-umass/scalene
that referenced
this pull request
Jun 29, 2026
…2 workaround) emeryberger/LeanToPython#1 (the Nat.min binary-operand-drop fix) is merged, so the extraction no longer needs the min2 = (if a <= b then a else b) workaround: - ScaleneExtract.minCount now uses Nat.min directly; LeanToPython main extracts it cleanly to min(a, b). - Regenerated formal/extract/scalene_verified_core.py from current main. - ExtractMirror.lean: minCountX now mirrors with Nat.min and minCountX_eq holds definitionally (rfl-ish); dropped the now-vestigial min2/min2_eq_min. - Removed formal/extract/leantopython-fixes.patch (fixes are upstream now) and updated formal/README.md to point at the merged PR. Full Lean build clean (no sorry); tests/test_verified_space_saving.py 6/6.
emeryberger
added a commit
to plasma-umass/scalene
that referenced
this pull request
Jun 29, 2026
…n into production (#1070) * Formal: prove combined_stacks capacity bound + extract verified Python to production Extends the formal models with a bounded heavy-hitter proof and a working proof->production pipeline via LeanToPython. New proofs (formal/lean/Scalene/): - SpaceSaving.lean: models _space_saving_increment (scalene_utility.py:515) as a pure step over an assoc-list table. Proves step_withinCap / fold_withinCap (the table NEVER exceeds capacity -- the bound the bounded-combined_stacks design exists to guarantee, _COMBINED_STACKS_MAX_KEYS), minCount_le (eviction targets a minimum-count entry, so heavy hitters survive), and the per-branch size behavior. No sorry; standard axioms only. - ExtractMirror.lean: machine-checked integrity bridge. Re-states the extraction defs and proves them equal to/satisfying the proven ones (min2_eq_min, minCountX_eq, totalTimeNs_eq_elapsed, pythonFractionPpm_le), so the extracted Python cannot silently diverge from what was proven. Proof -> production (formal/extract/ + tests/): - ScaleneExtract.lean: extraction-friendly (Nat/List, no Mathlib) copies of the proven cores -- space_saving_step, cTimeNs/totalTimeNs, pythonFractionPpm, pythonBytes, footprint_delta. - scalene_verified_core.py: GENERATED by LeanToPython from those defs (committed with a provenance header). - tests/test_verified_space_saving.py: differential test running the REAL _space_saving_increment against the extracted oracle over random streams -- asserts the proven capacity bound holds on production code and the count multiset matches the oracle. 6 tests pass. LeanToPython transpiler fixes (formal/extract/leantopython-fixes.patch): extracting Scalene's defs surfaced two bugs, fixed in a LeanToPython checkout: 1. Bool-typed-parameter branch inversion (if isMalloc then a else b emitted swapped) -- fixed by tracking each fvar's LCNF Bool type and resolving through the alias map instead of guessing from the name; also repaired the previously-broken mod_pow corpus case. 2. Binary-builtin operand drop (min/max emitting only the last arg). formal/README.md documents the pipeline, the integrity bridge, the tie-break caveat, and how to regenerate the extracted Python. * test: skip verified-oracle test on Python < 3.10 The LeanToPython-generated oracle (formal/extract/scalene_verified_core.py) uses PEP 604 'X | Y' type unions at module scope, evaluated at import, which raises TypeError on Python 3.9. LeanToPython targets 3.10+ (per its README), so skip the differential test below 3.10 rather than post-process generated code. Fixes the ubuntu-latest/3.9 failure on this branch. * Regenerate verified oracle with upstream-fixed LeanToPython (drop min2 workaround) emeryberger/LeanToPython#1 (the Nat.min binary-operand-drop fix) is merged, so the extraction no longer needs the min2 = (if a <= b then a else b) workaround: - ScaleneExtract.minCount now uses Nat.min directly; LeanToPython main extracts it cleanly to min(a, b). - Regenerated formal/extract/scalene_verified_core.py from current main. - ExtractMirror.lean: minCountX now mirrors with Nat.min and minCountX_eq holds definitionally (rfl-ish); dropped the now-vestigial min2/min2_eq_min. - Removed formal/extract/leantopython-fixes.patch (fixes are upstream now) and updated formal/README.md to point at the merged PR. Full Lean build clean (no sorry); tests/test_verified_space_saving.py 6/6.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix two LCNF extraction bugs found extracting real-world code
While using LeanToPython to extract verified algorithms from
Scalene's profiler into Python, two
extraction bugs surfaced. Both are fixed here with regression coverage; the
existing correctness suite stays at LCNF 132/132 (and the two
previously-failing
mod_pow/fast_powBool cases now pass).1. Bool-typed-parameter branch inversion (silent wrong answer)
if boolParam then a else blowers to aDecidablecasesoninstDecidableEqBool boolParam true.emitDecidableCaseschose branch orderfrom a name heuristic (
_x/x/b/cond/…). ABoolparameter whose namewasn't on that list (e.g.
isMalloc,flag) fell through to the"Nat matched against 0" path, which swaps the branches — emitting
for
if isMalloc then count else -count.Fix: track each fvar's LCNF
Booltype (boolTypedVars, populated forparams in
emitFunParamsand lets inemitLetValue) and resolve thediscriminant through the alias map (the
Decidablenode aliases to theunderlying
Boolfvar). Bool-typed discriminant → normal branch order;otherwise fall back to the existing name heuristic.
2. Binary
min/maxoperand dropNat.min/Nat.maxmap to Pythonmin/maxviastdlibFnToPython?, but the"simple wrapper" emitter passed only the last argument, so
Nat.min a bbecame
min(b)— dropping the first operand (and crashing on assoc-listinputs). Fix: for
min/max, emit all value operands.Tests
RegressionFixes.lean+RegressionFixes_test.pyextract and execute bothpatterns and assert correct results: