Skip to content

Fix LCNF extraction bugs: Bool-param branch inversion & binary min/max operand drop#1

Merged
emeryberger merged 1 commit into
mainfrom
fix-bool-branch-and-binary-min
Jun 29, 2026
Merged

Fix LCNF extraction bugs: Bool-param branch inversion & binary min/max operand drop#1
emeryberger merged 1 commit into
mainfrom
fix-bool-branch-and-binary-min

Conversation

@emeryberger

Copy link
Copy Markdown
Owner

Fix two LCNF extraction bugs found extracting real-world code

While using LeanToPython to extract verified algorithms from
Scalene's profiler into Python, two
extraction bugs surfaced. Both are fixed here with regression coverage; the
existing correctness suite stays at LCNF 132/132 (and the two
previously-failing mod_pow/fast_pow Bool cases now pass).

1. Bool-typed-parameter branch inversion (silent wrong answer)

if boolParam then a else b lowers to a Decidable cases on
instDecidableEqBool boolParam true. emitDecidableCases chose branch order
from a name heuristic (_x/x/b/cond/…). A Bool parameter whose name
wasn't on that list (e.g. isMalloc, flag) fell through to the
"Nat matched against 0" path, which swaps the branches — emitting

def fpd(is_malloc, count):
    if is_malloc:
        return -count      # WRONG — this is the `else` body
    else:
        return count

for if isMalloc then count else -count.

Fix: track each fvar's LCNF Bool type (boolTypedVars, populated for
params in emitFunParams and lets in emitLetValue) and resolve the
discriminant through the alias map (the Decidable node aliases to the
underlying Bool fvar). Bool-typed discriminant → normal branch order;
otherwise fall back to the existing name heuristic.

2. Binary min/max operand drop

Nat.min/Nat.max map to Python min/max via stdlibFnToPython?, but the
"simple wrapper" emitter passed only the last argument, so Nat.min a b
became min(b) — dropping the first operand (and crashing on assoc-list
inputs). Fix: for min/max, emit all value operands.

Tests

RegressionFixes.lean + RegressionFixes_test.py extract and execute both
patterns and assert correct results:

lake env lean RegressionFixes.lean > RegressionFixes_out.py
python3 RegressionFixes_test.py   # OK: Bool-branch order, binary min/max

…n/max

Found while extracting real-world code (Scalene's profiler) to Python.

1. Bool-typed-parameter branch inversion.
   `if boolParam then a else b` lowers to a Decidable `cases` on
   `instDecidableEqBool boolParam true`. emitDecidableCases decided branch
   order from a NAME heuristic (`_x`/`x`/`b`/`cond`/...); a Bool parameter
   whose name wasn't recognized (e.g. `isMalloc`, `flag`) fell through to the
   Nat-matched-against-0 path, which SWAPS the branches — silently emitting
   `if flag: <else-body> else: <then-body>`.

   Fix: track each fvar's LCNF `Bool` type (boolTypedVars, populated for
   params in emitFunParams and for let-bindings in emitLetValue) and resolve
   the discriminant through the alias map (the Decidable node aliases to the
   underlying Bool fvar). If the discriminant is Bool-typed, use normal branch
   order; otherwise fall back to the existing name heuristic. Also repairs the
   previously-broken mod_pow / fast_pow corpus cases.

2. Binary stdlib operand drop for min/max.
   `Nat.min`/`Nat.max` map to Python `min`/`max` via stdlibFnToPython?, but the
   'simple wrapper' emitter passed only the LAST arg — `Nat.min a b` became
   `min(b)`, dropping the first operand. Fix: for min/max, emit all value
   (fvar) operands.

Adds RegressionFixes.lean + RegressionFixes_test.py covering both. Full
correctness suite stays at LCNF 132/132 (and gains the 2 previously-failing
Bool cases).
@emeryberger emeryberger merged commit 7852b4a into main Jun 29, 2026
emeryberger added a commit to plasma-umass/scalene that referenced this pull request Jun 29, 2026
…2 workaround)

emeryberger/LeanToPython#1 (the Nat.min binary-operand-drop fix) is merged, so
the extraction no longer needs the min2 = (if a <= b then a else b) workaround:
- ScaleneExtract.minCount now uses Nat.min directly; LeanToPython main extracts
  it cleanly to min(a, b).
- Regenerated formal/extract/scalene_verified_core.py from current main.
- ExtractMirror.lean: minCountX now mirrors with Nat.min and minCountX_eq holds
  definitionally (rfl-ish); dropped the now-vestigial min2/min2_eq_min.
- Removed formal/extract/leantopython-fixes.patch (fixes are upstream now) and
  updated formal/README.md to point at the merged PR.

Full Lean build clean (no sorry); tests/test_verified_space_saving.py 6/6.
emeryberger added a commit to plasma-umass/scalene that referenced this pull request Jun 29, 2026
…n into production (#1070)

* Formal: prove combined_stacks capacity bound + extract verified Python to production

Extends the formal models with a bounded heavy-hitter proof and a working
proof->production pipeline via LeanToPython.

New proofs (formal/lean/Scalene/):
- SpaceSaving.lean: models _space_saving_increment (scalene_utility.py:515) as
  a pure step over an assoc-list table. Proves step_withinCap / fold_withinCap
  (the table NEVER exceeds capacity -- the bound the bounded-combined_stacks
  design exists to guarantee, _COMBINED_STACKS_MAX_KEYS), minCount_le (eviction
  targets a minimum-count entry, so heavy hitters survive), and the per-branch
  size behavior. No sorry; standard axioms only.
- ExtractMirror.lean: machine-checked integrity bridge. Re-states the
  extraction defs and proves them equal to/satisfying the proven ones
  (min2_eq_min, minCountX_eq, totalTimeNs_eq_elapsed, pythonFractionPpm_le), so
  the extracted Python cannot silently diverge from what was proven.

Proof -> production (formal/extract/ + tests/):
- ScaleneExtract.lean: extraction-friendly (Nat/List, no Mathlib) copies of the
  proven cores -- space_saving_step, cTimeNs/totalTimeNs, pythonFractionPpm,
  pythonBytes, footprint_delta.
- scalene_verified_core.py: GENERATED by LeanToPython from those defs (committed
  with a provenance header).
- tests/test_verified_space_saving.py: differential test running the REAL
  _space_saving_increment against the extracted oracle over random streams --
  asserts the proven capacity bound holds on production code and the count
  multiset matches the oracle. 6 tests pass.

LeanToPython transpiler fixes (formal/extract/leantopython-fixes.patch):
extracting Scalene's defs surfaced two bugs, fixed in a LeanToPython checkout:
1. Bool-typed-parameter branch inversion (if isMalloc then a else b emitted
   swapped) -- fixed by tracking each fvar's LCNF Bool type and resolving
   through the alias map instead of guessing from the name; also repaired the
   previously-broken mod_pow corpus case.
2. Binary-builtin operand drop (min/max emitting only the last arg).

formal/README.md documents the pipeline, the integrity bridge, the tie-break
caveat, and how to regenerate the extracted Python.

* test: skip verified-oracle test on Python < 3.10

The LeanToPython-generated oracle (formal/extract/scalene_verified_core.py)
uses PEP 604 'X | Y' type unions at module scope, evaluated at import, which
raises TypeError on Python 3.9. LeanToPython targets 3.10+ (per its README),
so skip the differential test below 3.10 rather than post-process generated
code. Fixes the ubuntu-latest/3.9 failure on this branch.

* Regenerate verified oracle with upstream-fixed LeanToPython (drop min2 workaround)

emeryberger/LeanToPython#1 (the Nat.min binary-operand-drop fix) is merged, so
the extraction no longer needs the min2 = (if a <= b then a else b) workaround:
- ScaleneExtract.minCount now uses Nat.min directly; LeanToPython main extracts
  it cleanly to min(a, b).
- Regenerated formal/extract/scalene_verified_core.py from current main.
- ExtractMirror.lean: minCountX now mirrors with Nat.min and minCountX_eq holds
  definitionally (rfl-ish); dropped the now-vestigial min2/min2_eq_min.
- Removed formal/extract/leantopython-fixes.patch (fixes are upstream now) and
  updated formal/README.md to point at the merged PR.

Full Lean build clean (no sorry); tests/test_verified_space_saving.py 6/6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant