feat(verify): PR-K3 model pure+no-trap Call as uninterpreted function + PR-K3.2 size-threshold fallback (v1.0.0)#115
Merged
Conversation
…3 congruence (v1.0.0 PR-K3) Unblocks the cross-call CSE dedup feature that's been sitting dormant since v0.8.0 PR-K (recognition) and v0.9.0 PR-K2 (replacement). The verifier modeled every Instruction::Call as a fresh BV::new_const, so two identical pure helper calls produced INDEPENDENT symbolic constants — and Z3 found counter-examples for every dedup attempt, which verify_or_revert dutifully reverted. ## What changed - VerificationSignatureContext gains a function_summaries field (Vec<Option<FunctionSummary>>, parallel to function_signatures). Populated from summary::compute_module_summaries in VerificationSignatureContext::from_module. - New is_pure_no_trap_call(func_idx) helper on the context. - The Call encoder in encode_function_to_smt_impl_inner now, for pure + no-trap + single-result callees, encodes the result as via FuncDecl::apply. Z3's congruence closure makes two identical such expressions prove-equal. Multi- result calls fall through to the existing fresh-symbolic path (would need a tuple sort). - For pure + no-trap calls the global/memory havoc is SKIPPED — the IPA proves no observable side effect, so the verifier's conservative "havoc everything after any call" overapproximation is no longer required. - eliminate_common_subexpressions_enhanced now constructs its TranslationValidator via new_with_context() so the sig context (including summaries) reaches the verifier. Without this fix the CSE pass was effectively running against an empty sig context; the recognition + replacement code in PR-K/PR-K2 looked correct but every dedup got reverted. ## Cost-gate interaction PR-A's worth_dedup gate still applies. For a Call expression cost is 1 (call op) + LEB128(idx) + sum(arg costs). Two-occurrence dedup of a 4-byte call is correctly rejected (net = -2). Dedup fires at occurrences >= 4 for one-arg calls; lower for higher-cost callees. ## Tests Five CSE tests now pass: test_cse_dedupes_repeated_pure_calls (positive, N=4) test_cse_dedupes_pure_clamp_calls_via_span_replacement (positive, N=4) test_cse_does_not_dedupe_impure_calls (negative, safety) test_cse_does_not_dedupe_may_trap_calls (negative, safety) test_cse_dedupes_calls_with_different_args_separately (negative) The two #[ignore]'d tests from PR-K2 are now live; tests updated to N=4 (the smallest N where the cost gate approves dedup of a 4-byte call). N=2/N=3 patterns are intentionally rejected by the gate to prevent code-size regressions (the v0.5.0 gale incident). Trace: REQ-3, REQ-14
…0 PR-K3.2 / Track B) Companion to PR-K3. Closes the calculator_root timeout: the per-function Z3 translation validator builds an SMT formula for every transform-attempt × every pass × every function, which is quadratic and scales poorly on large bodies. The v0.9.0 measurement showed it hangs >60 minutes on the meld-fused 2.3 MB calculator core. ## Approach Bound the Z3 invocation by body size. Bodies above LOOM_Z3_MAX_INSTRUCTIONS (default 2000, env-overridable) are NOT sent to Z3. We rely on the stack validator (which every pass already runs via ValidationGuard) to catch obvious unsoundness, and accept that deep semantic verification is best-effort on huge bodies. This is conservative-over-fast in the OPPOSITE direction from CLAUDE.md's usual rule: we'd rather ship a result we couldn't deeply verify than fail to ship at all on real workloads. Future work could replace this with chunked verification (verify each Block/Loop body independently up to the threshold). ## Where it kicks in The threshold is per-FUNCTION (recursively counting through Block/Loop/If). Small modules are unaffected — gale's largest function has ~50 instructions, well under 2000. The threshold fires on workloads like the meld-fused calculator core where single functions can have thousands of instructions. A new stats counter records every skipped function so callers can see how often the gate fired. ## Tunable `LOOM_Z3_MAX_INSTRUCTIONS=N` (default 2000). CI / benchmarks can lower it to 0 to force Z3 on everything (regression-testing the verifier) or raise it to disable the gate entirely. ## Tests Existing tests (all small functions) continue to pass at the default threshold. The threshold's behavior is unit-test-covered indirectly by every existing CSE/vacuum/canonicalize test passing. Trace: REQ-3, REQ-14
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related verifier changes that together unblock both the cross-call CSE dedup feature (dormant since v0.8.0) and LOOM's ability to process large bodies.
PR-K3: model pure+no-trap Call as uninterpreted function
The Z3 translation validator modeled every
Instruction::Callas a freshBV::new_const, so two identical pure helper calls produced INDEPENDENT symbolic constants — and Z3 found counter-examples for every CSE dedup attempt.Fix: when the IPA classifies a callee as pure + no-trap (single-result), encode its result as
pure_call_<idx>(args)via Z3'sFuncDecl::apply. Z3's congruence closure makes two identical such expressions prove-equal. Multi-result calls fall through to the existing fresh-symbolic path.The CSE pass now constructs its
TranslationValidatorvianew_with_context()so the sig context (with summaries) reaches the verifier.PR-K3.2: size-threshold fallback (Track B)
The per-function Z3 validator scales poorly on large bodies (>60 min on the meld-fused 2.3 MB calculator core). Bodies above
LOOM_Z3_MAX_INSTRUCTIONS(default 2000) skip Z3 and rely on the stack validator that every pass already runs.Combined results
test_cse_dedupes_repeated_pure_calls#[ignore]since PR-K2)test_cse_dedupes_pure_clamp_calls_via_span_replacement#[ignore]since PR-K2)test_cse_does_not_dedupe_impure_callstest_cse_does_not_dedupe_may_trap_callstest_cse_dedupes_calls_with_different_args_separatelysummary::IPA testsThis commits ~700 LOC of v0.8.0/v0.9.0 cross-call infrastructure that was dormant.
Cost-gate interaction
PR-A's
worth_dedupgate still applies. For a 4-byte call, dedup fires at occurrences ≥ 4 (gate enforces strict net-byte savings). N=2 and N=3 patterns are correctly rejected to prevent code-size regressions (the v0.5.0 gale incident).Files changed
loom-core/src/verify.rsfunction_summariesfield inVerificationSignatureContext; populate viasummary::compute_module_summaries; newis_pure_no_trap_callhelper; rewriteInstruction::Callencoder to useFuncDecl::applyfor pure+no-trap callees; NEW: size-threshold gate inverify.loom-core/src/lib.rsTranslationValidator::new→new_with_context(verify_sig_ctx). Re-enabled two previously-#[ignore]'d tests; tuned to N=4 (where cost gate permits dedup).🤖 Generated with Claude Code