perf(chapel): wave-1 speedup baseline — bench_mrr + 3 fixtures#148
Merged
Conversation
…file recipe Closes the deferred "MRR numbers" acceptance item from PR #146's brief. - `src/chapel/bench_mrr.chpl` invokes sequentialProofSearch / parallelProofSearch (best-of) / parallelProofSearchSpeculative against three trivially-provable fixtures (Coq, Lean, Idris2) and emits CSV. - Five-run medians (`docs/bench/2026-05-30-chapel-mrr-baseline.md`) show parallel strategies ~1.6× faster than sequential in the failure regime; sequential wins the trivial-success regime because cofork overhead dominates a 0.3 s prover invocation. - `parallel_proof_search.chpl::isProverAvailable` now pipes the `which` subprocess's stdout/stderr so it stops mangling structured output from callers like bench_mrr. - `just bench-chapel-mrr` reproduces. Two Wave-2 follow-ups surface and are documented in the bench writeup: the Idris2 fixture is unprovable in the current setup because `idris2 --check /tmp/...` can't find Prelude (needs IDRIS2_DATA_DIR or per-prover cwd hook in tryProver), and Agda rejects the mangled temp filename `goal_Agda_N.agda` as an invalid module identifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/chapel/bench_mrr.chpl— invokes all three search strategies (sequential / parallel best-of / parallel speculative) against a small fixture corpus and emits CSV.tests/chapel_fixtures/{coq,lean,idris2}_trivial.*— trivially-provable goals.just bench-chapel-mrrrecipe.parallel_proof_search.chpl::isProverAvailableto pipe thewhichsubprocess so its stdout no longer leaks into structured callers (CSV/JSON).docs/bench/2026-05-30-chapel-mrr-baseline.md.Median results (5 runs, seconds)
Two regimes show clearly: trivial-success → sequential wins (cofork overhead > 0.3s prover invocation); failure → parallel ~1.6× faster (avoids serial walk through 30 not-on-PATH entries).
Wave-2 follow-ups (surfaced, not blockers)
idris2 --check /tmp/...reports "Module Prelude not found" because the temp file lives outside the configured source directory andIDRIS2_DATA_DIRisn't propagated. Needs per-prover cwd/env hook intryProver.goal_Agda_N.agdais not a valid module identifier (numeric literal). Either prefix the basename or generate content withmodule _ where.A real corpus benchmark (10–30s proofs, multiple succeeders) is the follow-up that will show speculative dramatically beating sequential. That's gated on L3 corpus hand-off + the Wave-2 fixes above.
Test plan
🤖 Generated with Claude Code