perf(chapel): wave-1 speedup baseline — bench_mrr + 3 fixtures by hyperpolymath · Pull Request #148 · hyperpolymath/echidna

hyperpolymath · 2026-05-30T19:23:03Z

Summary

Closes the deferred "MRR numbers" acceptance item from PR feat(chapel): L2 parallel proof search — rehabilitation (closes #133) #146's brief.
Adds src/chapel/bench_mrr.chpl — invokes all three search strategies (sequential / parallel best-of / parallel speculative) against a small fixture corpus and emits CSV.
Adds tests/chapel_fixtures/{coq,lean,idris2}_trivial.* — trivially-provable goals.
Adds just bench-chapel-mrr recipe.
Patches parallel_proof_search.chpl::isProverAvailable to pipe the which subprocess so its stdout no longer leaks into structured callers (CSV/JSON).
Documents results + caveats at docs/bench/2026-05-30-chapel-mrr-baseline.md.

Median results (5 runs, seconds)

fixture	sequential	parallel_bestof	parallel_speculative
coq_trivial	0.313	0.741	0.640
lean_trivial	0.520	0.733	0.636
idris2_trivial (fail)	1.201	0.749	0.731

Two regimes show clearly: trivial-success → sequential wins (cofork overhead > 0.3s prover invocation); failure → parallel ~1.6× faster (avoids serial walk through 30 not-on-PATH entries).

Wave-2 follow-ups (surfaced, not blockers)

Idris2 fixture fails for all strategies. idris2 --check /tmp/... reports "Module Prelude not found" because the temp file lives outside the configured source directory and IDRIS2_DATA_DIR isn't propagated. Needs per-prover cwd/env hook in tryProver.
Agda rejects mangled temp filenames. goal_Agda_N.agda is not a valid module identifier (numeric literal). Either prefix the basename or generate content with module _ where.
Sub-second wall-clock has ±200ms cold-cache jitter. Five-run median is the right tool; CI should warm the cache first.

A real corpus benchmark (10–30s proofs, multiple succeeders) is the follow-up that will show speculative dramatically beating sequential. That's gated on L3 corpus hand-off + the Wave-2 fixes above.

Test plan

`just bench-chapel-mrr` runs locally and emits the expected CSV
5-run medians stable to ±15% (raw data preserved in baseline.csv)
`chpl -o bench_mrr bench_mrr.chpl` clean compile
`which`-output suppression: `bench_mrr 2>/dev/null` now yields pure CSV
CI: chapel-build + chapel-smoke remain green

🤖 Generated with Claude Code

…file recipe Closes the deferred "MRR numbers" acceptance item from PR #146's brief. - `src/chapel/bench_mrr.chpl` invokes sequentialProofSearch / parallelProofSearch (best-of) / parallelProofSearchSpeculative against three trivially-provable fixtures (Coq, Lean, Idris2) and emits CSV. - Five-run medians (`docs/bench/2026-05-30-chapel-mrr-baseline.md`) show parallel strategies ~1.6× faster than sequential in the failure regime; sequential wins the trivial-success regime because cofork overhead dominates a 0.3 s prover invocation. - `parallel_proof_search.chpl::isProverAvailable` now pipes the `which` subprocess's stdout/stderr so it stops mangling structured output from callers like bench_mrr. - `just bench-chapel-mrr` reproduces. Two Wave-2 follow-ups surface and are documented in the bench writeup: the Idris2 fixture is unprovable in the current setup because `idris2 --check /tmp/...` can't find Prelude (needs IDRIS2_DATA_DIR or per-prover cwd hook in tryProver), and Agda rejects the mangled temp filename `goal_Agda_N.agda` as an invalid module identifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hyperpolymath enabled auto-merge (squash) May 30, 2026 19:23

hyperpolymath merged commit c880bd6 into main May 30, 2026
32 of 40 checks passed

hyperpolymath deleted the feat/chapel-mrr-baseline branch May 30, 2026 19:26

This was referenced May 30, 2026

chapel: per-prover env/cwd hook in tryProver — unblock Idris2 invocation #158

Closed

chapel: per-prover temp-filename hook in tryProver — unblock Agda invocation #159

Closed

chapel: real-corpus speedup bench (10-30s prover invocations) #161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(chapel): wave-1 speedup baseline — bench_mrr + 3 fixtures#148

perf(chapel): wave-1 speedup baseline — bench_mrr + 3 fixtures#148
hyperpolymath merged 1 commit into
mainfrom
feat/chapel-mrr-baseline

hyperpolymath commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hyperpolymath commented May 30, 2026

Summary

Median results (5 runs, seconds)

Wave-2 follow-ups (surfaced, not blockers)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant