Skip to content

perf(chapel): wave-1 speedup baseline — bench_mrr + 3 fixtures#148

Merged
hyperpolymath merged 1 commit into
mainfrom
feat/chapel-mrr-baseline
May 30, 2026
Merged

perf(chapel): wave-1 speedup baseline — bench_mrr + 3 fixtures#148
hyperpolymath merged 1 commit into
mainfrom
feat/chapel-mrr-baseline

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

  • Closes the deferred "MRR numbers" acceptance item from PR feat(chapel): L2 parallel proof search — rehabilitation (closes #133) #146's brief.
  • Adds src/chapel/bench_mrr.chpl — invokes all three search strategies (sequential / parallel best-of / parallel speculative) against a small fixture corpus and emits CSV.
  • Adds tests/chapel_fixtures/{coq,lean,idris2}_trivial.* — trivially-provable goals.
  • Adds just bench-chapel-mrr recipe.
  • Patches parallel_proof_search.chpl::isProverAvailable to pipe the which subprocess so its stdout no longer leaks into structured callers (CSV/JSON).
  • Documents results + caveats at docs/bench/2026-05-30-chapel-mrr-baseline.md.

Median results (5 runs, seconds)

fixture sequential parallel_bestof parallel_speculative
coq_trivial 0.313 0.741 0.640
lean_trivial 0.520 0.733 0.636
idris2_trivial (fail) 1.201 0.749 0.731

Two regimes show clearly: trivial-success → sequential wins (cofork overhead > 0.3s prover invocation); failure → parallel ~1.6× faster (avoids serial walk through 30 not-on-PATH entries).

Wave-2 follow-ups (surfaced, not blockers)

  1. Idris2 fixture fails for all strategies. idris2 --check /tmp/... reports "Module Prelude not found" because the temp file lives outside the configured source directory and IDRIS2_DATA_DIR isn't propagated. Needs per-prover cwd/env hook in tryProver.
  2. Agda rejects mangled temp filenames. goal_Agda_N.agda is not a valid module identifier (numeric literal). Either prefix the basename or generate content with module _ where.
  3. Sub-second wall-clock has ±200ms cold-cache jitter. Five-run median is the right tool; CI should warm the cache first.

A real corpus benchmark (10–30s proofs, multiple succeeders) is the follow-up that will show speculative dramatically beating sequential. That's gated on L3 corpus hand-off + the Wave-2 fixes above.

Test plan

  • `just bench-chapel-mrr` runs locally and emits the expected CSV
  • 5-run medians stable to ±15% (raw data preserved in baseline.csv)
  • `chpl -o bench_mrr bench_mrr.chpl` clean compile
  • `which`-output suppression: `bench_mrr 2>/dev/null` now yields pure CSV
  • CI: chapel-build + chapel-smoke remain green

🤖 Generated with Claude Code

…file recipe

Closes the deferred "MRR numbers" acceptance item from PR #146's brief.

- `src/chapel/bench_mrr.chpl` invokes sequentialProofSearch /
  parallelProofSearch (best-of) / parallelProofSearchSpeculative against
  three trivially-provable fixtures (Coq, Lean, Idris2) and emits CSV.
- Five-run medians (`docs/bench/2026-05-30-chapel-mrr-baseline.md`)
  show parallel strategies ~1.6× faster than sequential in the failure
  regime; sequential wins the trivial-success regime because cofork
  overhead dominates a 0.3 s prover invocation.
- `parallel_proof_search.chpl::isProverAvailable` now pipes the `which`
  subprocess's stdout/stderr so it stops mangling structured output
  from callers like bench_mrr.
- `just bench-chapel-mrr` reproduces.

Two Wave-2 follow-ups surface and are documented in the bench writeup:
the Idris2 fixture is unprovable in the current setup because
`idris2 --check /tmp/...` can't find Prelude (needs IDRIS2_DATA_DIR
or per-prover cwd hook in tryProver), and Agda rejects the mangled
temp filename `goal_Agda_N.agda` as an invalid module identifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hyperpolymath hyperpolymath enabled auto-merge (squash) May 30, 2026 19:23
@hyperpolymath hyperpolymath merged commit c880bd6 into main May 30, 2026
32 of 40 checks passed
@hyperpolymath hyperpolymath deleted the feat/chapel-mrr-baseline branch May 30, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant