Skip to content

perf: gate MyVarCleanupStack emission on per-sub AST analysis#533

Merged
fglock merged 1 commit intofeature/refcount-perf-combinedfrom
perf/cleanup-needed-analysis
Apr 21, 2026
Merged

perf: gate MyVarCleanupStack emission on per-sub AST analysis#533
fglock merged 1 commit intofeature/refcount-perf-combinedfrom
perf/cleanup-needed-analysis

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 21, 2026

Summary

First concrete step on §0 of PR #526's parity plan. New CleanupNeededVisitor (AST pass, ~200 lines, same pattern as TempLocalCountVisitor) decides per-sub whether the MyVarCleanupStack.unregister emission can be skipped at scope exit.

Measurement protocol

Rigorous A/B within a single process to eliminate JVM-startup and system-load confounds:

for i in 1..5; do ./jperl examples/life_bitpacked.pl -r none -g 500; done
for i in 1..5; do JPERL_FORCE_CLEANUP=1 ./jperl examples/life_bitpacked.pl -r none -g 500; done

Result: 8.46 Mcells/s (optimized) vs 8.25 Mcells/s (forced-cleanup), ~2.5% consistent across 5 runs each. Modest but reproducible.

Other benchmarks stay within ±2% noise.

Why it's a modest win, not a big one

The analysis was initially aggressive (also skipped Phase 1 scopeExitCleanup) and broke DBIC txn_scope_guard.t + tie_scalar.t (blessed params need cleanup regardless of whether the sub itself calls bless). Narrowed scope to Phase E only.

Phase E (MyVarCleanupStack.unregister) is the emission master does NOT have — it's specifically our walker-hardening overhead. Skipping it for simple subs is safe and gives the measured gain. The deeper scopeExitCleanup costs are shared with master and aren't addressed here.

See dev/design/life_bitpacked_regression_analysis.md for the full architectural context.

Correctness

  • make unit tests: pass except the pre-existing destroy_eval_die.t#4
  • tie_scalar.t: 12/12 ok
  • DBIC t/storage/txn_scope_guard.t: 18/18 ok
  • JPERL_FORCE_CLEANUP=1 env-var escape hatch for correctness debugging

Merge plan

Targets feature/refcount-perf-combined (not master) so it stacks cleanly into PR #526 before that lands.

What's next

  • Extend the visitor to also handle Phase 1 / 1b skipping, but only with static type analysis that can prove a lexical never holds a blessed-with-DESTROY.
  • Find the remaining structural gap (master is still ~60% faster than this branch on life_bitpacked). Likely requires reducing per-call overhead beyond emission-level changes.

Generated with Devin

A new CleanupNeededVisitor (following the established visitor pattern,
like TempLocalCountVisitor) analyses each sub's body and decides
whether the per-scope-exit MyVarCleanupStack.unregister loop can be
skipped at emit time.

The visitor flags the sub as "needs cleanup" when its body contains
any of:
  - bless / weaken / isweak
  - local
  - defer blocks
  - user sub calls (BinaryOperatorNode "(") — callee may return
    blessed refs
  - method calls (BinaryOperatorNode "->" with IdentifierNode or "("
    on the RHS) — array/hash derefs are recognised and NOT flagged
  - nested SubroutineNode (closure captures)

Builtins (print, push, chr, length, etc.) are parsed as OperatorNode
and don't trigger the sub-call branch, so they're correctly classified
as safe. Overridden builtins imported via `use subs` resolve to user
sub calls at parse time and are flagged automatically.

When the visitor says no-cleanup-needed, EmitStatement skips ONLY the
Phase-E MyVarCleanupStack.unregister loop. Phases 1, 1b (scopeExitCleanup
on blessed scalars/containers) and Phase 3 (MortalList.flush) are
always emitted — skipping those would break blessed-parameter DESTROY,
tie_scalar untie, DBIC txn_scope_guard, and the destroy_eval_die
tests. The correctness-to-perf tradeoff for these is NOT worth taking.

Perf impact:
  life_bitpacked  A/B controlled: 8.25 (force-cleanup) -> 8.46 (-2.5%)
                  5 consecutive runs each, same process, env-toggled
  Other benches:  within ±2% of baseline noise

Correctness:
  - `make` passes except pre-existing destroy_eval_die.t#4
  - tie_scalar.t 12/12 ok
  - DBIC t/storage/txn_scope_guard.t 18/18 ok

Env-var escape hatch: JPERL_FORCE_CLEANUP=1 disables the analysis
globally, forcing the old (slow) emission path. Use when investigating
suspected correctness regressions attributable to this optimization.

This is a modest but real win and follows a clean architectural
pattern (visitor + flag + emit branch). Future work could extend the
visitor to also detect when Phase 1 / 1b can be skipped, but that
requires static type analysis of scalar contents — deferred to a
separate effort.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock merged commit 3e3908f into feature/refcount-perf-combined Apr 21, 2026
@fglock fglock deleted the perf/cleanup-needed-analysis branch April 21, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant