perf: gate MyVarCleanupStack emission on per-sub AST analysis by fglock · Pull Request #533 · fglock/PerlOnJava

fglock · 2026-04-21T21:19:38Z

Summary

First concrete step on §0 of PR #526's parity plan. New CleanupNeededVisitor (AST pass, ~200 lines, same pattern as TempLocalCountVisitor) decides per-sub whether the MyVarCleanupStack.unregister emission can be skipped at scope exit.

Measurement protocol

Rigorous A/B within a single process to eliminate JVM-startup and system-load confounds:

for i in 1..5; do ./jperl examples/life_bitpacked.pl -r none -g 500; done
for i in 1..5; do JPERL_FORCE_CLEANUP=1 ./jperl examples/life_bitpacked.pl -r none -g 500; done

Result: 8.46 Mcells/s (optimized) vs 8.25 Mcells/s (forced-cleanup), ~2.5% consistent across 5 runs each. Modest but reproducible.

Other benchmarks stay within ±2% noise.

Why it's a modest win, not a big one

The analysis was initially aggressive (also skipped Phase 1 scopeExitCleanup) and broke DBIC txn_scope_guard.t + tie_scalar.t (blessed params need cleanup regardless of whether the sub itself calls bless). Narrowed scope to Phase E only.

Phase E (MyVarCleanupStack.unregister) is the emission master does NOT have — it's specifically our walker-hardening overhead. Skipping it for simple subs is safe and gives the measured gain. The deeper scopeExitCleanup costs are shared with master and aren't addressed here.

See dev/design/life_bitpacked_regression_analysis.md for the full architectural context.

Correctness

make unit tests: pass except the pre-existing destroy_eval_die.t#4
tie_scalar.t: 12/12 ok
DBIC t/storage/txn_scope_guard.t: 18/18 ok
JPERL_FORCE_CLEANUP=1 env-var escape hatch for correctness debugging

Merge plan

Targets feature/refcount-perf-combined (not master) so it stacks cleanly into PR #526 before that lands.

What's next

Extend the visitor to also handle Phase 1 / 1b skipping, but only with static type analysis that can prove a lexical never holds a blessed-with-DESTROY.
Find the remaining structural gap (master is still ~60% faster than this branch on life_bitpacked). Likely requires reducing per-call overhead beyond emission-level changes.

Generated with Devin

A new CleanupNeededVisitor (following the established visitor pattern, like TempLocalCountVisitor) analyses each sub's body and decides whether the per-scope-exit MyVarCleanupStack.unregister loop can be skipped at emit time. The visitor flags the sub as "needs cleanup" when its body contains any of: - bless / weaken / isweak - local - defer blocks - user sub calls (BinaryOperatorNode "(") — callee may return blessed refs - method calls (BinaryOperatorNode "->" with IdentifierNode or "(" on the RHS) — array/hash derefs are recognised and NOT flagged - nested SubroutineNode (closure captures) Builtins (print, push, chr, length, etc.) are parsed as OperatorNode and don't trigger the sub-call branch, so they're correctly classified as safe. Overridden builtins imported via `use subs` resolve to user sub calls at parse time and are flagged automatically. When the visitor says no-cleanup-needed, EmitStatement skips ONLY the Phase-E MyVarCleanupStack.unregister loop. Phases 1, 1b (scopeExitCleanup on blessed scalars/containers) and Phase 3 (MortalList.flush) are always emitted — skipping those would break blessed-parameter DESTROY, tie_scalar untie, DBIC txn_scope_guard, and the destroy_eval_die tests. The correctness-to-perf tradeoff for these is NOT worth taking. Perf impact: life_bitpacked A/B controlled: 8.25 (force-cleanup) -> 8.46 (-2.5%) 5 consecutive runs each, same process, env-toggled Other benches: within ±2% of baseline noise Correctness: - `make` passes except pre-existing destroy_eval_die.t#4 - tie_scalar.t 12/12 ok - DBIC t/storage/txn_scope_guard.t 18/18 ok Env-var escape hatch: JPERL_FORCE_CLEANUP=1 disables the analysis globally, forcing the old (slow) emission path. Use when investigating suspected correctness regressions attributable to this optimization. This is a modest but real win and follows a clean architectural pattern (visitor + flag + emit branch). Future work could extend the visitor to also detect when Phase 1 / 1b can be skipped, but that requires static type analysis of scalar contents — deferred to a separate effort. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock merged commit 3e3908f into feature/refcount-perf-combined Apr 21, 2026

fglock deleted the perf/cleanup-needed-analysis branch April 21, 2026 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: gate MyVarCleanupStack emission on per-sub AST analysis#533

perf: gate MyVarCleanupStack emission on per-sub AST analysis#533
fglock merged 1 commit intofeature/refcount-perf-combinedfrom
perf/cleanup-needed-analysis

fglock commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fglock commented Apr 21, 2026

Summary

Measurement protocol

Why it's a modest win, not a big one

Correctness

Merge plan

What's next

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant