common/dbg, execution: PERF_PROFILES env knob + pprof labels for parallel exec phases#21516
Merged
Conversation
…llel exec phases
Adds an opt-in profiling surface for the parallel execution stack, default-off.
ERIGON_PERF_PROFILES=true enables runtime.SetBlockProfileRate(1) and
SetMutexProfileFraction(1) at package init in common/dbg, populating
/debug/pprof/{block,mutex} for blocking and contention analysis.
Wraps parallelExecutor.exec in pprof.Do(phase=pe-exec) so child goroutines
inherit the phase tag via context. Adds sub-labels on the goroutines:
- exec-worker (exec.Worker.Run per-task workers)
- exec-loop (parallelExecutor.execLoop block scheduler)
- calculator (commitmentCalculator.loop commitment computation)
Pure-additive: no behavior change when env unset (default false), and only
cheap label sets at goroutine entry when on.
AskAlexSharov
approved these changes
May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in profiling surface for the parallel execution stack. Two pieces, both default-off so this is a no-functional-change PR when the env knob is unset.
1.
ERIGON_PERF_PROFILES=trueenv knob — at package init incommon/dbg, enablesruntime.SetBlockProfileRate(1)andruntime.SetMutexProfileFraction(1), populating/debug/pprof/{block,mutex}for blocking and contention analysis. Defaultfalsematches today's behaviour.2. pprof goroutine labels on the parallel exec hot path — fires unconditionally (cheap pointer writes to G-local label storage), but only useful when the CPU profiler is on. Labels:
phase=pe-execparallelExecutor.execis wrapped inpprof.Do(...), so all child goroutines inherit via contextsub=exec-worker(*Worker).Runper-task workerssub=exec-loopparallelExecutor.execLoopblock schedulersub=calculatorcommitmentCalculator.loopcommitment computationThese make it possible to filter
/debug/pprof/profileto the parallel-exec phase via the pprof tags axis and separate dispatch from EVM from commitment without code-side wall-clock instrumentation.Why
Parallel-exec perf work needs to attribute CPU to four buckets — dispatch overhead, EVM execution, IO reads, and in-memory writes/version-map — to know where each optimisation lands. Without phase/sub labels, every pprof read mixes pe-exec CPU with txpool, p2p, GC, snapshot-build, etc. With this PR, one tags query separates them cleanly.
Validation
Pulled two 30s CPU profiles against a mainnet node executing live with
ERIGON_PERF_PROFILES=true.Catchup window (5000-block big-jump, 257% CPU)
```
phase: Total 67.04s of 77.70s (86.28%)
67.04s (86.28%): pe-exec
sub: exec-worker 49.13s (63.23%)
exec-loop 13.53s (17.41%)
calculator 1.47s ( 1.89%)
```
Tip window (NewPayload at slot tip, 39.6% CPU, mostly idle)
```
phase: Total 1.27s of 11.92s (10.65%)
1.27s (10.65%): pe-exec
sub: calculator 0.62s (5.20%)
exec-worker 0.47s (3.94%)
exec-loop 0.14s (1.17%)
```
Different regimes flip which sub dominates — at catchup workers saturate, at tip commitment is the largest slice. The labels split both cleanly. CPU under
phase=pe-execwith no sub is the apply-loop main goroutine (per-block result handling, ~3.7% in catchup, ~3% at tip).Test plan