Skip to content

execution: make block read aheader non-global#21056

Merged
taratorio merged 20 commits intomainfrom
worktree-cmd-enginextest
May 8, 2026
Merged

execution: make block read aheader non-global#21056
taratorio merged 20 commits intomainfrom
worktree-cmd-enginextest

Conversation

@taratorio
Copy link
Copy Markdown
Member

execution: make block read aheader non-global

Summary

Replaces the package-level globalReadAheader in execution/exec/blocks_read_ahead.go with a per-Ethereum instance. Each backend now owns its own *BlockReadAheader (caches + warmup WaitGroup) and threads it through to the consumers that need it.

Why

The globalReadAheader is a process-wide singleton. The exec module's ValidateChain calls AddHeaderAndBody, which spawns a fire-and-forget warmup goroutine tracked on a shared sync.WaitGroup. The eth backend's Stop() waits on that same WaitGroup before closing the chain DB. With multiple Ethereum instances in one process (e.g. the new evm enginextest runner that creates ~36 k testers in a single run, or any other harness that spins up many backends), Add(1) from one tester races with Wait() returning in another. Once the counter momentarily hits zero, a concurrent Add(1) panics:

panic: sync: WaitGroup is reused before previous Wait has returned

This was reproducible at --workers ≥ 16 running evm enginextest against the full EEST blockchain_tests_engine_x set on tmpfs. Single-Erigon-per-process production deployments never trigger it, which is why it has been latent.

Approach

Make the read-aheader a regular struct owned by each Ethereum. The lifetime is the same as the eth backend, the WaitGroup never escapes the backend, and the race vanishes.

Files (13 changed, +69 / −58)

  • execution/exec/blocks_read_ahead.go — exports BlockReadAheader and NewBlockReadAheader(). Methods replace the package-level *Global* wrappers (AddHeaderAndBodyToGlobalReadAheader, AddSendersToGlobalReadAheader, ReadBodyWithTransactionsFromGlobalReadAheader, ReadBlockWithSendersFromGlobalReadAheader, WaitForWarmup). The unused WarmBodyFromGlobalReadAheader is dropped.
  • node/eth/backend.goEthereum.readAheader field, initialized in New to exec.NewBlockReadAheader(), used by Stop() and threaded into the staged-sync configs and NewExecModule.
  • execution/execmodule/exec_module.goExecModule.readAheader field; new arg on NewExecModule. ValidateChain now calls e.readAheader.AddHeaderAndBody(...).
  • execution/stagedsync/stage_senders.go (+ stage_senders_test.go) — SendersCfg.readAheader; new arg on StageSendersCfg; calls cfg.readAheader.{AddSenders, ReadBodyWithTransactions}.
  • execution/stagedsync/stage_execute.goExecuteBlockCfg.readAheader; new arg on StageExecuteBlocksCfg.
  • execution/stagedsync/exec3.go, exec3_serial.go — call te.cfg.readAheader.ReadBlockWithSenders(...) / se.cfg.readAheader.ReadBlockWithSenders(...).
  • execution/stagedsync/stage_witness.go — passes a fresh exec.NewBlockReadAheader() (rewind-only path, doesn't share with the live backend).
  • execution/stagedsync/stageloop/stageloop.goNewDefaultStages / NewPipelineStages / NewInMemoryExecution accept readAheader and plumb it into the senders + execute stage configs.
  • execution/execmodule/execmoduletester/exec_module_tester.go — constructs one readAheader and reuses it across the tester's stage configs and NewExecModule call (mirroring how Ethereum shares a single instance).
  • cmd/integration/commands/stages.go, state_stages.go — pass fresh exec.NewBlockReadAheader() instances at the integration-CLI call sites (each invocation is its own short-lived process).

No runtime semantics change inside the read-aheader itself: same caches, same one-warmup-at-a-time atomic.Bool, same WaitForWarmup behaviour. The only change is that the WaitGroup is per-instance rather than process-wide.

Verification

  • make lint clean.

  • evm enginextest against the full EEST blockchain_tests_engine_x set (fixtures_develop v5.4.0), --workers 8 on TMPDIR=/dev/shm:

    metric value
    Tests passed 63920 / 63920
    Wall time 5:10
    Peak RSS 5.0 GB

    No WaitGroup panic at any worker count tested (8, 16, 24).

  • make test-all with ERIGON_EXECUTION_TESTS_TMPDIR=/mnt/erigon-ramdisk GOGC=80: 222 packages green, 0 failures. All packages this PR touches pass directly:

    • execution/exec (0.10s)
    • execution/execmodule (0.83s), execmoduletester (0.29s)
    • execution/stagedsync (7.18s), stagedsync/bodydownload, stagedsync/headerdownload
    • execution/tests (212.89s, contains the previously-flaky TestInvalidReceiptHashHighMgas)
  • Leak-loop tooling test (build tag leak, in the engine-x test runner package) shows goroutine count flat at 3 across 15 iterations — the read-aheader's warmup goroutine terminates cleanly per Close.

Notes

  • The integration commands (cmd/integration/commands/stages.go, state_stages.go) and the RewindStagesForWitness helper construct fresh exec.NewBlockReadAheader() instances rather than threading one through. Each is a one-shot, short-lived process where the cache wouldn't carry value across the call. If we ever want shared caching there, those sites have a clear seam to do it.
  • Production single-instance Erigon behaviour is unchanged: same caches, same warmup semantics, same Stop() wait, same dbg.ReadAhead gate inside warmBody.

@taratorio taratorio added this pull request to the merge queue May 8, 2026
Merged via the queue into main with commit 97ff85c May 8, 2026
37 of 38 checks passed
@taratorio taratorio deleted the worktree-cmd-enginextest branch May 8, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants