Revert "logger: avoid mutex contention"#20653
Closed
AskAlexSharov wants to merge 178 commits into
Closed
Conversation
…logic extraction (#19642) - relax check when several empty blocks (same state root) - sampling logic extraction ``` for blockNum := range sampler.BlockNums(from, to) { ``` or ``` for start := fromBlock; start <= toBlock; start += chunkSize { if sampler.CanSkip() { continue } ``` Also: - enable CommitmentHistVal by default - with `--sample` support
## Summary - Replace per-key heap allocations in HashSort (ModeDirect and ModeUpdate) with grow-only `batchSlab` and `byteArena` fields on the `Updates` struct - Arena is pre-allocated per batch and reset between batches; slab stores `KeyUpdate` values contiguously - Add dedicated HashSort benchmarks for both modes at N=50, 5000, 50000 ## Benchmark Results **HexPatriciaHashed_Process** (main vs this branch, 5 runs, p=0.008): | Metric | main | branch | Change | |--------|------|--------|--------| | Time | 21.3 µs/op | 17.8 µs/op | **-16% faster** | | Memory | 91.5 KB/op | 10.4 KB/op | **-89% less** | | Allocs | 128/op | 106/op | **-17% fewer** | **HashSort-specific** (new benchmarks): - ModeDirect: constant 18-19 allocs regardless of key count (50 to 50k) - ModeUpdate: **zero allocations** with full arena reuse ## Test plan - [x] `go test ./execution/commitment/...` passes - [x] Benchmarks run with `-count=5 -benchmem` on both main and branch - [x] `benchstat` comparison confirms statistically significant improvements (p=0.008) --------- Co-authored-by: bloxster <40316187+bloxster@users.noreply.github.com> Co-authored-by: Bloxster <bloxster@proton.me> Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com> Co-authored-by: Mark Holt <erigon@dev-bm-e3-ethmainnet-n4.erigon.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
## Summary Adds a file-integrity-cache for `CommitmentKvi` and `CommitmentKvDeref` integrity checks. Once a file passes integrity verification, its result is cached using the torrent InfoHash as the fingerprint. Subsequent runs skip re-checking files that have not changed. ## Changes - **New CLI flag**: `--file-integrity-cache=<path>` for `erigon snapshots integrity` - **Cache implementation**: `db/integrity/deref_cache.go` - Uses SHA1 InfoHash from `.torrent` files (not content hashing) - Requires `.torrent` files to exist (no fallback) - Cache format: `CheckName\tfile1:hash1\tfile2:hash2...` (e.g. `CommitmentKvi\tv2.0-commitment.0-32.kv:a2de2d...\tv2.0-commitment.0-32.kvi:9158a0...`) - **Integration**: Cache parameter added to `CommitmentKvi` and `CommitmentKvDeref` functions ## Performance Tested on Sepolia (4 commitment file sets, 9.4G total): | Phase | Description | Duration | |-------|-------------|----------| | Baseline (no cache) | Full integrity check | 2m 7s | | Cache creation | Full check + write cache | 2m 7s | | Cache hit | Skip verified files | **2s** | **63x speedup** on subsequent runs. ## Test Commands ```bash # Generate .torrent files if missing ./build/bin/downloader torrent_create --datadir=/path/to/datadir --chain=sepolia --all # Run with cache ./build/bin/erigon snapshots integrity --datadir=/path/to/datadir \ --check=CommitmentKvi,CommitmentKvDeref \ --file-integrity-cache=/tmp/integrity-cache.txt ``` ## Notes - Cache is invalidated automatically when file content changes (different torrent hash) - Missing `.torrent` files will cause an error (use `torrent_create` to generate them) --------- Co-authored-by: Alexey Sharov <AskAlexSharov@gmail.com>
- Sampling support in `CommitmentKvi` - Enable CheckCommitmentHistAtBlkRange as default check - A bit hack: reduced sample ratio for CheckCommitmentHistAtBlkRange in code (to make default `integrity` run fast-enough). ``` INFO[03-06|05:43:31.354] [integrity] CommitmentKvi kvi=v2.0-commitment.0-4096.kvi kv=v2.0-commitment.0-4096.kv INFO[03-06|05:44:01.354] [integrity] CommitmentKvi at=19718552/333930881 p=5.9% k/s=657269.458 eta=7m58s kvi=v2.0-commitment.0-4096.kvi INFO[03-06|05:44:31.354] [integrity] CommitmentKvi at=38533118/333930881 p=11.5% k/s=642211.170 eta=7m39s kvi=v2.0-commitment.0-4096.kvi ```
Example i catched: ``` [integrity] err="[integrity] .ef file has foreign txNum: 100000000 < 114165593, v3.0-logaddrs.192-256.ef, 0000000000000039" ```
in the past i introduced couple primitive nil-ptrs there - and tests didn't catch it earlier.
Slot 21651456 Epoch 1353216 ts 1773653580 UTC Mon 16/03/2026, 09:33:00 Cherry-pick of #17485 to `release/3.4`
## Summary - Cherry-pick of #19691 to `release/3.4` - Replace `ChiadoBootstrapNodes` with the ones from Lighthouse's built-in Chiado network config 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…retry in buildVI (#19697) Cherry-pick of #19695 to release/3.4 --- The counter 'i' used to track page offsets in paged history files was declared outside the retry loop. When a recsplit collision occurred and the loop retried, 'i' retained its value from the previous iteration, causing incorrect page offset tracking in the .vi index. Fix: Move `i := 0` inside the retry loop so it's reset on each attempt.
…19680) Story: i noticed on profiler that during InvertedIndex files merge `deriveFields()` fun did many re-allocs. Initially i had intent to 2x over-alloc to amortize it - but realized that we doing multiple merges of same key (multiple chunks from multiple small files) - each such merge produced "bigger sequence". So, clearly higher-level logic of merge files is wrong and need merge all chunks at once. --- Before: incremental pairwise merge — for a key in N files, reads C + 2C + 3C + ... + NC = O(N²·C) elements and calls EliasFano.ResetForWrite(→ deriveFields → make([]uint64, ...)) N times, allocating a new backing array each time since the merged size always exceeds the previous capacity. After: single-pass accumulation — collects all N items for the current key, computes totalCount and maxOff in one scan, initialises the builder once with the correct size (so deriveFields only allocates once per key at the right capacity), then iterates the N files in ascending txNum order adding all values in a single pass. O(N·C) reads, O(1) allocations per key. `BenchmarkInvertedIndexMergeFiles`: ``` ┌───────┬─────────┬─────────┬─────────┐ │ files │ before │ after │ speedup │ ├───────┼─────────┼─────────┼─────────┤ │ 4 │ 840 µs │ 763 µs │ 1.1x │ ├───────┼─────────┼─────────┼─────────┤ │ 8 │ 1306 µs │ 819 µs │ 1.6x │ ├───────┼─────────┼─────────┼─────────┤ │ 16 │ 3116 µs │ 970 µs │ 3.2x │ ├───────┼─────────┼─────────┼─────────┤ │ 32 │ 9718 µs │ 1305 µs │ 7.4x │ └───────┴─────────┴─────────┴─────────┘ ```
UnitTest for #19697 - maybe can simplify in future (by passing external recsplit cfg)
…curren… (#19725) cherry-pick of #19710 close #19720 Fix goroutine leak in GetReceipts loop: each tx spawned a goroutine waiting on ctx.Done() to cancel the EVM, but ctx was shared across the entire block execution, so all N goroutines stayed alive until GetReceipts returned. Under concurrent requests for different blocks this caused goroutines and EVMs to accumulate in memory, triggering OOM. Fixed by adding a txDone channel closed immediately after ApplyTransactionWithEVM - the goroutine now exits as soon as its tx completes, keeping at most 1 goroutine alive at a time. Add execSem semaphore (capacity max(1, GOMAXPROCS/2), env R_EXEC_CONCURRENCY) to limit the number of blocks executing concurrently in GetReceipts. Each parallel block execution holds its own IntraBlockState which can be hundreds of MB for busy mainnet blocks; without a bound, many concurrent requests for different blocks exhaust memory. Requests already served from the LRU cache bypass the semaphore entirely. ./run_perf_tests.py .... _eth_get_block_receipts_21M_20K.tar -t 500:10,5000:1 with current SW: [3. 1] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=10.597s] [3. 2] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=13.36s] [3. 3] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=26.353s] [3. 4] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=1m56s] [3. 5] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=2m44s] [4. 1] daemon: executes test qps: 5000 time: 1 -> [R=100.00% max=4m11s] [4. 2] daemon: executes test qps: 5000 time: 1 -> test failed: server is Dead for OOM with new SW: [3. 1] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=11.591s] [3. 2] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=5.202s] [3. 3] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=4.947s] [3. 4] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=5.061s] [3. 5] daemon: executes test qps: 500 time: 10 -> [R=100.00% max=5.009s] [4. 1] daemon: executes test qps: 5000 time: 1 -> [R=100.00% max=13.789s] [4. 2] daemon: executes test qps: 5000 time: 1 -> [R=100.00% max=14.032s] [4. 3] daemon: executes test qps: 5000 time: 1 -> [R=100.00% max=14.02s] [4. 4] daemon: executes test qps: 5000 time: 1 -> [R=100.00% max=13.958s] [4. 5] daemon: executes test qps: 5000 time: 1 -> [R=100.00% max=13.924s] Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…g step-rebase (#19730) Cherry-pick of #19723 to `release/3.4`. ## Summary - `erigon seg step-rebase` renames snapshot data files, invalidating existing torrent metadata - `.torrent` files in subdirectories (domain/, history/, accessor/, idx/) were already deleted, but `erigondb.toml.torrent` in the snapshots root was missed - Add it to the deletion list
## Summary - Cherry-pick of #19728 to `release/3.4` - Documents the `[rX.Y]` prefix convention for PRs that cherry-pick commits to `release/X.Y` branches ## Test plan - [x] Read updated file and confirm the new line appears in the Conventions section
…the chain in use (#19727) Cherry-pick of #19722 (merged to main as a18eb9b) to `release/3.4`. ## Summary - **Lazy-parse webseed TOML**: instead of parsing all 8 chains' webseed TOML at init time, store raw bytes in `EmbeddedWebseedsRaw` and parse on demand via `GetEmbeddedWebseeds(chain)` — only the chain actually in use gets parsed. - **Remove no-op re-assignment**: `LoadRemotePreverified` was redundantly re-building the same `KnownWebseeds` map; removed. - **Inline `webseedsParse`**: folded into its sole caller `GetEmbeddedWebseeds`. - **Rename `KnownWebseeds` → `EmbeddedWebseeds`**: clearer naming — `EmbeddedWebseedsRaw` for the raw bytes map, `GetEmbeddedWebseeds()` for parsed access.
…s hash collisions (#19741) (#19753) ## Summary Cherry-pick of #19741 from `performance` to `release/3.4`. - Migrate bloatnet configuration from perf-devnet-2 (which is down) to perf-devnet-3 - Fix genesis hash collisions in `registeredChainsByGenesisHash` caused by multiple chains sharing the same genesis hash - Replace the genesis-hash-based chain lookup with explicit `chainName` parameter threading ## Changes - **`execution/chain/spec/config.go`** — Remove `registeredChainsByGenesisHash` map - **`execution/state/genesiswrite/genesis_write.go`** — Add `chainName` parameter to `WriteGenesisState` - **`p2p/sentry/sentry_grpc_server.go`** — Accept `bootnodes`/`dnsNetwork` as explicit params instead of looking them up from genesis hash - **`node/eth/backend.go`** — Resolve and pass chain-specific bootnodes/DNS params - **`cl/clparams/config.go`** — Update bloatnet ENR and fork configuration for perf-devnet-3 ## Test plan - [x] Cherry-pick applies cleanly (no conflicts) - [x] `make lint` passes - [x] `make erigon integration` builds successfully - [ ] CI passes
- db/state/domain.go: Move keyPos and valPos declarations inside the retry loop
remove alert: `[experiment] enabling commitment history. this is an experimental flag so run at your own risk!`
…nts (#19793) ## Summary Cherry-pick of #19777 to `release/3.4`. - Use `GetCodeHash` instead of `ResolveCodeHash` in the contract address collision check so that an EIP-7702 delegation designator (`0xef0100...`) is treated as non-empty code—matching geth's behavior and preventing a consensus split. - Includes tests for both CREATE and CREATE2 collision with delegated accounts. Fixes ethereum-bounty/erigon#2 ## Test plan - [x] `TestCreate2CollisionWithEIP7702Delegation` passes - [x] `TestCreateCollisionWithEIP7702Delegation` passes - [x] `go build ./execution/vm/...` compiles cleanly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-pick of #19819 to `release/3.4`. ## Summary - Adds `mcp` to the `BINARIES` list in `.github/workflows/release.yml` so it is built and included in release artifacts and Docker images. Generated with Claude.
…build (#19853) ## Summary Cherry-pick of #19825 and #19847 to `release/3.4`. - Improve release workflow robustness (early release existence check, artifact verification, non-fatal skopeo delete, GitHub App token for publish step) - Inline debian package build (removing separate reusable workflow file) - Fix debian control file heredoc inside for loop - Update docker actions to v4.0.0 (Node.js 24) - Pin `actions/create-github-app-token` to v2.2.1 SHA (Node.js 24)
it's often useful for experiments: "how collate+build impacting chaindata size", etc... often need build files in blocking way - but don't wait for merge to finish
…ch-up (#20555) Cherry-pick of #20546. InsertBlocks on 3.4 uses `BeginRw` directly (not `SharedDomains`), so the `inserters.go` portion of #20546 is a no-op here. This cherry-pick keeps the general `SharedDomains` contract improvement: `SeekCommitment` always fully restores the SD, and `NewSharedDomains` attaches `ErrBehindCommitment` as an environmental signal probed via `TxNums.Last` at the construction boundary.
Cherry-pick of #20573 to release/3.4.
``` [INFO] [04-15|05:15:50.145] [integrity] StateRootVerifyByHistory blks/s=0.5 checked=3.72k/2.48k windows=1860/2488 blkRange=1-24.87M ``` `checked` overflow
Needed for Gnosis (Fulu fork) and for security updates
…BuildFilesInBackground (#20594) Port of #20445 to release/3.4. ## Summary Fixes two related bugs in the domain state layer that cause gas mismatches during execution (#20169). ### Bug 1: Collation/pruning race `BuildFilesInBackground` collates domain files by reading values from the DB via per-worker read-transactions opened at collation time. Between the moment a step is deemed ready and the collation reads, execution can commit new batches that overwrite step S values with step S+1 data. The collated file for step S then contains wrong values. After pruning removes the DB entries, `GetLatest` returns the stale file values, causing SSTORE gas mispricing. **Fix:** Restructure `buildFiles` into two phases: 1. **Phase 1 (sequential, single read-tx):** Collate all domains and indices using one MDBX read-transaction. All collations see the exact same DB snapshot — zero race window. 2. **Phase 2 (parallel, no DB access):** Build files from collations. This is the expensive part and remains fully parallel. Additionally, add a committed txNum guard: don't collate step S until `ComputeCommitment` has confirmed all data through the end of step S is flushed. ### Bug 2: Unwind entry visibility after filing After a reorg, the unwind restores domain entries tagged with a step derived from the unwind-target txNum. If `BuildFilesInBackground` has filed that step, `getLatestFromDb` discards the restored entry (step covered by files) and falls through to `getLatestFromFiles`, returning the stale end-of-step value instead of the changeset-restored value. **Fix:** Pass `Aggregator.EndTxNumMinimax()` into the unwind and tag restored entries with `max(naturalStep, currentFilesEndStep)`. ### Changes - `db/state/aggregator.go`: single-tx collation + parallel file building; committed txNum guard; `stepFullyCommitted` helper; pass current file boundary to unwind - `db/state/domain.go`: bump unwind step tag past current filed range - `db/state/aggregator_test.go`: `TestAggregator_CommittedTxNumGuard` - `db/state/domain_test.go`: `TestDomain_CollationIsolatedFromLaterSteps`, `TestDomain_UnwindRestoredEntryVisibility` - `execution/commitment/commitmentdb/commitment_context.go`: export `DecodeTxBlockNums`; fix `minUnwindale` typo; short-value length guard
…ionData (#20600) ## Summary Cherry-pick of #19783 from `main` to `release/3.4`. Fixes a panic observed on `alex/collation_race_fix_34` (and `release/3.4`) when a validator client polls `/eth/v1/validator/attestation_data` before Caplin has synced to head. - **`SyncedDataManager.CommitteeCount`** (`synced_data.go`): added `accessLock.RLock()` + nil check on `headState`, consistent with every other accessor in the same file. - **Debug-log defer** (`block_production.go`): guard against nil `committeeIndex` in the deferred log closure. ## Reproduction Start Erigon on `release/3.4` while a validator client is actively polling. The VC calls `GET /eth/v1/validator/attestation_data` before Caplin reaches head → panic in HTTP handler goroutine: ``` panic: runtime error: invalid memory address or nil pointer dereference github.com/erigontech/erigon/cl/phase1/core/state.(*CachingBeaconState).CommitteeCount(0x0, ...) github.com/erigontech/erigon/cl/beacon/synced_data.(*SyncedDataManager).CommitteeCount(...) github.com/erigontech/erigon/cl/beacon/handler.(*ApiHandler).GetEthV1ValidatorAttestationData.func1() ``` ## Test plan - [x] Clean cherry-pick from `main` (commit `0f3624a17b`) - [x] `go test ./cl/beacon/synced_data/... ./cl/beacon/handler/... -short` passes on main Generated with Claude
Cherry-pick of #20609 to release/3.4 Co-authored-by: bendertherobert <bendertherobert@gmail.com>
…t+Put` (#20643) revert changes introduced in https://github.com/erigontech/erigon/pull/19914/changes
Adds FAQ entries for the MCP server to the Help Center. ## Changes - `docs/gitbook-help/frequently-asked-questions-faqs.md` — FAQ #23: what is the MCP server; FAQ #24: how to connect Claude Desktop ## Notes - `mcp.md` already exists and is complete — no changes - Port 8553 and MCP flags already in `default-ports.md` and `configuring-erigon/README.md` - Second PR targeting `main`: #20605
This reverts commit 02e843b.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reverts #20454
reason: gnosis has regression