execution/commitment: stop cloning bytes at TrieContext.Branch#21524
execution/commitment: stop cloning bytes at TrieContext.Branch#21524yperbasis wants to merge 2 commits into
Conversation
profiling at tip (MDBX, 60s window) pinned bytes.Clone as the #1 single allocation source at 64.3 GB / 60s (~16.85% of total alloc traffic). 37.86 GB / 60s (~631 MB/sec) of that came from the defensive common.Copy at TrieContext.Branch. The clone was redundant: every downstream consumer either reads the slice inline (cell.fillFromFields copies into pre-allocated arrays; merger.Merge consumes and produces a fresh buffer; trie_reader parses bytes into cells; unfoldBranchNode similar) or clones it itself at the queue boundary (getDeferredUpdate clones both prefix and prev when storing in the deferred-update pool). Branch's clone was a third copy of bytes that nothing needed to retain. Document the new contract (borrowed slice valid for the current ComputeCommitment scope) and update the test that exercised the old "returns owned bytes" guarantee to verify the new aliasing guarantee instead. After-measurement on the rig is blocked by an unrelated stage-loop persistence inconsistency (chaindata head pointer ahead of state writes on restart) that's reproducing on every restart cycle today; the change is mathematically minimal (single Clone removal + test contract update) and unit tests + make lint pass, so shipping on the math. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Live A/B measurement — both MDBX, both at mainnet tipSetup: two erigon instances on disjoint 4-P-core sets (CPUs 0-7 and 8-15) of an i9-13900KS, aux on E-cores. Both started from a byte-for-byte clone of the same chaindata @ block 25,204,537. Same prysm CL driving both via engine-mux fan-out (primary=opt, shadow=main, latency recorded mux-side end-to-end). Both clean (0 errors). Erigon-side direct execution time (strips mux RPC overhead, n=100 head-validated blocks per side):
Mux-side end-to-end (n=331
The honest engine-API gain is the 2-7% / 2-7 ms number from the erigon-side timer. The mux-side gap is wider by ~10 ppt — likely GC-pause and RPC serialization under main's higher heap churn. Alloc profile confirms the mechanism (lifetime alloc, same uptime each):
A one-line change that erases 5.5 GB of redundant clone work and shaves 2-7 ms off newPayload p50. |
There was a problem hiding this comment.
Pull request overview
This PR reduces allocation pressure in the commitment trie path by making TrieContext.Branch return the domain reader’s slice directly instead of cloning it, relying on callers to consume inline or clone at retention boundaries.
Changes:
- Removed the
common.CopyinTrieContext.Branch. - Added an implementation-level borrowed-slice contract comment.
- Updated the unit test to assert aliasing/borrowing behavior instead of copy ownership.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
execution/commitment/commitmentdb/commitment_context.go |
Changes Branch to return borrowed data directly and documents the new lifetime expectation. |
execution/commitment/commitmentdb/commitment_context_test.go |
Updates the branch ownership test to validate borrowed-slice aliasing behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e PatriciaContext interface Address Copilot's review on PR #21524: the borrowed-slice lifetime note was only on the concrete *TrieContext implementation; callers that depend on the interface couldn't see it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM \u2014 small obvious change (~39 changed lines across 3 files)
Summary
`TrieContext.Branch` clones every returned byte slice via `common.Copy` (= `bytes.Clone`). Profiling shows this single site is responsible for 37.86 GB / 60s (~631 MB/s) of allocated bytes at tip — about 10% of total alloc traffic and 59% of all `bytes.Clone` calls — for no benefit: every downstream consumer either reads inline (`cell.fillFromFields` copies into pre-allocated arrays; `merger.Merge`, `trie_reader`, `unfoldBranchNode`, `EncodeBranch` consume the bytes synchronously) or clones at its own queue boundary (`getDeferredUpdate` clones both `prefix` and `prev` when storing in the deferred-update pool).
The upstream Clone was a third copy of bytes nothing needed to retain.
Change
Test plan
Profile context
60-second steady-state CPU+alloc capture on the live-rig (MDBX, in-flight at tip) was used to identify the hotspot. Top alloc consumers pre-fix:
```
64.3 GB / 60s bytes.Clone (cum)
└─ 37.86 GB via TrieContext.Branch
└─ 11.27 GB via encodeDeferredUpdate (legitimate, queue boundary)
└─ 11.09 GB via getDeferredUpdate (legitimate, queue boundary)
└─ ... rest
79.0 GB / 60s sortableBuffer.Put (etl pooling - separate problem)
75.0 GB / 60s DomainPut (chains down to MemBatch writes)
63.1 GB / 60s GetLatest (cascades into Branch + downstream)
```