TraceDB: Snapshot-backed state for the trace baker#3360
Conversation
1764fc9 to
c173794
Compare
7b8a363 to
af0de0f
Compare
2412434 to
e0a56bd
Compare
af0de0f to
6981a66
Compare
e0a56bd to
c5e3d21
Compare
6981a66 to
fe9ec89
Compare
c5e3d21 to
4bfc441
Compare
ae34e85 to
f861506
Compare
4bfc441 to
cf01744
Compare
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
7c210c4 to
27dc9b0
Compare
cf01744 to
c28c23f
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3360 +/- ##
==========================================
- Coverage 59.26% 59.26% -0.01%
==========================================
Files 2110 2111 +1
Lines 174242 174387 +145
==========================================
+ Hits 103259 103343 +84
- Misses 62053 62081 +28
- Partials 8930 8963 +33
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Stacks on the trace baker PR. Captures an O(1) memiavl snapshot of the SC tree at EndBlock and serves trace re-execution from in-RAM state instead of SS-pebble. memiavl: refcount *Snapshot. Tree.Copy() Acquires; Snapshot.Close unmaps only on the final release. Without this a held copy was a use-after-munmap waiting to happen — the background snapshot rewrite calls Tree.ReplaceWith → snapshot.Close mid-flight, segfaulting any held copy. The internal rewrite goroutine also drops its clone's ref so the refcount can reach zero. Committer interface gains Copy(). memiavl delegates to *DB.Copy. composite returns nil when flatkv is engaged so the snapshot path silently falls back. storev2 rootmulti adds SnapshotSCStore + CacheMultiStoreFromCommitter. EVM keeper: TraceSnapshotStore (bounded by-height map) and EndBlock capture keyed by snapshot.Version() (= H-1 at EndBlock(H)). App: SnapshotAwareRPCContextProvider builds the sdk.Context directly from the snapshot CMS to skip the throwaway CacheMultiStoreWithVersion that CreateQueryContext would otherwise make. Configurable via [evm]: trace_bake_use_snapshot (default false) trace_bake_snapshot_window (default 64) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Point to the existing memiavl MemNode gauges and trace-baker counters that operators should watch when enabling the snapshot path on high-throughput nodes. No new metrics — just signposts to ones that already exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9bba87f to
aacf715
Compare
Resolve conflict in sei-db/state_db/sc/types/types.go by keeping both the Copy() addition from this branch and the Importer doc comment from main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| homeDir string, | ||
| stateStore types.StateStore, | ||
| isPanicOrSyntheticTxFunc func(ctx context.Context, hash common.Hash) (bool, error), // used in *ExcludeTraceFail endpoints | ||
| traceCtxProviders ...TraceContextProvider, |
There was a problem hiding this comment.
only the first element is ever read. A non-variadic *TraceContextProvider parameter (or a small options struct) avoids the "what if someone passes two" ambiguity and keeps the signature self-documenting.
There was a problem hiding this comment.
keeping the variadic override for minimal PR scope since only one provider is intentionally supported
| } | ||
|
|
||
| // Close releases all retained snapshots. | ||
| func (s *TraceSnapshotStore) Close() { |
There was a problem hiding this comment.
TraceSnapshotStore.Close() returns nothing, but inside it ignores per-snapshot release errors:
x/evm/keeper/trace_snapshot.go:97-112
_ = releaser.ReleaseSnapshotRefs()
Same as the WARN above: refcount mismatches are real bugs. Either return error or log at WARN level on close to keep ops visibility.
There was a problem hiding this comment.
Updated, release/close errors are now logged at WARN instead of being swallowed
| defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker) | ||
| // Bake height-1: at EndBlock(N) the indexer's safe latest is N-1, so | ||
| // N-1 is the most recent block guaranteed to be queryable. | ||
| // Bake height-1: at EndBlock(N) the indexer's safe latest is N-1. When |
There was a problem hiding this comment.
EndBlock snapshot semantics are subtle — comment is dense, easy to mis-read
The off-by-one here is correct but non-obvious: storev2/rootmulti.flush() doesn't run until Commit(), so at EndBlock(N) the SC tree state is state_after_commit_of_(N-1) and snap.Version() == N-1. The baker then traces H=N-1, whose initializeBlock calls ctxProvider(H-1) = ctxProvider(N-2), which finds snap[N-2] Put at the previous EndBlock(N-1). Worth a one-line "lined up because rs.flush is called from Commit, not from EndBlock" in the comment to save the next reader a half-hour.
Also: initializeBlock calls the provider twice — once for prevBlockHeight (H-1) and once for blockNumber (H) (for WithNextMs). That means a single trace leases both snap[H-1] and snap[H]. As long as TraceBakeSnapshotWindow >= 2 this is fine, but if an operator misconfigures window=1 the second lease will miss and silently fall through to SS-pebble for oracle_mem/WithNextMs. Consider clamping window to >= 2 (or whatever the documented minimum is) at config-load time.
There was a problem hiding this comment.
Fixed the root cause by making snapshot window >= 2. Leaving the comment wording as is for now
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4e744cf. Configure here.
| for _, snap := range toRelease { | ||
| releaseSnapshotRefs(snap) | ||
| } | ||
| } |
There was a problem hiding this comment.
TraceSnapshotStore.Close swallows errors silently on shutdown
Low Severity
TraceSnapshotStore.Close() returns nothing, so HandleClose in app.go cannot collect or propagate snapshot-release errors, unlike every other resource in that function which appends errors to errs. A refcount mismatch (an over-close indicating a real bug) would only appear as a WARN log line instead of surfacing through the standard error-return path.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 4e744cf. Configure here.


Describe your changes and provide context
StateReleaseFunc, avoiding GC finalizers.trace_bake_use_snapshot; falls back when the backend cannot provide a snapshot.Testing performed to validate your change
go test ./sei-db/state_db/sc/memiavl -run 'TreeCopy|Snapshot' -count=1Note
Medium Risk
Medium risk because it changes how
debug_trace*replays acquire/release historical state and adds new snapshot lifecycle management; while gated behind config and with fallbacks, bugs could cause leaks, stale reads, or trace failures under load.Overview
Adds optional snapshot-backed trace baking: when
trace_bake_use_snapshotis enabled and the node is using thestorev2root multi-store,EndBlockcaptures an in-memory SC snapshot and the trace baker/debug endpoints replay against that snapshot instead of SS-pebble.Plumbs a new
TraceContextProviderthroughevmrpcso debug tracing can obtain a context plus a release function, and implementsSnapshotAwareRPCContextProviderto build contexts from leased snapshots (with consensus params populated) and fall back to the existing RPC context on misses/unsupported backends.Extends the state-commit/memiavl layer to support
Committer.Copy()snapshots and explicit ref-release, including refcounted memiavl snapshot mmaps and new tests to prevent use-after-unmap and verify eviction/lease semantics.Reviewed by Cursor Bugbot for commit 4e744cf. Bugbot is set up for automated code reviews on this repo. Configure here.