execution/cache: fix StateCache dropping deleted storage keys, causing stale reads#20265
Merged
execution/cache: fix StateCache dropping deleted storage keys, causing stale reads#20265
Conversation
StateCache.Get() treated empty/nil cached values as cache misses,
returning (nil, false). This caused reads after a DomainDel to fall
through to the DB and return the stale pre-delete value:
1. DomainDel → stateCache.Delete(key)
2. GetLatest reads mem batch → nil → stateCache.Put(key, nil)
3. Put sees len(nil)==0 → cache.Delete(key) — removes from cache!
4. Next tx: stateCache.Get(key) → not found → falls through to DB
5. DB returns old non-zero value → state corruption
This caused Erigon nodes on Hoodi to get stuck: block 2523978
cleared 5 storage slots, but the clearing was lost in the cache,
so block 2525503 saw stale non-zero values causing a 91,100 gas
mismatch.
Fix:
- Get: remove the len(v)==0 → false short-circuit
- Put: store []byte{} sentinel for empty values instead of deleting
The bug only affects main; release/3.3 does not have StateCache in
SharedDomains.
Closes #20169
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
|
There's a subtle issue with the sentinel approach. In cache.Put(key, []byte{})But if !ok || c.sizeFunc(value) == 0 {
return zero, false // sentinel is treated as "not found"!
}Since To fix this properly, either:
The |
…ession tests
GenericCache.Get had the same bug: `sizeFunc(value) == 0` caused it to
return false for empty []byte{} sentinels stored by the StateCache fix.
Add regression tests that reproduce the exact SharedDomains pattern:
Put(key, nil) followed by Get(key) must return (_, true) — a cache hit
with empty value, not a cache miss.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exercises the exact sequence that caused Hoodi nodes to get stuck: 1. Put(key, value) — cache a storage value 2. Delete(key) — simulate DomainDel 3. Put(key, nil) — simulate GetLatest caching a deletion from mem buffer 4. Get(key) — must return (nil, true), not (nil, false) Step 4 is the critical assertion: with the bug, Get returns false (cache miss), causing the caller to fall through to the DB and read the stale pre-delete value. This caused SSTORE gas to be mispriced (warm noop instead of cold new-slot creation), producing a 91,100 gas mismatch at Hoodi block 2525503. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…yte{}
common.Copy (bytes.Clone) returns nil for nil input, so the
len(value)==0 branch is unnecessary. Storing nil directly in the
map works: GenericCache.Get returns (nil, true) for it, which is
a more natural return value for deleted keys.
Also fix import ordering in blockchain_test.go.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AskAlexSharov
approved these changes
Apr 1, 2026
yperbasis
added a commit
that referenced
this pull request
Apr 1, 2026
…g stale reads (#20265) ## Summary Fix a bug in `StateCache.Get()`/`Put()` that caused deleted storage keys to be treated as cache misses, allowing stale pre-delete values to be read from the DB. **Root cause of #20169** (Erigon nodes stuck on Hoodi). ## The bug `StateCache` is an in-memory cache in `SharedDomains` that caches storage values between transactions in the same batch. When a storage slot is deleted (written to zero via `DomainDel`): 1. `DomainDel` → `stateCache.Delete(key)` — removes key from cache 2. `GetLatest` reads mem batch → gets `nil` → `stateCache.Put(key, nil)` — but `Put` sees `len(nil)==0` → calls `cache.Delete(key)` again (removes from cache!) 3. Next tx reads → `stateCache.Get(key)` → key not found → returns `(nil, false)` — **cache miss** 4. Falls through to DB → returns the old non-zero value that's still in MDBX ## Impact On Hoodi, block 2523978 tx[21] cleared 5 storage slots at contract `0x1d150609ee9edcc6143506ba55a4faaedd562cd9`. The clearing was lost in the cache, so those slots retained stale non-zero values. 1,525 blocks later, block 2525503 tx[25] hit those slots: 5 SSTOREs cost 91,100 less gas than expected (warm/existing writes instead of cold/new slot creation), causing a gas mismatch that stuck the node. ## Fix - **`Get`**: Remove the `len(v)==0 → false` short-circuit so that cached empty values are returned as `([]byte{}, true)` (cache hit with empty value), not `(nil, false)` (cache miss). - **`Put`**: When `len(value)==0`, store `[]byte{}` sentinel instead of deleting the key, so the "deleted" state is preserved in cache. ## Why release/3.3 is not affected release/3.3's `SharedDomains` does not have `stateCache`. All reads go directly to the mem batch → DB, bypassing the buggy cache layer. ## Test plan - [x] `go test ./execution/cache/...` passes - [x] Re-execution from clean snapshots on Hoodi produces correct state (block 2525503 passes with `diff=0`) - [ ] Full Hoodi sync from scratch with this fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yperbasis
added a commit
that referenced
this pull request
Apr 1, 2026
… causing stale reads (#20271) Cherry-pick of #20265 to `release/3.4`. ## Summary Fix a bug in `StateCache.Get()`/`Put()` that caused deleted storage keys to be treated as cache misses, allowing stale pre-delete values to be read from the DB. **Root cause of #20169** (Erigon nodes stuck on Hoodi). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 2, 2026
yperbasis
added a commit
that referenced
this pull request
Apr 12, 2026
The StateCache fix (PR #20265) is a valid cache correctness improvement, but it wasn't the root cause of Issue #20169 — the real fix was in the domain/snapshot layer (PR #20483). A cache miss falling through to the membatch/MDBX path should return the correct value. - Remove TestStateCacheDeletedStorageSSTOREGas (duplicates TestStateCache_PutEmpty_ThenGet_IsCacheHit, with wrong attribution) - Simplify TestStateCache_PutEmpty_ThenGet_IsCacheHit comment to frame it as a cache correctness property, not a regression test for #20169 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 12, 2026
…he tests (#20498) ## Summary - Remove `TestStateCacheDeletedStorageSSTOREGas` — it duplicated `TestStateCache_PutEmpty_ThenGet_IsCacheHit` and incorrectly attributed the SSTORE gas mismatch (#20169) to the StateCache layer - Simplify the comment on `TestStateCache_PutEmpty_ThenGet_IsCacheHit` to frame it as a cache correctness property, not a regression test for #20169 The StateCache fix (PR #20265) is a valid cache correctness improvement, but the root cause of #20169 was in the domain/snapshot layer — a cache miss falling through to the membatch/MDBX path should return the correct value. The real fix was PR #20483. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
6 tasks
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 13, 2026
## Summary - Wire the existing `RevertWithDiffset` into `unwindExec3` so the state cache is surgically invalidated during unwinds — keys touched by the unwound blocks are evicted, untouched keys remain cached - Previously `RevertWithDiffset` was implemented but never called; the cache survived unwinds only because `ValidateAndPrepare` cleared it on the next block (hash mismatch fallback) - The `lastExecHash` (already fetched with a comment anticipating this wiring) detects if the cache was modified by a different execution path (e.g. rolled-back `ValidatePayload`), falling back to a full clear ## Context Follow-up to #20265 and #20483. Those PRs fixed correctness bugs in the cache layer and domain unwind; this PR completes the picture by actually invalidating the cache during unwinds rather than relying on the next-block safety net. ## Test plan - [x] `TestRevertWithDiffset_SurgicalEviction` — touched keys evicted, untouched preserved, blockHash updated - [x] `TestRevertWithDiffset_HashMismatch_ClearsAll` — mismatched revertFromHash triggers full clear - [x] `TestRevertWithDiffset_AccountChange_EvictsCode` — account diffs also evict corresponding code entries - [x] `TestRevertWithDiffset_ThenValidateAndPrepare_Continuity` — end-to-end: surgical revert then re-execution preserves surviving cache data - [x] `make lint` clean - [ ] Full sync test on a testnet with unwinds 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a bug in
StateCache.Get()/Put()that caused deleted storage keys to be treated as cache misses, allowing stale pre-delete values to be read from the DB.Root cause of #20169 (Erigon nodes stuck on Hoodi).
The bug
StateCacheis an in-memory cache inSharedDomainsthat caches storage values between transactions in the same batch. When a storage slot is deleted (written to zero viaDomainDel):DomainDel→stateCache.Delete(key)— removes key from cacheGetLatestreads mem batch → getsnil→stateCache.Put(key, nil)— butPutseeslen(nil)==0→ callscache.Delete(key)again (removes from cache!)stateCache.Get(key)→ key not found → returns(nil, false)— cache missImpact
On Hoodi, block 2523978 tx[21] cleared 5 storage slots at contract
0x1d150609ee9edcc6143506ba55a4faaedd562cd9. The clearing was lost in the cache, so those slots retained stale non-zero values. 1,525 blocks later, block 2525503 tx[25] hit those slots: 5 SSTOREs cost 91,100 less gas than expected (warm/existing writes instead of cold/new slot creation), causing a gas mismatch that stuck the node.Fix
Get: Remove thelen(v)==0 → falseshort-circuit so that cached empty values are returned as([]byte{}, true)(cache hit with empty value), not(nil, false)(cache miss).Put: Whenlen(value)==0, store[]byte{}sentinel instead of deleting the key, so the "deleted" state is preserved in cache.Why release/3.3 is not affected
release/3.3's
SharedDomainsdoes not havestateCache. All reads go directly to the mem batch → DB, bypassing the buggy cache layer.Test plan
go test ./execution/cache/...passesdiff=0)🤖 Generated with Claude Code