Skip to content

db/state: prune TemporalMemBatch overlay entries past unwindToTxNum#20625

Merged
yperbasis merged 1 commit intomainfrom
fix/temporal-mem-batch-unwind-prune-overlay
Apr 17, 2026
Merged

db/state: prune TemporalMemBatch overlay entries past unwindToTxNum#20625
yperbasis merged 1 commit intomainfrom
fix/temporal-mem-batch-unwind-prune-overlay

Conversation

@mh0lt
Copy link
Copy Markdown
Contributor

@mh0lt mh0lt commented Apr 17, 2026

Summary

Fix a post-unwind stale-read in the in-memory domain overlay that causes gas-used mismatches on post-Fusaka mainnet catch-up.

TemporalMemBatch stores per-key overlay writes as []dataWithTxNum, each entry stamped with its write txNum. getLatest returns dataWithTxNums[len-1] — the most recently appended entry — without comparing it against sd.unwindToTxNum. Unwind() only recorded unwindToTxNum + an unwindChangeset; it never touched sd.domains / sd.storage. Since unwindChangeset is consulted only when the overlay misses, any key still present in the overlay kept returning a pre-unwind write made inside the unwound txNum range.

Observed symptom

Post-Fusaka mainnet catch-up, after a forkchoice-driven unwind. The first re-executed block reads a storage slot that was first-written inside the unwound range. The overlay returns the post-target write, flipping the SSTORE cost from SET (20000) to RESET (2900)exactly a 17100-gas shortfall per affected slot.

  • Block 24,898,955: diff=-34200 (2 slots × 17100)
  • Block 24,899,403: diff=-73829 (compound — several slots affected)

Live trace instrumentation (not shipped) captured 3,082 SD_STALE_READ events between an unwind at txNum=3454259398 and the resulting mismatch at block 24,899,403.

Fix

On Unwind, walk sd.domains and sd.storage and drop any dataWithTxNum whose txNum > unwindToTxNum. If a key's slice empties out, delete the key so the unwindChangeset fallback (or the underlying tx) supplies the pre-unwind answer. Runs under sd.latestStateLock so the transition is atomic to concurrent reads. Storage-btree mutations are staged after Scan to respect btree iterator rules.

Regression test

TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey in db/state/execctx/domain_shared_test.go:

  • writes a first-time storage value at txNum=100
  • calls Unwind(50)
  • asserts the overlay no longer returns the post-target write

Test fails on pre-fix code with the exact error that mirrors the mainnet symptom; passes with this change.

Test plan

  • go test -short ./db/state/... — all pass
  • make lint — 0 issues
  • make erigon — builds clean
  • Manual sync verification: post-Fusaka mainnet with chaindata/ wiped and snapshots/ retained (same repro that produced block-24,899,403 mismatch) — sync progresses past the catch-up / first-forkchoice-unwind window without a gas mismatch.

Known adjacent issues, out of scope

DB-layer siblings of this bug exist separately and are not addressed here:

  1. db/state/domain.go:1317 — on-disk unwind currently conflates nil ("different step, skip") and []byte{} ("key was absent, write tombstone") via if len(value) > 0, so first-time writes in the unwound range leave no restoring tombstone.
  2. db/state/domain.go:1665getLatestFromDb discards deletion markers at a step within file range, so the caller falls through to getLatestFromFiles, which has no concept of deletions and returns stale pre-deletion data.

Both were previously addressed by #20483 and reverted by #20509 while regressions were investigated. They need their own narrower fixes with dedicated regression guards and should be staged as separate PRs so they're independently revertible.

🤖 Generated with Claude Code

TemporalMemBatch tracks per-key overlay writes as []dataWithTxNum with
each write stamped by txNum. getLatest returns dataWithTxNums[len-1]
— the most recently appended entry — without comparing it against
sd.unwindToTxNum. Unwind() only recorded unwindToTxNum and an
unwindChangeset; it never touched sd.domains / sd.storage. Since
unwindChangeset is consulted only when the overlay misses, any key
still present in the overlay would keep returning a pre-unwind
write made inside the unwound txNum range.

Observed on post-Fusaka mainnet catch-up: after a forkchoice-driven
unwind, the first re-executed block reads a storage slot that was
first-written inside the unwound range. The overlay returns the
post-target write, flipping the SSTORE cost from SET (20000) to
RESET (2900) — exactly a 17100-gas shortfall per affected slot.
On mainnet block 24899403 this compounded to diff=-73829 across
several slots; block 24898955 showed diff=-34200.

Fix: on Unwind, walk sd.domains and sd.storage and drop any
dataWithTxNum whose txNum > unwindToTxNum. If a key's slice
empties out, delete the key so the unwindChangeset fallback (or
the underlying tx) can supply the pre-unwind answer.
Runs under sd.latestStateLock so the transition is visible
atomically to concurrent reads.

A regression test in db/state/execctx/domain_shared_test.go
(TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey) writes
a first-time storage value at txNum=100, calls Unwind(50), and
asserts the overlay no longer returns the post-target write.
Test fails on the pre-fix code and passes with this change.

Note: DB-layer siblings of this bug exist separately (missing
tombstone on domain unwind for []byte{} diffs; getLatestFromDb
fallthrough past in-range deletion markers, previously addressed
by #20483 and reverted by #20509). Those need their own surgical
fixes with dedicated regression guards and are intentionally
out-of-scope here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sudeepdino008
Copy link
Copy Markdown
Member

getLatestFromFiles, which has no concept of deletions and returns stale pre-deletion data.

files do have notion of deleted keys vs missing keys

@yperbasis yperbasis added this pull request to the merge queue Apr 17, 2026
Merged via the queue into main with commit 2698b38 Apr 17, 2026
36 checks passed
@yperbasis yperbasis deleted the fix/temporal-mem-batch-unwind-prune-overlay branch April 17, 2026 16:32
github-merge-queue Bot pushed a commit that referenced this pull request Apr 18, 2026
…estFromDb (#20627)

## Summary

Re-land the narrow core of #20483 (reverted by #20509), addressing the
DB-layer siblings of the post-unwind stale-read bug. Complements #20625
which addressed the in-memory overlay side.

Two linked bugs in the DB-level domain unwind and read paths caused
stale data resurrection after an unwind that reverted a first-time write
or a deletion inside the unwound range.

### Symptom observed on mainnet

Post-Fusaka mainnet catch-up sync with a chaindata/ wipe (snapshots/
retained). On the first re-executed block after a forkchoice-driven
unwind, execution returned less gas than the header — diffs observed of
`-34200` (block 24,898,955), `-73829` (block 24,899,403), `-118872`
(block 24,899,594). The diffs break down into multiples of `SSTORE_SET -
SSTORE_RESET = 17100` plus cold-access flips of `2600`.

Previous PR #20625 cleared the first two by pruning the in-memory
overlay on Unwind. Block 24,899,594 still failed because the overlay was
already flushed to DB at Unwind time — the stale-read path now was
purely DB-layer, addressed here.

## Fixes

### 1. `unwind()` must restore empty tombstones —
`db/state/domain.go:1317` (both DupSort and LargeValues paths)

`DomainEntryDiff.Value` has three shapes, documented in
`db/kv/helpers.go:247`:
- `nil` — "different step": prev value lives at another step, skip
restore (legacy V0 changeset shape)
- `[]byte{}` — "no previous value": key was absent before this step;
write an empty tombstone so the key appears absent again after the
unwind completes
- non-empty — restore the actual previous value

The old guard `if len(value) > 0` skipped *both* `nil` and `[]byte{}`,
leaving no tombstone after unwinding a first-time write. Corrected to
`if value != nil`.

### 2. `getLatestFromDb` must treat empty values as authoritative —
`db/state/domain.go:1665`

Empty-value entries are deletion tombstones. The step-age guard
previously discarded them when their step fell within the frozen file
range, causing the caller to fall through to `getLatestFromFiles`.
Frozen files encode deletions as absence, so the file returns the
pre-deletion value — the exact resurrection the deletion was meant to
prevent. Empty entries are now returned as `found=true` regardless of
step age; the step-age guard still applies to non-empty entries.

## Relationship to #20483 / #20509

This is a deliberate re-land of the narrow core of #20483. Key
differences:

- **Excluded**: the LargeValues cross-check in `getLatest` (PR #20483
lines 1697–1737). That handled an interrupted-`PruneSmallBatches` edge
case specific to `CodeDomain` / `RCacheDomain` and was the most likely
source of the regressions that motivated #20509's revert. If it proves
necessary, it can be added later as a separate PR with its own dedicated
test.
- **Included**: both matched fixes (write-side tombstone + read-side
authoritativeness). They are a pair — neither is useful alone; staging
them separately risks merging half and shipping a version that's still
broken.

## Tests

- `TestDomain_UnwindRestoresDeletionMarker` (DupSort + LargeValues
subtests) — writes a key, deletes it, re-writes within the same step,
builds files, then unwinds the re-write. **Fails on pre-fix code**
(getLatest returns the stale post-unwind `value2` from the frozen file);
passes with the fix. Exercises the write-side bug directly.

- `TestDomain_DeletedKeyNotResurrectedByFiles` (DupSort + LargeValues
subtests) — documents the read-side contract by writing a key and
deleting it at a step that falls within file range. Passes on current
`main` even without the fix (the file-build + prune semantics evolved
since #20483 and no longer hit the specific stale-read in this exact
test scenario), but retained as a forward regression guard and as
documentation of the invariant.

## Test plan

- [x] `go test -short ./db/state/...` — all pass
- [x] `make lint` — 0 issues
- [x] `make erigon` — builds clean
- [x] Manual repro of the production symptom (mainnet sync from
snapshots-only) in combination with #20625 — sync progresses past the
catch-up / first-forkchoice-unwind window without a gas mismatch.
(Re-verification run in progress alongside this PR.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AskAlexSharov pushed a commit that referenced this pull request Apr 21, 2026
…ry orphan (#20716)

Cherry-pick of #20710 to release/3.4.

## r3.4-specific adaptations

- `sd1.Merge` call in the new test dropped `ctx` — release/3.4's
signature is `Merge(uint64, *SharedDomains, uint64)`.
- `TestSharedDomain_RepeatedUnwindAcrossStepBoundary` is skipped: it
depends on the overlay-pruning semantics from PR #20625 (not present in
release/3.4). The single-unwind variants
(`TestSharedDomain_UnwindAcrossStepBoundary`,
`TestSharedDomain_MergeUnwindAcrossStepBoundary`,
`TestSharedDomain_UnwindWithDeleteAcrossStepBoundary`) still guard the
per-step orphan-entry invariant this PR fixes.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants