Skip to content

[r3.4] db/state: fix CursorHeap tie-break to prefer RAM over DB over FILE#20318

Merged
sudeepdino008 merged 4 commits intorelease/3.4from
fix/cursor-heap-priority-34
Apr 4, 2026
Merged

[r3.4] db/state: fix CursorHeap tie-break to prefer RAM over DB over FILE#20318
sudeepdino008 merged 4 commits intorelease/3.4from
fix/cursor-heap-priority-34

Conversation

@sudeepdino008
Copy link
Copy Markdown
Member

@sudeepdino008 sudeepdino008 commented Apr 3, 2026

  • When DomainDelPrefix iterates storage keys during account deletion, it merges RAM, DB, and FILE cursors via a min-heap
  • RAM and DB cursors both used endTxNum: math.MaxUint64, so when the same key existed in both, heap order was undefined
  • If DB won the tie, the old value was used as prevVal in the history entry instead of the current uncommitted RAM value
  • This caused corrupted history values that propagated to commitment root mismatches

Fix: add tie-breaker in CursorHeap.Less — when endTxNum is equal, prefer RAM_CURSOR > DB_CURSOR > FILE_CURSOR

Safety: CursorHeap is used in 3 contexts:

  • merge.go / deduplicate.go: only FILE_CURSOR — tie-break is a no-op (same type, returns false)
  • DomainLatestIterFile (DB + FILE): DB endTxNum = step * stepSize always differs from FILE endTxNum = item.endTxNum - 1, so tie-break rarely fires. If it does, DB correctly wins over FILE
  • debugIteratePrefixLatest (RAM + DB + FILE): the bug fix — RAM now correctly wins over DB when both use MaxUint64

Fixes #20246

Copy link
Copy Markdown
Collaborator

@Giulio2002 Giulio2002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AskAlexSharov
Copy link
Copy Markdown
Collaborator

@sudeepdino008 but files must be over DB? Because pruning can leave garbage?

@sudeepdino008
Copy link
Copy Markdown
Member Author

@sudeepdino008 but files must be over DB? Because pruning can leave garbage?

i'll think about it.

@sudeepdino008 sudeepdino008 marked this pull request as draft April 4, 2026 04:22
@sudeepdino008
Copy link
Copy Markdown
Member Author

@sudeepdino008 but files must be over DB? Because pruning can leave garbage?

i'll think about it.

@AskAlexSharov we should be fine as long as we don't do out of order pruning (e.g removing more latest step first, then older step). Half pruning is okay.

@sudeepdino008 sudeepdino008 marked this pull request as ready for review April 4, 2026 15:44
@sudeepdino008 sudeepdino008 enabled auto-merge (squash) April 4, 2026 15:44
@sudeepdino008 sudeepdino008 merged commit 6f6ca81 into release/3.4 Apr 4, 2026
17 of 22 checks passed
@sudeepdino008 sudeepdino008 deleted the fix/cursor-heap-priority-34 branch April 4, 2026 15:55
AskAlexSharov pushed a commit that referenced this pull request Apr 5, 2026
…FILE (#20318)

- When `DomainDelPrefix` iterates storage keys during account deletion,
it merges RAM, DB, and FILE cursors via a min-heap
- RAM and DB cursors both used `endTxNum: math.MaxUint64`, so when the
same key existed in both, heap order was undefined
- If DB won the tie, the old value was used as `prevVal` in the history
entry instead of the current uncommitted RAM value
- This caused corrupted history values that propagated to commitment
root mismatches

Fix: add tie-breaker in `CursorHeap.Less` — when `endTxNum` is equal,
prefer `RAM_CURSOR > DB_CURSOR > FILE_CURSOR`

**Safety**: `CursorHeap` is used in 3 contexts:
- `merge.go` / `deduplicate.go`: only `FILE_CURSOR` — tie-break is a
no-op (same type, returns false)
- `DomainLatestIterFile` (DB + FILE): DB `endTxNum = step * stepSize`
always differs from FILE `endTxNum = item.endTxNum - 1`, so tie-break
rarely fires. If it does, DB correctly wins over FILE
- `debugIteratePrefixLatest` (RAM + DB + FILE): the bug fix — RAM now
correctly wins over DB when both use `MaxUint64`

Fixes #20246
@AskAlexSharov AskAlexSharov requested a review from Copilot April 6, 2026 00:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a deterministic ordering bug in db/state’s cursor merge heap that could select stale DB values over uncommitted RAM values when keys and endTxNum tie, leading to incorrect history entries and downstream commitment root mismatches (per #20246).

Changes:

  • Add a tie-break rule in CursorHeap.Less to prefer RAM_CURSOR > DB_CURSOR > FILE_CURSOR when endTxNum is equal.
  • Add unit tests covering the new tie-break behavior and the previously failing merge-loop scenario.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
db/state/domain_stream.go Makes heap ordering deterministic on (key, endTxNum) ties by prioritizing RAM over DB over FILE.
db/state/domain_stream_test.go Adds tests to ensure tie-break priority is enforced and that merge-loop selection prefers RAM values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +47
func TestCursorHeapPriority_MaxUint64TieBreak(t *testing.T) {
// Reproduces the exact scenario from #20246: RAM and DB both use
// math.MaxUint64 as endTxNum in debugIteratePrefixLatest.
// RAM must win so DomainDelPrefix picks up the current uncommitted value.
var h CursorHeap
heap.Init(&h)
heap.Push(&h, &CursorItem{t: DB_CURSOR, key: []byte("key1"), val: []byte("0f"), endTxNum: ^uint64(0), reverse: true})
heap.Push(&h, &CursorItem{t: RAM_CURSOR, key: []byte("key1"), val: []byte("15"), endTxNum: ^uint64(0), reverse: true})

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TestCursorHeapPriority_MaxUint64TieBreak the comment says the scenario uses math.MaxUint64, but the test value is ^uint64(0). Consider either importing math and using math.MaxUint64 or updating the comment so the test intent matches the literal being used (this avoids confusion when scanning failures).

Copilot uses AI. Check for mistakes.
github-merge-queue Bot pushed a commit that referenced this pull request Apr 6, 2026
…x, xsync fix, warmuper fix, logs debug, unused code (#20337)

Cherry-picks from release/3.4 to main:

- integrity: blk-range chk to pre-build "changed keys" index before calc
state root (#20302)
- up gql and grpc (#20304)
- stepSize: all tooling to use `tx.Debug().StepSize()` instead of
`DefaultStepSize` constant (#20280)
- db/state: fix CursorHeap tie-break to prefer RAM over DB over FILE
(#20318)
- xsync deprecated use fix (#20331)
- warmuper.WaitAndClose: cancel work before wait (#20332)
- logs: move some Info logs to Debug level (to simplify logs for Users)
(#20329)
- db: unused code remove (#20334)

---------

Co-authored-by: moskud <sudeepdino008@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants