Skip to content

[dev][RangeIndex] Fix FlagRecovered never cleared on RangeIndex stub after restore#1746

Merged
badrishc merged 2 commits into
devfrom
tiagonapoli/fix-rangeindex-stub-flags
Apr 30, 2026
Merged

[dev][RangeIndex] Fix FlagRecovered never cleared on RangeIndex stub after restore#1746
badrishc merged 2 commits into
devfrom
tiagonapoli/fix-rangeindex-stub-flags

Conversation

@tiagonapoli
Copy link
Copy Markdown
Collaborator

@tiagonapoli tiagonapoli commented Apr 29, 2026

Summary

After checkpoint recovery, FlagRecovered is set on RangeIndex stubs but never cleared. On a second eviction cycle, RestoreTreeFromFlush picks the stale checkpoint snapshot instead of the fresh flush.bftree, losing post-recovery writes.

Root Cause

RecreateIndex() only sets TreeHandle — it does not clear FlagRecovered. No other code path clears the flag either.

Scenario:

  1. Create disk-backed RI key, insert data, checkpoint
  2. Recover from checkpoint — FlagRecovered set on stub
  3. First access restores from checkpoint snapshot (correct)
  4. Write new data (tree diverges from checkpoint)
  5. Eviction triggers flush (writes flush.bftree with current state) then frees tree
  6. Next access — RestoreTreeFromFlush sees FlagRecovered still set — picks stale checkpoint snapshot
  7. Post-recovery writes are lost

Fix

Clear FlagRecovered in RecreateIndex() after setting the new TreeHandle. This is safe because Tsavorite always calls OnFlush (which writes a fresh flush.bftree) before eviction, so any post-restore eviction will have the correct flush snapshot available.

Testing

  • Added RIRecoverThenSecondEvictionUsesFlushSnapshotTest
  • Without fix: test fails (Expected: "new-value" But was: null)
  • With fix: test passes
  • All 56 existing RI tests pass (no regressions)

Copilot AI review requested due to automatic review settings April 29, 2026 03:40
@tiagonapoli tiagonapoli changed the title Fix FlagRecovered never cleared on RangeIndex stub after restore [dev][RangeIndex] Fix FlagRecovered never cleared on RangeIndex stub after restore Apr 29, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a RangeIndex recovery/eviction correctness issue where FlagRecovered remained set on restored stubs after checkpoint recovery, causing subsequent eviction cycles to restore from a stale checkpoint snapshot instead of the latest flush.bftree.

Changes:

  • Clear RangeIndexStub.FlagRecovered in RangeIndexManager.RecreateIndex() after the first post-recovery restore.
  • Add a regression test covering the “recover → write → flush → evict → restore” second-eviction scenario to ensure post-recovery writes are preserved.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
libs/server/Resp/RangeIndex/RangeIndexManager.Index.cs Clears FlagRecovered during RIRESTORE stub recreation so later restores choose the flush snapshot.
test/Garnet.test/RespRangeIndexTests.cs Adds an end-to-end regression test reproducing the stale-checkpoint-on-second-eviction bug.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tiagonapoli tiagonapoli force-pushed the tiagonapoli/fix-rangeindex-stub-flags branch 2 times, most recently from 3aad625 to 69212e8 Compare April 29, 2026 03:59
After server recovery from a checkpoint, MarkRecoveredFromCheckpoint sets
FlagRecovered on the RangeIndex stub. However, RecreateIndex (called when
the tree is lazily restored) never cleared this flag. This caused
RestoreTreeFromFlush to incorrectly pick the stale checkpoint snapshot
instead of the fresh flush.bftree on a second eviction cycle, losing all
writes made between recovery and the second eviction.

Fix: Clear FlagRecovered in RecreateIndex() after setting the new
TreeHandle. This is safe because eviction always goes through OnFlush
first (which writes a fresh flush.bftree), so any post-restore eviction
will have the correct flush snapshot available.

Add regression test RIRecoverThenSecondEvictionUsesFlushSnapshotTest that
exercises the full scenario: checkpoint -> recover -> restore -> write
new data -> flush -> promote -> write more -> evict -> restore again ->
assert post-recovery data is present.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tiagonapoli tiagonapoli force-pushed the tiagonapoli/fix-rangeindex-stub-flags branch from 69212e8 to f0360b9 Compare April 29, 2026 04:06
@badrishc badrishc merged commit f77a391 into dev Apr 30, 2026
2 checks passed
@badrishc badrishc deleted the tiagonapoli/fix-rangeindex-stub-flags branch April 30, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants