Skip to content

Drain deferred DoReadPage callbacks before disposing scan iterator#1719

Merged
badrishc merged 3 commits intodevfrom
badrishc/scan-iterator-fix-2
Apr 22, 2026
Merged

Drain deferred DoReadPage callbacks before disposing scan iterator#1719
badrishc merged 3 commits intodevfrom
badrishc/scan-iterator-fix-2

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

BufferAndLoad defers page reads via BumpCurrentEpoch(() => DoReadPage(...)). These drain callbacks capture 'this' and access instance state (frame memory, loadCompletionEvents, etc.). For read-ahead pages, the callback may still be in the epoch drain list when the scan completes and the iterator is disposed. The callback then accesses freed native frame memory (AccessViolationException) or null arrays (NullReferenceException), and the resulting exception from within a drain callback leaves the epoch in a held state, causing cascading 'Trying to acquire protected epoch' assertion failures.

Fix: add a drain phase at the start of Dispose() that detects pending deferred reads (nextLoadedPages[i] > loadedPages[i]) and drains them by acquiring the epoch and calling ProtectAndDrain until the callback executes. This ensures loadCompletionEvents[i] is set, so the existing Phase 2 wait can then handle async I/O completion. Only after both phases is it safe to free the frame.

Also reverts the insufficient null-check guard in DoReadPage from the prior fix, since the drain-before-dispose approach makes it unnecessary.

Copilot AI review requested due to automatic review settings April 21, 2026 17:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a race in Tsavorite scan iterators where deferred BumpCurrentEpoch(() => DoReadPage(...)) drain callbacks can run after the iterator is disposed, leading to use-after-free of frame memory / null state and cascading epoch-hold assertion failures.

Changes:

  • Removes the prior “disposed/null array” guard inside DoReadPage, returning to direct use of loadCompletionEvents.
  • Adds a new “Phase 1” in Dispose() to drain any pending deferred DoReadPage callbacks before waiting for async I/O completion and disposing resources.
  • Keeps the existing “Phase 2” wait/dispose loop to ensure issued async reads complete before freeing resources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/storage/Tsavorite/cs/src/core/Allocator/ScanIteratorBase.cs Outdated
Comment thread libs/storage/Tsavorite/cs/src/core/Allocator/ScanIteratorBase.cs
@badrishc badrishc force-pushed the badrishc/scan-iterator-fix-2 branch from f1f386d to 563960d Compare April 21, 2026 17:51
@badrishc badrishc marked this pull request as draft April 21, 2026 17:51
@badrishc badrishc force-pushed the badrishc/scan-iterator-fix-2 branch 2 times, most recently from 00abe92 to 42e2d0d Compare April 21, 2026 19:00
BufferAndLoad defers page reads via BumpCurrentEpoch(() => DoReadPage(...)).
SuspendDrain guarantees the drain callback (DoReadPage) has executed by the
time GetNext returns, but the async I/O issued by DoReadPage may still be
in flight. If Dispose frees the frame before the I/O callback fires, the
callback writes to freed native memory (AccessViolationException).

Fix: track outstanding async I/O with a pendingDrainCallbacks counter.
Increment before issuing AsyncReadPageFromDeviceToFrame, decrement in
AsyncReadPageFromDeviceToFrameCallback. Dispose spin-waits for the counter
to reach zero before freeing resources. No epoch acquisition needed in
Dispose — SuspendDrain already guarantees drain callbacks have executed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc force-pushed the badrishc/scan-iterator-fix-2 branch from 42e2d0d to 9e3d0d8 Compare April 21, 2026 19:20
@badrishc badrishc marked this pull request as ready for review April 21, 2026 19:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/storage/Tsavorite/cs/src/core/Allocator/ScanIteratorBase.cs
@badrishc badrishc merged commit 949e526 into dev Apr 22, 2026
32 of 33 checks passed
@badrishc badrishc deleted the badrishc/scan-iterator-fix-2 branch April 22, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants