Drain deferred DoReadPage callbacks before disposing scan iterator#1719
Merged
Drain deferred DoReadPage callbacks before disposing scan iterator#1719
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a race in Tsavorite scan iterators where deferred BumpCurrentEpoch(() => DoReadPage(...)) drain callbacks can run after the iterator is disposed, leading to use-after-free of frame memory / null state and cascading epoch-hold assertion failures.
Changes:
- Removes the prior “disposed/null array” guard inside
DoReadPage, returning to direct use ofloadCompletionEvents. - Adds a new “Phase 1” in
Dispose()to drain any pending deferredDoReadPagecallbacks before waiting for async I/O completion and disposing resources. - Keeps the existing “Phase 2” wait/dispose loop to ensure issued async reads complete before freeing resources.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f1f386d to
563960d
Compare
00abe92 to
42e2d0d
Compare
BufferAndLoad defers page reads via BumpCurrentEpoch(() => DoReadPage(...)). SuspendDrain guarantees the drain callback (DoReadPage) has executed by the time GetNext returns, but the async I/O issued by DoReadPage may still be in flight. If Dispose frees the frame before the I/O callback fires, the callback writes to freed native memory (AccessViolationException). Fix: track outstanding async I/O with a pendingDrainCallbacks counter. Increment before issuing AsyncReadPageFromDeviceToFrame, decrement in AsyncReadPageFromDeviceToFrameCallback. Dispose spin-waits for the counter to reach zero before freeing resources. No epoch acquisition needed in Dispose — SuspendDrain already guarantees drain callbacks have executed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
42e2d0d to
9e3d0d8
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BufferAndLoad defers page reads via BumpCurrentEpoch(() => DoReadPage(...)). These drain callbacks capture 'this' and access instance state (frame memory, loadCompletionEvents, etc.). For read-ahead pages, the callback may still be in the epoch drain list when the scan completes and the iterator is disposed. The callback then accesses freed native frame memory (AccessViolationException) or null arrays (NullReferenceException), and the resulting exception from within a drain callback leaves the epoch in a held state, causing cascading 'Trying to acquire protected epoch' assertion failures.
Fix: add a drain phase at the start of Dispose() that detects pending deferred reads (nextLoadedPages[i] > loadedPages[i]) and drains them by acquiring the epoch and calling ProtectAndDrain until the callback executes. This ensures loadCompletionEvents[i] is set, so the existing Phase 2 wait can then handle async I/O completion. Only after both phases is it safe to free the frame.
Also reverts the insufficient null-check guard in DoReadPage from the prior fix, since the drain-before-dispose approach makes it unnecessary.