gds: register stream to warm compat init#119
Merged
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #119 +/- ##
==========================================
+ Coverage 55.64% 55.71% +0.07%
==========================================
Files 49 49
Lines 6903 6903
Branches 1233 1233
==========================================
+ Hits 3841 3846 +5
+ Misses 2585 2582 -3
+ Partials 477 475 -2
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
9d4d373 to
27a726c
Compare
nclack
added a commit
that referenced
this pull request
May 22, 2026
Fixes #118. ## Approach In compat mode, libcufile lazily allocates per-stream state on the first `cuFileReadAsync`, and that lazy init races against itself. The passing test in the same file happens to enqueue a `cuLaunchHostFunc` barrier before the first read, which serializes the stream enough to mask the race. The failing test goes straight into `cuFileReadAsync` on an empty stream and SEGVs ~4% of the time deep inside libcufile. cuFile already exposes the hook for this: `cuFileStreamRegister` "allocates resources needed to support cuFile operations asynchronously for the cuda stream" — i.e. exactly the lazy state that was racing. Calling it eagerly when damacy adopts a stream removes the race window. The matching `cuFileStreamDeregister` is required before `cuStreamDestroy` per the cuFile contract. ## Change In `src/store/store_fs_gds.c`: - dlsym-bind `cuFileStreamRegister` / `cuFileStreamDeregister` as optional symbols (graceful no-op on older libcufile that doesn't ship them). - `store_fs_gds_set_stream`: deregister any previously-set stream, then register the new one. First `cuFileReadAsync` now finds per-stream state already allocated. - `gds_destroy`: deregister after the existing `cuStreamSynchronize`, before the caller's `cuStreamDestroy`. ## Verification - `cmake --build build` clean. - 100× loop of `CUFILE_FORCE_COMPAT_MODE=true ./build/tests/test_store_fs_gds`: 0 failures (was ~4/100). - Full `ctest`: 26/26 pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #118.
Approach
In compat mode, libcufile lazily allocates per-stream state on the first
cuFileReadAsync, and that lazy init races against itself. The passing test in the same file happens to enqueue acuLaunchHostFuncbarrier before the first read, which serializes the stream enough to mask the race. The failing test goes straight intocuFileReadAsyncon an empty stream and SEGVs ~4% of the time deep inside libcufile.cuFile already exposes the hook for this:
cuFileStreamRegister"allocates resources needed to support cuFile operations asynchronously for the cuda stream" — i.e. exactly the lazy state that was racing. Calling it eagerly when damacy adopts a stream removes the race window. The matchingcuFileStreamDeregisteris required beforecuStreamDestroyper the cuFile contract.Change
In
src/store/store_fs_gds.c:cuFileStreamRegister/cuFileStreamDeregisteras optional symbols (graceful no-op on older libcufile that doesn't ship them).store_fs_gds_set_stream: deregister any previously-set stream, then register the new one. FirstcuFileReadAsyncnow finds per-stream state already allocated.gds_destroy: deregister after the existingcuStreamSynchronize, before the caller'scuStreamDestroy.Verification
cmake --build buildclean.CUFILE_FORCE_COMPAT_MODE=true ./build/tests/test_store_fs_gds: 0 failures (was ~4/100).ctest: 26/26 pass.