Skip to content

gds: register stream to warm compat init#119

Merged
nclack merged 7 commits into
mainfrom
fix-118-gds-stream-register
May 21, 2026
Merged

gds: register stream to warm compat init#119
nclack merged 7 commits into
mainfrom
fix-118-gds-stream-register

Conversation

@nclack
Copy link
Copy Markdown
Owner

@nclack nclack commented May 21, 2026

Fixes #118.

Approach

In compat mode, libcufile lazily allocates per-stream state on the first cuFileReadAsync, and that lazy init races against itself. The passing test in the same file happens to enqueue a cuLaunchHostFunc barrier before the first read, which serializes the stream enough to mask the race. The failing test goes straight into cuFileReadAsync on an empty stream and SEGVs ~4% of the time deep inside libcufile.

cuFile already exposes the hook for this: cuFileStreamRegister "allocates resources needed to support cuFile operations asynchronously for the cuda stream" — i.e. exactly the lazy state that was racing. Calling it eagerly when damacy adopts a stream removes the race window. The matching cuFileStreamDeregister is required before cuStreamDestroy per the cuFile contract.

Change

In src/store/store_fs_gds.c:

  • dlsym-bind cuFileStreamRegister / cuFileStreamDeregister as optional symbols (graceful no-op on older libcufile that doesn't ship them).
  • store_fs_gds_set_stream: deregister any previously-set stream, then register the new one. First cuFileReadAsync now finds per-stream state already allocated.
  • gds_destroy: deregister after the existing cuStreamSynchronize, before the caller's cuStreamDestroy.

Verification

  • cmake --build build clean.
  • 100× loop of CUFILE_FORCE_COMPAT_MODE=true ./build/tests/test_store_fs_gds: 0 failures (was ~4/100).
  • Full ctest: 26/26 pass.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.71%. Comparing base (5f9226d) to head (27a726c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #119      +/-   ##
==========================================
+ Coverage   55.64%   55.71%   +0.07%     
==========================================
  Files          49       49              
  Lines        6903     6903              
  Branches     1233     1233              
==========================================
+ Hits         3841     3846       +5     
+ Misses       2585     2582       -3     
+ Partials      477      475       -2     
Flag Coverage Δ
unittests 55.71% <ø> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nclack nclack force-pushed the fix-118-gds-stream-register branch from 9d4d373 to 27a726c Compare May 21, 2026 18:29
@nclack nclack merged commit 4179da9 into main May 21, 2026
4 checks passed
@nclack nclack deleted the fix-118-gds-stream-register branch May 21, 2026 18:38
nclack added a commit that referenced this pull request May 22, 2026
Fixes #118.

## Approach

In compat mode, libcufile lazily allocates per-stream state on the first
`cuFileReadAsync`, and that lazy init races against itself. The passing
test in the same file happens to enqueue a `cuLaunchHostFunc` barrier
before the first read, which serializes the stream enough to mask the
race. The failing test goes straight into `cuFileReadAsync` on an empty
stream and SEGVs ~4% of the time deep inside libcufile.

cuFile already exposes the hook for this: `cuFileStreamRegister`
"allocates resources needed to support cuFile operations asynchronously
for the cuda stream" — i.e. exactly the lazy state that was racing.
Calling it eagerly when damacy adopts a stream removes the race window.
The matching `cuFileStreamDeregister` is required before
`cuStreamDestroy` per the cuFile contract.

## Change

In `src/store/store_fs_gds.c`:
- dlsym-bind `cuFileStreamRegister` / `cuFileStreamDeregister` as
optional symbols (graceful no-op on older libcufile that doesn't ship
them).
- `store_fs_gds_set_stream`: deregister any previously-set stream, then
register the new one. First `cuFileReadAsync` now finds per-stream state
already allocated.
- `gds_destroy`: deregister after the existing `cuStreamSynchronize`,
before the caller's `cuStreamDestroy`.

## Verification

- `cmake --build build` clean.
- 100× loop of `CUFILE_FORCE_COMPAT_MODE=true
./build/tests/test_store_fs_gds`: 0 failures (was ~4/100).
- Full `ctest`: 26/26 pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GDS: intermittent SEGV in cuFileReadAsync (compat mode), ~4% rate

1 participant