-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: Implement support for PTDS globally #6936
Conversation
This looks great! Thanks @gmarkall for the work here!
This was indeed something I could not find an optimal solution for in CuPy. Because CuPy uses the CUDA runtime API, what we did was replacing It seems that using the driver API things are even more complicated, but the best I can think of is to test whether the API calls in Numba are those with |
Thanks for the feedback / thoughts @pentschev !
That's a good point. The env var |
The IPC tests spawn instead of forking, so it looks like I might be able to use a similar approach to test PTDS. |
We have been increasingly using this approach for Dask/UCX-Py tests where we need different configurations (such as different environment variables), which have another benefit of avoiding leaks of unexpected state into other pytests. It's definitely a bit slower, but it's also much easier to track when tests fail. |
Confirmed working downstream! Thank you for taking care of this! |
This was added in Python 3.4, so the check is no longer necessary.
Thanks for the insights here @pentschev - this does seem to be the best way to go. A test for PTDS is now added in 8577d9b. Whilst implementing this based on the pattern in |
Thanks for working on that, it's great to know that this is indeed a feasible way of testing PTDS! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch and for writing the somewhat involved unit test to check this! Just one minor comment to resolve in the docs else looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch and fixes.
Buildfarm ID: |
Passed. |
This PR adds support for using the per-thread default stream by default - a config variable,
CUDA_PER_THREAD_DEFAULT_STREAM
, is added, which is used to control whether the default stream is legacy or per-thread. Presently the default is the legacy default stream - once this implementation is proven in use, we could consider making per-thread the default in future.All tests pass for me locally with the legacy and per-thread default stream, run with:
and
respectively. Additionally, a test program:
can be shown to correctly use the legacy default stream or the per-thread default stream depending on the setting of the config variable, using Nsight Systems to record the streams of the kernel launches. With the legacy default stream set:
With the per-thread default stream set:
EDIT (Test added in commit 8577d9b). Original question: However, I'm struggling to thing of a fully automated / programmatic way of testing this. Any thoughts on this, anyone?
Fixes #5137.