-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda multi-GPU support: Allow execution space instance constructor to run #6706
Cuda multi-GPU support: Allow execution space instance constructor to run #6706
Conversation
0e9fa7c
to
2a53772
Compare
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(0)); | ||
cudaStream_t stream0; | ||
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamCreate(&stream0)); | ||
|
||
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(n_devices - 1)); | ||
cudaStream_t stream; | ||
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamCreate(&stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to mess with the CUDA runtime after Kokkos::initialize()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to make sure that we are always using the correct device even if the user changed it between Kokkos
API calls.
Part of #6091. This pull request allows running the Cuda execution space constructor with a device ID different from the default one. This basically means that we also handle the internal allocations correctly.
A caveat here (that wasn't exposed in the initial version of #6091) is that we need to keep track of the correct device ID for allocations since we might copy the allocation header back to the host. CallingcudaFree
doesn't seem to require using the correct device.This is handled in a hacky way for the internal allocations in this pull request by setting the singleton's device ID before deallocating the internal allocations and then resetting it to its previous value. This approach allows us to run the full unit test in #6091.
I decided to remove the guard we have for using a device ID different from the default one in this pull request already but I'm also open to introducing a macro that guards it and is only defined for the unit test.