Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda multi-GPU support: Allow execution space instance constructor to run #6706

Merged
merged 6 commits into from
Jan 24, 2024

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Jan 8, 2024

Part of #6091. This pull request allows running the Cuda execution space constructor with a device ID different from the default one. This basically means that we also handle the internal allocations correctly.
A caveat here (that wasn't exposed in the initial version of #6091) is that we need to keep track of the correct device ID for allocations since we might copy the allocation header back to the host. Calling cudaFree doesn't seem to require using the correct device.
This is handled in a hacky way for the internal allocations in this pull request by setting the singleton's device ID before deallocating the internal allocations and then resetting it to its previous value. This approach allows us to run the full unit test in #6091.

I decided to remove the guard we have for using a device ID different from the default one in this pull request already but I'm also open to introducing a macro that guards it and is only defined for the unit test.

@masterleinad masterleinad force-pushed the cuda_multiple_devices_init_exec branch from 0e9fa7c to 2a53772 Compare January 8, 2024 18:17
@masterleinad masterleinad marked this pull request as ready for review January 8, 2024 20:51
core/src/Cuda/Kokkos_CudaSpace.hpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_CudaSpace.hpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_CudaSpace.cpp Outdated Show resolved Hide resolved
Comment on lines +27 to +33
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(0));
cudaStream_t stream0;
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamCreate(&stream0));

KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(n_devices - 1));
cudaStream_t stream;
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamCreate(&stream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to mess with the CUDA runtime after Kokkos::initialize()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to make sure that we are always using the correct device even if the user changed it between Kokkos API calls.

@masterleinad
Copy link
Contributor Author

After #6732, this pull request doesn't do much more than fixing the place where we set the m_stream argument and adding a test. In particular, it is independent of the refactoring in #6738.

@dalg24 dalg24 merged commit 2dc7cbc into kokkos:develop Jan 24, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants