Cuda multi-GPU support: Allow execution space instance constructor to run #6706

masterleinad · 2024-01-08T15:59:43Z

Part of #6091. This pull request allows running the Cuda execution space constructor with a device ID different from the default one. This basically means that we also handle the internal allocations correctly.
A caveat here (that wasn't exposed in the initial version of #6091) is that we need to keep track of the correct device ID for allocations since we might copy the allocation header back to the host. Calling cudaFree doesn't seem to require using the correct device.
This is handled in a hacky way for the internal allocations in this pull request by setting the singleton's device ID before deallocating the internal allocations and then resetting it to its previous value. This approach allows us to run the full unit test in #6091.

I decided to remove the guard we have for using a device ID different from the default one in this pull request already but I'm also open to introducing a macro that guards it and is only defined for the unit test.

… run

core/src/Cuda/Kokkos_CudaSpace.cpp

core/src/Cuda/Kokkos_CudaSpace.hpp

core/src/Cuda/Kokkos_CudaSpace.cpp

dalg24 · 2024-01-09T01:55:51Z

core/unit_test/cuda/TestCuda_InterOp_StreamsMultiGPU.cpp

+  KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(0));
+  cudaStream_t stream0;
+  KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamCreate(&stream0));
+
+  KOKKOS_IMPL_CUDA_SAFE_CALL(cudaSetDevice(n_devices - 1));
+  cudaStream_t stream;
+  KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamCreate(&stream));


Did you mean to mess with the CUDA runtime after Kokkos::initialize()?

We are trying to make sure that we are always using the correct device even if the user changed it between Kokkos API calls.

core/src/Cuda/Kokkos_Cuda_Instance.cpp

…vices_init_exec

masterleinad · 2024-01-24T15:12:57Z

After #6732, this pull request doesn't do much more than fixing the place where we set the m_stream argument and adding a test. In particular, it is independent of the refactoring in #6738.

core/src/Cuda/Kokkos_Cuda_Instance.cpp

Cuda multi-GPU support: Allow execution space instance constructor to…

2a53772

… run

masterleinad force-pushed the cuda_multiple_devices_init_exec branch from 0e9fa7c to 2a53772 Compare January 8, 2024 18:17

Skip a test

67b7d68

masterleinad marked this pull request as ready for review January 8, 2024 20:51

ldh4 reviewed Jan 9, 2024

View reviewed changes

core/src/Cuda/Kokkos_CudaSpace.cpp Outdated Show resolved Hide resolved

dalg24 reviewed Jan 9, 2024

View reviewed changes

Use cuda_stream/device also for UVM and HostPinned

26f1bb2

dalg24 reviewed Jan 10, 2024

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Instance.cpp Outdated Show resolved Hide resolved

masterleinad mentioned this pull request Jan 22, 2024

Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with SharedAllocationRecord) #6732

Merged

masterleinad added 3 commits January 22, 2024 18:44

Merge remote-tracking branch 'upstream/develop' into cuda_multiple_de…

2bd14f3

…vices_init_exec

Clean up

f4ee6f3

Revert test changes

f501bd5

masterleinad requested a review from dalg24 January 24, 2024 15:13

dalg24 approved these changes Jan 24, 2024

View reviewed changes

ldh4 approved these changes Jan 24, 2024

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Instance.cpp Show resolved Hide resolved

dalg24 merged commit 2dc7cbc into kokkos:develop Jan 24, 2024
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda multi-GPU support: Allow execution space instance constructor to run #6706

Cuda multi-GPU support: Allow execution space instance constructor to run #6706

masterleinad commented Jan 8, 2024 •

edited

Loading

dalg24 Jan 9, 2024

masterleinad Jan 9, 2024

masterleinad commented Jan 24, 2024

Cuda multi-GPU support: Allow execution space instance constructor to run #6706

Cuda multi-GPU support: Allow execution space instance constructor to run #6706

Conversation

masterleinad commented Jan 8, 2024 • edited Loading

dalg24 Jan 9, 2024

Choose a reason for hiding this comment

masterleinad Jan 9, 2024

Choose a reason for hiding this comment

masterleinad commented Jan 24, 2024

masterleinad commented Jan 8, 2024 •

edited

Loading