Create cudaAPI function wrappers #6299

tcclevenger · 2023-07-21T17:08:29Z

An alternative to #5989. Here I use individual wrappers for each cudaAPI call.

Benefits

Much easier to read where wrapper is called (vs. function ptrs)
Reading runtime errors give location of call with name matching cudaAPI function name (difference only in case)
Less code where function is called (vs. function ptrs where type casts are required)

Negative

Much more code in Kokkos_Cuda_Instance.hpp where wrappers are defined

Notes

I did not use the wrappers in the unit tests
A separate impl file could be added for the wrappers to make CudaInternal class much more readable (at the cost of more code lines)

Ping @crtrott.

crtrott · 2023-07-21T21:58:25Z

I personally like this approach better.

tcclevenger · 2023-07-24T13:14:20Z

I personally like this approach better.

I'm starting to agree. Extra ~100 lines of code are worth the simplicity. Any opinion on putting the function definitions in a separate header or the cpp file?

masterleinad

Looks reasonable to me.

core/src/Cuda/Kokkos_Cuda_Instance.hpp

tcclevenger · 2023-07-25T16:07:57Z

core/src/Cuda/Kokkos_CudaSpace.cpp

-        ptr, bytes, space.cuda_device(), space.cuda_stream()));
+    KOKKOS_IMPL_CUDA_SAFE_CALL(
+        (space.impl_internal_space_instance()->cuda_mem_prefetch_async_wrapper(
+            ptr, bytes, space.cuda_device(), space.cuda_stream())));


@masterleinad Does it make sense to use m_cudaDev here? Or do we ever need a different cuda device than the Instance device.

In my opinion, we should always use the instance's member variables in these wrappers and not even expose a device id or a stream in the interface.

core/src/Cuda/Kokkos_CudaSpace.cpp

core/src/Cuda/Kokkos_Cuda_GraphNodeKernel.hpp

core/src/Cuda/Kokkos_Cuda_Instance.cpp

core/src/Cuda/Kokkos_Cuda_Instance.hpp

masterleinad

Looks good to me. When adding support for multiple devices we will need to again review all the places where the singleton is used anyway.

fnrizzi · 2023-07-27T13:26:05Z

core/src/Cuda/Kokkos_CudaSpace.cpp

@@ -43,7 +43,8 @@
 cudaStream_t Kokkos::Impl::cuda_get_deep_copy_stream() {
  static cudaStream_t s = nullptr;
  if (s == nullptr) {
-    cudaStreamCreate(&s);
+    KOKKOS_IMPL_CUDA_SAFE_CALL(
+        (CudaInternal::singleton().cuda_stream_create_wrapper(&s)));


based on the discussion of Wed 26 july, can we please open an issue to follow up to where we can the name of this singleton into something else since it is not a singleton?

masterleinad · 2023-07-27T14:24:36Z

You will have to resolve conflicts.

Since all calls to cudaAPI are where wrappers are defined, remove includes from other parts of Cuda

- allow for different stream as input - default to stream from instance - add helper function "get_input_stream" for selecting correct stream

Variable will soon become non-static. Static information is unecessary.

crtrott

As a follow on optimization can we identify calls which always will be preceded by another call so one wouldn't need to call setCudaDevice?

masterleinad · 2023-07-27T17:58:23Z

As a follow on optimization can we identify calls which always will be preceded by another call so one wouldn't need to call setCudaDevice?

I thought we already convinced ourselves that the performance impact of this pull request is negligible?

crtrott · 2023-07-27T18:11:23Z

Yeah I guess so.

tcclevenger · 2023-07-27T18:50:28Z

As a follow on optimization can we identify calls which always will be preceded by another call so one wouldn't need to call setCudaDevice?

I had that at one point in a previous commit on the other PR, but like Daniel said there wasn't a real performance impact, so I removed it since it made the already complicated looking template params even more complicated. Now that the calls are much simpler (with no templates needed for inputs), I'll add back in a follow up branch and get some performance numbers.

crtrott · 2023-07-28T02:36:53Z

CUDA 11.6 failed with non-available container, but all other CUDA builds passed so I am merging.

tcclevenger · 2023-07-31T14:06:31Z

Thanks @crtrott. Setting up an issue for HIP version.

tcclevenger marked this pull request as draft July 21, 2023 17:08

tcclevenger mentioned this pull request Jul 21, 2023

Encapsulate CudaAPI calls in CudaInternal, call cudaSetDevice for thread safety #5989

Closed

4 tasks

masterleinad approved these changes Jul 24, 2023

View reviewed changes

masterleinad reviewed Jul 25, 2023

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Instance.hpp Show resolved Hide resolved

tcclevenger commented Jul 25, 2023

View reviewed changes

core/src/Cuda/Kokkos_CudaSpace.cpp Show resolved Hide resolved

tcclevenger requested a review from masterleinad July 26, 2023 16:41

tcclevenger marked this pull request as ready for review July 26, 2023 16:42

masterleinad reviewed Jul 26, 2023

View reviewed changes

core/src/Cuda/Kokkos_Cuda_GraphNodeKernel.hpp Show resolved Hide resolved

core/src/Cuda/Kokkos_Cuda_Instance.cpp Show resolved Hide resolved

core/src/Cuda/Kokkos_Cuda_Instance.hpp Show resolved Hide resolved

core/src/Cuda/Kokkos_Cuda_Instance.hpp Outdated Show resolved Hide resolved

masterleinad approved these changes Jul 26, 2023

View reviewed changes

fnrizzi reviewed Jul 27, 2023

View reviewed changes

Thomas Conrad Clevenger and others added 7 commits July 27, 2023 09:11

create cudaAPI function wrappers

4d8629f

Reorganize #include <cuda_runtime_api.h>

03039fe

Since all calls to cudaAPI are where wrappers are defined, remove includes from other parts of Cuda

Some api function require cuda11.2+

3f19a97

Cuda10 requires "stream=nullptr" as default arg

c4278e1

Rework stream inputs

ba6b4d9

- allow for different stream as input - default to stream from instance - add helper function "get_input_stream" for selecting correct stream

Use "if constexpr" for setCudaDevice

0e97679

Remove static in comment

c893105

Variable will soon become non-static. Static information is unecessary.

tcclevenger force-pushed the thread_saftey_for_cuda_api_calls_individual_wrappers branch from 89a1c4c to c893105 Compare July 27, 2023 15:12

crtrott approved these changes Jul 27, 2023

View reviewed changes

crtrott merged commit 39de959 into kokkos:develop Jul 28, 2023
27 of 28 checks passed

tcclevenger deleted the thread_saftey_for_cuda_api_calls_individual_wrappers branch July 31, 2023 14:06

This was referenced Jul 31, 2023

Nightly test failures with Cuda builds running on device id != 0 #5713

Closed

HIP device ID incorrect when specifying KOKKOS_DEVICE_ID with threading #6324

Open

Create HIP API function wrappers in HIPInternal #6338

Draft

romintomasetti mentioned this pull request Aug 9, 2023

Cuda instance will not compile after PR 6299 (== in signature) #6344

Closed

masterleinad mentioned this pull request Aug 28, 2023

Cuda: Allocate using the correct device #6392

Merged

tcclevenger mentioned this pull request Sep 11, 2023

CHANGELOG: 4.2.0 #6197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create cudaAPI function wrappers #6299

Create cudaAPI function wrappers #6299

tcclevenger commented Jul 21, 2023

crtrott commented Jul 21, 2023

tcclevenger commented Jul 24, 2023

masterleinad left a comment

tcclevenger Jul 25, 2023

masterleinad Jul 25, 2023

masterleinad left a comment

fnrizzi Jul 27, 2023 •

edited

masterleinad commented Jul 27, 2023

crtrott left a comment

masterleinad commented Jul 27, 2023

crtrott commented Jul 27, 2023

tcclevenger commented Jul 27, 2023

crtrott commented Jul 28, 2023

tcclevenger commented Jul 31, 2023

Create cudaAPI function wrappers #6299

Create cudaAPI function wrappers #6299

Conversation

tcclevenger commented Jul 21, 2023

crtrott commented Jul 21, 2023

tcclevenger commented Jul 24, 2023

masterleinad left a comment

Choose a reason for hiding this comment

tcclevenger Jul 25, 2023

Choose a reason for hiding this comment

masterleinad Jul 25, 2023

Choose a reason for hiding this comment

masterleinad left a comment

Choose a reason for hiding this comment

fnrizzi Jul 27, 2023 • edited

Choose a reason for hiding this comment

masterleinad commented Jul 27, 2023

crtrott left a comment

Choose a reason for hiding this comment

masterleinad commented Jul 27, 2023

crtrott commented Jul 27, 2023

tcclevenger commented Jul 27, 2023

crtrott commented Jul 28, 2023

tcclevenger commented Jul 31, 2023

fnrizzi Jul 27, 2023 •

edited