Use CUDA 11.2+ features via dlopen #990

robertmaynard · 2022-03-09T21:26:39Z

By binding to the cudart 11.2 functions at runtime we remove the requirement that these symbols exist, therefore allowing
RMM to be compiled with 11.2+ and used with 11.0 or 11.1.

By binding to the cudart 11.2 functions at runtime we remove the requirement that these symbols exist, therefore allowing RMM to be compiled with 11.2+ and used with 11.0 or 11.1.

include/rmm/detail/dynamic_load_runtime.hpp

leofang · 2022-03-14T15:08:06Z

Just saying, cudaGetDriverEntryPoint is a runtime API that serves the similar purpose, available since CUDA 11.0. The only downside is it does not look up and return the runtime APIs; instead, it returns driver APIs. But the stream-ordered memory allocator is available at the driver level, so I suppose it's also a solution.

jrhemstad · 2022-03-14T15:38:59Z

Just saying, cudaGetDriverEntryPoint is a runtime API that serves the similar purpose, available since CUDA 11.0. The only downside is it does not look up and return the runtime APIs; instead, it returns driver APIs. But the stream-ordered memory allocator is available at the driver level, so I suppose it's also a solution.

That was added in cudart 11.3, so it would have the same problem we already have with cudaMallocAsync :)

Plus, using the driver API adds an annoying element of needing to ensure a context is created.

…dart

ajschmidt8 · 2022-03-16T17:12:32Z

This is the patch that's necessary to revert both has_cma/no_cma rmm packages from being built. Not sure if this should be included in this PR or a separate one.

Patch

From e0d812392e682c455652a6347656eca516338edd Mon Sep 17 00:00:00 2001
From: AJ Schmidt <aschmidt@nvidia.com>
Date: Wed, 16 Mar 2022 09:53:50 -0400
Subject: [PATCH] Remove `no_cma`/`has_cma` variants

Depends on #990.

Once #990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This change removes the two variants.
---
 conda/recipes/rmm/build.sh                | 7 +------
 conda/recipes/rmm/conda_build_config.yaml | 3 ---
 conda/recipes/rmm/meta.yaml               | 8 ++------
 3 files changed, 3 insertions(+), 15 deletions(-)
 delete mode 100644 conda/recipes/rmm/conda_build_config.yaml

diff --git a/conda/recipes/rmm/build.sh b/conda/recipes/rmm/build.sh
index 08990c36..d2c672e6 100644
--- a/conda/recipes/rmm/build.sh
+++ b/conda/recipes/rmm/build.sh
@@ -1,9 +1,4 @@
 # Copyright (c) 2018-2019, NVIDIA CORPORATION.
 
 # Script assumes the script is executed from the root of the repo directory
-BUILD_FLAGS=""
-if [ "${cudaMallocAsync}" == "no_cma" ]; then
-    BUILD_FLAGS="${BUILD_FLAGS} --no-cudamallocasync"
-fi
-
-./build.sh -v clean rmm ${BUILD_FLAGS}
+./build.sh -v clean rmm
diff --git a/conda/recipes/rmm/conda_build_config.yaml b/conda/recipes/rmm/conda_build_config.yaml
deleted file mode 100644
index fcbbfef8..00000000
--- a/conda/recipes/rmm/conda_build_config.yaml
+++ /dev/null
@@ -1,3 +0,0 @@
-cudaMallocAsync:
-  - has_cma
-  - no_cma
diff --git a/conda/recipes/rmm/meta.yaml b/conda/recipes/rmm/meta.yaml
index 4baa0908..72582846 100644
--- a/conda/recipes/rmm/meta.yaml
+++ b/conda/recipes/rmm/meta.yaml
@@ -14,7 +14,7 @@ source:
 
 build:
   number: {{ GIT_DESCRIBE_NUMBER }}
-  string: cuda{{ cuda_major }}_py{{ py_version }}_{{ GIT_DESCRIBE_HASH }}_{{ GIT_DESCRIBE_NUMBER }}_{{ cudaMallocAsync }}
+  string: cuda{{ cuda_major }}_py{{ py_version }}_{{ GIT_DESCRIBE_HASH }}_{{ GIT_DESCRIBE_NUMBER }}
   script_env:
     - RMM_BUILD_NO_GPU_TEST
     - VERSION_SUFFIX
@@ -37,11 +37,7 @@ requirements:
     - python
     - numba >=0.49
     - numpy
-{% if cudaMallocAsync == "has_cma" %}
-    - {{ pin_compatible('cudatoolkit', max_pin='x', lower_bound='11.2') }} # cudatoolkit >=11.2,<12.0.0
-{% else %}
-    - {{ pin_compatible('cudatoolkit', upper_bound='11.2', lower_bound='11.0') }} # cudatoolkit >=11.0,<11.2
-{% endif %}
+    - {{ pin_compatible('cudatoolkit', max_pin='x', min_pin='x') }}
     - cuda-python >=11.5,<12.0
 
 test:
-- 
2.35.1

Depends on rapidsai#990. Once rapidsai#990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants.

ajschmidt8 · 2022-03-16T17:24:28Z

This is the patch that's necessary to revert both has_cma/no_cma rmm packages from being built. Not sure if this should be included in this PR or a separate one.

Draft PR that includes these changes is here: #996

robertmaynard · 2022-03-16T21:02:53Z

With #993 being merged I need to update this PR to handle the new logic.

robertmaynard · 2022-03-16T21:04:19Z

This is the patch that's necessary to revert both has_cma/no_cma rmm packages from being built. Not sure if this should be included in this PR or a separate one.

Draft PR that includes these changes is here: #996

Personally I am okay with the removal of the cma variants happening in a second PR. This won't break the cma variants it just makes the output identical, and therefore doesn't block this from being merged.

…dart

include/rmm/detail/dynamic_load_runtime.hpp

Depends on rapidsai#990. Once rapidsai#990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants.

jrhemstad · 2022-03-18T13:12:40Z

@gpucibot merge

Depends on #990. Once #990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Robert Maynard (https://github.com/robertmaynard) URL: #996

robertmaynard added bug Something isn't working 3 - Ready for review Ready for review by team non-breaking Non-breaking change labels Mar 9, 2022

robertmaynard requested review from a team as code owners March 9, 2022 21:26

robertmaynard requested review from rongou and cwharris March 9, 2022 21:26

github-actions bot added CMake cpp Pertains to C++ code labels Mar 9, 2022

CUDA 11.2+ features are used via dlopen

8944f92

By binding to the cudart 11.2 functions at runtime we remove the requirement that these symbols exist, therefore allowing RMM to be compiled with 11.2+ and used with 11.0 or 11.1.

robertmaynard force-pushed the cuda_async_memory_resource-dlopen-cudart branch from 74372fa to 8944f92 Compare March 9, 2022 21:27

jrhemstad reviewed Mar 9, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved

jrhemstad reviewed Mar 9, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved

jrhemstad reviewed Mar 9, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved

rongou reviewed Mar 9, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Show resolved Hide resolved

robertmaynard added 3 commits March 9, 2022 16:46

Remove cuda_async_memory_resource::is_supported

838d115

Give the typedef used by function a better name

452b998

Consistently use 'auto *'

f05cf86

jrhemstad reviewed Mar 9, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved

robertmaynard added 3 commits March 10, 2022 10:03

Refactor 'open_cuda_runtime' to be thread safe

b070025

Remove an unneeded check from dynamic_load_runtime

b598499

Correct style issues found by CI

b3ff2d8

jrhemstad reviewed Mar 10, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved

jrhemstad reviewed Mar 10, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved

Refactor 'open_cuda_runtime' to 'get_cuda_runtime_handle'

2b3a2b1

robertmaynard requested a review from jrhemstad March 10, 2022 21:01

Ensure we call the correct cudart functions

da02b55

robertmaynard added 3 commits March 16, 2022 07:58

Correct std::optional issues found by review

13ef5ea

Correct style issues found by ci

2ddc6fd

Merge branch 'branch-22.04' into cuda_async_memory_resource-dlopen-cu…

76b8db5

…dart

jrhemstad approved these changes Mar 16, 2022

View reviewed changes

ajschmidt8 added a commit to ajschmidt8/rmm that referenced this pull request Mar 16, 2022

Remove no_cma/has_cma variants

e0d8123

Depends on rapidsai#990. Once rapidsai#990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants.

ajschmidt8 mentioned this pull request Mar 16, 2022

Remove no_cma/has_cma variants #996

Merged

jrhemstad mentioned this pull request Mar 16, 2022

Disable opportunistic reuse in async mr when cuda driver < 11.5 #993

Merged

robertmaynard added 5 - DO NOT MERGE Hold off on merging; see PR for details and removed 3 - Ready for review Ready for review by team labels Mar 16, 2022

harrism approved these changes Mar 16, 2022

View reviewed changes

robertmaynard added 2 commits March 17, 2022 09:13

Merge branch 'branch-22.04' into cuda_async_memory_resource-dlopen-cu…

1aad91c

…dart

opportunistic reuse bug now updated to use rmm::detail::async_alloc

1994bc4

robertmaynard added 3 - Ready for review Ready for review by team and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Mar 17, 2022

jrhemstad reviewed Mar 17, 2022

View reviewed changes

include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved

robertmaynard added 2 commits March 17, 2022 13:27

Use static_assert to validate function arguments are valid

48e10d6

remove indirect function call when building statically

86ef0f9

ajschmidt8 added a commit to ajschmidt8/rmm that referenced this pull request Mar 17, 2022

Remove no_cma/has_cma variants

d1bb54f

Depends on rapidsai#990. Once rapidsai#990 is merged, we should no longer need the `no_cma`/`has_cma` `rmm` package variants. This PR removes the two variants.

rapids-bot bot merged commit 7434bd6 into rapidsai:branch-22.04 Mar 18, 2022

robertmaynard deleted the cuda_async_memory_resource-dlopen-cudart branch March 18, 2022 13:14

harrism mentioned this pull request Mar 21, 2022

[REVIEW] Allow construction of cuda_async_memory_resource from existing pool #889

Merged

shwina mentioned this pull request May 12, 2022

Make IPC handle export optional in cuda_async_memory_resource #1030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CUDA 11.2+ features via dlopen #990

Use CUDA 11.2+ features via dlopen #990

robertmaynard commented Mar 9, 2022

leofang commented Mar 14, 2022

jrhemstad commented Mar 14, 2022

ajschmidt8 commented Mar 16, 2022

ajschmidt8 commented Mar 16, 2022

robertmaynard commented Mar 16, 2022

robertmaynard commented Mar 16, 2022

jrhemstad commented Mar 18, 2022

Use CUDA 11.2+ features via dlopen #990

Use CUDA 11.2+ features via dlopen #990

Conversation

robertmaynard commented Mar 9, 2022

leofang commented Mar 14, 2022

jrhemstad commented Mar 14, 2022

ajschmidt8 commented Mar 16, 2022

ajschmidt8 commented Mar 16, 2022

robertmaynard commented Mar 16, 2022

robertmaynard commented Mar 16, 2022

jrhemstad commented Mar 18, 2022