Support NVRTC using ctypes binding #9086

gmarkall · 2023-07-24T13:03:33Z

This adds support for NVRTC using the ctypes binding, which enables linking CUDA C / C++ sources, andfloat16, when either binding is in use.

The binding is modelled on the implementation of the NVVM binding, for consistency in Numba. The main API of this binding is a single function, compile(), which compiles a CUDA C / C++ source to PTX. Internal APIs provide a Pythonic interface to the underlying NVRTC C APIs accessed though ctypes via Numba's open_cudalib() function.

Since there is no CUDA Python binding for NVVM, I opted to always use the ctypes binding to NVRTC (like NVVM) rather than using one or the other and maintaining two kinds of bindings inconsistently with the NVVM binding.

This supersedes #8893, but is substantially different from it - there is no vendoring of pynvrtc, which I considered problematic because it vendored an existing library with extensive changes without necessarily covering the changes with tests. It also didn't appear to protect from race conditions in initialization, like the existing NVVM binding does. The test changes in that PR looked OK to me though, so I have incorporated them here under Michael Collison's authorship.

Note that this PR is failing with CUDA 11.0 because it binds APIs that didn't exist in that version - however, following the merge of #9040, that configuration will no longer be supported or tested, so that should not be an issue.

This binding is modelled on the implementation of the NVVM binding for consistency in Numba. The main API of this binding is a single function, `compile()`, which compiles a CUDA C / C++ source to PTX. Internal APIs provide a Pythonic interface to the underlying NVRTC C APIs accessed though ctypes via Numba's `open_cudalib()` function.

This allows NVRTC to be used with or without the NVIDIA CUDA bindings. Since we don't have a CUDA Python binding for NVVM either, we always use the internal Numba binding for NVRTC, rather than maintaining two bindings for it, which is consistent with how we handle NVVM.

This commit is authored in Michael Collison's name to preserve attribution for his work (though it has been aggregated from changes in PR numba#8893). Tests of float16 division need to be skipped with NVVM 3.4 - this was never working due to an NVVM 3.4 code generation bug, but was not noticed before. It became apparent once tests started running in CI on the old toolkit versions that include NVVM 3.4.

gmarkall · 2023-07-24T13:03:41Z

gpuci run tests

gmarkall · 2023-07-24T13:37:51Z

gpuci run tests

gmarkall · 2023-07-25T12:50:41Z

gpuci run tests

gmarkall · 2023-07-25T15:05:09Z

gpuci run tests

stuartarchibald

Thanks for the patch, it's great to see this functionality now working out-of-the-box. Given the testing and work to enable NVRTC was done previously, this change is essentially augmenting the code base with an alternative implementation and existing testing etc will cover this. There's a few minor things to look at in the review but otherwise looks good. Thanks again!

numba/cuda/api.py

numba/cuda/cudadrv/nvrtc.py

numba/cuda/tests/cudapy/test_operator.py

numba/cuda/tests/cudapy/test_intrinsics.py

numba/cuda/cudadrv/nvrtc.py

The word "driver" doesn't really relate to what's being handled. There was also no need to make the library a member of the `NVRTC` instance - instead a local variable named `lib` should suffice.

gmarkall · 2023-07-27T06:47:37Z

gpuci run tests

stuartarchibald · 2023-07-27T12:32:00Z

Buildfarm ID: numba_smoketest_cuda_yaml_211.

stuartarchibald · 2023-07-27T13:13:44Z

Buildfarm ID: numba_smoketest_cuda_yaml_211.

This is failing for tests such as numba.cuda.tests.cudapy.test_operator.TestOperatorModule.test_mixed_fp16_binary_arithmetic with an approximate trace:

File "numba/cuda/cudadrv/nvrtc.py", line 247, in compile
   raise NvrtcError(msg)
... <skip>
error: cannot open source file "cuda_fp16.h" 1 catastrophic error detected in the compilation of "cpp_function_wrappers.cu".
Compilation terminated.

failure seems invariant of CUDA version. Perhaps there needs to be a check for headers? I assume that's where the problem lies?

gmarkall · 2023-07-27T14:52:31Z

Thanks - I think this has always been a problem and we've never noticed because we've never had a configuration on the buildfarm that runs this check. If you don't have a full toolkit installed, this will happen, but I never noticed before. I will check if there is a conda package that includes these that we should require, but we should also guard against the header not being present somehow.

In order to ensure the half-precision floating point headers are available in all installations, we vendor them from the CUDA toolkit 11.2 (chosen as it is the oldest supported toolkit version, and therefore expected to be compatible with all supported NVRTC versions). These headers are redistributable as per the CUDA EULA, and explicitly mentioned in Attachment A at https://docs.nvidia.com/cuda/archive/11.2.2/eula/index.html#attachment-a under the "CUDA Half Precision Headers" component.

gmarkall · 2023-07-31T15:22:48Z

gpuci run tests

gmarkall · 2023-08-01T09:34:02Z

@esc Could this have another buildfarm run please?

esc · 2023-08-02T12:56:01Z

Build numba_smoketest_cuda_yaml_212 has started

esc · 2023-08-02T17:09:39Z

Build farm passed as numba_smoketest_cuda_yaml_214

gmarkall · 2023-08-03T07:39:02Z

gpuci run tests

gmarkall · 2023-08-03T07:43:39Z

Following @stuartarchibald 's approval, the changes to pass on the build farm are:

The inclusion of the headers cuda_fp16.{h,hpp} and their installation at install time. Commit ad931f8
Addition of the license notes for these headers. Commit 937ca40
Addition of headers to the NVRTC include path. Commit dc112f4

Could @esc or @sklam review these additional changes please?

esc

Vendoring of headers and their inclusion seems to work. Following @stuartarchibald "s approval, I add mine too!

PR numba#9086 accidentally introduced (or exacerbated) and accidental cffi dependency in the CUDA tests. This commit fixes the issue. Additionally, there are several ways to skip tests that need cffi when it is not present - we unify them with a new decorator, `skip_unless_cffi`.

The `NvrtcProgram` class moved to a new module, `numba.cuda.cudadrv.nvrtc` in Numba 0.58 (in PR numba/numba#9086), so we need to import it from there for that version onwards.

gmarkall and others added 5 commits July 24, 2023 13:55

CUDA: Add test of nvrtc to libs test

dffc6c4

Remove unused skip_unless_cuda_python decorator

bc4a259

gmarkall added CUDA CUDA related issue/PR 2 - In Progress labels Jul 24, 2023

gmarkall added this to the Numba 0.58 RC milestone Jul 24, 2023

Add NvrtcError to cudasim

2a30443

Add changelog for PR numba#9086

4b3eb7a

gmarkall added 3 - Ready for Review and removed 2 - In Progress labels Jul 25, 2023

gmarkall mentioned this pull request Jul 25, 2023

Support NVRTC using the ctypes binding #8893

Closed

gmarkall marked this pull request as ready for review July 25, 2023 13:31

gmarkall requested a review from stuartarchibald July 25, 2023 14:16

Always report that float16 is supported

3e1df7c

stuartarchibald reviewed Jul 26, 2023

View reviewed changes

stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Jul 26, 2023

gmarkall added 4 commits July 27, 2023 07:45

Use an IntEnum for NVRTC error codes

fe3d209

Remove references to NVVM < 7.0

07cdca3

NVRTC: Replace uses of "driver" with more appropriate words

683cda9

The word "driver" doesn't really relate to what's being handled. There was also no need to make the library a member of the `NVRTC` instance - instead a local variable named `lib` should suffice.

Accept Python strings or bytes for NVRTC source and name

180a2e2

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jul 27, 2023

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 4 - Waiting on CI Review etc done, waiting for CI to finish labels Jul 27, 2023

gmarkall dismissed stuartarchibald’s stale review via ad931f8 July 31, 2023 15:22

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Aug 1, 2023

gmarkall mentioned this pull request Aug 2, 2023

Docs should be updated for CUDA 12 #9045

Closed

Add numba.cuda path to NVRTC include path

dc112f4

Add License notes for CUDA Half Precision headers

937ca40

gmarkall added the BuildFarm Passed For PRs that have been through the buildfarm and passed label Aug 3, 2023

esc approved these changes Aug 3, 2023

View reviewed changes

esc added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Aug 3, 2023

esc merged commit 5ef7c86 into numba:main Aug 3, 2023
24 checks passed

gmarkall mentioned this pull request Aug 10, 2023

Fix accidental cffi test deps, refactor cffi skipping #9127

Merged

gmarkall mentioned this pull request Oct 16, 2023

Support Numba 0.58 rapidsai/pynvjitlink#7

Merged

gmarkall deleted the nvrtc-ctypes branch May 2, 2024 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support NVRTC using ctypes binding #9086

Support NVRTC using ctypes binding #9086

gmarkall commented Jul 24, 2023 •

edited

gmarkall commented Jul 24, 2023

gmarkall commented Jul 24, 2023

gmarkall commented Jul 25, 2023

gmarkall commented Jul 25, 2023

stuartarchibald left a comment

gmarkall commented Jul 27, 2023

stuartarchibald commented Jul 27, 2023

stuartarchibald commented Jul 27, 2023

gmarkall commented Jul 27, 2023

gmarkall commented Jul 31, 2023

gmarkall commented Aug 1, 2023

esc commented Aug 2, 2023

esc commented Aug 2, 2023

gmarkall commented Aug 3, 2023

gmarkall commented Aug 3, 2023

esc left a comment

Support NVRTC using ctypes binding #9086

Support NVRTC using ctypes binding #9086

Conversation

gmarkall commented Jul 24, 2023 • edited

gmarkall commented Jul 24, 2023

gmarkall commented Jul 24, 2023

gmarkall commented Jul 25, 2023

gmarkall commented Jul 25, 2023

stuartarchibald left a comment

Choose a reason for hiding this comment

gmarkall commented Jul 27, 2023

stuartarchibald commented Jul 27, 2023

stuartarchibald commented Jul 27, 2023

gmarkall commented Jul 27, 2023

gmarkall commented Jul 31, 2023

gmarkall commented Aug 1, 2023

esc commented Aug 2, 2023

esc commented Aug 2, 2023

gmarkall commented Aug 3, 2023

gmarkall commented Aug 3, 2023

esc left a comment

Choose a reason for hiding this comment

gmarkall commented Jul 24, 2023 •

edited