Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NVRTC using ctypes binding #9086

Merged
merged 18 commits into from Aug 3, 2023
Merged

Support NVRTC using ctypes binding #9086

merged 18 commits into from Aug 3, 2023

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Jul 24, 2023

This adds support for NVRTC using the ctypes binding, which enables linking CUDA C / C++ sources, andfloat16, when either binding is in use.

The binding is modelled on the implementation of the NVVM binding, for consistency in Numba. The main API of this binding is a single function, compile(), which compiles a CUDA C / C++ source to PTX. Internal APIs provide a Pythonic interface to the underlying NVRTC C APIs accessed though ctypes via Numba's open_cudalib() function.

Since there is no CUDA Python binding for NVVM, I opted to always use the ctypes binding to NVRTC (like NVVM) rather than using one or the other and maintaining two kinds of bindings inconsistently with the NVVM binding.

This supersedes #8893, but is substantially different from it - there is no vendoring of pynvrtc, which I considered problematic because it vendored an existing library with extensive changes without necessarily covering the changes with tests. It also didn't appear to protect from race conditions in initialization, like the existing NVVM binding does. The test changes in that PR looked OK to me though, so I have incorporated them here under Michael Collison's authorship.

Note that this PR is failing with CUDA 11.0 because it binds APIs that didn't exist in that version - however, following the merge of #9040, that configuration will no longer be supported or tested, so that should not be an issue.

gmarkall and others added 5 commits July 24, 2023 13:55
This binding is modelled on the implementation of the NVVM binding for
consistency in Numba.

The main API of this binding is a single function, `compile()`, which
compiles a CUDA C / C++ source to PTX. Internal APIs provide a Pythonic
interface to the underlying NVRTC C APIs accessed though ctypes via
Numba's `open_cudalib()` function.
This allows NVRTC to be used with or without the NVIDIA CUDA bindings.
Since we don't have a CUDA Python binding for NVVM either, we always use
the internal Numba binding for NVRTC, rather than maintaining two
bindings for it, which is consistent with how we handle NVVM.
This commit is authored in Michael Collison's name to preserve
attribution for his work (though it has been aggregated from changes in
PR numba#8893).

Tests of float16 division need to be skipped with NVVM 3.4 - this was
never working due to an NVVM 3.4 code generation bug, but was not
noticed before. It became apparent once tests started running in CI on
the old toolkit versions that include NVVM 3.4.
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall gmarkall added CUDA CUDA related issue/PR 2 - In Progress labels Jul 24, 2023
@gmarkall gmarkall added this to the Numba 0.58 RC milestone Jul 24, 2023
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

gpuci run tests

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, it's great to see this functionality now working out-of-the-box. Given the testing and work to enable NVRTC was done previously, this change is essentially augmenting the code base with an alternative implementation and existing testing etc will cover this. There's a few minor things to look at in the review but otherwise looks good. Thanks again!

numba/cuda/api.py Show resolved Hide resolved
numba/cuda/cudadrv/nvrtc.py Outdated Show resolved Hide resolved
numba/cuda/tests/cudapy/test_operator.py Outdated Show resolved Hide resolved
numba/cuda/tests/cudapy/test_intrinsics.py Outdated Show resolved Hide resolved
numba/cuda/cudadrv/nvrtc.py Show resolved Hide resolved
numba/cuda/cudadrv/nvrtc.py Outdated Show resolved Hide resolved
numba/cuda/cudadrv/nvrtc.py Outdated Show resolved Hide resolved
numba/cuda/cudadrv/nvrtc.py Outdated Show resolved Hide resolved
numba/cuda/cudadrv/nvrtc.py Outdated Show resolved Hide resolved
@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Jul 26, 2023
The word "driver" doesn't really relate to what's being handled. There
was also no need to make the library a member of the `NVRTC` instance -
instead a local variable named `lib` should suffice.
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jul 27, 2023
@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_211.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_211.

This is failing for tests such as numba.cuda.tests.cudapy.test_operator.TestOperatorModule.test_mixed_fp16_binary_arithmetic with an approximate trace:

File "numba/cuda/cudadrv/nvrtc.py", line 247, in compile
   raise NvrtcError(msg)
... <skip>
error: cannot open source file "cuda_fp16.h" 1 catastrophic error detected in the compilation of "cpp_function_wrappers.cu".
Compilation terminated.

failure seems invariant of CUDA version. Perhaps there needs to be a check for headers? I assume that's where the problem lies?

@gmarkall
Copy link
Member Author

Thanks - I think this has always been a problem and we've never noticed because we've never had a configuration on the buildfarm that runs this check. If you don't have a full toolkit installed, this will happen, but I never noticed before. I will check if there is a conda package that includes these that we should require, but we should also guard against the header not being present somehow.

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 4 - Waiting on CI Review etc done, waiting for CI to finish labels Jul 27, 2023
In order to ensure the half-precision floating point headers are
available in all installations, we vendor them from the CUDA toolkit
11.2 (chosen as it is the oldest supported toolkit version, and
therefore expected to be compatible with all supported NVRTC versions).

These headers are redistributable as per the CUDA EULA, and explicitly
mentioned in Attachment A at
https://docs.nvidia.com/cuda/archive/11.2.2/eula/index.html#attachment-a
under the "CUDA Half Precision Headers" component.
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Aug 1, 2023
@gmarkall
Copy link
Member Author

gmarkall commented Aug 1, 2023

@esc Could this have another buildfarm run please?

@esc
Copy link
Member

esc commented Aug 2, 2023

Build numba_smoketest_cuda_yaml_212 has started

@esc
Copy link
Member

esc commented Aug 2, 2023

Build farm passed as numba_smoketest_cuda_yaml_214

@gmarkall
Copy link
Member Author

gmarkall commented Aug 3, 2023

gpuci run tests

@gmarkall gmarkall added the BuildFarm Passed For PRs that have been through the buildfarm and passed label Aug 3, 2023
@gmarkall
Copy link
Member Author

gmarkall commented Aug 3, 2023

Following @stuartarchibald 's approval, the changes to pass on the build farm are:

  • The inclusion of the headers cuda_fp16.{h,hpp} and their installation at install time. Commit ad931f8
  • Addition of the license notes for these headers. Commit 937ca40
  • Addition of headers to the NVRTC include path. Commit dc112f4

Could @esc or @sklam review these additional changes please?

Copy link
Member

@esc esc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendoring of headers and their inclusion seems to work. Following @stuartarchibald "s approval, I add mine too!

@esc esc added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Aug 3, 2023
@esc esc merged commit 5ef7c86 into numba:main Aug 3, 2023
24 checks passed
gmarkall added a commit to gmarkall/numba that referenced this pull request Aug 10, 2023
PR numba#9086 accidentally introduced (or exacerbated) and accidental cffi
dependency in the CUDA tests. This commit fixes the issue.

Additionally, there are several ways to skip tests that need cffi when
it is not present - we unify them with a new decorator,
`skip_unless_cffi`.
gmarkall added a commit to gmarkall/pynvjitlink that referenced this pull request Oct 16, 2023
The `NvrtcProgram` class moved to a new module,
`numba.cuda.cudadrv.nvrtc` in Numba 0.58 (in PR numba/numba#9086), so we
need to import it from there for that version onwards.
pentschev pushed a commit to rapidsai/pynvjitlink that referenced this pull request Oct 16, 2023
The `NvrtcProgram` class moved to a new module,
`numba.cuda.cudadrv.nvrtc` in Numba 0.58 (in PR numba/numba#9086), so we
need to import it from there for that version onwards.
@gmarkall gmarkall deleted the nvrtc-ctypes branch May 2, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants