New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NVRTC using ctypes binding #9086
Conversation
This binding is modelled on the implementation of the NVVM binding for consistency in Numba. The main API of this binding is a single function, `compile()`, which compiles a CUDA C / C++ source to PTX. Internal APIs provide a Pythonic interface to the underlying NVRTC C APIs accessed though ctypes via Numba's `open_cudalib()` function.
This allows NVRTC to be used with or without the NVIDIA CUDA bindings. Since we don't have a CUDA Python binding for NVVM either, we always use the internal Numba binding for NVRTC, rather than maintaining two bindings for it, which is consistent with how we handle NVVM.
This commit is authored in Michael Collison's name to preserve attribution for his work (though it has been aggregated from changes in PR numba#8893). Tests of float16 division need to be skipped with NVVM 3.4 - this was never working due to an NVVM 3.4 code generation bug, but was not noticed before. It became apparent once tests started running in CI on the old toolkit versions that include NVVM 3.4.
gpuci run tests |
gpuci run tests |
gpuci run tests |
gpuci run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch, it's great to see this functionality now working out-of-the-box. Given the testing and work to enable NVRTC was done previously, this change is essentially augmenting the code base with an alternative implementation and existing testing etc will cover this. There's a few minor things to look at in the review but otherwise looks good. Thanks again!
The word "driver" doesn't really relate to what's being handled. There was also no need to make the library a member of the `NVRTC` instance - instead a local variable named `lib` should suffice.
gpuci run tests |
Buildfarm ID: |
This is failing for tests such as
failure seems invariant of CUDA version. Perhaps there needs to be a check for headers? I assume that's where the problem lies? |
Thanks - I think this has always been a problem and we've never noticed because we've never had a configuration on the buildfarm that runs this check. If you don't have a full toolkit installed, this will happen, but I never noticed before. I will check if there is a conda package that includes these that we should require, but we should also guard against the header not being present somehow. |
In order to ensure the half-precision floating point headers are available in all installations, we vendor them from the CUDA toolkit 11.2 (chosen as it is the oldest supported toolkit version, and therefore expected to be compatible with all supported NVRTC versions). These headers are redistributable as per the CUDA EULA, and explicitly mentioned in Attachment A at https://docs.nvidia.com/cuda/archive/11.2.2/eula/index.html#attachment-a under the "CUDA Half Precision Headers" component.
gpuci run tests |
@esc Could this have another buildfarm run please? |
|
Build farm passed as |
gpuci run tests |
Following @stuartarchibald 's approval, the changes to pass on the build farm are:
Could @esc or @sklam review these additional changes please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vendoring of headers and their inclusion seems to work. Following @stuartarchibald "s approval, I add mine too!
PR numba#9086 accidentally introduced (or exacerbated) and accidental cffi dependency in the CUDA tests. This commit fixes the issue. Additionally, there are several ways to skip tests that need cffi when it is not present - we unify them with a new decorator, `skip_unless_cffi`.
The `NvrtcProgram` class moved to a new module, `numba.cuda.cudadrv.nvrtc` in Numba 0.58 (in PR numba/numba#9086), so we need to import it from there for that version onwards.
The `NvrtcProgram` class moved to a new module, `numba.cuda.cudadrv.nvrtc` in Numba 0.58 (in PR numba/numba#9086), so we need to import it from there for that version onwards.
This adds support for NVRTC using the ctypes binding, which enables linking CUDA C / C++ sources, and
float16
, when either binding is in use.The binding is modelled on the implementation of the NVVM binding, for consistency in Numba. The main API of this binding is a single function,
compile()
, which compiles a CUDA C / C++ source to PTX. Internal APIs provide a Pythonic interface to the underlying NVRTC C APIs accessed though ctypes via Numba'sopen_cudalib()
function.Since there is no CUDA Python binding for NVVM, I opted to always use the ctypes binding to NVRTC (like NVVM) rather than using one or the other and maintaining two kinds of bindings inconsistently with the NVVM binding.
This supersedes #8893, but is substantially different from it - there is no vendoring of pynvrtc, which I considered problematic because it vendored an existing library with extensive changes without necessarily covering the changes with tests. It also didn't appear to protect from race conditions in initialization, like the existing NVVM binding does. The test changes in that PR looked OK to me though, so I have incorporated them here under Michael Collison's authorship.
Note that this PR is failing with CUDA 11.0 because it binds APIs that didn't exist in that version - however, following the merge of #9040, that configuration will no longer be supported or tested, so that should not be an issue.