Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NVIDIA's CUDA Python bindings #7461

Merged
merged 77 commits into from Nov 24, 2021
Merged

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Oct 6, 2021

This PR adds support for using NVIDIA's CUDA Python bindings to make CUDA Driver API calls. The ecosystem around these bindings presently at an early stage, with CuPy recently gaining support for them. Eventually the common use of the NVIDIA bindings by Python libraries in the CUDA ecosystem will facilitate better interoperability for primitives other than just arrays (which are presently dealt with nicely by the CUDA Array Interface), and eliminate the need for converting types between libraries (see e.g. #4797).

An overview of differences between the NVIDIA bindings and Numba's ctypes bindings:

  • The NVIDIA bindings are more strongly typed than the ctypes ones, so values held internally in Numba need wrapping in a type to be able to pass them to the bindings.
  • The return value of functions in the NVIDIA bindings is always a tuple of the error code and then the output values (of which there can be several, e.g. for cuMemGetInfo which returns the free and total memory). The way in which values are returned is made consistent with the ctypes bindings by some adaptation in _check_cuda_python_error.
  • The differences are handled mostly with branches on the the config variable CUDA_USE_CUDA_PYTHON. Module, Function and Linker are now ABCs with a separate implementations for each binding, as they are mostly specific to the binding in use.

Some remarks:

  • Using the bindings is optional, and turned off by default, because the official bindings are yet to support Per-Thread Default Streams and profiling. Once these are supported, I expect to submit another PR that makes it the default to use them if they are installed.
  • Runtime API calls still uses a ctypes interface - changing the Runtime API to use the NVIDIA bindings will come in a follow-up PR.
  • I expect the NVIDIA and Numba ctypes bindings to coexist for a period of time, and for the ctypes binding to eventually be retired once the NVIDIA bindings are ubiquitous.
  • I've left in the commit history in this PR because I've kept things working at each commit, and I think if there are issues introduced by these changes, it'll be much easier to track down the source of any problems with a bisect through the individual changes instead of one mega-commit.
  • I've tested this on my workstation with Windows + Linux and Linux on a Jetson AGX Xavier, both in environments with the NVIDIA bindings installed, and without them. However, a buildfarm run early on would be great in case there's any issues that turn up on the buildfarm machines that I've not spotted - @esc / @stuartarchibald could this have a CUDA buildfarm run to check please?

CCs for visiblity: @quasiben @leofang @stuartarchibald @mmccarty @shwina

This provides enough to successfully run `cuda.detect()` and for
`cuda.is_available()` to return `True`, whilst maintaining correctness
of the inbuilt ctypes bindings.
All tests pass with ctypes bindings. With CUDA Python bindings:

```
Ran 1201 tests in 30.800s

FAILED (failures=5, errors=900, skipped=10, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 31.862s

FAILED (failures=49, errors=863, skipped=10, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 32.145s

FAILED (failures=6, errors=892, skipped=10, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 35.580s

FAILED (failures=6, errors=892, skipped=10, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 36.668s

FAILED (failures=6, errors=1580, skipped=10, expected failures=1)
```
CUDA Python test results are presently:

```
Ran 1201 tests in 39.895s

FAILED (failures=30, errors=1500, skipped=10, expected failures=1)
```
Unfortunately the testsuite segfaults before completion at present, so
there are no metrics available.
Test results with CUDA Python:

```
Ran 1174 tests in 60.225s

FAILED (failures=30, errors=591, skipped=10, expected failures=1)
```
Test results with CUDA Python are now:

```
Ran 1201 tests in 47.054s

FAILED (failures=31, errors=248, skipped=41, expected failures=1)
```
It should not be the responsibility of launch_kernel to get the ctypes
pointer from a higher-level abstraction object such as a device array or
record.
CUDA Python test results are now:

```
Ran 1201 tests in 72.650s

FAILED (failures=32, errors=224, skipped=50, expected failures=1)
```
These caused some problems for the ctypes bindings.
CUDA Python test results are now:

```
Ran 1201 tests in 68.140s

FAILED (failures=15, errors=217, skipped=41, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 54.820s

FAILED (failures=13, errors=146, skipped=41, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 65.175s

FAILED (failures=15, errors=139, skipped=41, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 55.840s

FAILED (failures=17, errors=105, skipped=41, expected failures=1)
```
CUDA Python results are now:

```
Ran 1201 tests in 74.768s

FAILED (failures=18, errors=99, skipped=41, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 81.919s

FAILED (failures=15, errors=49, skipped=41, expected failures=1)
```
CUDA Python results are now:

```
Ran 1201 tests in 73.974s

FAILED (failures=15, errors=39, skipped=41, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 73.229s

FAILED (failures=10, errors=38, skipped=41, expected failures=1)
```
Tests now unskipped. CUDA Python tests are now:

```
Ran 1201 tests in 78.288s

FAILED (failures=10, errors=38, skipped=13, expected failures=1)
```
CUDA Python test results now look like:

```
Ran 1201 tests in 74.361s

FAILED (failures=10, errors=38, skipped=10, expected failures=1)
```
Tests results with CUDA Python are now:

```
Ran 1201 tests in 74.033s

FAILED (failures=7, errors=34, skipped=10, expected failures=1)
```
Test results with CUDA Python are now:

```
Ran 1201 tests in 73.647s

FAILED (failures=7, errors=32, skipped=10, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 75.513s

FAILED (failures=7, errors=25, skipped=10, expected failures=1)
```
CUDA Python test results are now:

```
Ran 1201 tests in 74.231s

FAILED (failures=3, errors=25, skipped=10, expected failures=2)
```
CUDA Python test results are now:

```
Ran 1201 tests in 77.240s

FAILED (failures=1, errors=22, skipped=10, expected failures=2)
(numbanp120)
```
CUDA Python test results are now:

```
Ran 1201 tests in 73.284s

FAILED (failures=1, errors=14, skipped=10, expected failures=2)
```
docs/source/reference/envvars.rst Show resolved Hide resolved
numba/core/config.py Outdated Show resolved Hide resolved
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, couple more minor things to look at otherwise looks good.

docs/source/cuda/overview.rst Show resolved Hide resolved
docs/source/reference/envvars.rst Outdated Show resolved Hide resolved
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Nov 23, 2021
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

@stuartarchibald Thanks for the feedback - I've turned the hard error into a warning now. I think this should be ready for another look.

numba/core/config.py Outdated Show resolved Hide resolved
numba/core/config.py Show resolved Hide resolved
numba/core/config.py Show resolved Hide resolved
@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author 4 - Waiting on CI Review etc done, waiting for CI to finish labels Nov 23, 2021
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Nov 23, 2021
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for all your efforts on this @gmarkall, really glad to see this working and to get the performance boost this provides to CUDA users. This is waiting on a final gpuci run and also the buildfarm for confirmation. Thanks again.

@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Nov 23, 2021
@esc
Copy link
Member

esc commented Nov 23, 2021

Build numba_smoketest_cuda_yaml_104 has started

@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

@esc Did the buildfarm run succeed?

@stuartarchibald
Copy link
Contributor

@esc Did the buildfarm run succeed?

From asking @esc OOB, yes! Which means I think this is now done!!! Thanks again @gmarkall.

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed and removed 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Nov 23, 2021
@sklam sklam merged commit 0849ae7 into numba:master Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR Effort - long Long size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants