Update `RMMNumbaManager` to handle `NUMBA_CUDA_USE_NVIDIA_BINDING=1` #1004

brandon-b-miller · 2022-03-22T16:33:16Z

harrism · 2022-03-22T20:37:24Z

@brandon-b-miller we are already in burndown for 22.04, so unless this is urgent for 22.04 we should push to the next release. From the bug description this doesn't sound like something we would hotfix for, so I think we can push it.

gmarkall · 2022-03-24T09:15:38Z

Have you run the Numba test suite with this branch? e.g.:

NUMBA_CUDA_USE_NVIDIA_BINDING=1 NUMBA_CUDA_MEMORY_MANAGER=rmm python -m numba.runtests numba.cuda.tests

brandon-b-miller · 2022-03-24T15:32:30Z

Have you run the Numba test suite with this branch? e.g.:
NUMBA_CUDA_USE_NVIDIA_BINDING=1 NUMBA_CUDA_MEMORY_MANAGER=rmm python -m numba.runtests numba.cuda.tests

This revealed more changes that were needed which are now pushed.

brandon-b-miller · 2022-03-29T15:05:41Z

rerun tests

gmarkall

I'm seeing when running

NUMBA_CUDA_USE_NVIDIA_BINDING=1 NUMBA_CUDA_MEMORY_MANAGER=rmm python -m numba.runtests numba.cuda.tests -v -m

with this PR and Numba main:

======================================================================
FAIL: test_ipc_array (numba.cuda.tests.cudapy.test_ipc.TestIpcStaged)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 293, in test_ipc_array
    self.fail(out)
AssertionError: Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 215, in staged_ipc_array_test
    with cuda.gpus[device_num]:
  File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devices.py", line 84, in __exit__
    self._device.get_primary_context().pop()
  File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1355, in pop
    assert int(popped) == int(self.handle)
AssertionError


======================================================================
FAIL: test_staged (numba.cuda.tests.cudapy.test_ipc.TestIpcStaged)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 273, in test_staged
    self.fail(out)
AssertionError: Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 18, in core_ipc_handle_test
    arr = the_work()
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 199, in the_work
    with cuda.gpus[device_num]:
  File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devices.py", line 84, in __exit__
    self._device.get_primary_context().pop()
  File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1355, in pop
    assert int(popped) == int(self.handle)
AssertionError


----------------------------------------------------------------------
Ran 1278 tests in 111.982s

FAILED (failures=2, skipped=20, expected failures=8)

This is with multiple devices:

$ python -c "from numba import cuda; cuda.detect()"
Found 3 CUDA devices
id 0     b'NVIDIA RTX A6000'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 21
                                    UUID: GPU-842b25ad-db82-ba9d-0380-e65fe57189eb
                                Watchdog: Enabled
             FP32/FP64 Performance Ratio: 32
id 1     b'NVIDIA RTX A6000'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 45
                                    UUID: GPU-af183771-f998-7235-c638-b407c81bf3f7
                                Watchdog: Enabled
             FP32/FP64 Performance Ratio: 32
id 2         b'Quadro P2200'                              [SUPPORTED]
                      Compute Capability: 6.1
                           PCI Device ID: 0
                              PCI Bus ID: 11
                                    UUID: GPU-321c7ee1-375f-7c11-a413-b0aab3ec4756
                                Watchdog: Enabled
             FP32/FP64 Performance Ratio: 32
Summary:
	3/3 devices are supported

(I suspect it does not occur with a single GPU)

gmarkall · 2022-03-30T11:09:55Z

Turns out this is something that started happening between 21.12 and 22.02, unrelated to this PR - I'll look and see if there's a fix we can roll into this PR so it passes tests again.

gmarkall · 2022-04-07T20:39:20Z

Turns out this is something that started happening between 21.12 and 22.02, unrelated to this PR - I'll look and see if there's a fix we can roll into this PR so it passes tests again.

This will be hard to track down and is unrelated to this PR, so let's not attempt to address it here.

gmarkall

Based on the fact the issue I previously identified is unrelated to this PR and was introduced earlier, I now think this looks good.

brandon-b-miller · 2022-04-08T15:32:17Z

@gpucibot merge

shwina · 2022-04-08T15:33:40Z

@gpucibot merge

update rmm numba mem manager to handle new bindings

00075bb

brandon-b-miller requested a review from a team as a code owner March 22, 2022 16:33

brandon-b-miller changed the title ~~update rmm numba mem manager to handle new bindings~~ Update RMMNumbaManager to handle NUMBA_CUDA_USE_NVIDIA_BINDING=1 Mar 22, 2022

github-actions bot added the Python Related to RMM Python API label Mar 22, 2022

harrism added the bug Something isn't working label Mar 22, 2022

update get_ipc_handle

2a0d3ed

brandon-b-miller changed the base branch from branch-22.04 to branch-22.06 March 28, 2022 14:11

galipremsagar added the non-breaking Non-breaking change label Mar 28, 2022

gmarkall requested changes Mar 30, 2022

View reviewed changes

shwina approved these changes Apr 7, 2022

View reviewed changes

gmarkall approved these changes Apr 7, 2022

View reviewed changes

brandon-b-miller added 2 commits April 8, 2022 07:00

merge latest

fbf56d2

style

09b7a6b

rapids-bot bot merged commit a067498 into rapidsai:branch-22.06 Apr 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `RMMNumbaManager` to handle `NUMBA_CUDA_USE_NVIDIA_BINDING=1` #1004

Update `RMMNumbaManager` to handle `NUMBA_CUDA_USE_NVIDIA_BINDING=1` #1004

brandon-b-miller commented Mar 22, 2022

harrism commented Mar 22, 2022

gmarkall commented Mar 24, 2022

brandon-b-miller commented Mar 24, 2022

brandon-b-miller commented Mar 29, 2022

gmarkall left a comment

gmarkall commented Mar 30, 2022

gmarkall commented Apr 7, 2022

gmarkall left a comment

brandon-b-miller commented Apr 8, 2022

shwina commented Apr 8, 2022

Update RMMNumbaManager to handle NUMBA_CUDA_USE_NVIDIA_BINDING=1 #1004

Update RMMNumbaManager to handle NUMBA_CUDA_USE_NVIDIA_BINDING=1 #1004

Conversation

brandon-b-miller commented Mar 22, 2022

harrism commented Mar 22, 2022

gmarkall commented Mar 24, 2022

brandon-b-miller commented Mar 24, 2022

brandon-b-miller commented Mar 29, 2022

gmarkall left a comment

Choose a reason for hiding this comment

gmarkall commented Mar 30, 2022

gmarkall commented Apr 7, 2022

gmarkall left a comment

Choose a reason for hiding this comment

brandon-b-miller commented Apr 8, 2022

shwina commented Apr 8, 2022

Update `RMMNumbaManager` to handle `NUMBA_CUDA_USE_NVIDIA_BINDING=1` #1004

Update `RMMNumbaManager` to handle `NUMBA_CUDA_USE_NVIDIA_BINDING=1` #1004