Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Don't make a runtime call on import #6147

Merged
merged 4 commits into from Aug 21, 2020

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Aug 18, 2020

Calling runtime.get_version() on import initializes the CUDA driver, even though it creates no context. This is an issue for libraries that import Numba then set CUDA_VISIBLE_DEVICES, because the call at import time already initialized the driver and fixed the set of visible devices. This is an issue for, but not limited to, the Dask Local CUDA Cluster.

This PR fixes this by deferring the call to runtime.get_version() until the list of supported CCs is actually needed.

cc @kkraus14 @pentschev @quasiben

Fixes #6149.

Calling `runtime.get_version()` on import initializes the CUDA driver,
even though it creates no context. This is an issue for libraries that
import Numba then set `CUDA_VISIBLE_DEVICES`, because the call at import
time already initialized the driver and fixed the set of visible
devices. This is an issue for, but not limited to, the Dask Local CUDA
Cluster.

This commit fixes this by deferring the call to `runtime.get_version()`
until the list of supported CCs is actually needed.
@gmarkall gmarkall changed the title [WIPDon't make a runtime call on import [WIP] Don't make a runtime call on import Aug 18, 2020
@gmarkall gmarkall added this to the Numba 0.51.1 milestone Aug 18, 2020
@gmarkall gmarkall added the CUDA CUDA related issue/PR label Aug 18, 2020
@kkraus14
Copy link

FYI: This breaks more than just Dask, but Python multiprocessing in general as even spawning a process via multiprocessing will have it run imports from the parent process unless they're put under the if __name__ == "__main__" scope.

@gmarkall gmarkall changed the title [WIP] Don't make a runtime call on import CUDA: Don't make a runtime call on import Aug 18, 2020
@gmarkall
Copy link
Member Author

For some extra info - prior to the commit in this PR:

$ NUMBA_CUDA_LOG_LEVEL=DEBUG python -c "from numba import cuda; print('Done')"
== CUDA [199] DEBUG -- call runtime api: cudaRuntimeGetVersion
Done

With this PR / commit:

$ NUMBA_CUDA_LOG_LEVEL=DEBUG python -c "from numba import cuda; print('Done')"
Done

@gmarkall
Copy link
Member Author

Test added - the test requires multiple GPUs. For me on a single-GPU system (one RTX 8000), I get:

$ python -m numba.runtests numba.cuda.tests.cudadrv.test_runtime -v
test_get_version (numba.cuda.tests.cudadrv.test_runtime.TestRuntime) ... ok
test_visible_devices_set_after_import (numba.cuda.tests.cudadrv.test_runtime.TestRuntime)
    ... skipped 'This test requires multiple GPUs'

----------------------------------------------------------------------
Ran 2 tests in 0.019s

OK (skipped=1)

With an 8-GPU system (V100s):

$ python -m numba.runtests numba.cuda.tests.cudadrv.test_runtime -v
test_get_version (numba.cuda.tests.cudadrv.test_runtime.TestRuntime) ... ok
test_visible_devices_set_after_import (numba.cuda.tests.cudadrv.test_runtime.TestRuntime) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.618s

OK

and with CUDA_VISIBLE_DEVICES already set:

$ export CUDA_VISIBLE_DEVICES=6,7
$ python -m numba.runtests numba.cuda.tests.cudadrv.test_runtime -v
test_get_version (numba.cuda.tests.cudadrv.test_runtime.TestRuntime) ... ok
test_visible_devices_set_after_import (numba.cuda.tests.cudadrv.test_runtime.TestRuntime)
    ... skipped 'Cannot test when CUDA_VISIBLE_DEVICES already set'

----------------------------------------------------------------------
Ran 2 tests in 0.051s

OK (skipped=1)

@stuartarchibald stuartarchibald added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Aug 19, 2020
Copy link
Contributor

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @gmarkall !

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_94 for a8e0877

@stuartarchibald
Copy link
Contributor

Testsuite is failing on the farm universally with:

[2020-08-20 06:00:35,606]  INFO - ======================================================================
[2020-08-20 06:00:35,606]  INFO - FAIL: test_visible_devices_set_after_import (numba.cuda.tests.cudadrv.test_runtime.TestRuntime)
[2020-08-20 06:00:35,606]  INFO - ----------------------------------------------------------------------
[2020-08-20 06:00:35,606]  INFO - Traceback (most recent call last):
[2020-08-20 06:00:35,606]  INFO -   File "<path>\testenv_b522e3de-5976-43cf-ae68-63c7d7c1f2a3\lib\site-packages\numba\cuda\tests\cudadrv\test_runtime.py", line 51, in test_visible_devices_set_after_import
[2020-08-20 06:00:35,606]  INFO -     p.start()
[2020-08-20 06:00:35,606]  INFO -   File "<path>\envs\testenv_b522e3de-5976-43cf-ae68-63c7d7c1f2a3\lib\multiprocessing\process.py", line 118, in start
[2020-08-20 06:00:35,606]  INFO -     assert not _current_process._config.get('daemon'), \
[2020-08-20 06:00:35,606]  INFO - AssertionError: daemonic processes are not allowed to have children
[2020-08-20 06:00:35,606]  INFO - 
[2020-08-20 06:00:35,606]  INFO - ----------------------------------------------------------------------
[2020-08-20 06:00:35,606]  INFO - Ran 665 tests in 144.755s
[2020-08-20 06:00:35,606]  INFO - 
[2020-08-20 06:00:35,606]  INFO - FAILED (failures=1, skipped=18)

This commit fixes the error:

```
[2020-08-20 06:00:35,606]  INFO - FAIL: test_visible_devices_set_after_import (numba.cuda.tests.cudadrv.test_runtime.TestRuntime)
[2020-08-20 06:00:35,606]  INFO - ----------------------------------------------------------------------
[2020-08-20 06:00:35,606]  INFO - Traceback (most recent call last):
[2020-08-20 06:00:35,606]  INFO -   File "<path>\testenv_b522e3de-5976-43cf-ae68-63c7d7c1f2a3\lib\site-packages\numba\cuda\tests\cudadrv\test_runtime.py", line 51, in test_visible_devices_set_after_import
[2020-08-20 06:00:35,606]  INFO -     p.start()
[2020-08-20 06:00:35,606]  INFO -   File "<path>\envs\testenv_b522e3de-5976-43cf-ae68-63c7d7c1f2a3\lib\multiprocessing\process.py", line 118, in start
[2020-08-20 06:00:35,606]  INFO -     assert not _current_process._config.get('daemon'), \
[2020-08-20 06:00:35,606]  INFO - AssertionError: daemonic processes are not allowed to have children
```

by putting `test_visible_devices_set_after_import` in its own test class
with a `SerialMixin`.
@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 3 - Ready for Review labels Aug 20, 2020
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_95 for 9988d56

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed and removed 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Aug 21, 2020
@sklam sklam merged commit 79b7b3a into numba:master Aug 21, 2020
gmarkall added a commit to gmarkall/numba that referenced this pull request Aug 25, 2020
PR numba#6147 broke the test_cuda_submodules test because the
test_nvvm_driver test could no longer be imported - the change from
SUPPORTED_CC to get_supported_ccs in nvvm.py was not reflected in the
CUDA simulator.

This commit fixes the issue by changing SUPPORTED_CC to
get_supported_ccs in the simulator.
@stuartarchibald stuartarchibald mentioned this pull request Aug 26, 2020
21 tasks
stuartarchibald pushed a commit to stuartarchibald/numba that referenced this pull request Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA: GPU list frozen at import time with 0.51
6 participants