Prevent kernel launch with no configuration, remove autotuner #5061

gmarkall · 2020-01-10T15:05:28Z

This is based on @stuartarchibald's PR #4468, with additions:

Removal of the autotuner. The autotuner test was one that started failing when kernels required a launch configuration, so rather than fix the test, I've removed the functionality as it was scheduled to be removed anyway (and deprecated since 0.40).
Make CUDAKernel objects (those with an explicit list of types to compile) also raise when not configured, as well as AutoJitCUDAKernel objects.
The CUDA Simulator also raises for unconfigured kernels.
Fix up the existing tests
Add tests of the raise for unconfigured kernels.

The autotuner has been deprecated since Numba 0.40.

Any unconfigured calls were using the default of one thread and one block, so this is made explicit.

This patch prevents the launch of a CUDA kernel with no configuration as this causes confusion for (especially) first time users. Current behaviour is that if no launch config is specified then a default everything-set-to-1 config is used, new behaviour is that if no launch config is specified then an exception is raised pointing users to the syntax and documentation.

The `normalize_kernel_dimensions` function validates that the kernel has been configured, so it must be called in a CUDAKernel call (in addition to an AutoJitCUDAKernel call) to ensure that the kernel has been configured.

gmarkall · 2020-01-10T17:01:00Z

Removing [WIP] as it passed the CI tests.

seibert · 2020-01-10T17:19:17Z

running numba_smoketest_cuda_19 as well

seibert · 2020-01-10T17:53:55Z

Something about this PR is causing CUDA to be initialized before the test runner forks to run tests in parallel:

[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - ======================================================================
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - ERROR: test_unconfigured_autojitcudakernel (numba.cuda.tests.cudapy.test_errors.TestJitErrors)
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - ----------------------------------------------------------------------
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - Traceback (most recent call last):
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/tests/cudapy/test_errors.py", line 56, in test_unconfigured_autojitcudakernel
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - self._test_unconfigured(kernfunc)
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/tests/cudapy/test_errors.py", line 46, in _test_unconfigured
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - kernfunc(0)
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 761, in __call__
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - kernel = self.specialize(*args)
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 772, in specialize
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - kernel = self.compile(argtypes)
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 782, in compile
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - cc = get_current_device().compute_capability
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/api.py", line 343, in get_current_device
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - return current_context().device
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 213, in get_context
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - return _runtime.get_or_create_context(devnum)
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 139, in get_or_create_context
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - return self._get_or_create_context_uncached(devnum)
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 152, in _get_or_create_context_uncached
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - with driver.get_active_context() as ac:
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 387, in __enter__
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - driver.cuCtxGetCurrent(byref(hctx))
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - self._check_error(fname, retcode)
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 328, in _check_error
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - raise CudaDriverError("CUDA initialized before forking")
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - 
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - ======================================================================
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - ERROR: test_unconfigured_cudakernel (numba.cuda.tests.cudapy.test_errors.TestJitErrors)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - ----------------------------------------------------------------------
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - Traceback (most recent call last):
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/tests/cudapy/test_errors.py", line 51, in test_unconfigured_cudakernel
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - kernfunc = cuda.jit("void(int32)")(noop)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/decorators.py", line 101, in kernel_jit
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - kernel.bind()
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 533, in bind
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - self._func.get()
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 406, in get
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - cuctx = get_context()
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 213, in get_context
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - return _runtime.get_or_create_context(devnum)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 139, in get_or_create_context
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - return self._get_or_create_context_uncached(devnum)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 152, in _get_or_create_context_uncached
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - with driver.get_active_context() as ac:
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 387, in __enter__
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - driver.cuCtxGetCurrent(byref(hctx))
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - self._check_error(fname, retcode)
[2020-01-10 17:28:10,895] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 328, in _check_error
[2020-01-10 17:28:10,895] {docker_operator.py:265} INFO - raise CudaDriverError("CUDA initialized before forking")
[2020-01-10 17:28:10,895] {docker_operator.py:265} INFO - numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking

gmarkall · 2020-01-10T22:16:46Z

Ah - I'm not in the habit of running CUDA tests with -m so I missed this - will investigate first thing on Monday. Thanks for the pointer!

gmarkall · 2020-01-10T22:18:40Z

Looking at the diff, I can see the problem - TestJitErrors has no SerialMixin - this wasn't a problem before because its tests (expectedly) failed to even configure a kernel, but it's a problem now that it gets as far as trying to call kernels in the new tests.

Now that the TestJitErrors class contains test cases that call CUDA functions, it requires the SerialMixin, otherwise it will be executed in a child process after the parent already used CUDA (which is not supported) when testing in parallel.

gmarkall · 2020-01-13T09:30:32Z

This should be resolved now - I have tested with python -m numba.runtests numba.cuda.tests -m and all passed successfully.

I believe the test runner always initializes CUDA before the test runner forks to run tests in parallel, as it uses CUDA during test discovery, and this seems to me to be the reason that all the CUDA tests have the SerialMixin. I think it would be an improvement if the test runner didn't initialize CUDA in the main process, as this would remove a barrier to running CUDA tests in parallel. I'm not sure how difficult it would be to resolve this (e.g. make all test discovery run in a child process), but it's something I'd mentioned to @stuartarchibald informally, and hoped I could take a look at in conjunction with contributing to py.test support sometime in the future.

seibert · 2020-01-13T13:48:34Z

This has passed internal CI. Thanks for the fix!

This test used CUDA functionality, so it needs the SerialMixin to prevent it running in a child process after the test runner already initialized CUDA in the parent process. It is moved into its own class to add the SerialMixin, to preserve the ability to run other tests from the same class to run in parallel. It also uses a CUDA kernel without a launch configuration, which will soon (pending PR numba#5061) be an error, so we add a launch configuration to it.

gmarkall · 2020-01-21T15:55:59Z

I'd somehow accidentally committed my Valgrind suppressions file to this branch - have now removed it.

stuartarchibald · 2020-02-06T12:29:39Z

Thanks for implementing this and fixing up my patch @gmarkall. Any chance you could resolve the merge conflicts when you have a few minutes please? Thanks.

@seibert this is removing the autotune and occupancy properties in the CUDAKernel instances. Using these leads to a deprecation warning being issued (since 0.40), do you think we need to put this into the deprecation notices in the docs or does Sept 2018 until now constitute enough time to simply permit removal (guess this is a bit like numba.autojit in time frame of deprecation).

stuartarchibald · 2020-02-27T13:36:20Z

@seibert this is removing the autotune and occupancy properties in the CUDAKernel instances. Using these leads to a deprecation warning being issued (since 0.40), do you think we need to put this into the deprecation notices in the docs or does Sept 2018 until now constitute enough time to simply permit removal (guess this is a bit like numba.autojit in time frame of deprecation).

Out of band @sklam suggested outright removal as-is would be fine, I'm inclined to agree, deprecation notices have been served for a very long time.

…nfig

gmarkall · 2020-02-27T14:59:27Z

I've just merged master into this PR, and just waiting to see what CI does - on my machine with a GPU, all tests pass as expected for the numba.cuda.tests package.

stuartarchibald · 2020-02-27T15:55:02Z

Farm build ID numba_smoketest_cuda_29.

stuartarchibald · 2020-02-27T17:09:01Z

Close/Open as CI got stuck due to github API having issues.

stuartarchibald · 2020-02-27T18:06:44Z

Build farm passed.

stuartarchibald · 2020-02-27T18:07:32Z

Thanks for doing the merge @gmarkall, merge-in looks good.

gmarkall and others added 6 commits January 10, 2020 06:58

Remove the autotuner

7e169e1

The autotuner has been deprecated since Numba 0.40.

Tests: ensure all kernel calls are configured

c67e2b3

Any unconfigured calls were using the default of one thread and one block, so this is made explicit.

Normalize kernel dimensions in CUDAKernel.__call__

ff4f86c

The `normalize_kernel_dimensions` function validates that the kernel has been configured, so it must be called in a CUDAKernel call (in addition to an AutoJitCUDAKernel call) to ensure that the kernel has been configured.

Raise ValueError in simulator if kernel not configured

dc120bc

Add tests for unconfigured kernel error

ad75d11

sklam added CUDA CUDA related issue/PR 2 - In Progress labels Jan 10, 2020

Fix flake8 errors in errors.py

23334fd

stuartarchibald mentioned this pull request Jan 10, 2020

Prevent CUDA kernel launch without a specified launch config. #4468

Closed

stuartarchibald added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Jan 10, 2020

gmarkall changed the title ~~[WIP] Prevent kernel launch with no configuration, remove autotuner~~ Prevent kernel launch with no configuration, remove autotuner Jan 10, 2020

stuartarchibald added 3 - Ready for Review and removed 2 - In Progress labels Jan 10, 2020

Add SerialMixin to TestJitErrors

773b669

Now that the TestJitErrors class contains test cases that call CUDA functions, it requires the SerialMixin, otherwise it will be executed in a child process after the parent already used CUDA (which is not supported) when testing in parallel.

seibert added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Jan 13, 2020

stuartarchibald added this to the Numba 0.49 RC milestone Jan 13, 2020

Remove accidentally-committed suppressions file

252f226

stuartarchibald removed the 3 - Ready for Review label Feb 6, 2020

stuartarchibald added the 4 - Waiting on author Waiting for author to respond to review label Feb 6, 2020

Merge remote-tracking branch 'numba/master' into grm-fix-no-launch-co…

7d28818

…nfig

stuartarchibald approved these changes Feb 27, 2020

View reviewed changes

stuartarchibald closed this Feb 27, 2020

stuartarchibald reopened this Feb 27, 2020

sklam merged commit 5c4c82d into numba:master Feb 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent kernel launch with no configuration, remove autotuner #5061

Prevent kernel launch with no configuration, remove autotuner #5061

gmarkall commented Jan 10, 2020

gmarkall commented Jan 10, 2020

seibert commented Jan 10, 2020

seibert commented Jan 10, 2020

gmarkall commented Jan 10, 2020

gmarkall commented Jan 10, 2020

gmarkall commented Jan 13, 2020

seibert commented Jan 13, 2020

gmarkall commented Jan 21, 2020

stuartarchibald commented Feb 6, 2020

stuartarchibald commented Feb 27, 2020

gmarkall commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020

Prevent kernel launch with no configuration, remove autotuner #5061

Prevent kernel launch with no configuration, remove autotuner #5061

Conversation

gmarkall commented Jan 10, 2020

gmarkall commented Jan 10, 2020

seibert commented Jan 10, 2020

seibert commented Jan 10, 2020

gmarkall commented Jan 10, 2020

gmarkall commented Jan 10, 2020

gmarkall commented Jan 13, 2020

seibert commented Jan 13, 2020

gmarkall commented Jan 21, 2020

stuartarchibald commented Feb 6, 2020

stuartarchibald commented Feb 27, 2020

gmarkall commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020

stuartarchibald commented Feb 27, 2020