Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent kernel launch with no configuration, remove autotuner #5061

Merged
merged 10 commits into from Feb 27, 2020

Conversation

gmarkall
Copy link
Member

This is based on @stuartarchibald's PR #4468, with additions:

  • Removal of the autotuner. The autotuner test was one that started failing when kernels required a launch configuration, so rather than fix the test, I've removed the functionality as it was scheduled to be removed anyway (and deprecated since 0.40).
  • Make CUDAKernel objects (those with an explicit list of types to compile) also raise when not configured, as well as AutoJitCUDAKernel objects.
  • The CUDA Simulator also raises for unconfigured kernels.
  • Fix up the existing tests
  • Add tests of the raise for unconfigured kernels.

gmarkall and others added 6 commits January 10, 2020 06:58
The autotuner has been deprecated since Numba 0.40.
Any unconfigured calls were using the default of one thread and one
block, so this is made explicit.
This patch prevents the launch of a CUDA kernel with no
configuration as this causes confusion for (especially) first time
users. Current behaviour is that if no launch config is specified
then a default everything-set-to-1 config is used, new behaviour is
that if no launch config is specified then an exception is raised
pointing users to the syntax and documentation.
The `normalize_kernel_dimensions` function validates that the kernel has
been configured, so it must be called in a CUDAKernel call (in addition
to an AutoJitCUDAKernel call) to ensure that the kernel has been
configured.
@sklam sklam added CUDA CUDA related issue/PR 2 - In Progress labels Jan 10, 2020
@stuartarchibald stuartarchibald added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Jan 10, 2020
@gmarkall
Copy link
Member Author

Removing [WIP] as it passed the CI tests.

@gmarkall gmarkall changed the title [WIP] Prevent kernel launch with no configuration, remove autotuner Prevent kernel launch with no configuration, remove autotuner Jan 10, 2020
@seibert
Copy link
Contributor

seibert commented Jan 10, 2020

running numba_smoketest_cuda_19 as well

@seibert
Copy link
Contributor

seibert commented Jan 10, 2020

Something about this PR is causing CUDA to be initialized before the test runner forks to run tests in parallel:

[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - ======================================================================
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - ERROR: test_unconfigured_autojitcudakernel (numba.cuda.tests.cudapy.test_errors.TestJitErrors)
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - ----------------------------------------------------------------------
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - Traceback (most recent call last):
[2020-01-10 17:28:10,889] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/tests/cudapy/test_errors.py", line 56, in test_unconfigured_autojitcudakernel
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - self._test_unconfigured(kernfunc)
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/tests/cudapy/test_errors.py", line 46, in _test_unconfigured
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - kernfunc(0)
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 761, in __call__
[2020-01-10 17:28:10,890] {docker_operator.py:265} INFO - kernel = self.specialize(*args)
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 772, in specialize
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - kernel = self.compile(argtypes)
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 782, in compile
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - cc = get_current_device().compute_capability
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/api.py", line 343, in get_current_device
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - return current_context().device
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 213, in get_context
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - return _runtime.get_or_create_context(devnum)
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 139, in get_or_create_context
[2020-01-10 17:28:10,891] {docker_operator.py:265} INFO - return self._get_or_create_context_uncached(devnum)
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 152, in _get_or_create_context_uncached
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - with driver.get_active_context() as ac:
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 387, in __enter__
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - driver.cuCtxGetCurrent(byref(hctx))
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - self._check_error(fname, retcode)
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 328, in _check_error
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - raise CudaDriverError("CUDA initialized before forking")
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - 
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - ======================================================================
[2020-01-10 17:28:10,892] {docker_operator.py:265} INFO - ERROR: test_unconfigured_cudakernel (numba.cuda.tests.cudapy.test_errors.TestJitErrors)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - ----------------------------------------------------------------------
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - Traceback (most recent call last):
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/tests/cudapy/test_errors.py", line 51, in test_unconfigured_cudakernel
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - kernfunc = cuda.jit("void(int32)")(noop)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/decorators.py", line 101, in kernel_jit
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - kernel.bind()
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 533, in bind
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - self._func.get()
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/compiler.py", line 406, in get
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - cuctx = get_context()
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 213, in get_context
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - return _runtime.get_or_create_context(devnum)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 139, in get_or_create_context
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - return self._get_or_create_context_uncached(devnum)
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 152, in _get_or_create_context_uncached
[2020-01-10 17:28:10,893] {docker_operator.py:265} INFO - with driver.get_active_context() as ac:
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 387, in __enter__
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - driver.cuCtxGetCurrent(byref(hctx))
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 294, in safe_cuda_api_call
[2020-01-10 17:28:10,894] {docker_operator.py:265} INFO - self._check_error(fname, retcode)
[2020-01-10 17:28:10,895] {docker_operator.py:265} INFO - File "/opt/conda/envs/testenv_f553a2aa-c1a1-4235-9f2d-178b245ca294/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 328, in _check_error
[2020-01-10 17:28:10,895] {docker_operator.py:265} INFO - raise CudaDriverError("CUDA initialized before forking")
[2020-01-10 17:28:10,895] {docker_operator.py:265} INFO - numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking

@gmarkall
Copy link
Member Author

Ah - I'm not in the habit of running CUDA tests with -m so I missed this - will investigate first thing on Monday. Thanks for the pointer!

@gmarkall
Copy link
Member Author

Looking at the diff, I can see the problem - TestJitErrors has no SerialMixin - this wasn't a problem before because its tests (expectedly) failed to even configure a kernel, but it's a problem now that it gets as far as trying to call kernels in the new tests.

Now that the TestJitErrors class contains test cases that call CUDA
functions, it requires the SerialMixin, otherwise it will be executed in
a child process after the parent already used CUDA (which is not
supported) when testing in parallel.
@gmarkall
Copy link
Member Author

This should be resolved now - I have tested with python -m numba.runtests numba.cuda.tests -m and all passed successfully.

I believe the test runner always initializes CUDA before the test runner forks to run tests in parallel, as it uses CUDA during test discovery, and this seems to me to be the reason that all the CUDA tests have the SerialMixin. I think it would be an improvement if the test runner didn't initialize CUDA in the main process, as this would remove a barrier to running CUDA tests in parallel. I'm not sure how difficult it would be to resolve this (e.g. make all test discovery run in a child process), but it's something I'd mentioned to @stuartarchibald informally, and hoped I could take a look at in conjunction with contributing to py.test support sometime in the future.

@seibert
Copy link
Contributor

seibert commented Jan 13, 2020

This has passed internal CI. Thanks for the fix!

@seibert seibert added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Jan 13, 2020
gmarkall added a commit to gmarkall/numba that referenced this pull request Jan 13, 2020
This test used CUDA functionality, so it needs the SerialMixin to
prevent it running in a child process after the test runner already
initialized CUDA in the parent process. It is moved into its own class
to add the SerialMixin, to preserve the ability to run other tests from
the same class to run in parallel.

It also uses a CUDA kernel without a launch configuration, which will
soon (pending PR numba#5061) be an error, so we add a launch configuration
to it.
@stuartarchibald stuartarchibald added this to the Numba 0.49 RC milestone Jan 13, 2020
@gmarkall
Copy link
Member Author

I'd somehow accidentally committed my Valgrind suppressions file to this branch - have now removed it.

@stuartarchibald
Copy link
Contributor

Thanks for implementing this and fixing up my patch @gmarkall. Any chance you could resolve the merge conflicts when you have a few minutes please? Thanks.

@seibert this is removing the autotune and occupancy properties in the CUDAKernel instances. Using these leads to a deprecation warning being issued (since 0.40), do you think we need to put this into the deprecation notices in the docs or does Sept 2018 until now constitute enough time to simply permit removal (guess this is a bit like numba.autojit in time frame of deprecation).

@stuartarchibald stuartarchibald added the 4 - Waiting on author Waiting for author to respond to review label Feb 6, 2020
@stuartarchibald
Copy link
Contributor

@seibert this is removing the autotune and occupancy properties in the CUDAKernel instances. Using these leads to a deprecation warning being issued (since 0.40), do you think we need to put this into the deprecation notices in the docs or does Sept 2018 until now constitute enough time to simply permit removal (guess this is a bit like numba.autojit in time frame of deprecation).

Out of band @sklam suggested outright removal as-is would be fine, I'm inclined to agree, deprecation notices have been served for a very long time.

@gmarkall
Copy link
Member Author

I've just merged master into this PR, and just waiting to see what CI does - on my machine with a GPU, all tests pass as expected for the numba.cuda.tests package.

@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm and removed 4 - Waiting on author Waiting for author to respond to review BuildFarm Passed For PRs that have been through the buildfarm and passed labels Feb 27, 2020
@stuartarchibald
Copy link
Contributor

Farm build ID numba_smoketest_cuda_29.

@stuartarchibald
Copy link
Contributor

Close/Open as CI got stuck due to github API having issues.

@stuartarchibald
Copy link
Contributor

Build farm passed.

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed and removed 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Feb 27, 2020
@stuartarchibald
Copy link
Contributor

Thanks for doing the merge @gmarkall, merge-in looks good.

@sklam sklam merged commit 5c4c82d into numba:master Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants