New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent kernel launch with no configuration, remove autotuner #5061
Conversation
The autotuner has been deprecated since Numba 0.40.
Any unconfigured calls were using the default of one thread and one block, so this is made explicit.
This patch prevents the launch of a CUDA kernel with no configuration as this causes confusion for (especially) first time users. Current behaviour is that if no launch config is specified then a default everything-set-to-1 config is used, new behaviour is that if no launch config is specified then an exception is raised pointing users to the syntax and documentation.
The `normalize_kernel_dimensions` function validates that the kernel has been configured, so it must be called in a CUDAKernel call (in addition to an AutoJitCUDAKernel call) to ensure that the kernel has been configured.
Removing [WIP] as it passed the CI tests. |
running |
Something about this PR is causing CUDA to be initialized before the test runner forks to run tests in parallel:
|
Ah - I'm not in the habit of running CUDA tests with |
Looking at the diff, I can see the problem - |
Now that the TestJitErrors class contains test cases that call CUDA functions, it requires the SerialMixin, otherwise it will be executed in a child process after the parent already used CUDA (which is not supported) when testing in parallel.
This should be resolved now - I have tested with I believe the test runner always initializes CUDA before the test runner forks to run tests in parallel, as it uses CUDA during test discovery, and this seems to me to be the reason that all the CUDA tests have the |
This has passed internal CI. Thanks for the fix! |
This test used CUDA functionality, so it needs the SerialMixin to prevent it running in a child process after the test runner already initialized CUDA in the parent process. It is moved into its own class to add the SerialMixin, to preserve the ability to run other tests from the same class to run in parallel. It also uses a CUDA kernel without a launch configuration, which will soon (pending PR numba#5061) be an error, so we add a launch configuration to it.
I'd somehow accidentally committed my Valgrind suppressions file to this branch - have now removed it. |
Thanks for implementing this and fixing up my patch @gmarkall. Any chance you could resolve the merge conflicts when you have a few minutes please? Thanks. @seibert this is removing the |
Out of band @sklam suggested outright removal as-is would be fine, I'm inclined to agree, deprecation notices have been served for a very long time. |
I've just merged master into this PR, and just waiting to see what CI does - on my machine with a GPU, all tests pass as expected for the |
Farm build ID |
Close/Open as CI got stuck due to github API having issues. |
Build farm passed. |
Thanks for doing the merge @gmarkall, merge-in looks good. |
This is based on @stuartarchibald's PR #4468, with additions:
CUDAKernel
objects (those with an explicit list of types to compile) also raise when not configured, as well asAutoJitCUDAKernel
objects.