Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Support for @overload #6944

Merged
merged 5 commits into from Apr 22, 2021
Merged

CUDA: Support for @overload #6944

merged 5 commits into from Apr 22, 2021

Conversation

gmarkall
Copy link
Member

Joint work with @stuartarchibald.

The implementation consists of:

  • Addition of get_call_template, nopython_signatures, and get_overload to the class that is now DeviceDispatcher. These are needed by the overload infrastructure, and are inspired by the equivalents in _DispatcherBase.
  • Registration of a CUDA jit decorator. The registration with dispatcher_register is removed and replaced with this new registration, because it did nothing (it was for the target kwarg to the numba.jit decorator, which is deprecated). The jit decorator for overloads is a bit special in that it always adds the device=True kwarg to the regular jit decorator, because overloads will always be device functions.
  • Addition of a no_cpython_wrapper kwarg to jitdevice, which is needed because the overload infrastructure passes it. No action is required regardless of its value, because there is no CPython wrapper in the CUDA target.

A refactoring is also made: DeviceFunctionTemplate is renamed to DeviceDispatcher, as it reflects more closely what it is - it is
analagous to the Dispatcher class used for kernels, and the DeviceFunction class is more analagous to the _Kernel class. Eventually Dispatcher will subsume these classes as we move towards a more "modern" dispatcher implementation in the CUDA target.

Tests are added; these are all skipped on the simulator because overloading doesn't really exist as a concept without compilation.

Joint work with @stuartarchibald.

The implementation consists of:

- Addition of `get_call_template`, `nopython_signatures`, and
  `get_overload` to the class that is now `DeviceDispatcher`. These are
  needed by the overload infrastructure, and are inspired by the
  equivalents in `_DispatcherBase`.
- Registration of a CUDA `jit` decorator. The registration with
  `dispatcher_register` is removed and replaced with this new
  registration, because it did nothing (it was for the `target` kwarg to
  the `numba.jit` decorator, which is deprecated). The `jit` decorator
  for overloads is a bit special in that it always adds the
  `device=True` kwarg to the regular `jit` decorator, because overloads
  will always be device functions.
- Addition of a `no_cpython_wrapper` kwarg to `jitdevice`, which is
  needed because the overload infrastructure passes it. No action is
  required regardless of its value, because there is no CPython wrapper
  in the CUDA target.

A refactoring is also made: `DeviceFunctionTemplate` is renamed to
`DeviceDispatcher`, as it reflects more closely what it is - it is
analagous to the `Dispatcher` class used for kernels, and the
`DeviceFunction` class is more analagous to the `_Kernel` class.
Eventually `Dispatcher` will subsume these classes as we move towards a
more "modern" dispatcher implementation in the CUDA target.

Tests are added; these are all skipped on the simulator because
overloading doesn't really exist as a concept without compilation.
@stuartarchibald
Copy link
Contributor

This failure: https://dev.azure.com/numba/ff1fe4d0-ed73-4f1c-b894-1d50a27e048f/_apis/build/builds/8536/logs/130 AFAICT is nothing to do with this PR, fixed in #6945

@gmarkall
Copy link
Member Author

A quick note on this, mainly for @stuartarchibald: Earlier versions of this had dispatcher_registry[hardware_registry["cuda"]] = Dispatcher in initialize_all(), but I removed it for this patch because it wasn't used for @overload - it will need adding back sometime. My understanding was that this is only needed for the @jit decorator with a hardware / target (whatever we end up calling it) kwarg.

@stuartarchibald
Copy link
Contributor

A quick note on this, mainly for @stuartarchibald: Earlier versions of this had dispatcher_registry[hardware_registry["cuda"]] = Dispatcher in initialize_all(), but I removed it for this patch because it wasn't used for @overload - it will need adding back sometime. My understanding was that this is only needed for the @jit decorator with a hardware / target (whatever we end up calling it) kwarg.

Agree, I think the registrations needed to get overload working at present can avoid having a dispatcher registered.

Using prints and capturing stdout to test overloads is both overly
complicated and difficult to debug (terminal-based debuggers and
debug prints don't work during the tests). The value-based check that
replaces it suffers neither of these issues, and it is at least as clear
what each test does (more so, in my opinion).
@gmarkall
Copy link
Member Author

@stuartarchibald As you suggested OOB I've replaced the printing and captured output in the tests with a value-based check - this is much tidier, and easier to deal with issues.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @gmarkall, great to see this working :) Once minor comment to resolve else looks good.

Comment on lines 260 to 286
def test_call_hardware_overloaded(self):
def kernel(x):
hardware_overloaded(x)

expected = CUDA_HARDWARE_OL
self.check_overload(kernel, expected)

def test_generic_calls_hardware_overloaded(self):
def kernel(x):
generic_calls_hardware_overloaded(x)

expected = GENERIC_CALLS_HARDWARE_OL * CUDA_HARDWARE_OL
self.check_overload(kernel, expected)

def test_cuda_calls_hardware_overloaded(self):
def kernel(x):
cuda_calls_hardware_overloaded(x)

expected = CUDA_CALLS_HARDWARE_OL * CUDA_HARDWARE_OL
self.check_overload(kernel, expected)

def test_hardware_overloaded_calls_hardware_overloaded(self):
def kernel(x):
hardware_overloaded_calls_hardware_overloaded(x)

expected = CUDA_HARDWARE_OL_CALLS_HARDWARE_OL * CUDA_HARDWARE_OL
self.check_overload(kernel, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section, maybe test that the CPU calls would pick up the "generic" variants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Apr 21, 2021
This is checking when there are both generic and CUDA overloads, to
ensure the CUDA overloads don't interfere with the generic / CPU target.

From PR numba#6944 feedback.
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 22, 2021
@stuartarchibald
Copy link
Contributor

Thanks for adding the extra tests in f1b54d0, this looks good. There's a conflict to resolve against mainline and once done this can be tested on the farm. Thanks again.

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Apr 22, 2021
@gmarkall
Copy link
Member Author

@stuartarchibald Many thanks, conflict resolved.

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 22, 2021
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this working, it's great to see the new API in action!

@stuartarchibald
Copy link
Contributor

@stuartarchibald Many thanks, conflict resolved.

Thanks, patch approved.

@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Apr 22, 2021
@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_49.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_49.

Passed.

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed and removed 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Apr 22, 2021
@sklam sklam merged commit 432588b into numba:master Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants