Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Fix and deprecate inspect_ptx(), fix NVVM option setup for device functions #6953

Merged
merged 5 commits into from May 11, 2021

Conversation

gmarkall
Copy link
Member

This fixes issues with device functions:

  • inspect_ptx was broken by PR CUDA: Don't parse IR for modules with llvmlite #6735 (commit 11a7397). The device function should use the code library to obtain the PTX instead of attempting to compile directly itself.
  • Optimization and debug options were not passed to NVVM when compiling device functions. These options are now set up and passed to compile_cuda when a device function is compiled.

This commit also deprecates the use of inspect_ptx - the preferred API for compiling Python to PTX is to use the compile_ptx function instead. The inspect_ptx() method had a couple of issues:

  • Allowing it to receive a dict of options for NVVM enabled it to produce different PTX to what might actually have been compiled so far, as highlighted by the fact that optimization and debug flags failed to be passed to NVVM for the compile() method, but not in the inspect_ptx() method.
  • The NVVM options dict kwarg was unsafely initialized anyway (nvvm_options={}) - See issue Eliminate all dangerous default values found by pylint #5811.
  • It returned encoded bytes, which differs from similar APIs (e.g. inspect_llvm(), and the Dispatcher's inspect_asm(), which return str instead.

As there is no easy way to correctly and consistently pass NVVM options through inspect_ptx(), a warning is emitted stating that these are ignored if a user passes any options in.

Some tests are also added, to ensure that it works correctly, and warns the user appropriately.

Fixes #6950.

This fixes issues with device functions:

- `inspect_ptx` was broken by PR numba#6735 (commit 11a7397). The device
  function should use the code library to obtain the PTX instead of
  attempting to compile directly itself.
- Optimization and debug options were not passed to NVVM when compiling
  device functions. These options are now set up and passed to
  `compile_cuda` when a device function is compiled.

This commit also deprecates the use of `inspect_ptx` - the preferred
API for compiling Python to PTX is to use the `compile_ptx` function
instead. The `inspect_ptx()` method had a couple of issues:

- Allowing it to receive a dict of options for NVVM enabled it to
  produce different PTX to what might actually have been compiled so
  far, as highlighted by the fact that optimization and debug flags
  failed to be passed to NVVM for the `compile()` method, but not in the
  `inspect_ptx()` method.
- The NVVM options dict kwarg was unsafely initialized anyway
  (`nvvm_options={}`) - See issue numba#5811.
- It returned encoded bytes, which differs from similar APIs (e.g.
  `inspect_llvm()`, and the Dispatcher's `inspect_asm()`, which return
  `str` instead.

As there is no easy way to correctly and consistently pass NVVM options
through `inspect_ptx()`, a warning is emitted stating that these are
ignored if a user passes any options in.
@gmarkall gmarkall added CUDA CUDA related issue/PR 3 - Ready for Review labels Apr 20, 2021
gmarkall added a commit to gmarkall/cudf that referenced this pull request Apr 21, 2021
Some small Numba-related changes:

- Testing on CI appears to be picking up Numba 0.53.1, and there have
  been quite a lot of CUDA changes between Numba 0.49 (the old minimum)
  and 0.53.1 - increase the minimum required Numba version to 0.53.1
  accordingly.
- Remove old import guards and alternatives for Numba versions < 0.49 -
  these were needed because of Numba's internals refactor between 0.48
  and 0.49.
- Replace the use of `inspect_ptx()` with `compile_ptx()` in
  `test_ptx_generic()` - `inspect_ptx()` has always had various issues
  so it is being deprecated and will be removed in a later version.
  `compile_ptx()` provides a better alternative. See
  numba/numba#6953
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, looks good. Just a couple of minor queries to resolve. Thanks again.

numba/cuda/compiler.py Show resolved Hide resolved
Comment on lines +313 to +321
msg = ('inspect_ptx for device functions is deprecated. Use '
'compile_ptx instead.')
warn(msg, category=NumbaDeprecationWarning)

if nvvm_options:
msg = ('nvvm_options are ignored. Use compile_ptx if you want to '
'set NVVM options.')
warn(msg, category=NumbaDeprecationWarning)
return self.compile(args).library.get_asm_str().encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note in the deprecation notices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Apr 21, 2021
@stuartarchibald stuartarchibald added this to the Numba 0.54 RC milestone Apr 21, 2021
@gmarkall gmarkall requested review from esc and sklam as code owners April 22, 2021 08:01
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 22, 2021
@stuartarchibald
Copy link
Contributor

@gmarkall many thanks for the updates. Please could you resolve the conflict and then this can be run through the farm? Thanks again!

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Apr 23, 2021
@gmarkall
Copy link
Member Author

@gmarkall many thanks for the updates. Please could you resolve the conflict and then this can be run through the farm? Thanks again!

Thanks for the review - the conflicts are fixed but the tests fail due to master being broken for CUDA at the moment, e.g.:

======================================================================
ERROR: test_array_assign_all (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudadrv/test_cuda_array_slicing.py", line 273, in test_array_assign_all
    darr[:] = _400
  File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devices.py", line 224, in _require_cuda_context
    return fn(*args, **kws)
  File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py", line 570, in __setitem__
    return self._do_setitem(key, value)
  File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py", line 628, in _do_setitem
    _assign_kernel(lhs.ndim).forall(n_elements, stream=stream)(lhs, rhs)
  File "/home/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 438, in __call__
    kernel = self.kernel.specialize(*args)
  File "/home/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 990, in specialize
    specialization = Dispatcher(self.py_func, [types.void(*argtypes)],
  File "/home/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 868, in __init__
    self.compile(sigs[0])
  File "/home/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 1055, in compile
    kernel = _Kernel(self.py_func, argtypes, link=self.link,
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 481, in __init__
    cres = compile_cuda(self.py_func, types.void, self.argtypes,
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 157, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
  File "/home/gmarkall/numbadev/numba/numba/core/compiler.py", line 675, in compile_extra
    return pipeline.compile_extra(func)
  File "/home/gmarkall/numbadev/numba/numba/core/compiler.py", line 419, in compile_extra
    return self._compile_bytecode()
  File "/home/gmarkall/numbadev/numba/numba/core/compiler.py", line 483, in _compile_bytecode
    return self._compile_core()
  File "/home/gmarkall/numbadev/numba/numba/core/compiler.py", line 462, in _compile_core
    raise e
  File "/home/gmarkall/numbadev/numba/numba/core/compiler.py", line 453, in _compile_core
    pm.run(self.state)
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_machinery.py", line 339, in run
    raise patched_exception
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_machinery.py", line 330, in run
    self._runPass(idx, pass_inst, state)
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_machinery.py", line 289, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/home/gmarkall/numbadev/numba/numba/core/compiler_machinery.py", line 262, in check
    mangled = func(compiler_state)
  File "/home/gmarkall/numbadev/numba/numba/core/typed_passes.py", line 105, in run_pass
    typemap, return_type, calltypes, errs = type_inference_stage(
  File "/home/gmarkall/numbadev/numba/numba/core/typed_passes.py", line 83, in type_inference_stage
    errs = infer.propagate(raise_errors=raise_errors)
  File "/home/gmarkall/numbadev/numba/numba/core/typeinfer.py", line 1071, in propagate
    raise errors[0]
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<intrinsic to_fixed_tuple>) found for signature:
 
 >>> to_fixed_tuple(array(int64, 1d, C), Literal[int](2))
 
There are 2 candidate implementations:
     - Of which 2 did not match due to:
     Intrinsic in function 'to_fixed_tuple': File: numba/np/unsafe/ndarray.py: Line 41.
       With argument(s): '(array(int64, 1d, C), int64)':
      Rejected as the implementation raised a specific error:
        KeyError: <class 'numba.core.extending_hardware.CUDA'>
  raised from /home/gmarkall/numbadev/numba/numba/core/registry.py:99

During: resolving callee type: Function(<intrinsic to_fixed_tuple>)
During: typing of call at /home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py (452)


File "numba/cuda/cudadrv/devicearray.py", line 452:
    def kernel(lhs, rhs):
        <source elided>

        lhs[to_fixed_tuple(idx[0], ndim)] = rhs[to_fixed_tuple(idx[1], ndim)]
        ^

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Apr 27, 2021
Some small Numba-related changes:

- Testing on CI appears to be picking up Numba 0.53.1, and there have been quite a lot of CUDA changes between Numba 0.49 (the old minimum) and 0.53.1 - increase the minimum required Numba version to 0.53.1
  accordingly.
- Remove old import guards and alternatives for Numba versions < 0.49 - these were needed because of Numba's internals refactor between 0.48 and 0.49.
- Replace the use of `inspect_ptx()` with `compile_ptx()` in `test_ptx_generic()` - `inspect_ptx()` has always had various issues so it is being deprecated and will be removed in a later version. `compile_ptx()` provides a better alternative. See numba/numba#6953

Authors:
  - Graham Markall (https://github.com/gmarkall)
  - Keith Kraus (https://github.com/kkraus14)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #8017
@gmarkall
Copy link
Member Author

gmarkall commented May 4, 2021

@stuartarchibald Now that master is fixed, this PR is working locally for me again - I've merged in master too.

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels May 4, 2021
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and fixes.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_54.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_54.

Passed.

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 10, 2021
@sklam sklam merged commit 6c452e5 into numba:master May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge CUDA CUDA related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Numba master breaks cuDF
3 participants