CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

Marking a subprogram, variable, or location in the DIBuilder API requires passing in a `numba.core.ir.Loc` object. This makes sense when all debuginfo generation comes from lowering, but creates weird dependencies when generating code that doesn't relate to a specific piece of IR - for example, the kernel wrapper generated in the CUDA target. To mark locations in the kernel wrapper, one must import the `Loc` class from `numba.core.ir`, and construct instances of it to pass in, just to mark locations in generated code (or do something more odd with a "pretend" `Loc` object). Since the `Loc` instances are only used for the line number, this commit changes the interface of the relevant `DIBuilder` methods to accept a line number instead of a `Loc`, breaking the spurious coupling.

Doing this provides a couple of improvements: - Breaking on any kernel launch, for example with `set cuda break_on_launch application` now results in GDB being able to find the source line at the break location - this enables the backtrace to show the mangled function name, source file, and line number. It also enables the text user interface to display the source when the kernel launches. From this point, stepping through line-by-line is feasible with the `next` command. - Breakpoints can be set on a specific function and argument types, by setting a breakpoint on the mangled name. (neither of these worked prior to this change) Issues that remain include: - GDB doesn't demangle the names of functions, like it seems to in the CPU target examples. Oddly, the act of marking the subprogram seems to break demangling - prior to this commit, the location of the wrapper kernel isn't known, but it does demangle correctly. - Most locals appear to be optimized out even with `opt=0`. This also needs further investigation.

Includes: - Calls `llvm_to_ptx` once for each IR module when compiling for debug. - Don't adjust linkage of functions in linked modules when debugging, because we need device functions to be externally visible. - Fixed setting of NVVM options when calling `compile_cuda` from kernel compilation and device function template compilation. - Removes `debug_pubnames` patch. Outcomes: - The "Error: Debugging support cannot be enabled when number of debug compile units is more than 1" message is no longer produced with NVVM 3.4. - NVVM 7.0: Everything still seems to "work" as much as it did before. Stepping may be more stable, but this needs a bit more verification. Fixes numba#5311.

…vements

- Use `self.py_func.__code__` to get the filename and line number for the function for `prepare_cuda_kernel`, which seems a bit cleaner / neater that digging into the type annotation for it. - Fix typo and explicitly link to an issue comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

Commits on Mar 18, 2021

Commits on Mar 19, 2021

Commits on May 25, 2021

Commits on Jun 16, 2021

Commits on Jun 21, 2021