CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

gmarkall · 2021-03-18T15:03:28Z

This PR improves a couple of aspects of debugging in CUDA:

Breaking on any kernel launch, for example with set cuda break_on_launch application now results in GDB being able to find the source line at the break location - this enables the backtrace to show the mangled function name, source file, and line number. It also enables the text user interface to display the source when the kernel launches. From this point, stepping through
line-by-line is feasible with the next command.
Breakpoints can be set on a specific function and argument types, by setting a breakpoint on the mangled name.

(neither of these worked prior to this change).

This PR also separates the DIBuilder API from the Numba IR API in the first commit, to avoid building Numba IR for no reason (or strange reasons) when generating the kernel wrapper.

There's still a little API weirdness that bothers me, but I'm not sure I can see the better way - perhaps the reviewer can suggest an improvement (I'll comment on the diff).

Marking a subprogram, variable, or location in the DIBuilder API requires passing in a `numba.core.ir.Loc` object. This makes sense when all debuginfo generation comes from lowering, but creates weird dependencies when generating code that doesn't relate to a specific piece of IR - for example, the kernel wrapper generated in the CUDA target. To mark locations in the kernel wrapper, one must import the `Loc` class from `numba.core.ir`, and construct instances of it to pass in, just to mark locations in generated code (or do something more odd with a "pretend" `Loc` object). Since the `Loc` instances are only used for the line number, this commit changes the interface of the relevant `DIBuilder` methods to accept a line number instead of a `Loc`, breaking the spurious coupling.

Doing this provides a couple of improvements: - Breaking on any kernel launch, for example with `set cuda break_on_launch application` now results in GDB being able to find the source line at the break location - this enables the backtrace to show the mangled function name, source file, and line number. It also enables the text user interface to display the source when the kernel launches. From this point, stepping through line-by-line is feasible with the `next` command. - Breakpoints can be set on a specific function and argument types, by setting a breakpoint on the mangled name. (neither of these worked prior to this change) Issues that remain include: - GDB doesn't demangle the names of functions, like it seems to in the CPU target examples. Oddly, the act of marking the subprogram seems to break demangling - prior to this commit, the location of the wrapper kernel isn't known, but it does demangle correctly. - Most locals appear to be optimized out even with `opt=0`. This also needs further investigation.

gmarkall · 2021-03-18T15:48:37Z

numba/cuda/compiler.py

@@ -479,8 +481,11 @@ def __init__(self, py_func, argtypes, link=None, debug=False, inline=False,
        }

        tgt_ctx = cres.target_context
+        filename = cres.type_annotation.filename


It's weird that I have to get the source file name from the type annotation at this stage, but I think the function IR is now lost (it was never saved after the pipeline finished).

It's probably in the self.py_func.__code__.co_filename and cres.fndesc.lookup_module.__file__ too if they are preferable? IIRC the compile result doesn't hold reference to the IR as the IR could hold reference to e.g. globals, some of which might be large (e.g. NumPy arrays) and this would create memory pressure etc.

gmarkall · 2021-03-18T15:49:27Z

numba/cuda/compiler.py

@@ -479,8 +481,11 @@ def __init__(self, py_func, argtypes, link=None, debug=False, inline=False,
        }

        tgt_ctx = cres.target_context
+        filename = cres.type_annotation.filename
+        linenum = int(cres.type_annotation.linenum)


Also, the line number being a string in the type annotation is a minor problem, as LLVM expects line numbers to be integers (e.g. 5), not strings ("5").

foo.py_func.__code__.co_firstlineno might help, as might inspect module?

gmarkall · 2021-03-18T18:00:57Z

Example of the text user interface showing source layout and backtrace (start of the kernel is repro.py line 7, the wrapper function name begins _ZN6cudapy8):

gmarkall · 2021-03-19T11:40:52Z

A couple more notes on the lack of un-mangling in cuda-gdb:

For the CUDA target, DICompileUnit metadata looks like:

!2 = distinct !DICompileUnit(emissionKind: 1, file: !1, isOptimized: true, language: DW_LANG_Python, producer: "Numba", runtimeVersion: 0)

whereas on the CPU target the emissionKind is FullDebug, i.e. like:

!2 = distinct !DICompileUnit(emissionKind: FullDebug, file: !1, isOptimized: true, language: DW_LANG_Python, producer: "Numba", runtimeVersion: 0)

At the moment the name and the linkage name are the same for each subprogram, e.g.:

!5 = distinct !DISubprogram(file: !1, isDefinition: true, isLocal: false, isOptimized: true, line: 29, linkageName: "_ZN6cudapy8__main__10kernel$241E5ArrayIiLi1E1A7mutable7alignedE", name: "_ZN6cudapy8__main__10kernel$241E5ArrayIiLi1E1A7mutable7alignedE", scope: !1, scopeLine: 29, type: !4, unit: !2)

instead of

!5 = distinct !DISubprogram(file: !1, isDefinition: true, isLocal: false, isOptimized: true, line: 29, linkageName: "_ZN6cudapy8__main__10kernel$241E5ArrayIiLi1E1A7mutable7alignedE", name: "kernel", scope: !1, scopeLine: 29, type: !4, unit: !2)

However, for device functions the name is correct, and these are also not un-mangled.

Fixing these alone doesn't seem to help (I've tried locally by applying):

diff --git a/numba/cuda/cudadrv/nvvm.py b/numba/cuda/cudadrv/nvvm.py
index c77ec456e..35ab0a260 100644
--- a/numba/cuda/cudadrv/nvvm.py
+++ b/numba/cuda/cudadrv/nvvm.py
@@ -776,6 +777,9 @@ def llvm100_to_70_ir(ir):
             attrs = ' '.join(a for a in attrs if a != 'willreturn')
             line = line.replace(m.group(1), attrs)
 
+        if '= distinct !DICompileUnit' in line:
+            line = line.replace('emissionKind: 1', 'emissionKind: FullDebug')
+
         buf.append(line)
 
     return '\n'.join(buf)
diff --git a/numba/cuda/target.py b/numba/cuda/target.py
index 090c58d34..82c21c794 100644
--- a/numba/cuda/target.py
+++ b/numba/cuda/target.py
@@ -145,12 +145,13 @@ class CUDATargetContext(BaseContext):
                                                 nvvm_options=nvvm_options,
                                                 max_registers=max_registers)
         library.add_linking_library(codelib)
-        wrapper = self.generate_kernel_wrapper(library, kernel_name, func_name,
+        wrapper = self.generate_kernel_wrapper(library, codelib.name,
+                                               kernel_name, func_name,
                                                argtypes, debug, filename,
                                                linenum)
         return library, wrapper
 
-    def generate_kernel_wrapper(self, library, kernel_name, func_name,
+    def generate_kernel_wrapper(self, library, py_name, kernel_name, func_name,
                                 argtypes, debug, filename, linenum):
         """
         Generate the kernel wrapper in the given ``library``.
@@ -172,7 +173,10 @@ class CUDATargetContext(BaseContext):
 
         if debug:
             debuginfo = self.DIBuilder(module=wrapper_module, filepath=filename)
-            debuginfo.mark_subprogram(wrapfn, kernel_name, linenum)
+            debuginfo.mark_subprogram(wrapfn, py_name, linenum)
             debuginfo.mark_location(builder, linenum)
 
         # Define error handling variables

Since it doesn't help I've not added it to this PR - I'd prefer only to add things that clearly make improvements - are there any other thoughts on what might be missing for mangling to work here?

gmarkall · 2021-03-19T12:19:47Z

(Popping this back to "In progress" because I just noticed it breaks with NVVM 3.4, and needs #5311 fixing for it to work)

Includes: - Calls `llvm_to_ptx` once for each IR module when compiling for debug. - Don't adjust linkage of functions in linked modules when debugging, because we need device functions to be externally visible. - Fixed setting of NVVM options when calling `compile_cuda` from kernel compilation and device function template compilation. - Removes `debug_pubnames` patch. Outcomes: - The "Error: Debugging support cannot be enabled when number of debug compile units is more than 1" message is no longer produced with NVVM 3.4. - NVVM 7.0: Everything still seems to "work" as much as it did before. Stepping may be more stable, but this needs a bit more verification. Fixes numba#5311.

gmarkall · 2021-03-19T13:07:17Z

I've included a fix for #5311 in this branch now - this resolves the long-standing "Error: Debugging support cannot be enabled when number of debug compile units is more than 1" message.

gmarkall · 2021-04-07T09:15:35Z

I think this is ready for review - looks like I somehow forgot to mark it as such.

…vements

stuartarchibald · 2021-05-25T11:15:58Z

Buildfarm ID: numba_smoketest_cuda_yaml_70. Just to see if there's any immediate issues with the patch.

stuartarchibald · 2021-05-25T11:54:59Z

Buildfarm ID: numba_smoketest_cuda_yaml_70. Just to see if there's any immediate issues with the patch.

Passed.

stuartarchibald · 2021-06-16T11:45:10Z

@gmarkall please could you resolve the conflicts? then I'll look to review this, thanks.

…vements

gmarkall · 2021-06-16T12:01:16Z

@stuartarchibald conflicts now resolved.

stuartarchibald · 2021-06-16T13:03:25Z

numba/cuda/target.py

+        filename:      The source filename that the function is contained in.
+        linenum:       The source line that the function is on.


What happens if someone materialized a function from a string?

filename is <string> and linenum is the line in the string on which the function began - I think this is normal / expected?

stuartarchibald

Thanks for the patch, I've taken a first look through it but would like to do some manual testing. There's a few minor things to look at and some questions/suggestions above. Thanks again!

numba/cuda/tests/cudapy/test_debuginfo.py

- Use `self.py_func.__code__` to get the filename and line number for the function for `prepare_cuda_kernel`, which seems a bit cleaner / neater that digging into the type annotation for it. - Fix typo and explicitly link to an issue comment.

gmarkall · 2021-06-21T10:34:01Z

@stuartarchibald Thanks for the review and suggestions! All comments should be addressed now.

esc · 2021-06-22T16:38:56Z

Build Farm ID: numba_smoketest_cuda_yaml_76

stuartarchibald

Thanks for the patch and fixes.

stuartarchibald · 2021-06-22T16:42:23Z

Note: I manually verified this patch:

$ cuda-gdb --args python add_one.py
<snip>
Reading symbols from python...
(cuda-gdb) set cuda break_on_launch application
(cuda-gdb) r
Starting program: bin/python add_one.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 7719]
[Detaching after fork from child process 7720]
[Detaching after fork from child process 7721]
[Detaching after fork from child process 7724]
[New Thread 0x7fffda69b700 (LWP 7731)]
[New Thread 0x7fffd9e9a700 (LWP 7732)]
[New Thread 0x7fffd9699700 (LWP 7733)]
numba/cuda/compiler.py:865: NumbaPerformanceWarning: Grid size (1) < 2 * SM count (26) will likely result in GPU under utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x0000555556b98c88 in _ZN6cudapy8__main__20increment_by_one$241E5ArrayIdLi1E1C7mutable7alignedE<<<(1,1,1),(1,1,1)>>> ()
    at add_one.py:4
4       @cuda.jit(debug=True, opt=0)
(cuda-gdb) l
1       import numpy as np
2       from numba import cuda
3
4       @cuda.jit(debug=True, opt=0)
5       def increment_by_one(an_array):
6           # Thread id in a 1D block
7           tx = cuda.threadIdx.x
8           # Block id in a 1D grid
9           ty = cuda.blockIdx.x
10          # Block width, i.e. number of threads per block
(cuda-gdb) s
0x0000555556b7f168 in _ZN8__main__20increment_by_one$241E5ArrayIdLi1E1C7mutable7alignedE () at add_one.py:4
4       @cuda.jit(debug=True, opt=0)
(cuda-gdb) s
7           tx = cuda.threadIdx.x
(cuda-gdb) s
13          pos = tx + ty * bw
(cuda-gdb) s
14          if pos < an_array.size:  # Check array boundaries

gmarkall added 2 - In Progress CUDA CUDA related issue/PR labels Mar 18, 2021

gmarkall changed the title ~~WIP: CUDA debug improvements~~ CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name Mar 18, 2021

gmarkall commented Mar 18, 2021

View reviewed changes

gmarkall added this to the Numba 0.54 RC milestone Mar 18, 2021

Add test that kernel wrapper has debug info

1f9d320

gmarkall marked this pull request as ready for review March 18, 2021 18:11

gmarkall requested review from esc, sklam and stuartarchibald as code owners March 18, 2021 18:11

gmarkall added 3 - Ready for Review Effort - medium Medium size effort needed and removed 2 - In Progress labels Mar 18, 2021

gmarkall added 2 - In Progress and removed 3 - Ready for Review labels Mar 19, 2021

gmarkall mentioned this pull request Mar 19, 2021

[cuda] lineinfo for device function #5311

Closed

gmarkall added 3 - Ready for Review and removed 2 - In Progress labels Apr 7, 2021

Merge remote-tracking branch 'numba/master' into grm-cuda-debug-impro…

772ed7f

…vements

stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Jun 16, 2021

Merge remote-tracking branch 'numba/master' into grm-cuda-debug-impro…

10d57b3

…vements

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jun 16, 2021

stuartarchibald reviewed Jun 16, 2021

View reviewed changes

numba/cuda/tests/cudapy/test_debuginfo.py Outdated Show resolved Hide resolved

numba/cuda/tests/cudapy/test_debuginfo.py Outdated Show resolved Hide resolved

stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jun 16, 2021

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jun 21, 2021

stuartarchibald approved these changes Jun 22, 2021

View reviewed changes

stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jun 22, 2021

stuartarchibald merged commit bb50fa8 into numba:master Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

gmarkall commented Mar 18, 2021 •

edited

gmarkall Mar 18, 2021

stuartarchibald Jun 16, 2021

gmarkall Mar 18, 2021

stuartarchibald Jun 16, 2021

gmarkall commented Mar 18, 2021

gmarkall commented Mar 19, 2021

gmarkall commented Mar 19, 2021

gmarkall commented Mar 19, 2021

gmarkall commented Apr 7, 2021

stuartarchibald commented May 25, 2021

stuartarchibald commented May 25, 2021

stuartarchibald commented Jun 16, 2021

gmarkall commented Jun 16, 2021

stuartarchibald Jun 16, 2021

gmarkall Jun 21, 2021

stuartarchibald left a comment

gmarkall commented Jun 21, 2021

esc commented Jun 22, 2021

stuartarchibald left a comment

stuartarchibald commented Jun 22, 2021

		filename: The source filename that the function is contained in.
		linenum: The source line that the function is on.

CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

CUDA: Fix source location on kernel entry and enable breakpoints to be set on kernels by mangled name #6841

Conversation

gmarkall commented Mar 18, 2021 • edited

gmarkall Mar 18, 2021

Choose a reason for hiding this comment

stuartarchibald Jun 16, 2021

Choose a reason for hiding this comment

gmarkall Mar 18, 2021

Choose a reason for hiding this comment

stuartarchibald Jun 16, 2021

Choose a reason for hiding this comment

gmarkall commented Mar 18, 2021

gmarkall commented Mar 19, 2021

gmarkall commented Mar 19, 2021

gmarkall commented Mar 19, 2021

gmarkall commented Apr 7, 2021

stuartarchibald commented May 25, 2021

stuartarchibald commented May 25, 2021

stuartarchibald commented Jun 16, 2021

gmarkall commented Jun 16, 2021

stuartarchibald Jun 16, 2021

Choose a reason for hiding this comment

gmarkall Jun 21, 2021

Choose a reason for hiding this comment

stuartarchibald left a comment

Choose a reason for hiding this comment

gmarkall commented Jun 21, 2021

esc commented Jun 22, 2021

stuartarchibald left a comment

Choose a reason for hiding this comment

stuartarchibald commented Jun 22, 2021

gmarkall commented Mar 18, 2021 •

edited