Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug GPU assembly of level zero kernels #108

Closed
airMeng opened this issue Feb 16, 2023 · 5 comments
Closed

Debug GPU assembly of level zero kernels #108

airMeng opened this issue Feb 16, 2023 · 5 comments

Comments

@airMeng
Copy link

airMeng commented Feb 16, 2023

Hi, I learned from here about how to debug sycl application even per assembly lines based on gdb-oneapi. I wonder there is any way to debug level zero kernels similarly.

The following pictures shows I tried to stop at where level zero execute kernels but I can't step in or get any thread information.
image

BTW, I found gdb-oneapi say only the Intel® oneAPI Level Zero (Level Zero) backend is supported for debug so I think debugging assembly in level zero is possible.

@bmyates
Copy link
Contributor

bmyates commented Feb 17, 2023

Hi, stepping into ZeCommandQueueExecuteCommandLists will go into ze_loader and L0 driver implementation of that function. This will not step directly into the kernel.

What format is the module input you are using? SPIRV or native? You will need to set a breakpoint inside kernel before execution to debug it

@airMeng
Copy link
Author

airMeng commented Feb 20, 2023

I am generating ze_module_handle_t directly via nGen

@HoppeMateusz
Copy link

Hello,

it is possible to debug level zero kernels. For that setup environment has to be setup similarly to what is done for SYCL application debugging (https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-debugging-dpcpp-linux/top.html), especially:

export ZET_ENABLE_PROGRAM_DEBUGGING=1
export IGC_EnableGTLocationDebugging=1

Then, start application under gdb.
Set breakpoint in HOST before zeModuleCreate() call ( gdb-oneapi -ex "b example.cpp:LINE" )
When HOST hits the breakpoint - set stop on loading libraries:

(gdb) set stop-on-solib-event 1

Continue execution on HOST, when module is created - message should be printed:

Stopped due to shared library event:
Inferior loaded in-memory-0x5555569e35c0-0x5555569e7508

Now, dump memory :

(gdb) dump memory module.elf 0x5555569e35c0  0x5555569e7508

Read elf to find out entry point address:

readelf -a module.elf

Elf file should list entries in symbol table, something similar to:

Symbol table '.symtab' contains 12 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 00008000fff40000 3760 FUNC LOCAL DEFAULT 1 mykernel
2: 00008000fff400c0 3568 FUNC LOCAL DEFAULT 1 _entry

The address to set BP from above is 0x8000fff400c0:

(gdb) b *0x8000fff400c0
Breakpoint 6 at 0x8000fff400c0: file main.cl, line 17.

continue

Now debugger should stop in the kernel

Thread 5.1 hit Breakpoint 6, with SIMD lanes [0-15], 0x00008000fff400c0 in mykernel ( ....

When stopped on GPU thread - it is possible to disassemble binary and single step.

Regards,
Mateusz

@airMeng
Copy link
Author

airMeng commented Feb 22, 2023

seems our address might be wrong

There are no relocations in this file.

The decoding of unwind sections for machine type Intel Graphics Technology is not currently supported.

Symbol table '.symtab' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: ffff8000fffa0000   104 FUNC    LOCAL  DEFAULT    1 _
     2: ffff8000fffa0020    72 FUNC    LOCAL  DEFAULT    1 _entry

set BP at ffff8000fffa0020

Cannot insert breakpoint 2.
Cannot access memory at address 0xffff8000fffa0020

set BP at 0x8000fffa0020

Cannot insert breakpoint 2.
Cannot access memory at address 0x8000fffa0020

@HoppeMateusz @bmyates any advices?

@HoppeMateusz
Copy link

@airMeng - do you see gdb event like this one:

Stopped due to shared library event:
Inferior loaded in-memory-0x5555569e35c0-0x5555569e7508

it is only possible to set BP after zeModuleCreate() creates and loads module binary to GPU. Have you tried breaking just before zeCommandQueueExecuteCommandLists() and setting BP in the GPU module at that point ?

@airMeng airMeng closed this as completed Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants