Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug break during seemingly innocuous Level Zero operations #639

Open
maleadt opened this issue Apr 21, 2023 · 2 comments
Open

Debug break during seemingly innocuous Level Zero operations #639

maleadt opened this issue Apr 21, 2023 · 2 comments

Comments

@maleadt
Copy link

maleadt commented Apr 21, 2023

While debugging a Julia/oneAPI.jl-related issue, I was using a debug build of the compute-runtime. Doing so however triggers a debug break with the following seemingly innocuous Level Zero operations:

julia> using oneAPI, .oneL0

julia> pool = ZeEventPool(context(), 1)
ZeEventPool(Ptr{oneAPI.oneL0._ze_event_pool_handle_t} @0x000000000321a480, ZeContext(Ptr{oneAPI.oneL0._ze_context_handle_t} @0x000000000629c060, ZeDriver(00000000-0000-0000-1757-e436010363f9)))

julia> event = pool[1]
ZeEvent(Ptr{oneAPI.oneL0._ze_event_handle_t} @0x0000000006b2b5f0, ZeEventPool(Ptr{oneAPI.oneL0._ze_event_pool_handle_t} @0x000000000321a480, ZeContext(Ptr{oneAPI.oneL0._ze_context_handle_t} @0x000000000629c060, ZeDriver(00000000-0000-0000-1757-e436010363f9))))

julia> signal(event)

julia> group = first(compute_groups(device()))
oneAPI.oneL0.ZeCommandQueueGroup(oneAPI.oneL0.ZeCommandQueueGroups(ZeDevice(GPU, vendor 0x8086, device 0x56a0)), 1)

julia> ZeCommandList(context(), device(), group.ordinal) do list
           append_signal!(list, event)
       end
Assert was called at 6379 line in file:
/workspace/srcdir/compute-runtime/shared/source/generated/xe_hpg_core/hw_cmds_generated_xe_hpg_core.inl
julia: /workspace/srcdir/compute-runtime/shared/source/helpers/debug_helpers.cpp:21: void NEO::debugBreak(int, const char*): Assertion `false' failed.

signal (6): Aborted
in expression starting at REPL[9]:1
unknown function (ip: 0x7f623e6c38ec)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f623e65e45b)
__assert_fail at /usr/lib/libc.so.6 (unknown line)
debugBreak at /workspace/srcdir/compute-runtime/shared/source/helpers/debug_helpers.cpp:21
setAddress at /workspace/srcdir/compute-runtime/shared/source/generated/xe_hpg_core/hw_cmds_generated_xe_hpg_core.inl:6379
programStoreDataImm at /workspace/srcdir/compute-runtime/shared/source/command_container/command_encoder_xehp_and_later.inl:774
programStoreDataImm at /workspace/srcdir/compute-runtime/shared/source/command_container/command_encoder.inl:1026
dispatchPostSyncCompute at /workspace/srcdir/compute-runtime/level_zero/core/source/cmdlist/cmdlist_hw.inl:2789
dispatchPostSyncCommands at /workspace/srcdir/compute-runtime/level_zero/core/source/cmdlist/cmdlist_hw.inl:2811
dispatchEventPostSyncOperation at /workspace/srcdir/compute-runtime/level_zero/core/source/cmdlist/cmdlist_hw.inl:2845
appendSignalEvent at /workspace/srcdir/compute-runtime/level_zero/core/source/cmdlist/cmdlist_hw.inl:1960
zeCommandListAppendSignalEvent at /workspace/srcdir/compute-runtime/level_zero/api/core/ze_event_api_entrypoints.h:61
macro expansion at /home/tim/Julia/pkg/oneAPI/lib/level-zero/libze.jl:1816 [inlined]
macro expansion at /home/tim/Julia/pkg/oneAPI/lib/level-zero/utils.jl:5 [inlined]
macro expansion at /home/tim/Julia/pkg/oneAPI/lib/level-zero/libze.jl:13 [inlined]
zeCommandListAppendSignalEvent at /home/tim/Julia/pkg/oneAPI/lib/utils/call.jl:24
append_signal! at /home/tim/Julia/pkg/oneAPI/lib/level-zero/event.jl:57 [inlined]

That's the following debug break:

What's the purpose of that debug break? Is my code doing anything wrong? I was hoping to have a CI job on oneAPI.jl run with a debug build of compute-runtime and IGC in order to spot any failed assertions (as these checks aren't present in release mode).

cc @pengtu

@HoppeMateusz
Copy link
Contributor

The assertion ( debug_break_if) is there to catch potential issues with truncating the address,

this debug break is not required here - we are using canonized addresses with upper bits set and this HW command truncates upper bits and ignores them.

we are adding DEBUG_BREAK_IF() for potential issues that need investigation - so you may see some of them fire when there is nothing wrong in the code.
Fatal error conditions are captured with UNRECOVERABLE_IF()

It is possible to disable DebugBreaks with EnableDebugBreak=0 debug variable.

@maleadt
Copy link
Author

maleadt commented Apr 21, 2023

It is possible to disable DebugBreaks with EnableDebugBreak=0 debug variable.

Perfect, that works, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants