-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OMPT][Trunk] Offloading to x86_64
misses some OMPT target callbacks
#64487
Comments
Just as an addition: Leaving out |
@llvm/issue-subscribers-openmp |
@mhalk Any suggestions? |
Just took a quick look, I guess we have to add OMPT support for the But I'll have to check back with the spec & @dhruvachak how to go about this. |
Poked at this, and it works ... From my perspective there are two ways to deal with the situation, either we try to omit all the callbacks generated in Let's say we would go with "added support": Output for
Output for
So, in the sources, I intentionally used devices 3 and 11 for target regions. Other than that, the generic CPU will always report @Thyre Please, let me know what you think based on these outputs. |
Thanks a lot for the update 😄 I would agree that the second option, adding support for Judging by your output for both Getting |
I checked how the device numbers are delivered on a system with an AMD + NVIDIA GPU. We can see the same behavior: $ clang-18 -fopenmp -fopenmp-targets=nvptx64,amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a veccopy.c
$ ./a.out
Num devices = 3
Device 0
Callback Init: device_num=0 type=sm_80 device=0x556d3f652a90 lookup=0x7efd6a8027b0 doc=(nil)
Callback Load: device_num:0 filename:(null) host_adddr:0x556d3ed12778 device_addr:(nil) bytes:715888
[...]
Device 1
Callback Init: device_num=0 type=gfx90a device=0x556d402203d0 lookup=0x7efd6a8027b0 doc=(nil)
Callback Load: device_num:0 filename:(null) host_adddr:0x556d3edc1478 device_addr:(nil) bytes:25064
[...]
Device 2
Callback Init: device_num=1 type=gfx90a device=0x556d40229210 lookup=0x7efd6a8027b0 doc=(nil)
Callback Load: device_num:1 filename:(null) host_adddr:0x556d3edc1478 device_addr:(nil) bytes:25064
[...]
Success
Callback Fini: device_num=0
Callback Fini: device_num=0
Callback Fini: device_num=1 |
Thanks for checking this! So, I guess this behavior is okay for now? (See my next comments.)
tl;dr: Agreed. Like that idea, albeit I'd have to see if this is possible. Maybe I'm not aware of some information but ATM I'm not very confident this can be (reasonably) solved on my end. When the callbacks are dispatched, we only have the information from the corresponding RTL.
As you stated this can be alleviated by implementing Generally, this is (at this stage) a very small change w.r.t. the LoC, but there definitely needs to be some discussion beforehand. |
I would say that the behavior is okay for now. We should probably track this in a separate issue since it is not directly related to the offloading to host.
I expected something like this. For now, I would say that it is not a huge deal for most users since offloading to two different architectures isn't that common and
Sure! I guess assigning a different name is also not necessarily in the scope of this issue. This can be done separately 🙂 |
@Thyre FYI Just brought this up in the OpenMP meeting and I'll prepare two patches for / related to this issue. |
That's great to hear, thanks 😄 |
FYI The two patches are prepped for tomorrow: |
As there were no objections during the meeting and the patches already got accepted, I will polish them a tiny bit and land them. Additionally, I wanted to check that we do not link OMPT all the time into every affected plugin, even when OMPT support was disabled. |
First tiny patch has landed -- so the plugin reports a reasonable CU kind / device |
Thanks a lot for fixing the issue this quick. It will certainly help when continuing to implement both support for the teams directive and offloading in Score-P. |
Description
LLVM Trunk has added first support for OMPT callbacks for target directives. During testing, I noticed that callbacks for offloading to host are dispatched as well. While that's fine, the callbacks
ompt_callback_device_initialize
,ompt_callback_device_load
,ompt_callback_device_unload
andompt_callback_device_finalize
are not dispatched.The callback
ompt_callback_device_unload
is not implemented as far as I know, but I would expect the others to show up.Missing
ompt_callback_device_initialize
is against the OpenMP specifications, which state [Link]:NVHPC 23.7 also dispatches
ompt_callback_target
withoutompt_callback_device_initialize
. However, in their case one can identify the host execution by checking thedevice_num
against the returned value ofompt_get_num_devices
which is either negative, or aboveompt_get_num_devices
. In the case of LLVM, offloading to host seems to initialize four devices, which are handled just like offloading to GPUs. Therefore, we get normal device numbers. We can verify this by runningllvm-omp-device-info
Reproducer
I used one of the aomp smoke tests veccopy-ompt-target-emi to verify this issue.
When compiling and running the code with the offload target
nvptx64
, the following output can be seen:Replacing
nvptx64
withx86_64
, one can see the follwing:Notice, that the callbacks are missing from the output.
Side question:
I noticed that
omp_get_num_devices()
returns a number of four for offloading tox86_64
. What's the reasoning behind that?llvm-omp-device-info
also shows four devices with the typeGenerif-elf-64bit
on a system with Ubuntu 22.04 LTS, Intel Core i7-1260P.The text was updated successfully, but these errors were encountered: