Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OMPT] Target callbacks use wrong device number due to late initialization #64738

Closed
Thyre opened this issue Aug 16, 2023 · 4 comments
Closed
Assignees
Labels

Comments

@Thyre
Copy link

Thyre commented Aug 16, 2023

Note: The issue was initially discussed in the ROCm-Developer-Tools/aomp repository. You can find the issue here.

There's also a Phabricator review by @mhalk already: https://reviews.llvm.org/D157605

Description

In the recently upstreamed implementation of the OpenMP target callbacks of the OMPT interface, it was discovered that the device numbers might not get set in some cases. This seems to affect the ompt_callback_target and ompt_callback_target_data_op callbacks in particular.

This is bad for tool developers, since we require the correct device number to know which regions were executed on which device.

At the same time, the order of the callbacks seems to be messed up as well. We do see the first target event before the device gets initialized.

Reproducer

One can use the following test to reproduce the issue:

#include <omp.h>
#include <stdio.h>
#include "callbacks.h"

int main( void )
{
    int M[10];
#pragma omp target enter data map(to: M[:10]) 
#pragma omp target 
    {
#pragma omp teams distribute parallel for simd
        for(int i = 0; i < 10; ++i)
        {
            M[i] = i;
        }
    }
#pragma omp target exit data map(from: M[:10])
    return 0;
}

I've used the callback interface from one of the aomp tests to get the callback information. It can be found here.
Running the tool, we see the following results:

$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project.git 5816d2ab287ab9d2e1624852946973ed43a0e3f2)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/software/software/LLVM/git/bin
$ wget https://raw.githubusercontent.com/ROCm-Developer-Tools/aomp/aomp-dev/test/smoke/veccopy-ompt-target-emi/callbacks.h
$ clang -fopenmp -fopenmp-targets=nvptx64 reproducer.c
$ ./a.out
Callback Target EMI: kind=2 endpoint=1 device_num=-1 task_data=0x55ff20a4aa00 (0x0) target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000001) code=0x55ff1f5497f1
Callback Init: device_num=0 type=sm_75 device=0x55ff20a8d120 lookup=0x7fd23d8730d0 doc=(nil)
Callback Load: device_num:0 filename:(null) host_adddr:0x55ff1f54a668 device_addr:(nil) bytes:613024
  Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000001) host_op_id=0x7fd23d6287c0 (0x8000000000000002) src=0x7ffc7782efa0 src_device_num=1 dest=(nil) dest_device_num=0 bytes=40 code=0x7fd23d77e393
  Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000001) host_op_id=0x7fd23d6287c0 (0x8000000000000002) src=0x7ffc7782efa0 src_device_num=1 dest=0x7fd206600000 dest_device_num=0 bytes=40 code=0x7fd23d77e393
  Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000001) host_op_id=0x7fd23d6287c0 (0x8000000000000003) src=0x7ffc7782efa0 src_device_num=1 dest=0x7fd206600000 dest_device_num=0 bytes=40 code=0x7fd23d77e30e
  Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000001) host_op_id=0x7fd23d6287c0 (0x8000000000000003) src=0x7ffc7782efa0 src_device_num=1 dest=0x7fd206600000 dest_device_num=0 bytes=40 code=0x7fd23d77e30e
Callback Target EMI: kind=2 endpoint=2 device_num=-1 task_data=0x55ff20a4aa00 (0x0) target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000001) code=0x55ff1f5497f1
Callback Target EMI: kind=1 endpoint=1 device_num=0 task_data=0x55ff20a4aa00 (0x0) target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000004) code=0x55ff1f5498d6
  Callback Submit EMI: endpoint=1  req_num_teams=0 target_data=0x7fd23d6287a8 (0x8000000000000004) host_op_id=0x7fd23d6287a0 (0x0)
  Callback Submit EMI: endpoint=2  req_num_teams=0 target_data=0x7fd23d6287a8 (0x8000000000000004) host_op_id=0x7fd23d6287a0 (0x0)
Callback Target EMI: kind=1 endpoint=2 device_num=0 task_data=0x55ff20a4aa00 (0x0) target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000004) code=0x55ff1f5498d6
Callback Target EMI: kind=3 endpoint=1 device_num=-1 task_data=0x55ff20a4aa00 (0x0) target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000005) code=0x55ff1f549956
  Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000005) host_op_id=0x7fd23d6287c0 (0x8000000000000006) src=0x7fd206600000 src_device_num=0 dest=0x7ffc7782efa0 dest_device_num=1 bytes=40 code=0x7fd23d787d7f
  Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000005) host_op_id=0x7fd23d6287c0 (0x8000000000000006) src=0x7fd206600000 src_device_num=0 dest=0x7ffc7782efa0 dest_device_num=1 bytes=40 code=0x7fd23d787d7f
  Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000005) host_op_id=0x7fd23d6287c0 (0x8000000000000007) src=0x7fd206600000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fd23d77f75a
  Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000005) host_op_id=0x7fd23d6287c0 (0x8000000000000007) src=0x7fd206600000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fd23d77f75a
Callback Target EMI: kind=3 endpoint=2 device_num=-1 task_data=0x55ff20a4aa00 (0x0) target_task_data=0x55ff20a74158 (0x0) target_data=0x7fd23d6287a8 (0x8000000000000005) code=0x55ff1f549956
Callback Fini: device_num=0
@mhalk mhalk self-assigned this Aug 16, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 16, 2023

@llvm/issue-subscribers-openmp

@mhalk
Copy link
Contributor

mhalk commented Aug 16, 2023

@Thyre Thanks for putting this up.
BTW if you find the time, could you please confirm that the segfault issue we discussed earlier is also resolved by the patch?

I just checked that I did not introduce this with my previous fix today; that problem was already there.
Hence, I will change the test to use EMI callbacks, so we additionally check for this behavior in the future.

@Thyre
Copy link
Author

Thyre commented Aug 16, 2023

Your patch should also resolve this issue. I was able to reproduce the fix on my machine 👍

@mhalk
Copy link
Contributor

mhalk commented Aug 16, 2023

Great, thanks for your time and effort!

@mhalk mhalk closed this as completed in 57f0bdc Aug 22, 2023
llvmbot pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 22, 2023
…evice num

This patch fixes: llvm/llvm-project#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605

(cherry picked from commit 57f0bdc8fb1e66d4ed9cfb57f1ef699eefd99646)
tru pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 25, 2023
…evice num

This patch fixes: llvm/llvm-project#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605

(cherry picked from commit 57f0bdc8fb1e66d4ed9cfb57f1ef699eefd99646)
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
…evice num

This patch fixes: llvm#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
…evice num

This patch fixes: llvm#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
…evice num

This patch fixes: llvm#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
…evice num

This patch fixes: llvm#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
…evice num

This patch fixes: llvm#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 6, 2023
…evice num

This patch fixes: llvm#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 11, 2023
…evice num

This patch fixes: llvm#64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants