Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libomptarget] [OMPT] Fixed return address computation for OMPT events. #80498

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

dhruvachak
Copy link
Contributor

Currently, __builtin_return_address is used to generate the return address when the callback invoker is created. However, this may result in the return address pointing to an internal runtime function. This is not what a tool would typically want. A tool would want to know the corresponding user code from where the runtime entry point is invoked.

This change adds a thread local variable that is assigned the return address at the OpenMP runtime entry points. An RAII is used to manage the modifications to the thread local variable. Whenever the return address is required for OMPT events, it is read from the thread local variable.

Currently, __builtin_return_address is used to generate the return
address when the callback invoker is created. However, this may result
in the return address pointing to an internal runtime function. This is
not what a tool would typically want. A tool would want to know the
corresponding user code from where the runtime entry point is invoked.

This change adds a thread local variable that is assigned the return
address at the OpenMP runtime entry points. An RAII is used to manage
the modifications to the thread local variable. Whenever the return address
is required for OMPT events, it is read from the thread local variable.
@llvmbot llvmbot added the openmp:libomptarget OpenMP offload runtime label Feb 2, 2024
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, we seem to have a lot of this "For each library entry point" logic. It may be good to unify this somehow.

In the future I'm going to simplify the timescope stuff as well by just making a thread local pointer to the ident_t so we don't need to care about where it's called from. So it's possible we could just put this in a single macro. Though that might be impossible considering that OMPT_SUPPORT needs to be checked.

@@ -10,15 +10,21 @@
//
//===----------------------------------------------------------------------===//

#include "OpenMP/OMPT/Interface.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we should bother adding this to the legacy API stuff because these are just kept around for backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we should bother adding this to the legacy API stuff because these are just kept around for backwards compatibility.

Well, downstream still uses it. Since the legacy API is still around upstream, I decided to make the changes here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we use it for downstream? All of the legacy API functions should have newer versions that clang has emitted for around two years now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we use it for downstream? All of the legacy API functions should have newer versions that clang has emitted for around two years now.

legacy flang.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense.

Copy link
Contributor

@jplehr jplehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing that. I think the changes LG.

Copy link
Collaborator

@jprotze jprotze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, just one inline comment.

If I remember libomptarget right, there is no case where we could observe reentrance, right? This is different from libomp where we have reentrance for example with nested tasking.

EXTERN void *llvm_omp_target_dynamic_shared_alloc() { return nullptr; }
EXTERN void *llvm_omp_get_dynamic_shared() { return nullptr; }
EXTERN void *llvm_omp_target_dynamic_shared_alloc() {
OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifetime ends with the next statement does it make sense here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifetime ends with the next statement does it make sense here?

I agree but it's mostly to future-proof the entry point. If you definitely want me to remove it, let me know. I can then add a comment and remove it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. Makes sense now.

@dhruvachak
Copy link
Contributor Author

Lgtm, just one inline comment.

If I remember libomptarget right, there is no case where we could observe reentrance, right? This is different from libomp where we have reentrance for example with nested tasking.

LegacyAPI.cpp has quite a few reentrant cases. And I believe the OMP APIs have some of that too.

@jprotze
Copy link
Collaborator

jprotze commented Feb 7, 2024

When I tried to roll out the raai approach to push and pop the return address, I realized that I actually need to push them to a stack rather than a single value. Otherwise I wouldn't have the appropriate return address on the way back.

The other complication for raai comes from splitter runtime calls, where the same region calls into the runtime at the beginning and at the end of the region. This might occur for target if(0). Not sure. Task if(0) has such code pattern.

Copy link
Collaborator

@jprotze jprotze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@dhruvachak
Copy link
Contributor Author

When I tried to roll out the raai approach to push and pop the return address, I realized that I actually need to push them to a stack rather than a single value. Otherwise I wouldn't have the appropriate return address on the way back.

The other complication for raai comes from splitter runtime calls, where the same region calls into the runtime at the beginning and at the end of the region. This might occur for target if(0). Not sure. Task if(0) has such code pattern.

Thanks @jprotze for sharing the other use cases.

@dhruvachak dhruvachak merged commit 0d7f232 into llvm:main Feb 8, 2024
5 checks passed
@dhruvachak dhruvachak deleted the dc_fix_ompt_ra_3 branch February 8, 2024 01:29
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 8, 2024
Lands and reverts locally :
0d7f232 [libomptarget] [OMPT] Fixed return address computation for OMPT events. (llvm#80498)

Change-Id: I7a31d130f76d9e467bf7c6aab087ba7685329717
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 9, 2024
…s. (llvm#80498)

Currently, __builtin_return_address is used to generate the return
address when the callback invoker is created. However, this may result
in the return address pointing to an internal runtime function. This is
not what a tool would typically want. A tool would want to know the
corresponding user code from where the runtime entry point is invoked.

This change adds a thread local variable that is assigned the return
address at the OpenMP runtime entry points. An RAII is used to manage
the modifications to the thread local variable. Whenever the return
address is required for OMPT events, it is read from the thread local
variable.

Adapted to work on amd-staging.

Change-Id: I3c85bdf48f81edbc421b9030f26f344ed5b963b2
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 3, 2024
…s. (llvm#80498)

Currently, __builtin_return_address is used to generate the return
address when the callback invoker is created. However, this may result
in the return address pointing to an internal runtime function. This is
not what a tool would typically want. A tool would want to know the
corresponding user code from where the runtime entry point is invoked.

This change adds a thread local variable that is assigned the return
address at the OpenMP runtime entry points. An RAII is used to manage
the modifications to the thread local variable. Whenever the return
address is required for OMPT events, it is read from the thread local
variable.

Adapted to work on amd-staging.

Change-Id: I3c85bdf48f81edbc421b9030f26f344ed5b963b2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants