-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libomptarget] [OMPT] Fixed return address computation for OMPT events. #80498
Conversation
Currently, __builtin_return_address is used to generate the return address when the callback invoker is created. However, this may result in the return address pointing to an internal runtime function. This is not what a tool would typically want. A tool would want to know the corresponding user code from where the runtime entry point is invoked. This change adds a thread local variable that is assigned the return address at the OpenMP runtime entry points. An RAII is used to manage the modifications to the thread local variable. Whenever the return address is required for OMPT events, it is read from the thread local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, we seem to have a lot of this "For each library entry point" logic. It may be good to unify this somehow.
In the future I'm going to simplify the timescope stuff as well by just making a thread local pointer to the ident_t
so we don't need to care about where it's called from. So it's possible we could just put this in a single macro. Though that might be impossible considering that OMPT_SUPPORT
needs to be checked.
@@ -10,15 +10,21 @@ | |||
// | |||
//===----------------------------------------------------------------------===// | |||
|
|||
#include "OpenMP/OMPT/Interface.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we should bother adding this to the legacy API stuff because these are just kept around for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we should bother adding this to the legacy API stuff because these are just kept around for backwards compatibility.
Well, downstream still uses it. Since the legacy API is still around upstream, I decided to make the changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we use it for downstream? All of the legacy API functions should have newer versions that clang
has emitted for around two years now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we use it for downstream? All of the legacy API functions should have newer versions that
clang
has emitted for around two years now.
legacy flang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing that. I think the changes LG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, just one inline comment.
If I remember libomptarget right, there is no case where we could observe reentrance, right? This is different from libomp where we have reentrance for example with nested tasking.
EXTERN void *llvm_omp_target_dynamic_shared_alloc() { return nullptr; } | ||
EXTERN void *llvm_omp_get_dynamic_shared() { return nullptr; } | ||
EXTERN void *llvm_omp_target_dynamic_shared_alloc() { | ||
OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lifetime ends with the next statement does it make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lifetime ends with the next statement does it make sense here?
I agree but it's mostly to future-proof the entry point. If you definitely want me to remove it, let me know. I can then add a comment and remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Makes sense now.
LegacyAPI.cpp has quite a few reentrant cases. And I believe the OMP APIs have some of that too. |
When I tried to roll out the raai approach to push and pop the return address, I realized that I actually need to push them to a stack rather than a single value. Otherwise I wouldn't have the appropriate return address on the way back. The other complication for raai comes from splitter runtime calls, where the same region calls into the runtime at the beginning and at the end of the region. This might occur for target if(0). Not sure. Task if(0) has such code pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Thanks @jprotze for sharing the other use cases. |
Lands and reverts locally : 0d7f232 [libomptarget] [OMPT] Fixed return address computation for OMPT events. (llvm#80498) Change-Id: I7a31d130f76d9e467bf7c6aab087ba7685329717
…s. (llvm#80498) Currently, __builtin_return_address is used to generate the return address when the callback invoker is created. However, this may result in the return address pointing to an internal runtime function. This is not what a tool would typically want. A tool would want to know the corresponding user code from where the runtime entry point is invoked. This change adds a thread local variable that is assigned the return address at the OpenMP runtime entry points. An RAII is used to manage the modifications to the thread local variable. Whenever the return address is required for OMPT events, it is read from the thread local variable. Adapted to work on amd-staging. Change-Id: I3c85bdf48f81edbc421b9030f26f344ed5b963b2
…s. (llvm#80498) Currently, __builtin_return_address is used to generate the return address when the callback invoker is created. However, this may result in the return address pointing to an internal runtime function. This is not what a tool would typically want. A tool would want to know the corresponding user code from where the runtime entry point is invoked. This change adds a thread local variable that is assigned the return address at the OpenMP runtime entry points. An RAII is used to manage the modifications to the thread local variable. Whenever the return address is required for OMPT events, it is read from the thread local variable. Adapted to work on amd-staging. Change-Id: I3c85bdf48f81edbc421b9030f26f344ed5b963b2
Currently, __builtin_return_address is used to generate the return address when the callback invoker is created. However, this may result in the return address pointing to an internal runtime function. This is not what a tool would typically want. A tool would want to know the corresponding user code from where the runtime entry point is invoked.
This change adds a thread local variable that is assigned the return address at the OpenMP runtime entry points. An RAII is used to manage the modifications to the thread local variable. Whenever the return address is required for OMPT events, it is read from the thread local variable.