-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8255243: Reinforce escape barrier interactions with ZGC conc stack processing #832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into |
@reinrich Since you wrote the escape barrier, I thought I'd ping you in case you are interested. This makes the mechanism more robust and reliable, w.r.t. concurrent stack processing. No issue observed, but looks wrong with the new vector API deoptimization code that was integrated recently. This makes it more future-proof and easy to reason about. |
Webrevs
|
Forgot to mention, but I tested this from tier1-5, to sanity check that the new solution doesn't introduce any new issues. |
@fisk thanks for pinging. I remember reading about a new vector API a while ago and noticed that it potentially interferes with escape barriers but then I lost track. I'll look at this. |
Thanks for adding the tier testing info. Sounds like there is no need for new |
Exactly; confirmed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really glad you caught that one! And I like the abstraction provided by KeepStackGCProcessedMark.
There is one execution path you missed coming from VM_GetOrSetLocal::deoptimize_objects(javaVFrame* jvf)
. This code should probably be moved into EscapeBarrier. EscapeBarrier::deoptimize_objects(int depth)
could be changed to be more generic EscapeBarrier::deoptimize_objects(int from_depth, int to_depth)
. VM_GetOrSetLocal::doit_prologue() could call eb.deoptimize_objects(_depth, _depth) then. That would be better but maybe not yet really good...
Update: I see now that there is also a stackwalk in VM_GetOrSetLocal::doit_prologue()
which needs to be taken care of with regard to concurrent stack processing. I'd like to try to refactor this. Will propose a patch.
Thanks again, Richard.
@@ -89,7 +89,7 @@ void SafepointMechanism::process(JavaThread *thread) { | |||
// 1) After we exit from block after a global poll | |||
// 2) After a thread races with the disarming of the global poll and transitions from native/blocked | |||
// 3) Before the handshake code is run | |||
StackWatermarkSet::start_processing(thread, StackWatermarkKind::gc); | |||
StackWatermarkSet::on_safepoint(thread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_processing in the comment above should be renamed too.
return; | ||
} | ||
StackWatermark* their_watermark = StackWatermarkSet::get(jt, StackWatermarkKind::gc); | ||
our_watermark->link_watermark(their_watermark); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert our_watermark->_linked_watermark == NULL to avoid unintentional nesting?
Hi Erik, the last commit in https://github.com/reinrich/jdk/commits/pr-832-with-better-encapsulation would be the refactoring I would like to do. It removes the code not compliant with concurrent thread stack processing from VM_GetOrSetLocal::doit_prologue(). Instead EscapeBarrier::deoptimize_objects(int d1, int d2) is called. You added already a KeepStackGCProcessedMark to that method and I changed it to accept a range [d1, d2] of frames do the object deoptimization for. I'm not sure how to handle this from a process point of view. Can the refactoring be done within this change? Should a new item or subtask be created for it. I'd be glad if you could give an advice on that. Thanks, Richard. |
If you are okay with it, I can add your refactorings into this change, and add you as a co-author of the change. Sounds good? Thanks, |
@fisk Unknown command |
It does sound good indeed to me if you don't mind doing that. Thanks! |
…ilitate correct interaction with concurrent thread stack processing. The Stackwalk for object deoptimization in VM_GetOrSetLocal::doit_prologue is not prepared for concurrent thread stack processing. EscapeBarrier::deoptimize_objects(int depth) is extended to cover a range of frames from depth d1 to depth d2. It is also prepared for concurrent thread stack processing. With this change it is used to deoptimize objects in the prologue of VM_GetOrSetLocal.
Thanks @reinrich. I uploaded your patch to this PR, and will add you as contributor. Also addressed your review comments. Hope your testing went fine. |
Thanks for importing the patch and for addressing my comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making EscapeBarriers more robust with regard to concurrent thread stack processing.
The change looks good to me.
/contributor add @reinrich |
@fisk |
@fisk Unknown command |
@fisk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Erik and Richard,
Changes in the serviceability files looks fine.
Thanks,
Serguei
@fisk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 92 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks for the review Serguei! |
/integrate |
@fisk Since your change was applied there have been 107 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 5b18558. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The escape barrier reallocates scalarized objects potentially deep into the stack of a remote thread. Each allocation can safepoint, causing referenced frames to be invalid. Some sprinklings were added that deal with that, but I believe it was subsequently broken with the integration of the new vector API, that has its own new deoptimization code that did not know about this. Not surprisingly, the integration of the new vector API had no idea about this subtlety, and allocates an object, and then reads an object deep from the stack of a remote thread (using an escape barrier). I suppose the issue is that all these 3 things were integrated at almost the same time. The problematic code sequence is in VectorSupport::allocate_vector() in vectorSupport.cpp, which is called from Deoptimization::realloc_objects(). It first allocates an oop (possibly safepointing), and then reads a vector oop from the stack. This is usually fine, but not through the escape barrier, with concurrent stack scanning. While I have not seen any crashes yet, I can see from code inspection, that there is no way that this works correctly.
In order to make this less fragile for future changes, we should really have a RAII object that keeps the target thread's stack of the escape barrier, stable and processed, across safepoints. This patch fixes that. Then it becomes much easier to reason about its correctness, compared to hoping the various hooks are applied after each safepoint.
With this new robustness fix, the thread running the escape barrier, keeps the target thread stack processed, straight through safepoints on the requesting thread, making it easy and intuitive to understand why this works correctly. The RAII object basically just has to cover the code block that pokes at the remote stack and goes in and out of safepoints, arbitrarily. Arguably, this escape barrier doesn't need to be blazingly fast, and can afford keeping stacks sane through its operation.
Progress
Testing
Failed test task
Issue
Reviewers
Contributors
<rrich@openjdk.org>
Download
$ git fetch https://git.openjdk.java.net/jdk pull/832/head:pull/832
$ git checkout pull/832