-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8322743: C2: prevent lock region elimination in OSR compilation #17331
Conversation
👋 Welcome back kvn! A progress list of the required criteria for merging this PR into |
Webrevs
|
I'm wondering if there is a simpler solution. What if in |
Thank you, Dean, for looking on changes. You are correct, we can mark created But in general case it could be only dead path where such object is referenced. Also it could be other cases where EA think that object escapes on one of paths. I wanted to check graph only after some transformations which happens before EA and use EA analysis to find escaped objects. |
I was thinking that the OSR situation is similar to this:
but maybe we can do better. If C2 can eliminate allocations/locks for non-escaping objects, and that works in one direction C2 --> interpreter (deopt), then the reverse direction, interpreter --> C2 (OSR) might also be made to work. In other words, I think we could eliminate the lock, even in the OSR case. We know from EA that the object coming from the interpreter does not escape, so if load_interpreter_state did the reverse of deopt, we would end up with a scalar-replaced object. Deopt does scalar-replaced object --> materialized, so OSR would need to do materialized --> scalar-replaced object. The fields of the scalar-replaced object would be populated from the fields of the interpreter object, but ignoring fields with a default (0) value. Assuming I'm right, and this could work, that doesn't mean it's worth doing. I'm just throwing this idea out mostly for completeness. |
Nevermind, object fields from the interpreter could have any value, so my idea doesn't work. |
"We know from EA that the object coming from the interpreter does not escape" - we don't know what happens in Interpreter to this object. There is no information where this object is coming from (no method and no bci info). We only know that we have monitor at slot 0 which uses this object. Yes, we can do bytecode analysis to determine that but it is a lot more code. There could be other, more complicated, ways to remove locks for this case. I was thinking about splitting Note, we can't eliminate only part of locks/unlocks associated with one synchronization block. Otherwise we can't guarantee that we have balanced locks and unlocks (we had bugs about it). So we either eliminate or keep all of them. I think my fix is conservative solution for this issue. |
It's still not clear to me why conservatively marking all objects coming from interpreter as globally escaped wouldn't work (what Dean initially proposed). My reading of your response is that it may be way too conservative:
Is it your main concern? |
It would work only for this OSR case.
First, I am concern that marking synchronization region as Second, marking during OSR load could be not enough. We may get an escaped locked object not only in such case. And not checking all objects in EA will miss it. Which may be not true and I am paranoid. I think my fix cover all cases. |
@dean-long, @iwanowww do you have other questions? Can I get reviewed status ;^) ? |
Thank you, @TobiHartmann, for review. I addressed your comments. |
@vnkozlov sorry, I still have a hard time reasoning about the correctness of the proposed fix. It's not clear to me what "synchronized block does not have any associated escaped objects" means in practice and how it relates to the original problem. When does the situation with a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to study EA from the ground up to really review this.
src/hotspot/share/opto/escape.cpp
Outdated
/* | ||
* The lock/unlock is unnecessary if we are locking a non-escaped object, | ||
* unless synchronized block (defined by BoxLock node) has other escaped objects | ||
* (for example, locked object come from Interpreter in OSR compilation). | ||
* | ||
* Return true if lock/unlock can be eliminated. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* | |
* The lock/unlock is unnecessary if we are locking a non-escaped object, | |
* unless synchronized block (defined by BoxLock node) has other escaped objects | |
* (for example, locked object come from Interpreter in OSR compilation). | |
* | |
* Return true if lock/unlock can be eliminated. | |
*/ | |
// The lock/unlock is unnecessary if we are locking a non-escaped object, | |
// unless synchronized block (defined by BoxLock node) has other escaped objects | |
// (for example, locked object come from Interpreter in OSR compilation). | |
// | |
// Return true if lock/unlock can be eliminated. |
This would be the first use in this file of multi-line comment 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I did some rudimentary study of EA. And now this PR makes much more sense ;)
Let me summarize my understanding of the issue:
An object gets allocated in interpreter, and we lock on it in the interpreter.
OSR is triggered, the object is passed in as OSR parameter, we hold the lock.
The OSR control flow now looks like this:
StartOSR:
LoadP -> load the object created in interpreter
we have not_global_escape(LoadP) == false
so this is correctly marked as escaping
now the osr path injects into the middle of the loop
Loop:
Phi -> merge interpreter obj and that from this compiled code
we have not_global_escape(Phi) == false
...
Unlock(Phi)
...
check some condition, maybe return
...
obj = CheckCastPP( Allocate(i.e. new Object()) )
we have not_global_escape(obj) == true
this is correct, the object will never escape
Lock(obj)
...
goto Loop
So if I understand this correctly, the marking in/with the ConnectionGrap is correct:
- The object passed in through OSR is marked as escaping.
- The object created locally is marked as non-escaping.
- The loop-phi that merges the two must therefore also be possibly escaping.
The question is then with the condition of Lock removal:
Can we remove the lock, just because its object is marked as non-escaping?
At first glance: obviously, because nobody else could ever have the object, and so nobody can ever lock/unlock it.
In the example, if we look at the Unlock node, we cannot remove it (at least at first):
its object is possibly escaping, because the Phi is not marked non-escaping.
But we can remove the Lock, since its object is non-escaping.
This is where the trouble starts.
I think it is exactly for this reason, that @vnkozlov thinks one cannot just look at the object of the individual Lock/Unlock node, but one has to look at all Lock/Unlock nodes of a BoxLock, and see if all objects are non-escaping.
@vnkozlov please correct me if I got something wrong ;)
I was trying to see what the meaning of the BoxLockNode is, but I did not find any useful documentation. Can you help me out here? Your patch assumes that all "relevant" Lock/Unlock nodes share the same BoxLockNode. Why is that the case?
src/hotspot/share/opto/callnode.cpp
Outdated
@@ -2001,7 +2001,7 @@ Node *LockNode::Ideal(PhaseGVN *phase, bool can_reshape) { | |||
// If we are locking an non-escaped object, the lock/unlock is unnecessary | |||
// | |||
ConnectionGraph *cgr = phase->C->congraph(); | |||
if (cgr != nullptr && cgr->not_global_escape(obj_node())) { | |||
if (cgr != nullptr && cgr->can_eliminate_lock(this)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if you make this change, then you probably would also want to rename NonEscObj
and set_non_esc_obj
and is_non_esc_obj
, right? Now it is not just about being non-escaped, but the more complex semantics of can_eliminate_lock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
Thank you, @eme64 for review and for "diving" into the issue to understand it. Your conclusion is correct. First, when not-escaped object merged by Phi node with escaped one we only mark such object "Not Scalar Replaceable"
We can not eliminate it but we can still do some optimizations for it, like CMP nodes optimization and Locks elimination. There is "balanced monitors" rule: on any code path number of executed Locks and Unlocks for locked object should match. Even when an object is "local" and no other threads can see it as in this case. You either eliminate all, keep all or prove that you can eliminate some but keep them balance (as we do for |
As I commented in bug reported, technically we can split Unlock node through Phi to separate them and try to eliminate ones related not-escaped new object. But it will not help in this case. There are 2 modes in C2 how we handle locks/unlocks. Before JDK-7125896 we used In that mode the same
The only matter stack slot it points locknode.cpp#L51 EA supports this mode and C2 looks on each locks/unlocks which reference only one object and creates new separate JDK-7125896 and sequential fixes introduced new mode to simplify handling locks and to allow "easy" implement elimination of some nested locks which lock the same object. This is default mode ( An other assumption is that if we have merge point during parsing (for example, diamond shape code inside synchronized region) we can use Based on that (all OSR compilation in this bug case breaks these assumptions. During parsing we merged synchronized region (one It may be possible do something when we parse merge point but I think it is hard. What if this merge point is not at the start but somewhere later? For me it was much easier to catch such case early during escape analysis where information about all objects is available. |
I think it is true that OSR nmethods only have the OSR entry point. There is no normal entry point. So if we did a special kind of loop unrolling, so that the OSR entry came first, we would end up with something like this, assuming OSR entry happens on the first iteration with i == 0. The merge point/phi goes away completely, I believe. In general we don't know which iteration will trigger OSR. So unrolled code would look like: I wouldn't be surprised if generating the Unlock without the Lock breaks some assumptions elsewhere. |
I agree, this is very interesting suggestion (for separate RFE) which may allow us to avoid inverted (and irreducible) loops and not just current locking issue. |
I am still working on it. I have to address Emanuel's suggestions (renaming is not trivial). And also take into account executions when |
@vnkozlov This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
New simple fix implementation based on JDK-8324969 changes:
|
@eme64, @iwanowww and @dean-long please look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this simpler version.
@vnkozlov This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 13 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@vnkozlov title improvement suggestion: |
Done |
I added |
Thank you Dean, Vladimir and Emanuel for reviews |
/integrate |
Going to push as commit 742c776.
Your commit was automatically rebased without conflicts. |
Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case.
The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug.
The fix is to mark BoxLock node associated with OSR entry as Unbalanced to prevent EA from removing locks/unlocks from it. It is based on JDK-8324969 changes.
Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress.
Performance testing show no difference.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17331/head:pull/17331
$ git checkout pull/17331
Update a local copy of the PR:
$ git checkout pull/17331
$ git pull https://git.openjdk.org/jdk.git pull/17331/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17331
View PR using the GUI difftool:
$ git pr show -t 17331
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17331.diff
Webrev
Link to Webrev Comment