Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8322743: C2: prevent lock region elimination in OSR compilation #17331

Closed
wants to merge 8 commits into from

Conversation

vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Jan 9, 2024

Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case.

        for (int i = 0; i < 2; ++i) {
            Object o = new Object();
            synchronized (o) { // monitorenter
                // Trigger OSR compilation
                for (int j = 0; j < 100_000; ++j) {

The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug.

The fix is to mark BoxLock node associated with OSR entry as Unbalanced to prevent EA from removing locks/unlocks from it. It is based on JDK-8324969 changes.

Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress.
Performance testing show no difference.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8322743: C2: prevent lock region elimination in OSR compilation (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17331/head:pull/17331
$ git checkout pull/17331

Update a local copy of the PR:
$ git checkout pull/17331
$ git pull https://git.openjdk.org/jdk.git pull/17331/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17331

View PR using the GUI difftool:
$ git pr show -t 17331

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17331.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 9, 2024

👋 Welcome back kvn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 9, 2024

@vnkozlov The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jan 9, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 9, 2024
@mlbridge
Copy link

mlbridge bot commented Jan 9, 2024

Webrevs

@dean-long
Copy link
Member

I'm wondering if there is a simpler solution. What if in Parse::load_interpreter_state we maark the lock objects from the interpreter as global escape?

@vnkozlov
Copy link
Contributor Author

I'm wondering if there is a simpler solution. What if in Parse::load_interpreter_state we maark the lock objects from the interpreter as global escape?

Thank you, Dean, for looking on changes.

You are correct, we can mark created BoxLock node in Parse::load_interpreter_state as having escaped object.

But in general case it could be only dead path where such object is referenced. Also it could be other cases where EA think that object escapes on one of paths.

I wanted to check graph only after some transformations which happens before EA and use EA analysis to find escaped objects.

@dean-long
Copy link
Member

I was thinking that the OSR situation is similar to this:

        for (int i = 0; i < 2; ++i) {
            Object o = osr ? static_volatile_field /* black hole, can't eliminate */ : new Object() /* can eliminate */;
            synchronized (o) { // monitorenter
                // Trigger OSR compilation
                for (int j = 0; j < 100_000; ++j) {

but maybe we can do better. If C2 can eliminate allocations/locks for non-escaping objects, and that works in one direction C2 --> interpreter (deopt), then the reverse direction, interpreter --> C2 (OSR) might also be made to work. In other words, I think we could eliminate the lock, even in the OSR case. We know from EA that the object coming from the interpreter does not escape, so if load_interpreter_state did the reverse of deopt, we would end up with a scalar-replaced object. Deopt does scalar-replaced object --> materialized, so OSR would need to do materialized --> scalar-replaced object. The fields of the scalar-replaced object would be populated from the fields of the interpreter object, but ignoring fields with a default (0) value. Assuming I'm right, and this could work, that doesn't mean it's worth doing. I'm just throwing this idea out mostly for completeness.

@dean-long
Copy link
Member

Nevermind, object fields from the interpreter could have any value, so my idea doesn't work.

@vnkozlov
Copy link
Contributor Author

"We know from EA that the object coming from the interpreter does not escape" - we don't know what happens in Interpreter to this object. There is no information where this object is coming from (no method and no bci info). We only know that we have monitor at slot 0 which uses this object. Yes, we can do bytecode analysis to determine that but it is a lot more code.

There could be other, more complicated, ways to remove locks for this case. I was thinking about splitting unlock(obj) through Phi node to keep separate unlock for object coming from Interpreter. Unfortunately it is not enough. We need also to keep separate synchronization blocks defined by BoxLock node. Otherwise we still eliminate all locks/unlocks during locks elimination macro.cpp#L1946.

Note, we can't eliminate only part of locks/unlocks associated with one synchronization block. Otherwise we can't guarantee that we have balanced locks and unlocks (we had bugs about it). So we either eliminate or keep all of them.

I think my fix is conservative solution for this issue.

@iwanowww
Copy link
Contributor

I think my fix is conservative solution for this issue.

It's still not clear to me why conservatively marking all objects coming from interpreter as globally escaped wouldn't work (what Dean initially proposed).

My reading of your response is that it may be way too conservative:

But in general case it could be only dead path where such object is referenced.

Is it your main concern?

@vnkozlov
Copy link
Contributor Author

I think my fix is conservative solution for this issue.

It's still not clear to me why conservatively marking all objects coming from interpreter as globally escaped wouldn't work (what Dean initially proposed).

It would work only for this OSR case.

My reading of your response is that it may be way too conservative:

But in general case it could be only dead path where such object is referenced.

Is it your main concern?

First, I am concern that marking synchronization region as has_escaped_object during parsing when we load OSR state could be premature and later we can still eliminate locks if we don't do that. That was my comment about dead path.

Second, marking during OSR load could be not enough. We may get an escaped locked object not only in such case. And not checking all objects in EA will miss it. Which may be not true and I am paranoid.

I think my fix cover all cases.

@vnkozlov
Copy link
Contributor Author

@dean-long, @iwanowww do you have other questions? Can I get reviewed status ;^) ?

@vnkozlov
Copy link
Contributor Author

Thank you, @TobiHartmann, for review. I addressed your comments.

@iwanowww
Copy link
Contributor

iwanowww commented Jan 18, 2024

@vnkozlov sorry, I still have a hard time reasoning about the correctness of the proposed fix.

It's not clear to me what "synchronized block does not have any associated escaped objects" means in practice and how it relates to the original problem. When does the situation with a single BoxLock shared between multiple AbstractLocks bug distinct obj_node() inputs occur? Does it only happen for matched Lock/Unlock node pairs?

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to study EA from the ground up to really review this.

Comment on lines 2882 to 2888
/*
* The lock/unlock is unnecessary if we are locking a non-escaped object,
* unless synchronized block (defined by BoxLock node) has other escaped objects
* (for example, locked object come from Interpreter in OSR compilation).
*
* Return true if lock/unlock can be eliminated.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
* The lock/unlock is unnecessary if we are locking a non-escaped object,
* unless synchronized block (defined by BoxLock node) has other escaped objects
* (for example, locked object come from Interpreter in OSR compilation).
*
* Return true if lock/unlock can be eliminated.
*/
// The lock/unlock is unnecessary if we are locking a non-escaped object,
// unless synchronized block (defined by BoxLock node) has other escaped objects
// (for example, locked object come from Interpreter in OSR compilation).
//
// Return true if lock/unlock can be eliminated.

This would be the first use in this file of multi-line comment 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I did some rudimentary study of EA. And now this PR makes much more sense ;)

Let me summarize my understanding of the issue:
An object gets allocated in interpreter, and we lock on it in the interpreter.
OSR is triggered, the object is passed in as OSR parameter, we hold the lock.
The OSR control flow now looks like this:

StartOSR:
  LoadP -> load the object created in interpreter
  we have not_global_escape(LoadP) == false
  so this is correctly marked as escaping
  now the osr path injects into the middle of the loop

Loop:
  Phi -> merge interpreter obj and that from this compiled code
  we have not_global_escape(Phi) == false
  ...
  Unlock(Phi)
  ...
  check some condition, maybe return
  ...
  obj = CheckCastPP( Allocate(i.e. new Object()) )
  we have not_global_escape(obj) == true
  this is correct, the object will never escape
  Lock(obj)
  ...
  goto Loop

So if I understand this correctly, the marking in/with the ConnectionGrap is correct:

  • The object passed in through OSR is marked as escaping.
  • The object created locally is marked as non-escaping.
  • The loop-phi that merges the two must therefore also be possibly escaping.

The question is then with the condition of Lock removal:
Can we remove the lock, just because its object is marked as non-escaping?
At first glance: obviously, because nobody else could ever have the object, and so nobody can ever lock/unlock it.

In the example, if we look at the Unlock node, we cannot remove it (at least at first):
its object is possibly escaping, because the Phi is not marked non-escaping.
But we can remove the Lock, since its object is non-escaping.
This is where the trouble starts.

I think it is exactly for this reason, that @vnkozlov thinks one cannot just look at the object of the individual Lock/Unlock node, but one has to look at all Lock/Unlock nodes of a BoxLock, and see if all objects are non-escaping.

@vnkozlov please correct me if I got something wrong ;)

I was trying to see what the meaning of the BoxLockNode is, but I did not find any useful documentation. Can you help me out here? Your patch assumes that all "relevant" Lock/Unlock nodes share the same BoxLockNode. Why is that the case?

@@ -2001,7 +2001,7 @@ Node *LockNode::Ideal(PhaseGVN *phase, bool can_reshape) {
// If we are locking an non-escaped object, the lock/unlock is unnecessary
//
ConnectionGraph *cgr = phase->C->congraph();
if (cgr != nullptr && cgr->not_global_escape(obj_node())) {
if (cgr != nullptr && cgr->can_eliminate_lock(this)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you make this change, then you probably would also want to rename NonEscObj and set_non_esc_obj and is_non_esc_obj, right? Now it is not just about being non-escaped, but the more complex semantics of can_eliminate_lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@vnkozlov
Copy link
Contributor Author

Thank you, @eme64 for review and for "diving" into the issue to understand it. Your conclusion is correct.

First, when not-escaped object merged by Phi node with escaped one we only mark such object "Not Scalar Replaceable" NSR:

JavaObject(5) NoEscape(NoEscape) NSR [ [ 155 160 215 213 101 99 ]]   143  Allocate 

We can not eliminate it but we can still do some optimizations for it, like CMP nodes optimization and Locks elimination.
Unfortunately in this case it share Unlock node with escaped object so we can't eliminate Unlock and related Lock.
It is bug that we eliminated Unlock based only on knowledge that Lock can be eliminated.

There is "balanced monitors" rule: on any code path number of executed Locks and Unlocks for locked object should match. Even when an object is "local" and no other threads can see it as in this case. You either eliminate all, keep all or prove that you can eliminate some but keep them balance (as we do for Lock Coarsening).
This bug breaks this rule.

@vnkozlov
Copy link
Contributor Author

As I commented in bug reported, technically we can split Unlock node through Phi to separate them and try to eliminate ones related not-escaped new object. But it will not help in this case.

There are 2 modes in C2 how we handle locks/unlocks.

Before JDK-7125896 we used BoxLockNode only to indicate stack slot where we store object's header (MarkWord) for heavy monitors HotSpot/Synchronization.

In that mode the same BoxLockNode can be used by not interfering synchronization regions even for different objects:

  synchronize(obj1) {}
  synchronize(obj2) {}

The only matter stack slot it points locknode.cpp#L51

EA supports this mode and C2 looks on each locks/unlocks which reference only one object and creates new separate BoxLockNode (synchronization region) for them when it eliminates locks macro.cpp#L1974

JDK-7125896 and sequential fixes introduced new mode to simplify handling locks and to allow "easy" implement elimination of some nested locks which lock the same object. This is default mode (EliminateNestedLocks == true) since JDK 8 (and 7u4). In this mode we don't merge BoxLockNode nodes - each synchronization region will have separate BoxLockNode - one per locked object. This assumes that we will see only on object if we trace all Lock/Unlock nodes which reference one BoxLockNode.

An other assumption is that if we have merge point during parsing (for example, diamond shape code inside synchronized region) we can use BoxLockNode for the same stack slot from already processed path: parse1.cpp#L1800. It was additional fix JDK-7128355 after nested locks elimination implementation.

Based on that (all Lock/Unlock nodes which reference one BoxLockNode locks only one and the same object) in this mode it was assumed that we can eliminate all locks and unlocks if we find at least one which we can eliminate in one synchronized region (one BoxLockNode) macro.cpp#L1946

OSR compilation in this bug case breaks these assumptions. During parsing we merged synchronized region (one BoxLockNode) with different locked object (from Interpreter). As result the assumption that we can eliminate all locks/unlocks for one region based only on one lock is incorrect.

It may be possible do something when we parse merge point but I think it is hard. What if this merge point is not at the start but somewhere later?

For me it was much easier to catch such case early during escape analysis where information about all objects is available.

@dean-long
Copy link
Member

I was thinking about splitting unlock(obj) through Phi node to keep separate unlock for object coming from Interpreter

It may be possible do something when we parse merge point but I think it is hard. What if this merge point is not at the start but somewhere later?

I think it is true that OSR nmethods only have the OSR entry point. There is no normal entry point. So if we did a special kind of loop unrolling, so that the OSR entry came first, we would end up with something like this, assuming OSR entry happens on the first iteration with i == 0. The merge point/phi goes away completely, I believe.
i = 0;
// Trigger OSR compilation
[ OSR entry ]
[...]
[montorexit on iterpreter object, with no preceding monitorenter!]
i = 1;
Object o = new Object(); // Never escapes
synchronized (o) { // This monitorenter can be eliminated
for (int j = 0; j < 100_000; ++j) {

In general we don't know which iteration will trigger OSR. So unrolled code would look like:
int i = OSR_start;
[...]
for (i = OSR_start + 1; i < 2; ++i) {

I wouldn't be surprised if generating the Unlock without the Lock breaks some assumptions elsewhere.
I'm not suggesting something like this for this PR -- just thinking it seems possible conceptually.

@vnkozlov
Copy link
Contributor Author

vnkozlov commented Jan 20, 2024

I agree, this is very interesting suggestion (for separate RFE) which may allow us to avoid inverted (and irreducible) loops and not just current locking issue.

@vnkozlov
Copy link
Contributor Author

I am still working on it. I have to address Emanuel's suggestions (renaming is not trivial). And also take into account executions when EliminateNestedLocks flag switched off. As I explained, several synchronized region and different objects can be referenced by one BoxLockNode in such case.

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 23, 2024

@vnkozlov This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@vnkozlov
Copy link
Contributor Author

New simple fix implementation based on JDK-8324969 changes:

  • I marked locking region (BoxLock node) coming from OSR entry as Unbalanced and propagate the state when merging regions (for case when EliminateNestedLocks is on, JDK-8324969 changes do that for the flag off case)
  • moved test to compiler/locks
  • tested tier1-7, xcomp, stress

@vnkozlov vnkozlov requested a review from eme64 February 28, 2024 21:53
@vnkozlov
Copy link
Contributor Author

@eme64, @iwanowww and @dean-long please look.

@vnkozlov vnkozlov changed the title 8322743: assert(held_monitor_count() == jni_monitor_count()) failed 8322743: C2: prevent elimination OSR locking region Feb 28, 2024
Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this simpler version.

@openjdk
Copy link

openjdk bot commented Feb 28, 2024

@vnkozlov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8322743: C2: prevent lock region elimination in OSR compilation

Reviewed-by: epeter, dlong, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 13 new commits pushed to the master branch:

  • b8fc418: 8326525: com/sun/tools/attach/BasicTests.java does not verify AgentLoadException case
  • d9aa1de: 8318605: Enable parallelism in vmTestbase/nsk/stress/stack tests
  • bbfda65: 8326897: (fs) The utility TestUtil.supportsLinks is wrongly used to check for hard link support
  • db0e2b8: 8326944: (ch) Minor typo in the ScatteringByteChannel.read(ByteBuffer[] dsts,int offset,int length) javadoc
  • 8f6edd8: 8326975: Parallel: Remove redundant PSOldGen::is_allocated
  • 4302900: 8319673: Few security tests ignore VM flags
  • e772e78: 8326948: Force English locale for timeout formatting
  • d9ef16d: 8326140: src/jdk.accessibility/windows/native/libjavaaccessbridge/AccessBridgeJavaEntryPoints.cpp ReleaseStringChars might be missing in early returns
  • 998d0ba: 8324799: Use correct extension for C++ test headers
  • 0735c8a: 8318302: ThreadCountLimit.java failed with "Native memory allocation (mprotect) failed to protect 16384 bytes for memory to guard stack pages"
  • ... and 3 more: https://git.openjdk.org/jdk/compare/b938a5c9edd53821a52b43a8e342b76adb341a3f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 28, 2024
Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@eme64
Copy link
Contributor

eme64 commented Feb 29, 2024

@vnkozlov title improvement suggestion:
8322743: C2: prevent elimination OSR locking region
-> 8322743: C2: prevent lock region elimination in OSR compilation

@vnkozlov vnkozlov changed the title 8322743: C2: prevent elimination OSR locking region 8322743: C2: prevent lock region elimination in OSR compilation Feb 29, 2024
@vnkozlov
Copy link
Contributor Author

@vnkozlov title improvement suggestion: 8322743: C2: prevent elimination OSR locking region -> 8322743: C2: prevent lock region elimination in OSR compilation

Done

@vnkozlov
Copy link
Contributor Author

I added @run main TestLocksInOSR to regression test and tested hs-tier1-3,xcomp,stress which passed.

@vnkozlov
Copy link
Contributor Author

Thank you Dean, Vladimir and Emanuel for reviews

@vnkozlov
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Feb 29, 2024

Going to push as commit 742c776.
Since your change was applied there have been 14 commits pushed to the master branch:

  • d29cefb: 8326838: JFR: Native mirror events
  • b8fc418: 8326525: com/sun/tools/attach/BasicTests.java does not verify AgentLoadException case
  • d9aa1de: 8318605: Enable parallelism in vmTestbase/nsk/stress/stack tests
  • bbfda65: 8326897: (fs) The utility TestUtil.supportsLinks is wrongly used to check for hard link support
  • db0e2b8: 8326944: (ch) Minor typo in the ScatteringByteChannel.read(ByteBuffer[] dsts,int offset,int length) javadoc
  • 8f6edd8: 8326975: Parallel: Remove redundant PSOldGen::is_allocated
  • 4302900: 8319673: Few security tests ignore VM flags
  • e772e78: 8326948: Force English locale for timeout formatting
  • d9ef16d: 8326140: src/jdk.accessibility/windows/native/libjavaaccessbridge/AccessBridgeJavaEntryPoints.cpp ReleaseStringChars might be missing in early returns
  • 998d0ba: 8324799: Use correct extension for C++ test headers
  • ... and 4 more: https://git.openjdk.org/jdk/compare/b938a5c9edd53821a52b43a8e342b76adb341a3f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 29, 2024
@openjdk openjdk bot closed this Feb 29, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 29, 2024
@openjdk
Copy link

openjdk bot commented Feb 29, 2024

@vnkozlov Pushed as commit 742c776.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@vnkozlov vnkozlov deleted the 8322743 branch February 29, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants