-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8324969: C2: prevent elimination of unbalanced coarsened locking regions #17697
Conversation
👋 Welcome back kvn! A progress list of the required criteria for merging this PR into |
Webrevs
|
Doesn't this problem also go away if lock coarsening turns the two regions into a single region? |
src/hotspot/share/opto/locknode.cpp
Outdated
// Unbalanced locking region. | ||
// Can happen when locks coarsening optimization eliminated | ||
// pair of Unlock/Lock nodes from adjacent locking regions. | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the assumption here that each region has its own BoxLockNode, i.e. we have at most one lock and at most one unlock per box? Or can regions share a box?
If it is shared: could we have an imbalance over multiple regions that share the BoxLock, but overall we have at least one lock and one unlock, and so we pass this test?
Do I understand this right: returning false here just means that we don't do the optimization of removing the lock/unlock from the region? I think it is in PhaseMacroExpand::mark_eliminated_box
that we would eliminate the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the assumption here that each region has its own BoxLockNode, i.e. we have at most one lock and at most one unlock per box? Or can regions share a box? If it is shared: could we have an imbalance over multiple regions that share the BoxLock, but overall we have at least one lock and one unlock, and so we pass this test?
It is controlled by EliminateNestedLocks
flag. By default the flag is ON and each synchronized region has its own BoxLockNode associated with it. The region can have one Lock node (before C2 transforms the graph) and multiple Unlock nodes (for each exit path). That is why I can't check matching numbers. I think it is possible to construct bytecode (jasm) with multiple Lock nodes for one region but in practice I never saw such case. W can have multiple Lock/Unlock nodes per region after C2 graph transformations (loop unroll/peelling) as in this case.
With EliminateNestedLocks
flag switched off we don't do nested locks elimination and regions can share the same BoxLockNode node after Parse phase. We do only locks coarsening optimization and eliminate locks for not escaped objects.
Do I understand this right: returning false here just means that we don't do the optimization of removing the lock/unlock from the region? I think it is in
PhaseMacroExpand::mark_eliminated_box
that we would eliminate the box.
Actually the code which marks the box for nested locks is in mark_eliminated_locking_nodes()
under EliminateNestedLocks
flag. is_nested_lock_region()
calls is_simple_lock_region()
and will skip elimination for this case.
But yes, it also affects code in mark_eliminated_box
which is called for locks with not-escaped objects. So we will not not eliminate those if we see not-balanced region (combined or not combined).
test/hotspot/jtreg/compiler/escapeAnalysis/TestNestedLocksElimination.java
Outdated
Show resolved
Hide resolved
If that works that would probably be a more elegant solution. Could there be an invariant that a box always has a lock and an unlock? |
Yes, this is good suggestion. Let me investigate how to implement it. It will allow to eliminate all lock/unlock nodes in nested combined region. The only concern for me is that I have to backport the fix into all releases and if it is too complex I would prefer to go with current "band-aid" fix and have separate RFE to implement this suggestion. |
After looking on current code I think this will introduce more issues. In current state we have assumption that with But I share @eme64 concern that current fix may not work if coarsening eliminates an Unlock node on one of paths but left Unlock nodes on other paths. The check will pass but we will have unbalanced lock/unlock on the first path. I think I should just disable nested and not-escaped object lock eliminations if a coarsening elimination happens before it. |
@eme64 and @dean-long I pushed new implementation. I used existing list of coarsened Lock/Unlock nodes pairs to flag synchronized regions (identified by BoxLockNode nodes) as unbalanced to skip any further locks elimination in these regions. Please look. |
I did renaming |
You should probably update the PR description now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks better now :)
Though it would be good if someone who has a deeper understanding of the locking code would have a look as well.
src/hotspot/share/opto/compile.cpp
Outdated
// Mark locking regions (identified by BoxLockNode) as coarsened | ||
// if locks coarsening optimization removed lock/unlock from them. | ||
// Such regions become unbalanced and we can't execute other | ||
// locks elimination optimization on them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment suggests you mark boxes as "coarsened" if ...
But really you mark them as "unbalanced", right? Because you have the condition that the locks have to already be "coarsened".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated comment
src/hotspot/share/opto/compile.cpp
Outdated
if (size > 0) { | ||
AbstractLockNode* alock = locks_list->at(0)->as_AbstractLock(); | ||
BoxLockNode* box = alock->box_node()->as_BoxLock(); | ||
if (alock->is_coarsened() && !box->is_unbalanced()) { // Not marked already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the position 0 box
is already marked as unbalanced, but there are other this_box
for positions j=1..
that are not yet marked but should be?
Is it really ok that we are noch cheching all-with-all but only first-with-all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I removed !box->is_unbalanced()
check.
@eme64 I updated changes based on your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@vnkozlov This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 5 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this end up marking even the simplest lock coarsening cases as unbalanced?
Is there a reason the above check can't be done as part of coarsened_locks_consistent()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this end up marking even the simplest lock coarsening cases as unbalanced?
Yes, the simplest lock coarsening makes locking regions as unbalanced.
See second code example in PR's description. Coarsening eliminates Unlock in regions 1 and Lock in region 2. Both regions become unbalanced.
Is there a reason the above check can't be done as part of coarsened_locks_consistent()?
Unbalanced
in coarsened_locks_consistent() is different from Unbalanced
in mark_coarsened_boxes()
Or do you mean to recompile without lock coarsening when regions are unbalanced? But that will disable all coarsening, even valid one (for example, when no outer locking in my examples).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the above check can't be done as part of coarsened_locks_consistent()?
Unbalanced
in coarsened_locks_consistent() is different fromUnbalanced
inmark_coarsened_boxes()
Or do you mean to recompile without lock coarsening when regions are unbalanced? But that will disable all coarsening, even valid one (for example, when no outer locking in my examples).
If we are simply making sure that all nodes in a group have the same BoxLock, then that check could be done earlier, in coarsened_locks_consistent(), or even as the nodes are added to the list. But I guess it's possible to get a premature answer is nodes are removed from the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to mix these two methods. They do different things:
coarsened_locks_consistent()
checks that graph is correct otherwise bailout compilationmark_coarsened_boxes()
marks BoxLock nodes to prevent further locks optimizations
It could be done but it will complicate code in coarsened_locks_consistent()
and you would still need 2 iterations:
- first, you need to check that all
alock
onlocks_list
are marked ascoarsened
and also set flag if they reference different Boxes - only then you do second round over them to mark all referenced Box as unbalanced
And if we bailout from compilation Boxes marking will be useless work.
If lock coarsening just marked the Lock node as coarsened, but did not remove it, then Nested locks elimination would still find the Lock node and work correctly, right? Can't we just remove coarsened locks at a later time, after locks elimination? |
Running with test from JDK-8322743 found that I missed |
I have old RFE for that JDK-8268571:
The issue here is that we don't know when we done with EA and nested locks optimizations.
My fix simple prevent further locks optimizations if coarsening happened. |
Yes, we can do nested locks elimination before loop unrolling which will help in this case but may not in others. |
Thank you, @eme64 and @iwanowww for reviews. |
My concern is that a group of unbalanced locks is added to coarsened_locks (they have different BoxLockNodes), but later some of those locks are removed by It is easy to check for unbalanced in |
Good suggestion. I replace |
I think it is premature and too conservative. If a locking region (related lock/unlock nodes with other BoxLockNode) is removed because it is on dead path it can affect analysis of the rest of graph. I think it is normal. Yes, group could become balanced if nodes which are left points to the same BoxLockNode. I am not sure why it is the issue - the part of graph which made it unbalanced is gone before we check consistency, BoxLock marking and locks elimination. That is why I am doing BoxLock nodes marking after verification of @iwanowww suggested that we can additionally verify in debug VM before eliminating coarsened Lock/Unlock nodes that information in |
Looking on cases for locks coarsening callnode.cpp#L1667 I don't see how it can become balanced. In all cases we have separate synchronization regions ( I can do one more experiment. Do BoxLock marking in |
@dean-long, I added verification code as I described in previous comment. Please look. |
Verification code looks correct. |
Thank you, Dean |
src/hotspot/share/opto/compile.cpp
Outdated
} | ||
} | ||
assert(box->is_unbalanced() == box->is_marked_unbalanced(),"inconsistency"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you even move this a scope out, i.e. a line down, so this check is even run with alock->is_coarsened() == false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. It will verify that we did not miss is_unbalanced()
check when do EA or nested elimination to avoid them.
I will also move the assert at line 4984 under (box != this_box)
to avoid duplication of check when box
is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving assert down does not work because Nested lock elimination and EA can overwrite Coarsened status for BoxLock before we run mark_unbalanced_boxes()
. Such BoxLock will not be marked as Unbalanced and assert will fail.
I would like to keep current assert in place but add an other assert when we change status of BoxLock. I am working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sounds good :)
New changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the _kind
states, with the state transitions. Makes it a bit clearer :)
Would it make sense to add the _kind
string to the node dump?
That would possibly allow for IR matching with regex, on a few examples.
And that would make me feel a bit safer about test coverage ;)
// corresponding pair of Lock/Unlock nodes - they are balanced. | ||
void Compile::mark_unbalanced_boxes() { | ||
int count = coarsened_count(); | ||
for (int i = 0; i < count; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity: Do you often move out the limit from the loop?
I guess this could help with performance. But coarsened_count
is const, so I guess it would not make a change?
Another question: could you make mark_unbalanced_boxes
const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity: Do you often move out the limit from the loop?
Not always. Mostly when I have several usages or it is not constant. Here it is indirect load: compile.hpp#L726
could you make mark_unbalanced_boxes const?
Yes, I can.
if (EliminateNestedLocks) { | ||
// We can mark whole locking region as Local only when only | ||
// one object is used for locking. | ||
box->set_local(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. The name of the method suggests that there are no side-effects. Not sure what would be a better alternative though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the box be nested
at this point? If so, the assert inside set_local
could trigger, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the box be nested at this point? If so, the assert inside set_local could trigger, right?
No, it can't be nested.
Check for nesting and its elimination is done in one place from which it does not escape macro.cpp#L2047
(Nested state is set only when is_nested_lock_region()
returns true in the check).
And it will be overwritten immediately by box_node->set_eliminated()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also BoxLock can't be Local when we check for nesting because of (!alock->is_non_esc_obj())
check above.
} | ||
return old_box; | ||
} | ||
return this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For EliminateNestedLocks
we now never have a hash, are only equal with this
, and don't common nodes in Ideal
either. Is that intended?
Are there any IR tests that verify that we common nodes in the cases where we expect it to common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For EliminateNestedLocks we now never have a hash, are only equal with this, and don't common nodes in Ideal either. Is that intended?
It was since JDK 8. I only corrected style there: added {}
. And replaced _is_eliminated
with accessor call.
It is intended for nested locks elimination - to have only one Object per locking region: JDK-7125896@locknode.cpp
Are there any IR tests that verify that we common nodes in the cases where we expect it to common?
These optimizations were implemented before we had IR tests. We have some regression tests. During my experiments I added _kind
to hash()
and a lot of tests in Tier1 (not just compiler) failed when ran with -XX:-EliminateNestedLocks
:
# Internal Error (/workspace/open/src/hotspot/share/opto/parse1.cpp:2030), pid=71294, tid=28931
# assert(!nocreate) failed: Cannot build a phi for a block already parsed.
So we have this verification that we common these nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eme64 Thank you for your comments. I hope I answered all your questions.
The only thing I may need to do is to add const
to mark_unbalanced_boxes()
.
// corresponding pair of Lock/Unlock nodes - they are balanced. | ||
void Compile::mark_unbalanced_boxes() { | ||
int count = coarsened_count(); | ||
for (int i = 0; i < count; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity: Do you often move out the limit from the loop?
Not always. Mostly when I have several usages or it is not constant. Here it is indirect load: compile.hpp#L726
could you make mark_unbalanced_boxes const?
Yes, I can.
if (EliminateNestedLocks) { | ||
// We can mark whole locking region as Local only when only | ||
// one object is used for locking. | ||
box->set_local(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the box be nested at this point? If so, the assert inside set_local could trigger, right?
No, it can't be nested.
Check for nesting and its elimination is done in one place from which it does not escape macro.cpp#L2047
(Nested state is set only when is_nested_lock_region()
returns true in the check).
And it will be overwritten immediately by box_node->set_eliminated()
.
} | ||
return old_box; | ||
} | ||
return this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For EliminateNestedLocks we now never have a hash, are only equal with this, and don't common nodes in Ideal either. Is that intended?
It was since JDK 8. I only corrected style there: added {}
. And replaced _is_eliminated
with accessor call.
It is intended for nested locks elimination - to have only one Object per locking region: JDK-7125896@locknode.cpp
Are there any IR tests that verify that we common nodes in the cases where we expect it to common?
These optimizations were implemented before we had IR tests. We have some regression tests. During my experiments I added _kind
to hash()
and a lot of tests in Tier1 (not just compiler) failed when ran with -XX:-EliminateNestedLocks
:
# Internal Error (/workspace/open/src/hotspot/share/opto/parse1.cpp:2030), pid=71294, tid=28931
# assert(!nocreate) failed: Cannot build a phi for a block already parsed.
So we have this verification that we common these nodes.
if (EliminateNestedLocks) { | ||
// We can mark whole locking region as Local only when only | ||
// one object is used for locking. | ||
box->set_local(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also BoxLock can't be Local when we check for nesting because of (!alock->is_non_esc_obj())
check above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for the explanations :)
Thank you, @eme64, for review. I will merge the latest JDK sources and do additional testing with it before integration. |
New tier1-5 testing passed. |
/integrate |
Going to push as commit b938a5c.
Your commit was automatically rebased without conflicts. |
The issue is that when we do nested locks elimination we don't check that synchronized region has both
Lock
andUnlock
nodes.Which can happen if locks coarsening optimization eliminated pair of Unlock/Lock nodes from adjacent locking regions before we check for nested locks.
Consider this code (all locks/unlocks use the same object):
If
lock3
directly followsunlock2
(no branches or safepoints) locks coarsening optimization will remove them:Nested locks elimination code checks only
Lock
node in one region to find if it is nested (inside other lock region which use the same object) and then eliminate it. So we end up with not eliminatedUnlock
node in second nested region.Why we don't hit this issue before? Normally nested locks elimination executed first and only then we do locks coarsening elimination. In the example above we eliminate all nested
Lock
andUnlock
nodes, leaving only outerLock
andUnlock
.The additional factors which leads to the failure is fully unrolled loop around nested sync regions and some allocation to trigger Escape Analysis:
Before executing Escape Analysis we do loops optimization to simplify graph: compile.cpp#L2332
We also allow to fully unroll short loops (LoopOptsMaxUnroll) to remove merges from graph. It helps EA eliminate allocations.
Such unrolling creates several
Lock
andUnlock
nodes per synchronized region. But nested locks elimination look for region with only one uniqueLock
node: callnode.cpp#L2117. So we skip first iteration of nested locks elimination and execute locks coarsening optimization which eliminates all innersLock
andUnlock
nodes leaving only firstLock
and lastUnlock
. On next round of macro nodes expansion we execute nested locks elimination and remove firstLock
node leavingUnlock
.The fix is to use existing list of coarsened Lock/Unlock nodes pairs to flag synchronized regions (identified by BoxLockNode nodes) as
Unbalanced
to skip any further locks elimination in these regions.New regression test was added. It has two
test
methods. One is based on original test from bug report. The other I wrote to show the issue clearly. Both triggers the assert.Tested tier1-5, xcomp, stress
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17697/head:pull/17697
$ git checkout pull/17697
Update a local copy of the PR:
$ git checkout pull/17697
$ git pull https://git.openjdk.org/jdk.git pull/17697/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17697
View PR using the GUI difftool:
$ git pr show -t 17697
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17697.diff
Webrev
Link to Webrev Comment