-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8316746: Top of lock-stack does not match the unlocked object #15903
Conversation
👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into |
@TheRealMDoerr The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
No, not really. C2 should only encounter balanced locking. If the TOS doesn't match, then the issue is a discrepancy elsewhere. See my comment in the ticket for a possible explanation. |
Handling ANONYMOUS_OWNER in the C2 monitor unlocking code doesn't solve this issue, but this PR does. Please see my reply in the JBS issue. |
I've added x86_64 and aarch64 support in case somebody else has seen similar problems and would like to try this patch. |
As Roman said, the C2 locks should be balanced, and the lock stack looks correct: nsk.share.jdi.EventHandler.run locks listeners first, then the EventHandler. Maybe something is going wrong in deoptimization? Pushing this change without understanding the cause could make it harder to find the real problem. |
I'm not planning to push it without having understood what the problem really is. This PR may be interesting for other people, too. That's why I've created it. |
Closing after Roman's design explanation: "C2 (and C1) enforces strictly nested balanced locking." We will hopefully find a better fix to make this really happen. (Otherwise, we could still reopen.) |
The lock order cannot be guaranteed for OSR compilation (see JBS discussion). Hence, I'm passing this information and allow reordering only in this case (currently testing on PPC64). Other platforms will follow after more testing and feedback. |
|
||
// Check if the top of the lock-stack matches the unlocked object. | ||
addi(temp, temp, -oopSize); | ||
if (may_be_unordered) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there is no need to call the slow-path here. Simply don't emit the assertion should be good enough, right?
There won't be an nmethod with an OSR entry point if the locks aren't nested correctly, so I still don't understand what is going wrong and why this change is necessary. The C2 assembly code output in the hs_err file looks correct. How could the locks in the interpreter frame get out of order? Allowing C2 to unlock locks out of order seems like the wrong solution. I think it could have unintended consequences because this is a basic invariant for the compilers. If this problem can somehow to caused by JVMTI, then whatever unsafe operation JVMTI is doing should probably invalidate OSR entry points for the method. |
Thanks for looking at it! I'm currently using this PR as workaround until a better fix is found (without integrating it). I have come to the same conclusion that JIT compilers reject methods which may possibly contain unbalanced monitors. What I can observe is that the test passes with this: diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp
index 36a87cd8b0d..385dc54f947 100644
--- a/src/hotspot/share/opto/parse1.cpp
+++ b/src/hotspot/share/opto/parse1.cpp
@@ -222,7 +222,7 @@ void Parse::load_interpreter_state(Node* osr_buf) {
assert(jvms()->monitor_depth() == 0, "should be no active locks at beginning of osr");
int mcnt = osr_block->flow()->monitor_count();
Node *monitors_addr = basic_plus_adr(osr_buf, osr_buf, (max_locals+mcnt*2-1)*wordSize);
- for (index = 0; index < mcnt; index++) {
+ for (index = mcnt; (--index) >= 0;) {
// Make a BoxLockNode for the monitor.
Node *box = _gvn.transform(new BoxLockNode(next_monitor()));
I think the lowest address contains the latest monitor (because the stack grows downwards) which should get pushed latest. Is this currently wrong? Note that x86_64 doesn't have the lock checks, so we don't know if it's affected, too. The test may work even if it's wrong because we unlock both objects. |
Hmmm, I believe you're right. If only C2 is affected and not C1, then we should fix it in load_interpreter_state(). If C1 is also affected we can fix it in OSR_migration_begin(). |
Please go with your load_interpreter_state() fix. You've convinced me it is correct :-) |
I've run more tests with the I still wonder if a scenario like this can happen:
|
Shouldn't that be |
Right, I think we definitely need more tests. At least a targeted regression test for this particular issue. |
I'm using pre-decrement, so the 1st iteration starts with mcnt-1. |
I have tried to test on x86 with this patch: diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
index 2154601f2f2..3666d1490fc 100644
--- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
@@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t
jccb (Assembler::notZero, CheckSucc);
// Without cast to int32_t this style of movptr will destroy r10 which is typically obj.
movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD);
- jmpb (DONE_LABEL);
+ jmp (DONE_LABEL);
// Try to avoid passing control into the slow_path ...
bind (CheckSucc);
diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
index 26135c65418..a95149c2be5 100644
--- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
@@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp
assert(hdr == rax, "header must be in rax for cmpxchg");
assert_different_registers(obj, hdr, tmp);
+ if (UseNewCode) {
+ Label tos_ok;
+ movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset()));
+ cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize));
+ jcc(Assembler::equal, tos_ok);
+ STOP("Top of lock-stack does not match the unlocked object");
+ bind(tos_ok);
+ }
+
// Mark-word must be lock_mask now, try to swing it back to unlocked_value.
movptr(tmp, hdr); // The expected old value
orptr(tmp, markWord::unlocked_value); The assertion fires in C1 compiled methods and prevents me from getting far enough to run the same test. |
Ugghh! a side-effect in the condition check. |
Thinking about this some more, aren't we going to hit this problem every time the monitor is inflated, either because of contention, recursive entry, or running out of lock stack slots? |
The lock stack has a fixed size, so it cannot contain the complete lock order. Only the C1/C2 BasicObjectLock stack records contain that. In theory we could do sanity checks against the top of that "stack" for C1/C2, but the compiler current cheat. Instead of passing in the oop for unlock, they grab it from the top of the BasicObjectLock stack, so any sanity check would always pass. |
I guess all callers of lightweight_unlock already check if the object is inflated, so that should be OK as long as it is guaranteed to stay inflated, right? |
I don't see x86 callers of lightweight_unlock doing a check for inflation, so there is no guarantee it's on the lock stack. |
Valid point. Thanks! I've got it to work: diff --git a/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
index 78361a305ae..4571036477e 100644
--- a/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
@@ -135,6 +135,8 @@ void C1_MacroAssembler::unlock_object(Register hdr, Register obj, Register disp_
if (LockingMode == LM_LIGHTWEIGHT) {
movptr(disp_hdr, Address(obj, hdr_offset));
+ testptr(disp_hdr, markWord::monitor_value);
+ jcc(Assembler::notEqual, slow_case);
andptr(disp_hdr, ~(int32_t)markWord::lock_mask_in_place);
lightweight_unlock(obj, disp_hdr, hdr, slow_case);
} else if (LockingMode == LM_LEGACY) {
diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
index 2154601f2f2..3666d1490fc 100644
--- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
@@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t
jccb (Assembler::notZero, CheckSucc);
// Without cast to int32_t this style of movptr will destroy r10 which is typically obj.
movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD);
- jmpb (DONE_LABEL);
+ jmp (DONE_LABEL);
// Try to avoid passing control into the slow_path ...
bind (CheckSucc);
diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
index 26135c65418..0b3af526b32 100644
--- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
@@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp
assert(hdr == rax, "header must be in rax for cmpxchg");
assert_different_registers(obj, hdr, tmp);
+ {
+ Label tos_ok;
+ movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset()));
+ cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize));
+ jcc(Assembler::equal, tos_ok);
+ STOP("Top of lock-stack does not match the unlocked object");
+ bind(tos_ok);
+ }
+
// Mark-word must be lock_mask now, try to swing it back to unlocked_value.
movptr(tmp, hdr); // The expected old value
orptr(tmp, markWord::unlocked_value); The nsk/jdi/StepEvent tests have passed on my x86_64 machine. I couldn't reproduce the issue on that platform. |
With your fix for load_interpreter_state, you should be able to set may_be_unordered to false for c1 and c2 calls now, even for osr, right? |
Other platforms may still need may_be_unordered for interpreter calls. |
Hi Dean, the following test fails with the reversed order (my load_interpreter_state patch above) on PPC64, but passes with the original order. Surprisingly, x86_64 passes with either version (with asm assertions in unlock_object applied from above). This PR not applied in any of those experiments. Maybe I should contribute this test even though it doesn't trigger the problem in the original VM? diff --git a/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java b/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java
new file mode 100644
index 00000000000..be58f21c977
--- /dev/null
+++ b/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java
@@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2023 SAP SE. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ *
+ */
+
+/*
+ * @test
+ * @summary During OSR, locks get transferred from interpreter frame.
+ * Check that unlocking 2 such locks works in the OSR compiled nmethod.
+ * Some platforms verify that the unlocking happens in the corrent order.
+ *
+ * @run main/othervm -Xbatch TestUnlockOSR
+ */
+
+public class TestUnlockOSR {
+ static void test_method(Object a, Object b, int limit) {
+ synchronized(a) {
+ synchronized(b) {
+ for (int i = 0; i < limit; i++) {}
+ }
+ }
+ }
+
+ public static void main(String[] args) {
+ Object a = new TestUnlockOSR(),
+ b = new TestUnlockOSR();
+ for (int i = 0; i < 10000; i++) { test_method(a, b, 0); } // compile
+ test_method(a, b, 100000); // deopt, trigger OSR
+ }
+} |
Yes, please add the new test. But now I'm confused. Is the load_interpreter_state() change needed or not? |
Hi Dean, let me try to summarize my experiments:
|
When I run TestUnlockOSR on x86_64 it doesn't trigger a C2 unlock because the OSR nmethod hits an uncommon trap first. |
If I modify the test slightly to avoid the uncommon trap, I do hit the "Top of lock-stack does not match the unlocked object" stop on x86_64 with the new reversed order for load_interpreter_state. So it does look like that fix is not correct, and something else must be going wrong with nsk/jdi/StepEvent. |
I have an other interesting experiment: |
If the locks are inflated then you won't hit the top of stack check in the fast path. |
I guess that in the error scenario, the locks got async deflated before we run into the issue (async deflation not switched off by -XX:AsyncDeflationInterval=0). Could be that they stay inflated when I use that switch and the unlock order is not checked. |
@dean-long: I have attached a synthetic reproducer to the JBS issue which simply checks the lock order when reaching the OSR entry. I'd appreciate it if you could take a look. |
The patch fails for me even though the locks are correct, because kptr->obj()->print_string() is not null-terminated. The patch needs to use strncmp or maybe something like kptr->obj()->klass()->name()->as_C_string(). |
Never mind, I have found a new solution. Thanks for looking into it! |
Closing in favor of #16345. |
I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue.
Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms).
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15903/head:pull/15903
$ git checkout pull/15903
Update a local copy of the PR:
$ git checkout pull/15903
$ git pull https://git.openjdk.org/jdk.git pull/15903/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 15903
View PR using the GUI difftool:
$ git pr show -t 15903
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15903.diff
Webrev
Link to Webrev Comment