Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8316746: Top of lock-stack does not match the unlocked object #15903

Closed

Conversation

TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented Sep 25, 2023

I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue.
Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8316746: Top of lock-stack does not match the unlocked object (Bug - P2)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15903/head:pull/15903
$ git checkout pull/15903

Update a local copy of the PR:
$ git checkout pull/15903
$ git pull https://git.openjdk.org/jdk.git pull/15903/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15903

View PR using the GUI difftool:
$ git pr show -t 15903

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15903.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 25, 2023

👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 25, 2023
@openjdk
Copy link

openjdk bot commented Sep 25, 2023

@TheRealMDoerr The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Sep 25, 2023
@mlbridge
Copy link

mlbridge bot commented Sep 25, 2023

Webrevs

@rkennke
Copy link
Contributor

rkennke commented Sep 25, 2023

No, not really. C2 should only encounter balanced locking. If the TOS doesn't match, then the issue is a discrepancy elsewhere. See my comment in the ticket for a possible explanation.

@TheRealMDoerr
Copy link
Contributor Author

No, not really. C2 should only encounter balanced locking. If the TOS doesn't match, then the issue is a discrepancy elsewhere. See my comment in the ticket for a possible explanation.

Handling ANONYMOUS_OWNER in the C2 monitor unlocking code doesn't solve this issue, but this PR does. Please see my reply in the JBS issue.
Also note that other tests and benchmarks are stable on PPC64. We should have seen more problems if the implementation had such a fundamental bug. This issue only shows up in jdi tests which do special things. Will need to look more what exactly they do.

@TheRealMDoerr
Copy link
Contributor Author

I've added x86_64 and aarch64 support in case somebody else has seen similar problems and would like to try this patch.

@dean-long
Copy link
Member

As Roman said, the C2 locks should be balanced, and the lock stack looks correct: nsk.share.jdi.EventHandler.run locks listeners first, then the EventHandler. Maybe something is going wrong in deoptimization? Pushing this change without understanding the cause could make it harder to find the real problem.

@TheRealMDoerr
Copy link
Contributor Author

I'm not planning to push it without having understood what the problem really is. This PR may be interesting for other people, too. That's why I've created it.

@TheRealMDoerr
Copy link
Contributor Author

Closing after Roman's design explanation: "C2 (and C1) enforces strictly nested balanced locking." We will hopefully find a better fix to make this really happen. (Otherwise, we could still reopen.)

@TheRealMDoerr TheRealMDoerr deleted the 8316746_lock_stack branch September 26, 2023 13:25
@TheRealMDoerr TheRealMDoerr reopened this Sep 28, 2023
@TheRealMDoerr
Copy link
Contributor Author

TheRealMDoerr commented Sep 28, 2023

The lock order cannot be guaranteed for OSR compilation (see JBS discussion). Hence, I'm passing this information and allow reordering only in this case (currently testing on PPC64). Other platforms will follow after more testing and feedback.


// Check if the top of the lock-stack matches the unlocked object.
addi(temp, temp, -oopSize);
if (may_be_unordered) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is no need to call the slow-path here. Simply don't emit the assertion should be good enough, right?

@dean-long
Copy link
Member

The lock order cannot be guaranteed for OSR compilation (see JBS discussion).

There won't be an nmethod with an OSR entry point if the locks aren't nested correctly, so I still don't understand what is going wrong and why this change is necessary. The C2 assembly code output in the hs_err file looks correct. How could the locks in the interpreter frame get out of order?

Allowing C2 to unlock locks out of order seems like the wrong solution. I think it could have unintended consequences because this is a basic invariant for the compilers. If this problem can somehow to caused by JVMTI, then whatever unsafe operation JVMTI is doing should probably invalidate OSR entry points for the method.

@TheRealMDoerr
Copy link
Contributor Author

Thanks for looking at it! I'm currently using this PR as workaround until a better fix is found (without integrating it). I have come to the same conclusion that JIT compilers reject methods which may possibly contain unbalanced monitors.

What I can observe is that the test passes with this:

diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp
index 36a87cd8b0d..385dc54f947 100644
--- a/src/hotspot/share/opto/parse1.cpp
+++ b/src/hotspot/share/opto/parse1.cpp
@@ -222,7 +222,7 @@ void Parse::load_interpreter_state(Node* osr_buf) {
   assert(jvms()->monitor_depth() == 0, "should be no active locks at beginning of osr");
   int mcnt = osr_block->flow()->monitor_count();
   Node *monitors_addr = basic_plus_adr(osr_buf, osr_buf, (max_locals+mcnt*2-1)*wordSize);
-  for (index = 0; index < mcnt; index++) {
+  for (index = mcnt; (--index) >= 0;) {
     // Make a BoxLockNode for the monitor.
     Node *box = _gvn.transform(new BoxLockNode(next_monitor()));
 

I think the lowest address contains the latest monitor (because the stack grows downwards) which should get pushed latest. Is this currently wrong?

Note that x86_64 doesn't have the lock checks, so we don't know if it's affected, too. The test may work even if it's wrong because we unlock both objects.
Other locking modes don't really care about the order, so the test simply passes even though it's wrong (at least on PPC64).

@dean-long
Copy link
Member

Hmmm, I believe you're right. If only C2 is affected and not C1, then we should fix it in load_interpreter_state(). If C1 is also affected we can fix it in OSR_migration_begin().

@dean-long
Copy link
Member

Please go with your load_interpreter_state() fix. You've convinced me it is correct :-)

@TheRealMDoerr
Copy link
Contributor Author

I've run more tests with the Parse::load_interpreter_state experiment and found one which fails: vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage006/TestDescription.java
Interesting that so few tests are sensitive to unlock ordering issues.

I still wonder if a scenario like this can happen:

  • Objects A and B are locked by the interpreter in a method which has balanced monitors.
  • A Java debugger does anything which causes the interpreter frame monitor stack slots to get exchanged (I don't know what it could be.) That could be considered legal because the slots have no defined order. It is even allowed to reuse empty slots on demand.
  • A later part of the method gets OSR compiled which is not prevented, because the method has balanced monitors. The JIT compiler expects the interpreter frame monitor slots to be in a certain order which is no longer true.

@dholmes-ora
Copy link
Member

for (index = mcnt; ...

Shouldn't that be for (index = mcnt -1; ... as index < mcnt?

@TobiHartmann
Copy link
Member

Interesting that so few tests are sensitive to unlock ordering issues.

Right, I think we definitely need more tests. At least a targeted regression test for this particular issue.

@TheRealMDoerr
Copy link
Contributor Author

for (index = mcnt; ...

Shouldn't that be for (index = mcnt -1; ... as index < mcnt?

I'm using pre-decrement, so the 1st iteration starts with mcnt-1.

@TheRealMDoerr
Copy link
Contributor Author

I have tried to test on x86 with this patch:

diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
index 2154601f2f2..3666d1490fc 100644
--- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
@@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t
   jccb  (Assembler::notZero, CheckSucc);
   // Without cast to int32_t this style of movptr will destroy r10 which is typically obj.
   movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD);
-  jmpb  (DONE_LABEL);
+  jmp  (DONE_LABEL);
 
   // Try to avoid passing control into the slow_path ...
   bind  (CheckSucc);
diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
index 26135c65418..a95149c2be5 100644
--- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
@@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp
   assert(hdr == rax, "header must be in rax for cmpxchg");
   assert_different_registers(obj, hdr, tmp);
 
+  if (UseNewCode) {
+    Label tos_ok;
+    movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset()));
+    cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize));
+    jcc(Assembler::equal, tos_ok);
+    STOP("Top of lock-stack does not match the unlocked object");
+    bind(tos_ok);
+  }
+
   // Mark-word must be lock_mask now, try to swing it back to unlocked_value.
   movptr(tmp, hdr); // The expected old value
   orptr(tmp, markWord::unlocked_value);

The assertion fires in C1 compiled methods and prevents me from getting far enough to run the same test.

@dholmes-ora
Copy link
Member

for (index = mcnt; ...

Shouldn't that be for (index = mcnt -1; ... as index < mcnt?

I'm using pre-decrement, so the 1st iteration starts with mcnt-1.

Ugghh! a side-effect in the condition check.

@dean-long
Copy link
Member

Thinking about this some more, aren't we going to hit this problem every time the monitor is inflated, either because of contention, recursive entry, or running out of lock stack slots?

@dean-long
Copy link
Member

The lock stack has a fixed size, so it cannot contain the complete lock order. Only the C1/C2 BasicObjectLock stack records contain that. In theory we could do sanity checks against the top of that "stack" for C1/C2, but the compiler current cheat. Instead of passing in the oop for unlock, they grab it from the top of the BasicObjectLock stack, so any sanity check would always pass.

@dean-long
Copy link
Member

I guess all callers of lightweight_unlock already check if the object is inflated, so that should be OK as long as it is guaranteed to stay inflated, right?

@dean-long
Copy link
Member

I have tried to test on x86 with this patch:

diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
index 2154601f2f2..3666d1490fc 100644
--- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
@@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t
   jccb  (Assembler::notZero, CheckSucc);
   // Without cast to int32_t this style of movptr will destroy r10 which is typically obj.
   movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD);
-  jmpb  (DONE_LABEL);
+  jmp  (DONE_LABEL);
 
   // Try to avoid passing control into the slow_path ...
   bind  (CheckSucc);
diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
index 26135c65418..a95149c2be5 100644
--- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
@@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp
   assert(hdr == rax, "header must be in rax for cmpxchg");
   assert_different_registers(obj, hdr, tmp);
 
+  if (UseNewCode) {
+    Label tos_ok;
+    movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset()));
+    cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize));
+    jcc(Assembler::equal, tos_ok);
+    STOP("Top of lock-stack does not match the unlocked object");
+    bind(tos_ok);
+  }
+
   // Mark-word must be lock_mask now, try to swing it back to unlocked_value.
   movptr(tmp, hdr); // The expected old value
   orptr(tmp, markWord::unlocked_value);

The assertion fires in C1 compiled methods and prevents me from getting far enough to run the same test.

I don't see x86 callers of lightweight_unlock doing a check for inflation, so there is no guarantee it's on the lock stack.

@TheRealMDoerr
Copy link
Contributor Author

Valid point. Thanks! I've got it to work:

diff --git a/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
index 78361a305ae..4571036477e 100644
--- a/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
@@ -135,6 +135,8 @@ void C1_MacroAssembler::unlock_object(Register hdr, Register obj, Register disp_
 
   if (LockingMode == LM_LIGHTWEIGHT) {
     movptr(disp_hdr, Address(obj, hdr_offset));
+    testptr(disp_hdr, markWord::monitor_value);
+    jcc(Assembler::notEqual, slow_case);
     andptr(disp_hdr, ~(int32_t)markWord::lock_mask_in_place);
     lightweight_unlock(obj, disp_hdr, hdr, slow_case);
   } else if (LockingMode == LM_LEGACY) {
diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
index 2154601f2f2..3666d1490fc 100644
--- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
@@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t
   jccb  (Assembler::notZero, CheckSucc);
   // Without cast to int32_t this style of movptr will destroy r10 which is typically obj.
   movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD);
-  jmpb  (DONE_LABEL);
+  jmp  (DONE_LABEL);
 
   // Try to avoid passing control into the slow_path ...
   bind  (CheckSucc);
diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
index 26135c65418..0b3af526b32 100644
--- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp
@@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp
   assert(hdr == rax, "header must be in rax for cmpxchg");
   assert_different_registers(obj, hdr, tmp);
 
+  {
+    Label tos_ok;
+    movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset()));
+    cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize));
+    jcc(Assembler::equal, tos_ok);
+    STOP("Top of lock-stack does not match the unlocked object");
+    bind(tos_ok);
+  }
+
   // Mark-word must be lock_mask now, try to swing it back to unlocked_value.
   movptr(tmp, hdr); // The expected old value
   orptr(tmp, markWord::unlocked_value);

The nsk/jdi/StepEvent tests have passed on my x86_64 machine. I couldn't reproduce the issue on that platform.

@dean-long
Copy link
Member

With your fix for load_interpreter_state, you should be able to set may_be_unordered to false for c1 and c2 calls now, even for osr, right?

@dean-long
Copy link
Member

Other platforms may still need may_be_unordered for interpreter calls.

@TheRealMDoerr
Copy link
Contributor Author

TheRealMDoerr commented Oct 10, 2023

Hi Dean, the following test fails with the reversed order (my load_interpreter_state patch above) on PPC64, but passes with the original order. Surprisingly, x86_64 passes with either version (with asm assertions in unlock_object applied from above). This PR not applied in any of those experiments. Maybe I should contribute this test even though it doesn't trigger the problem in the original VM?

diff --git a/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java b/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java
new file mode 100644
index 00000000000..be58f21c977
--- /dev/null
+++ b/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java
@@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2023 SAP SE. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ *
+ */
+
+/*
+ * @test
+ * @summary During OSR, locks get transferred from interpreter frame.
+ *          Check that unlocking 2 such locks works in the OSR compiled nmethod.
+ *          Some platforms verify that the unlocking happens in the corrent order.
+ *
+ * @run main/othervm -Xbatch TestUnlockOSR
+ */
+
+public class TestUnlockOSR {
+    static void test_method(Object a, Object b, int limit) {
+        synchronized(a) {
+            synchronized(b) {
+                for (int i = 0; i < limit; i++) {}
+            }
+        }
+    }
+
+    public static void main(String[] args) {
+        Object a = new TestUnlockOSR(),
+               b = new TestUnlockOSR();
+        for (int i = 0; i < 10000; i++) { test_method(a, b, 0); } // compile
+        test_method(a, b, 100000); // deopt, trigger OSR
+    }
+}

@dean-long
Copy link
Member

Yes, please add the new test. But now I'm confused. Is the load_interpreter_state() change needed or not?

@TheRealMDoerr
Copy link
Contributor Author

TheRealMDoerr commented Oct 12, 2023

Hi Dean, let me try to summarize my experiments:

  • This PR is one solution which makes all tests pass. The other experiments don't use this PR.
  • The reversed order in load_interpreter_state makes the nsk/jdi/StepEvent pass, but my new test failing on PPC64. In addition, nsk/jvmti/GetObjectMonitorUsage/objmonusage006/TestDescription.java fails on all platforms with that change. So, we can't use it.
  • I couldn't reproduce any unlock issue on x86_64 with the checks I had implemented (regardless of the load_interpreter_state change) except objmonusage006 with load_interpreter_state change.

@dean-long
Copy link
Member

When I run TestUnlockOSR on x86_64 it doesn't trigger a C2 unlock because the OSR nmethod hits an uncommon trap first.

@dean-long
Copy link
Member

If I modify the test slightly to avoid the uncommon trap, I do hit the "Top of lock-stack does not match the unlocked object" stop on x86_64 with the new reversed order for load_interpreter_state. So it does look like that fix is not correct, and something else must be going wrong with nsk/jdi/StepEvent.

@TheRealMDoerr
Copy link
Contributor Author

I have an other interesting experiment:
make run-test TEST="vmTestbase/nsk/jdi/StepEvent" JTREG="VM_OPTIONS=-Xbatch" reproduces the issue most of the time on PPC64.
However, make run-test TEST="vmTestbase/nsk/jdi/StepEvent" JTREG="VM_OPTIONS=-Xbatch -XX:AsyncDeflationInterval=0" appears to run stable.
Not sure if that's a coincidence or if the async deflation messes something up.

@dean-long
Copy link
Member

If the locks are inflated then you won't hit the top of stack check in the fast path.
Can you reproduce the StepEvent problem with C1 using -XX:TieredStopAtLevel=3?

@TheRealMDoerr
Copy link
Contributor Author

TheRealMDoerr commented Oct 14, 2023

I guess that in the error scenario, the locks got async deflated before we run into the issue (async deflation not switched off by -XX:AsyncDeflationInterval=0). Could be that they stay inflated when I use that switch and the unlock order is not checked.
No, I can't reproduce the issue with -XX:TieredStopAtLevel=3 (or 1).

@TheRealMDoerr
Copy link
Contributor Author

@dean-long: I have attached a synthetic reproducer to the JBS issue which simply checks the lock order when reaching the OSR entry. I'd appreciate it if you could take a look.

@dean-long
Copy link
Member

The patch fails for me even though the locks are correct, because kptr->obj()->print_string() is not null-terminated. The patch needs to use strncmp or maybe something like kptr->obj()->klass()->name()->as_C_string().

@TheRealMDoerr
Copy link
Contributor Author

The patch fails for me even though the locks are correct, because kptr->obj()->print_string() is not null-terminated. The patch needs to use strncmp or maybe something like kptr->obj()->klass()->name()->as_C_string().

Never mind, I have found a new solution. Thanks for looking into it!

@TheRealMDoerr
Copy link
Contributor Author

Closing in favor of #16345.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

5 participants