Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 #15978

Closed
wants to merge 2 commits into from

Conversation

nick-arm
Copy link
Contributor

@nick-arm nick-arm commented Sep 29, 2023

Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in:

  #
  # A fatal error has been detected by the Java Runtime Environment:
  #
  # Internal Error (0xe0000000), pid=64585, tid=64619
  # stop: Header is not fast-locked
  #
  # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c)
  # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
  # Problematic frame:
  # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190]
  #

When UseLSE is false MacroAssembler::cmpxchg() uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to MacroAssembler::lightweight_lock() and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15978/head:pull/15978
$ git checkout pull/15978

Update a local copy of the PR:
$ git checkout pull/15978
$ git pull https://git.openjdk.org/jdk.git pull/15978/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15978

View PR using the GUI difftool:
$ git pr show -t 15978

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15978.diff

Webrev

Link to Webrev Comment

…since JDK-8315880

Building a fastdebug image on a machine without LSE (e.g. A72) or
explicitly disabling LSE results in:

  #
  # A fatal error has been detected by the Java Runtime Environment:
  #
  # Internal Error (0xe0000000), pid=64585, tid=64619
  # stop: Header is not fast-locked
  #
  # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c)
  # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
  # Problematic frame:
  # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190]
  #

When UseLSE is false MacroAssembler::cmpxchg() uses rscratch1 as a
temporary to store the result of the store-exclusive instruction.
However rscratch1 may also be one of the registers passed as t1 or t2 to
MacroAssembler::lightweight_lock() and holding a live value which is
then clobbered.  Fixed by ensuring rscratch1 is never passed as one of
these temporaries.
@bridgekeeper
Copy link

bridgekeeper bot commented Sep 29, 2023

👋 Welcome back ngasson! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 29, 2023
@openjdk
Copy link

openjdk bot commented Sep 29, 2023

@nick-arm The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Sep 29, 2023
@mlbridge
Copy link

mlbridge bot commented Sep 29, 2023

Webrevs

Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Would it make any sense to only allocate the extra register when running with +UseLSE?

@openjdk
Copy link

openjdk bot commented Sep 29, 2023

@nick-arm This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880

Reviewed-by: rkennke, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 23 new commits pushed to the master branch:

  • a564d43: 8315692: Parallelize gc/stress/TestStressRSetCoarsening.java test
  • 878d27d: 8317273: compiler/codecache/OverflowCodeCacheTest.java fails transiently on Graal
  • 2637e8d: 8317314: Remove unimplemented ObjArrayKlass::oop_oop_iterate_elements_bounded
  • 8093563: 8317295: ResponseSubscribers.SubscriberAdapter should call the finisher function asynchronously
  • 516cfb1: 8316907: Fix nonnull-compare warnings
  • 5984792: 8316415: Parallelize sun/security/rsa/SignedObjectChain.java subtests
  • eeb63cd: 8316361: C2: assert(!failure) failed: Missed optimization opportunity in PhaseIterGVN with -XX:VerifyIterativeGVN=10
  • 6948942: 8317327: Remove JT_JAVA dead code in jib-profiles.js
  • 795e5dc: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration
  • 207819a: 8315604: IGV: dump and visualize node bottom and phase types
  • ... and 13 more: https://git.openjdk.org/jdk/compare/c45308afac019d40bbe3e9adf27733f6be520931...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 29, 2023
@nick-arm
Copy link
Contributor Author

Would it make any sense to only allocate the extra register when running with +UseLSE?

In C1? I thought about that but the benefit of always allocating it is that it reduces the differences between LSE and non-LSE modes.

@rkennke
Copy link
Contributor

rkennke commented Sep 29, 2023

Would it make any sense to only allocate the extra register when running with +UseLSE?

In C1? I thought about that but the benefit of always allocating it is that it reduces the differences between LSE and non-LSE modes.

Right. Good then!

@theRealAph
Copy link
Contributor

People unfamiliar with the platform conventions are going to keep getting this wrong. The scratch registers are used in macros: that's what they are for.

Please add:

diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
index e3df32ed602..4dbbedc123e 100644
--- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
@@ -2735,6 +2735,10 @@ void MacroAssembler::cmpxchg(Register addr, Register expected,
     mov(result, expected);
     lse_cas(result, new_val, addr, size, acquire, release, /*not_pair*/ true);
     compare_eq(result, expected, size);
+#ifdef ASSERT
+    // Poison rscratch1
+    mov(rscratch1, 0x1f1f1f1f1f1f1f1f);
+#endif
   } else {
     Label retry_load, done;
     prfm(Address(addr), PSTL1STRM);

@theRealAph
Copy link
Contributor

diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
index 51faab3d73b..c72a478949c 100644
--- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
@@ -6315,7 +6315,7 @@ void MacroAssembler::double_move(VMRegPair src, VMRegPair dst, Register tmp) {
 //  - t1, t2: temporary registers, will be destroyed
 void MacroAssembler::lightweight_lock(Register obj, Register hdr, Register t1, Register t2, Label& slow) {
   assert(LockingMode == LM_LIGHTWEIGHT, "only used with new lightweight locking");
-  assert_different_registers(obj, hdr, t1, t2);
+  assert_different_registers(obj, hdr, t1, t2, rscratch1, rscratch2);
 
   // Check if we would have space on lock-stack for the object.
   ldrw(t1, Address(rthread, JavaThread::lock_stack_top_offset()));

@nick-arm
Copy link
Contributor Author

nick-arm commented Oct 2, 2023

-  assert_different_registers(obj, hdr, t1, t2);
+  assert_different_registers(obj, hdr, t1, t2, rscratch1, rscratch2);

This is a bit trickier as we'd need to add an additional temporary to LIR_OpLock which would affect other platforms (from the call in C1_MacroAssembler::lock_object() which passes rscratch2).

@theRealAph
Copy link
Contributor

-  assert_different_registers(obj, hdr, t1, t2);
+  assert_different_registers(obj, hdr, t1, t2, rscratch1, rscratch2);

This is a bit trickier as we'd need to add an additional temporary to LIR_OpLock which would affect other platforms (from the call in C1_MacroAssembler::lock_object() which passes rscratch2).

Argh. OK.

@nick-arm
Copy link
Contributor Author

nick-arm commented Oct 3, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Oct 3, 2023

Going to push as commit b6a97c0.
Since your change was applied there have been 32 commits pushed to the master branch:

  • 287b243: 8316893: Compile without -fno-delete-null-pointer-checks
  • 26c21f5: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free
  • 6e1aacd: 8296631: NSS tests failing on OL9 linux-aarch64 hosts
  • d2e2c4c: 8309667: TLS handshake fails because of ConcurrentModificationException in PKCS12KeyStore.engineGetEntry
  • e25121d: 8316929: Shenandoah: Shenandoah degenerated GC and full GC need to cleanup old OopMapCache entries
  • 5c8366e: 8268622: Performance issues in javac Name class
  • ad81abd: 8317034: Remove redundant type cast in the java.util.stream package
  • d7d1d42: 8316771: Krb5.java has not defined messages for all error codes
  • f985006: 8309356: Read files in includedir in alphanumeric order
  • a564d43: 8315692: Parallelize gc/stress/TestStressRSetCoarsening.java test
  • ... and 22 more: https://git.openjdk.org/jdk/compare/c45308afac019d40bbe3e9adf27733f6be520931...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 3, 2023
@openjdk openjdk bot closed this Oct 3, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 3, 2023
@openjdk
Copy link

openjdk bot commented Oct 3, 2023

@nick-arm Pushed as commit b6a97c0.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@stefank
Copy link
Member

stefank commented Oct 3, 2023

Hi, while poking around in the locking code we also found this usage of rscratch1, which seems to be problematic:

  cmpxchg(tmp, zr, rthread, Assembler::xword, /*acquire*/ true,
          /*release*/ true, /*weak*/ false, rscratch1); // Sets flags for result

  if (LockingMode != LM_LIGHTWEIGHT) {
    // Store a non-null value into the box to avoid looking like a re-entrant
    // lock. The fast-path monitor unlock code checks for
    // markWord::monitor_value so use markWord::unused_mark which has the
    // relevant bit set, and also matches ObjectSynchronizer::enter.
    mov(tmp, (address)markWord::unused_mark().value());
    str(tmp, Address(box, BasicLock::displaced_header_offset_in_bytes()));
  }
  br(Assembler::EQ, cont); // CAS success means locking succeeded

  cmp(rscratch1, rthread);
  br(Assembler::NE, cont); // Check for recursive locking

I think we can use the new tmp3Reg instead of rscratch1 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
4 participants