New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8276901: Implement UseHeavyMonitors consistently #6320
Conversation
|
/cc hotspot-runtime |
/cc hotspot-compiler |
@rkennke |
@rkennke |
Webrevs
|
/label remove hotspot @rkennke A CSR request is needed here for the change to the UseHeavyMonitors flag, which for historical reasons was a product flag. I wonder if anyone using interpreter-only Zero builds uses this flag? |
@dholmes-ora |
How was, and going forward will, this be tested? There are no tests using UseHeavyMonitors. And a real test would be to run a bunch of other tests with the flag applied. |
@dholmes-ora has indicated that a compatibility and specification (CSR) request is needed for this pull request. |
I filed CSR here: |
I have run tier1-4 with the flag explicitely enabled. However, I agree, we should include the flag in a run config for some relevant tests. Do you have any suggestions? Maybe we have a stress test for synchronized already, and could add a run config there? jcstress comes to mind, but I am not sure if the included jcstress tests even exercise synchronized (rather than j.u.c. stuff)? |
I know nothing about jcstress but I skimmed through it and it seems synchronized is covered in "CHAPTER 2.c". I don't know how we normally run jcstress, but I don't think there is any file that would allow you set the flag. It would really be up to the person running the tests to add a configuration that includes testing UseHeavyMonitors. We have test configuration files that would do that, but I don't think we can add it to our testing due to resource constraints. So I guess we need at least some minimal tier1 test that does a bit of synchronization with UseHeavyMonitors enabled so that we can avoid breaking it. |
@lmesnik - Can you chime in the testing of this cmd line option? |
The testing VM options are set in task definitions. I'll file a separate issue to adjust configs. |
This is not necessary. I only need one or a few tests that exercise synchronize statement, and run this with -XX:+UseHeavyMonitors. If no such test exists, then I'll make one. |
@rkennke thanks, I don't see any good specific tests to run openjdk regression suite. There are a bunch of tests in JCK, jcstress. You could also just run tier1 with UseHeavyMonitors as a sanity check. There are also jvmti tests that might be used to verify that JVMTI is not broken with UseHeavyMonitors. |
Wouldn't a (new) minimal multi-threaded test that explicitly sets UseHeavyMonitors be good and sufficient for this develop feature? |
It would not break as such on other platforms. It would only be partially implemented, that is C1 would emit calls to runtime for and only use monitors while interpreter and C2 would still emit stack locks. That is ok - and that is roughly what +UseFastLocking used to do. |
I have addressed the suggested changes, and also implemented (and verified) the flag on x86_32 and aarch64. I don't have access to the remaining CPU arches. Can I please get another round of reviews? Thanks, Roman |
Sorry but I don't see how having the interpreter+C2 use stack-locks while C1 ignores them can possibly be correct. ??? |
Ok, right. It worked before, because -UseFastLocking (C1) and +UseHeavyMonitors (interpreter) would generate runtime calls (instead of fast stack locking paths), and the runtime implementation would still do stack-locking. For arches where UseHeavyMonitors is not (fully) supported, I am fixing this by letting the runtime do stack-locks. TBH, it would be nice if this change could be properly implemented on remaining arches... (ping @TheRealMDoerr for PPC, not sure who could do arm or s390). |
PPC64 could be implemented like this: diff --git a/src/hotspot/cpu/ppc/ppc.ad b/src/hotspot/cpu/ppc/ppc.ad
index 958059e1ca2..dc96bd15836 100644
--- a/src/hotspot/cpu/ppc/ppc.ad
+++ b/src/hotspot/cpu/ppc/ppc.ad
@@ -12132,7 +12132,7 @@ instruct partialSubtypeCheck(iRegPdst result, iRegP_N2P subklass, iRegP_N2P supe
instruct cmpFastLock(flagsReg crx, iRegPdst oop, iRegPdst box, iRegPdst tmp1, iRegPdst tmp2) %{
match(Set crx (FastLock oop box));
effect(TEMP tmp1, TEMP tmp2);
- predicate(!Compile::current()->use_rtm());
+ predicate(!Compile::current()->use_rtm() && !UseHeavyMonitors);
format %{ "FASTLOCK $oop, $box, $tmp1, $tmp2" %}
ins_encode %{
@@ -12149,7 +12149,7 @@ instruct cmpFastLock(flagsReg crx, iRegPdst oop, iRegPdst box, iRegPdst tmp1, iR
instruct cmpFastLock_tm(flagsReg crx, iRegPdst oop, rarg2RegP box, iRegPdst tmp1, iRegPdst tmp2, iRegPdst tmp3) %{
match(Set crx (FastLock oop box));
effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, USE_KILL box);
- predicate(Compile::current()->use_rtm());
+ predicate(Compile::current()->use_rtm() && !UseHeavyMonitors);
format %{ "FASTLOCK $oop, $box, $tmp1, $tmp2, $tmp3 (TM)" %}
ins_encode %{
@@ -12165,6 +12165,18 @@ instruct cmpFastLock_tm(flagsReg crx, iRegPdst oop, rarg2RegP box, iRegPdst tmp1
ins_pipe(pipe_class_compare);
%}
+instruct cmpFastLock_hm(flagsReg crx, iRegPdst oop, rarg2RegP box) %{
+ match(Set crx (FastLock oop box));
+ predicate(UseHeavyMonitors);
+
+ format %{ "FASTLOCK $oop, $box (HM)" %}
+ ins_encode %{
+ // Set NE to indicate 'failure' -> take slow-path.
+ __ crandc($crx$$CondRegister, Assembler::equal, $crx$$CondRegister, Assembler::equal);
+ %}
+ ins_pipe(pipe_class_compare);
+%}
+
instruct cmpFastUnlock(flagsReg crx, iRegPdst oop, iRegPdst box, iRegPdst tmp1, iRegPdst tmp2, iRegPdst tmp3) %{
match(Set crx (FastUnlock oop box));
effect(TEMP tmp1, TEMP tmp2, TEMP tmp3);
diff --git a/src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp b/src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp
index a834fa1af36..bac8ef164f8 100644
--- a/src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp
+++ b/src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp
@@ -2014,8 +2014,10 @@ nmethod *SharedRuntime::generate_native_wrapper(MacroAssembler *masm,
// Try fastpath for locking.
// fast_lock kills r_temp_1, r_temp_2, r_temp_3.
- __ compiler_fast_lock_object(r_flag, r_oop, r_box, r_temp_1, r_temp_2, r_temp_3);
- __ beq(r_flag, locked);
+ if (!UseHeavyMonitors) {
+ __ compiler_fast_lock_object(r_flag, r_oop, r_box, r_temp_1, r_temp_2, r_temp_3);
+ __ beq(r_flag, locked);
+ }
// None of the above fast optimizations worked so we have to get into the
// slow case of monitor enter. Inline a special case of call_VM that
diff --git a/src/hotspot/share/runtime/synchronizer.cpp b/src/hotspot/share/runtime/synchronizer.cpp
index 4c5ea4a6e40..4f9c7c21a9b 100644
--- a/src/hotspot/share/runtime/synchronizer.cpp
+++ b/src/hotspot/share/runtime/synchronizer.cpp
@@ -418,7 +418,7 @@ void ObjectSynchronizer::handle_sync_on_value_based_class(Handle obj, JavaThread
}
static bool useHeavyMonitors() {
-#if defined(X86) || defined(AARCH64)
+#if defined(X86) || defined(AARCH64) || defined(PPC64)
return UseHeavyMonitors;
#else
return false; I don't like hacking the regular assembler implementations. Better would be to change C2 such that it doesn't generate FastLockNodes. But that may be a bit cumbersome. |
Thank you, Martin!
That is a good suggestion, and would help ease the work in the backend. I believe you still have to change something in sharedRuntime_ppc.cpp, similar to what I did in, e.g., sharedRuntime_aarch64.cpp. |
You mean in |
Ah I haven't seen it, sorry. Can you verify the new testcase, and perhaps some test programs that do some locking with -XX:+UseHeavyMonitors -XX:+VerifyHeavyMonitors ? You also need to include PPC in arguments.cpp and synchronizer.cpp changes to enable that stuff on PPC: |
Ok, thanks for checking. You have convinced me that your version is fine. We should do it the same way on PPC64: diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
index 98565003691..cb58e775422 100644
--- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
@@ -2660,27 +2660,32 @@ void MacroAssembler::compiler_fast_lock_object(ConditionRegister flag, Register
andi_(temp, displaced_header, markWord::monitor_value);
bne(CCR0, object_has_monitor);
- // Set displaced_header to be (markWord of object | UNLOCK_VALUE).
- ori(displaced_header, displaced_header, markWord::unlocked_value);
-
- // Load Compare Value application register.
-
- // Initialize the box. (Must happen before we update the object mark!)
- std(displaced_header, BasicLock::displaced_header_offset_in_bytes(), box);
-
- // Must fence, otherwise, preceding store(s) may float below cmpxchg.
- // Compare object markWord with mark and if equal exchange scratch1 with object markWord.
- cmpxchgd(/*flag=*/flag,
- /*current_value=*/current_header,
- /*compare_value=*/displaced_header,
- /*exchange_value=*/box,
- /*where=*/oop,
- MacroAssembler::MemBarRel | MacroAssembler::MemBarAcq,
- MacroAssembler::cmpxchgx_hint_acquire_lock(),
- noreg,
- &cas_failed,
- /*check without membar and ldarx first*/true);
- assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0");
+ if (!UseHeavyMonitors) {
+ // Set displaced_header to be (markWord of object | UNLOCK_VALUE).
+ ori(displaced_header, displaced_header, markWord::unlocked_value);
+
+ // Load Compare Value application register.
+
+ // Initialize the box. (Must happen before we update the object mark!)
+ std(displaced_header, BasicLock::displaced_header_offset_in_bytes(), box);
+
+ // Must fence, otherwise, preceding store(s) may float below cmpxchg.
+ // Compare object markWord with mark and if equal exchange scratch1 with object markWord.
+ cmpxchgd(/*flag=*/flag,
+ /*current_value=*/current_header,
+ /*compare_value=*/displaced_header,
+ /*exchange_value=*/box,
+ /*where=*/oop,
+ MacroAssembler::MemBarRel | MacroAssembler::MemBarAcq,
+ MacroAssembler::cmpxchgx_hint_acquire_lock(),
+ noreg,
+ &cas_failed,
+ /*check without membar and ldarx first*/true);
+ assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0");
+ } else {
+ // Set NE to indicate 'failure' -> take slow-path.
+ crandc(flag, Assembler::equal, flag, Assembler::equal);
+ }
// If the compare-and-exchange succeeded, then we found an unlocked
// object and we have now locked it.
@@ -2768,12 +2773,14 @@ void MacroAssembler::compiler_fast_unlock_object(ConditionRegister flag, Registe
}
#endif
- // Find the lock address and load the displaced header from the stack.
- ld(displaced_header, BasicLock::displaced_header_offset_in_bytes(), box);
+ if (!UseHeavyMonitors) {
+ // Find the lock address and load the displaced header from the stack.
+ ld(displaced_header, BasicLock::displaced_header_offset_in_bytes(), box);
- // If the displaced header is 0, we have a recursive unlock.
- cmpdi(flag, displaced_header, 0);
- beq(flag, cont);
+ // If the displaced header is 0, we have a recursive unlock.
+ cmpdi(flag, displaced_header, 0);
+ beq(flag, cont);
+ }
// Handle existing monitor.
// The object has an existing monitor iff (mark & monitor_value) != 0.
@@ -2782,20 +2789,24 @@ void MacroAssembler::compiler_fast_unlock_object(ConditionRegister flag, Registe
andi_(R0, current_header, markWord::monitor_value);
bne(CCR0, object_has_monitor);
- // Check if it is still a light weight lock, this is is true if we see
- // the stack address of the basicLock in the markWord of the object.
- // Cmpxchg sets flag to cmpd(current_header, box).
- cmpxchgd(/*flag=*/flag,
- /*current_value=*/current_header,
- /*compare_value=*/box,
- /*exchange_value=*/displaced_header,
- /*where=*/oop,
- MacroAssembler::MemBarRel,
- MacroAssembler::cmpxchgx_hint_release_lock(),
- noreg,
- &cont);
-
- assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0");
+ if (!UseHeavyMonitors) {
+ // Check if it is still a light weight lock, this is is true if we see
+ // the stack address of the basicLock in the markWord of the object.
+ // Cmpxchg sets flag to cmpd(current_header, box).
+ cmpxchgd(/*flag=*/flag,
+ /*current_value=*/current_header,
+ /*compare_value=*/box,
+ /*exchange_value=*/displaced_header,
+ /*where=*/oop,
+ MacroAssembler::MemBarRel,
+ MacroAssembler::cmpxchgx_hint_release_lock(),
+ noreg,
+ &cont);
+ assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0");
+ } else {
+ // Set NE to indicate 'failure' -> take slow-path.
+ crandc(flag, Assembler::equal, flag, Assembler::equal);
+ }
// Handle existing monitor.
b(cont);
diff --git a/src/hotspot/share/runtime/arguments.cpp b/src/hotspot/share/runtime/arguments.cpp
index 3396adc0799..969c8e82b91 100644
--- a/src/hotspot/share/runtime/arguments.cpp
+++ b/src/hotspot/share/runtime/arguments.cpp
@@ -2021,12 +2021,12 @@ bool Arguments::check_vm_args_consistency() {
}
#endif
-#if !defined(X86) && !defined(AARCH64)
+#if !defined(X86) && !defined(AARCH64) && !defined(PPC64)
if (UseHeavyMonitors) {
warning("UseHeavyMonitors is not fully implemented on this architecture");
}
#endif
-#ifdef X86
+#if defined(X86) || defined(PPC64)
if (UseHeavyMonitors && UseRTMForStackLocks) {
fatal("-XX:+UseHeavyMonitors and -XX:+UseRTMForStackLocks are mutually exclusive");
}
diff --git a/src/hotspot/share/runtime/synchronizer.cpp b/src/hotspot/share/runtime/synchronizer.cpp
index 4c5ea4a6e40..4f9c7c21a9b 100644
--- a/src/hotspot/share/runtime/synchronizer.cpp
+++ b/src/hotspot/share/runtime/synchronizer.cpp
@@ -418,7 +418,7 @@ void ObjectSynchronizer::handle_sync_on_value_based_class(Handle obj, JavaThread
}
static bool useHeavyMonitors() {
-#if defined(X86) || defined(AARCH64)
+#if defined(X86) || defined(AARCH64) || defined(PPC64)
return UseHeavyMonitors;
#else
return false;
diff --git a/test/jdk/java/util/concurrent/ConcurrentHashMap/MapLoops.java b/test/jdk/java/util/concurrent/ConcurrentHashMap/MapLoops.java
index cd32e222f68..922b18836dd 100644
--- a/test/jdk/java/util/concurrent/ConcurrentHashMap/MapLoops.java
+++ b/test/jdk/java/util/concurrent/ConcurrentHashMap/MapLoops.java
@@ -48,7 +48,7 @@
/*
* @test
* @summary Exercise multithreaded maps, using only heavy monitors.
- * @requires os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64" | os.arch=="aarch64"
+ * @requires os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64" | os.arch=="aarch64" | os.arch == "ppc64" | os.arch == "ppc64le"
* @library /test/lib
* @run main/othervm/timeout=1600 -XX:+IgnoreUnrecognizedVMOptions -XX:+UseHeavyMonitors -XX:+VerifyHeavyMonitors MapLoops
*/ Note that this version does no longer require changes in sharedRuntime_ppc because the native wrapper generator uses the same code as C2. The test case has passed. |
Thanks, @TheRealMDoerr ! I've added your PPC port to this PR. |
Can I get more reviews for this PR before the JDK18 window closes? I suggest to not wait for arm or s390 ports for now. Thanks! |
Thumbs up. I'm going to kick off some Mach5 testing for
this change or at least try to...
Alright, I will wait for it! Thanks for testing! |
Mach5 Tier1:
Mach5 Tier2:
Mach5 Tier3:
Mach5 Tier4:
Mach5 Tier5:
Mach5 Tier6:
Mach5 Tier7:
Mach5 Tier8:
So Tier[1-4] are complete and look fine. Mach5 is a bit over loaded at the Update: Tier[5-8] finished and the results still look good. |
Thanks, David! I will then go ahead and integrate this PR. Cheers, |
/integrate |
Going to push as commit 5b81d5e.
Your commit was automatically rebased without conflicts. |
Mailing list message from David Holmes on hotspot-runtime-dev: (re-sending for the mailing lists as previous mail seems lost) On 8/12/2021 12:40 am, Roman Kennke wrote:
That was Daniel not David. Sorry I didn't get back to this sooner but ... I am not happy with the argument processing in I remain concerned about the impact of this change on platforms that David |
The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all.
I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful.
The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work.
Testing:
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320
$ git checkout pull/6320
Update a local copy of the PR:
$ git checkout pull/6320
$ git pull https://git.openjdk.java.net/jdk pull/6320/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6320
View PR using the GUI difftool:
$ git pr show -t 6320
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6320.diff