Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8316180: Thread-local backoff for secondary_super_cache updates #15718

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Sep 13, 2023

See more details in the bug and related issues.

This is the attempt to mitigate JDK-8180450, while the more complex fix that would obviate the need for secondary_super_cache is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, and staying simple enough and reliable for backports to currently supported JDK releases.

This implements mitigation on most current architectures:

  • ✅ x86_64: implemented
  • 🔴 x86_32: considered, abandoned; cannot be easily done without blowing up code size
  • ✅ AArch64: implemented
  • 🔴 ARM32: considered, abandoned; needs cleanups and testing; see JDK-8318414
  • ✅ PPC64: implemented, thanks @TheRealMDoerr
  • ✅ S390: implemented, thanks @offamitkumar
  • ✅ RISC-V: implemented, thanks @RealFYang
  • ✅ Zero: does not need implementation

Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity.

Setting backoff at 0 effectively disables the mitigation, and gives us safety hatch if something goes wrong.

I believe we can go in with 1000 as the default, given the experimental results mentioned in this PR.

Additional testing:

  • Linux x86_64 fastdebug, tier1 tier2 tier3
  • Linux AArch64 fastdebug, tier1 tier2 tier3

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Integration blocker

 ⚠️ Whitespace errors (failed with updated jcheck configuration in pull request)

Issue

  • JDK-8316180: Thread-local backoff for secondary_super_cache updates (Enhancement - P3) ⚠️ Issue is not open.

Reviewers

Contributors

  • Martin Doerr <mdoerr@openjdk.org>
  • Amit Kumar <amitkumar@openjdk.org>
  • Fei Yang <fyang@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718
$ git checkout pull/15718

Update a local copy of the PR:
$ git checkout pull/15718
$ git pull https://git.openjdk.org/jdk.git pull/15718/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15718

View PR using the GUI difftool:
$ git pr show -t 15718

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15718.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 13, 2023

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 13, 2023

@shipilev The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Sep 13, 2023
@shipilev
Copy link
Member Author

Motivational improvements for new benchmark shows that contended case improves a lot the larger backoff values we set, and uncontended does not regress.

$ make test TEST=micro:SecondarySuperCache TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:SecondarySuperMissBackoff=..."

== AArch64 (Mac M1, 10 cores)

# SecondarySuperMissBackoff=0 (current JDK)
SecondarySuperCache.contended    avgt   15 956,819 ± 26,373  ns/op
SecondarySuperCache.uncontended  avgt   15   2,797 ±  0,111  ns/op

# SecondarySuperMissBackoff=1
SecondarySuperCache.contended    avgt   15 401,065 ± 144,172 ns/op
SecondarySuperCache.uncontended  avgt   15   2,763 ±   0,077 ns/op

# SecondarySuperMissBackoff=10
SecondarySuperCache.contended    avgt   15  45,953 ± 24,371  ns/op
SecondarySuperCache.uncontended  avgt   15   2,787 ±  0,096  ns/op

# SecondarySuperMissBackoff=100 
SecondarySuperCache.contended    avgt   15  17,581 ± 2,910  ns/op
SecondarySuperCache.uncontended  avgt   15   2,771 ± 0,084  ns/op

# SecondarySuperMissBackoff=1000
SecondarySuperCache.contended    avgt   15   7,841 ± 0,413  ns/op
SecondarySuperCache.uncontended  avgt   15   2,739 ± 0,010  ns/op

# SecondarySuperMissBackoff=10000
SecondarySuperCache.contended    avgt   15   6,780 ± 0,082  ns/op
SecondarySuperCache.uncontended  avgt   15   2,781 ± 0,045  ns/op


=== x86_64 (Xeon, 18 cores)

Benchmark                        Mode  Cnt     Score     Error  Units

# SecondarySuperMissBackoff=0 (current JDK)
SecondarySuperCache.contended    avgt   15  2380.915 ± 159.350  ns/op
SecondarySuperCache.uncontended  avgt   15     9.165 ±   0.017  ns/op

# SecondarySuperMissBackoff=1
SecondarySuperCache.contended    avgt   15  1523.914 ±  19.694  ns/op
SecondarySuperCache.uncontended  avgt   15     9.173 ±   0.026  ns/op

# SecondarySuperMissBackoff=10
SecondarySuperCache.contended    avgt   15   736.271 ±   6.554  ns/op
SecondarySuperCache.uncontended  avgt   15     9.169 ±   0.039  ns/op

# SecondarySuperMissBackoff=100 
SecondarySuperCache.contended    avgt   15   254.527 ±   3.167  ns/op
SecondarySuperCache.uncontended  avgt   15     9.175 ±   0.053  ns/op

# SecondarySuperMissBackoff=1000
SecondarySuperCache.contended    avgt   15   91.392 ±   1.006   ns/op
SecondarySuperCache.uncontended  avgt   15    9.172 ±   0.025   ns/op

# SecondarySuperMissBackoff=10000
SecondarySuperCache.contended    avgt   15   67.798 ±   0.575   ns/op
SecondarySuperCache.uncontended  avgt   15    9.173 ±   0.047   ns/op

@shipilev shipilev marked this pull request as ready for review September 15, 2023 16:55
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 15, 2023
@mlbridge
Copy link

mlbridge bot commented Sep 15, 2023

@TheRealMDoerr
Copy link
Contributor

PPC64 implementation:

diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
index 8942199610e..0bef1b3760a 100644
--- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
+++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
@@ -2021,7 +2021,28 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass,
   b(fallthru);
 
   bind(hit);
-  std(super_klass, target_offset, sub_klass); // save result to cache
+  // Success. Try to cache the super we found and proceed in triumph.
+  uint32_t super_cache_backoff = checked_cast<uint32_t>(SecondarySuperMissBackoff);
+  if (super_cache_backoff > 0) {
+    Label L_skip;
+
+    lwz(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
+    addic_(temp, temp, -1);
+    stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
+    bgt(CCR0, L_skip);
+
+    load_const_optimized(temp, super_cache_backoff);
+    stw(temp, in_bytes(JavaThread::backoff_secondary_super_miss_offset()), R16_thread);
+
+    std(super_klass, target_offset, sub_klass); // save result to cache
+
+    bind(L_skip);
+    if (L_success == nullptr && result_reg == noreg) {
+      crorc(CCR0, Assembler::equal, CCR0, Assembler::equal); // Restore CCR0 EQ
+    }
+  } else {
+    std(super_klass, target_offset, sub_klass); // save result to cache
+  }
   if (result_reg != noreg) { li(result_reg, 0); } // load zero result (indicates a hit)
   if (L_success != nullptr) { b(*L_success); }
   else if (result_reg == noreg) { blr(); } // return with CR0.eq if neither label nor result reg provided

Power10 results (2 cores, SMT8, 3.55 GHz):
-XX:SecondarySuperMissBackoff=0

Benchmark                        Mode  Cnt     Score    Error  Units
SecondarySuperCache.contended    avgt   15  1107.019 ? 16.206  ns/op
SecondarySuperCache.uncontended  avgt   15    17.984 ?  0.164  ns/op

-XX:SecondarySuperMissBackoff=10

Benchmark                        Mode  Cnt    Score   Error  Units
SecondarySuperCache.contended    avgt   15  431.557 ? 3.690  ns/op
SecondarySuperCache.uncontended  avgt   15   17.870 ? 0.088  ns/op

-XX:SecondarySuperMissBackoff=100

Benchmark                        Mode  Cnt   Score   Error  Units
SecondarySuperCache.contended    avgt   15  90.766 ? 0.196  ns/op
SecondarySuperCache.uncontended  avgt   15  17.925 ? 0.239  ns/op

-XX:SecondarySuperMissBackoff=1000

Benchmark                        Mode  Cnt   Score   Error  Units
SecondarySuperCache.contended    avgt   15  39.803 ? 0.369  ns/op
SecondarySuperCache.uncontended  avgt   15  18.070 ? 0.337  ns/op

-XX:SecondarySuperMissBackoff=10000

Benchmark                        Mode  Cnt   Score   Error  Units
SecondarySuperCache.contended    avgt   15  34.499 ? 0.451  ns/op
SecondarySuperCache.uncontended  avgt   15  17.933 ? 0.165  ns/op

@shipilev
Copy link
Member Author

@TheRealMDoerr: Folded in, thank you!

/add contributor @TheRealMDoerr

@openjdk
Copy link

openjdk bot commented Sep 20, 2023

@shipilev Unknown command add - for a list of valid commands use /help.

@shipilev
Copy link
Member Author

/contributor add @TheRealMDoerr

@TheRealMDoerr
Copy link
Contributor

You're using 64 bit instructions, despite uint32_t _backoff_secondary_super_miss;. That should get fixed.

@openjdk
Copy link

openjdk bot commented Sep 20, 2023

@shipilev
Contributor Martin Doerr <mdoerr@openjdk.org> successfully added.

@shipilev
Copy link
Member Author

You're using 64 bit instructions, despite uint32_t _backoff_secondary_super_miss;. That should get fixed.

Oh, on AArch64, right. x86_64 looks fine, uses 32-bit subl/movl properly.

@shipilev
Copy link
Member Author

shipilev commented Oct 19, 2023

Deeper performance evaluation shows that Dacapo:pmd has regressions, that get linearly worse as we get into larger backoffs. They reach 8% at backoff=10000 with focused single-threaded tests. I am currently investigating the cause.

@franz1981
Copy link

franz1981 commented Oct 19, 2023

@shipilev

At the time of netty/netty#12708 I've prepared a microbenchmark which didn't required Netty and was decent enough (warmup phase should be improved) to implement a "fix": moreover, it contains a code pattern that will become even more relevant in the future, thanks to type switch (that I've already verified is affected by https://bugs.openjdk.org/browse/JDK-8180450 as well, at franz1981/java-puzzles@ac38cd0) and is already pretty common in middleware/low-level frameworks ie

  • having an Object signature (eg typical of Hibernate/ORMs where user defined types are "enhanced" by implementing specific & multiple behaviours and which types are uncknown to the framework entry points/signature statically)
  • using chains of type checks to verify presence of specific "traits" to feed a state-machine of some type
  • different concrete objects which implements sets of "traits" (ie interfaces) flowing through the state machine

The microbenchmark was at franz1981/java-puzzles@0d05795 (all classes but not FalseSharingInstanceOfBenchmark) and the entry point was InstanceOfScalabilityBenchmark.

The pattern is exactly the same of the Netty issue which affected Vertx/Quarkus, meaning that it let 2 concrete types to be checked against different type checks, but because of it, in case #14375 kicks-in, it won't work as expected.
At the time the benchmark was written, the bimorphic case was still affected, hence to fix it now (in case it won't work) we need to stop at C1 (again, like the other tests in this pr) or adding a third/forth concrete type during warmup.

Note:
Some of the many DONT_INLINE could be simplified I think: A the time I've prepared the benchmark, I've placed them to be sure inlining didn't played any factor to save some type checks.

If the benchmark looks decent (once cleaned up) to verify the effects of the patch with a more realistic (TBD) code pattern, feel free to pick it/reuse it.
ATM I don't have free cycles to try it myself, but I'm talking with other team members to see how we can contribute.

@barreiro
Copy link

barreiro commented Nov 15, 2023

I work at Red Hat with @franz1981 and was able to do some experimenting with this patch.

I have tested this patch in isolation, by running the techempower benchmark on a quarkus version that is heavily impacted by the type pollution issue (JDK-8316180). See [1] for the exact commit used for the benchmark.

The procedure has been to back-port the patch to the latest JDK 21 build (jdk-21+35) (that is the branch in [2]). 6 values for the backoff (-XX:SecondarySuperMissBackoff property) were used: 0, 1, 10, 100, 1000, 10000.

The test was executed on a server wit h a 64-core Intel(R) Xeon(R) Gold 5218 CPU.

The results for the plaintext test can be summarized in the graphs below:

Screenshot_patch

None of the test of the benchmark showed any regression.

When comparing with JDK 22 (jdk-22+21), this patch can still offer a significant improvement.

Screenshot_compare

The improved performance of JDK22 can be attributed, but not limited, to the JDK-8308869 patch by @rwestrel. It improves C2 type awareness and that is effective for this use case. Other optimizations that have since been merged can impact this comparison between results as well.

[1] - https://github.com/TechEmpower/FrameworkBenchmarks/tree/0db323061e4e258d1ce66a34ea2132f8beef5cc8
[2] - https://github.com/barreiro/jdk/tree/shipilev_patch

@openjdk
Copy link

openjdk bot commented Dec 7, 2023

@shipilev this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8316180-backoff-secondary-super
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 7, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 4, 2024

@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review and removed merge-conflict Pull request has merge conflict with target branch labels Jan 8, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Feb 5, 2024

@shipilev This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Feb 5, 2024
@shipilev
Copy link
Member Author

shipilev commented Feb 6, 2024

/open

@openjdk openjdk bot reopened this Feb 6, 2024
@openjdk
Copy link

openjdk bot commented Feb 6, 2024

@shipilev This pull request is now open

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 13, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Mar 12, 2024

@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@shipilev
Copy link
Member Author

Now that JDK-8180450 has a viable path forward, I see no point in continuing with this one. It has observable regressions on some workloads that are hard to overcome in a simple manner. Closing as WNF. Will reopen if JDK-8180450 is not in.

@shipilev shipilev closed this Mar 20, 2024
@theRealAph
Copy link
Contributor

Now that JDK-8180450 has a viable path forward, I see no point in continuing with this one. It has observable regressions on some workloads that are hard to overcome in a simple manner. Closing as WNF. Will reopen if JDK-8180450 is not in.

May I grab the test cases from this PR and add them into mine? Thanks.

@shipilev
Copy link
Member Author

May I grab the test cases from this PR and add them into mine? Thanks.

Sure, feel free. Not sure how useful are they. I pushed some test updates directly into my branch:
master...shipilev:jdk:JDK-8316180-backoff-secondary-super

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org merge-conflict Pull request has merge conflict with target branch
10 participants