Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8308479: [s390x] Implement alternative fast-locking scheme #14414

Closed
wants to merge 8 commits into from

Conversation

offamitkumar
Copy link
Member

@offamitkumar offamitkumar commented Jun 12, 2023

This PR implements new fast-locking scheme for s390x. Additionally few parameters have been renamed to be in sync with PPC.

Testing done (for release, fastdebug and slowdebug build):
All test/jdk/java/util/concurrent test with parameters:

  • LockingMode=2
  • LockingMode=2 with -Xint
  • LockingMode=2 with -XX:TieredStopAtLevel=1
  • LockingMode=2 with -XX:-TieredCompilation

Result is consistently similar to Aarch(MacOS) and PPC, All of 124 tests are passing except MapLoops.java because in the 2nd part for this testcase, jvm starts with HeavyMonitors which conflict with LockingMode=2

BenchMark Result for Renaissance-jmh:

Benchmark Without fastLock (ms/op) With fastLock (ms/op) Improvement
o.r.actors.JmhAkkaUct.runOperation 1565.080 1365.877 12.70%
o.r.actors.JmhReactors.runOperation 9316.972 10592.982 -13.70%
o.r.jdk.concurrent.JmhFjKmeans.runOperation 1257.183 1235.530 1.73%
o.r.jdk.concurrent.JmhFutureGenetic.runOperation 1925.158 2073.066 -7.69%
o.r.jdk.streams.JmhParMnemonics.runOperation 2746.664 2836.085 -3.24%
o.r.jdk.streams.JmhScrabble.runOperation 76.774 74.239 3.31%
o.r.rx.JmhRxScrabble.runOperation 162.270 167.061 -2.96%
o.r.scala.sat.JmhScalaDoku.runOperation 3333.711 3271.078 1.88%
o.r.scala.stdlib.JmhScalaKmeans.runOperation 182.746 182.153 0.33%
o.r.scala.stm.JmhPhilosophers.runOperation 15003.329 13396.921 10.57%
o.r.scala.stm.JmhScalaStmBench7.runOperation 1669.090 1579.900 5.34%
o.r.twitter.finagle.JmhFinagleChirper.runOperation 9601.963 10034.404 -4.52%
o.r.twitter.finagle.JmhFinagleHttp.runOperation 4403.725 4746.707 -7.79%

DaCapo Benchmark Result:

Benchmark Without fast lock (msec) With fast lock (msec) Improvement
DaCapo 9.12 h2 Run 1 117,010 108,699 7.10%
DaCapo 9.12 h2 Run 2 111,986 107,373 4.11%

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8308479: [s390x] Implement alternative fast-locking scheme (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14414/head:pull/14414
$ git checkout pull/14414

Update a local copy of the PR:
$ git checkout pull/14414
$ git pull https://git.openjdk.org/jdk.git pull/14414/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14414

View PR using the GUI difftool:
$ git pr show -t 14414

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14414.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 12, 2023

👋 Welcome back amitkumar! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8308479 8308479: [s390x] Implement alternative fast-locking scheme Jun 12, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 12, 2023
@openjdk
Copy link

openjdk bot commented Jun 12, 2023

@offamitkumar The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jun 12, 2023
@mlbridge
Copy link

mlbridge bot commented Jun 12, 2023

@offamitkumar
Copy link
Member Author

Hi @RealLucy, @TheRealMDoerr, @reinrich
Please review this PR as per your availability.
Thank you.

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some comments you may want to consider.

src/hotspot/cpu/s390/c1_MacroAssembler_s390.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/s390/c1_MacroAssembler_s390.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/s390/interp_masm_s390.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/s390/macroAssembler_s390.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/s390/macroAssembler_s390.cpp Show resolved Hide resolved
src/hotspot/cpu/s390/macroAssembler_s390.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/s390/macroAssembler_s390.cpp Show resolved Hide resolved
@offamitkumar
Copy link
Member Author

Hi @RealLucy,

Please check benchmark result in description.

@TheRealMDoerr
Copy link
Contributor

Are the benchmark results stable or do they have a large variance?
Did you also compare against the version without your changes? That should be done for sanity checking. You have modified the code for the legacy mode, too.
I'm not sure what you mean by "With/Without fastLock"? Patch applied? LockingMode selection?

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions.

src/hotspot/cpu/s390/macroAssembler_s390.cpp Show resolved Hide resolved
src/hotspot/cpu/s390/macroAssembler_s390.cpp Outdated Show resolved Hide resolved
@offamitkumar
Copy link
Member Author

Are the benchmark results stable or do they have a large variance?

I ran DaCapo twice, but with Renaissance-jmh it was just a single run.

Did you also compare against the version without your changes? That should be done for sanity checking. You have modified the code for the legacy mode, too.
I'm not sure what you mean by "With/Without fastLock"? Patch applied? LockingMode selection?

Patch was applied, but for in first case LockingMode was not selected, and in second case (with fast locking) LockingMode=2 argument was given.

But I will run these again with/without the code change. Thanks for insights.

@RealLucy
Copy link
Contributor

After struggling a lot with my environment, I finally was able to run some performance tests. There is the "db" subtest of JVM98 which caught my attention. It reproducibly shows

  • LockingMode=0: avg time: 2888ms
  • LockingMode=1: avg time: 867ms
  • LockingMode=2: avg time: 1020ms

We need to gain deeper insight into the why before integration.

@TheRealMDoerr
Copy link
Contributor

TheRealMDoerr commented Jun 26, 2023

After struggling a lot with my environment, I finally was able to run some performance tests. There is the "db" subtest of JVM98 which caught my attention. It reproducibly shows

  • LockingMode=0: avg time: 2888ms
  • LockingMode=1: avg time: 867ms
  • LockingMode=2: avg time: 1020ms

We need to gain deeper insight into the why before integration.

Note that the new fast locking (LockingMode 2) doesn't support fast recursive locking. Could this be the problem? LockingMode 2 is fastest on Power10 for "db" benchmark, but that may be due to fewer memory barriers. Interesting finding!

@offamitkumar
Copy link
Member Author

offamitkumar commented Jun 26, 2023

with

fastlockbench:

Time (ns/op) [Average] Performance Improvement
With Patch (+LockingMode=2) 18.614 -
Without Patch 19.116 -
Improvement - 2.63%

Dacapo:

Time (msec) Performance Improvement
Without Patch 118199 -
With Patch (+LockingMode=2) 120213 -
Improvement - -1.70%

Renaissance-jmh (ms/op):

Benchmark Without Patch With Patch (+LockingMode=2) Improvement (%)
o.r.actors.JmhAkkaUct.runOperation 1349.563 1456.927 -8.00
o.r.actors.JmhReactors.runOperation 10296.818 11605.138 -12.82
o.r.jdk.concurrent.JmhFjKmeans.runOperation 1145.668 1390.726 -21.35
o.r.jdk.concurrent.JmhFutureGenetic.runOperation 1952.648 1976.534 -1.22
o.r.jdk.streams.JmhParMnemonics.runOperation 2819.193 2801.261 0.64
o.r.jdk.streams.JmhScrabble.runOperation 78.594 74.529 5.16
o.r.rx.JmhRxScrabble.runOperation 171.220 180.978 -5.70
o.r.scala.sat.JmhScalaDoku.runOperation 5841.122 3556.529 39.15
o.r.scala.stdlib.JmhScalaKmeans.runOperation 199.683 183.357 8.18
o.r.scala.stm.JmhPhilosophers.runOperation 14482.716 15834.972 -9.32
o.r.scala.stm.JmhScalaStmBench7.runOperation 1567.814 1716.439 -9.46
o.r.twitter.finagle.JmhFinagleChirper.runOperation 9477.834 9737.119 -2.73
o.r.twitter.finagle.JmhFinagleHttp.runOperation 4381.338 4681.571 -6.85

These are the new results I'm getting, seems consistent as well. -21.35 appears to be a lot but not sure if it's of observable category or not.

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 24, 2023

@offamitkumar This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my concerns had been addressed. The performance topic can be investigated later if there's not enough time right now. I think it's good enough for an experimental feature. What's your opinion, @RealLucy?

@openjdk
Copy link

openjdk bot commented Aug 14, 2023

@offamitkumar This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8308479: [s390x] Implement alternative fast-locking scheme

Reviewed-by: lucy, mdoerr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1043 new commits pushed to the master branch:

  • 8ddf9ea: 8315686: G1: Disallow evacuation of marking regions in a Prepare Mixed gc
  • 7ef059a: 8315605: G1: Add number of nmethods in code roots scanning statistics
  • 825e0ed: 8315774: Enable parallelism in vmTestbase/gc/g1/unloading tests
  • dac1727: 8308869: C2: use profile data in subtype checks when profile has more than one class
  • 3c258ac: 8315702: jcmd Thread.dump_to_file slow with millions of virtual threads
  • 3a00ec8: 8312075: FileChooser.win32.newFolder is not updated when changing Locale
  • 806ef08: 8315594: Open source few headless Swing misc tests
  • 4b43c25: 8310929: Optimization for Integer.toString
  • 111ecdb: 8268829: Provide an optimized way to walk the stack with Class object only
  • 716201c: 8314935: Shenandoah: Unable to throw OOME on back-to-back Full GCs
  • ... and 1033 more: https://git.openjdk.org/jdk/compare/bb966827ac445d805bac5005d0fbda0c61111252...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 14, 2023
@coleenp
Copy link
Contributor

coleenp commented Sep 7, 2023

We are considering making Fast Locking on by default for Oracle supported platforms. Have these performance concerns been addressed?

@RealLucy
Copy link
Contributor

RealLucy commented Sep 8, 2023

We are considering making Fast Locking on by default for Oracle supported platforms. Have these performance concerns been addressed?

The performance regression which is observed in some tests is still not fully understood. I will approve the PR despite of that. According to all out testing, it is functionally correct.

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good functionally.
Performance regression should be further analyzed in a separate task.

@mlbridge
Copy link

mlbridge bot commented Sep 8, 2023

Mailing list message from Kennke, Roman on hotspot-dev:

FWIW, I found many of the renaissance benchmarks quite noisy. Might be worth watching them closely, and consider ramping up number of iterations *and* forks. (Also, I found using the jmh-wrapped version much easier to deal with, with that you can achieve the increased iterations and forks with simple -i -wi and -f options)

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

@offamitkumar
Copy link
Member Author

Lutz, Martin, Thanks for reviews and help.

/integrate

@openjdk
Copy link

openjdk bot commented Sep 26, 2023

Going to push as commit 3fe6e0f.
Since your change was applied there have been 1289 commits pushed to the master branch:

  • e2e8e8e: 8312136: Modify runtime/ErrorHandling/TestDwarf.java to split dwarf and decoder testing
  • 0dce4c1: 8313220: Remove Windows specific workaround in LCMS.c for _snprintf
  • e5f05b5: 8312191: ColorConvertOp.filter for the default destination is too slow
  • be9cc73: 8315871: Opensource five more Swing regression tests
  • b65f4f7: 8313403: Remove unused 'mask' field from JFormattedTextField
  • e3201d1: 8310631: test/jdk/sun/nio/cs/TestCharsetMapping.java is spuriously passing
  • 9291b46: 8313804: JDWP support for -Djava.net.preferIPv6Addresses=system
  • afa4833: 8271268: Fix Javadoc links for Stream.mapMulti
  • 9688ec2: 8311823: JFR: Uninitialized EventEmitter::_thread_id field
  • 0f77d25: 8315684: Parallelize sun/security/util/math/TestIntegerModuloP.java
  • ... and 1279 more: https://git.openjdk.org/jdk/compare/bb966827ac445d805bac5005d0fbda0c61111252...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 26, 2023
@openjdk openjdk bot closed this Sep 26, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 26, 2023
@openjdk
Copy link

openjdk bot commented Sep 26, 2023

@offamitkumar Pushed as commit 3fe6e0f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@offamitkumar offamitkumar deleted the JDK-8308479 branch September 26, 2023 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants