Skip to content

Conversation

@Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Aug 4, 2021

Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max.

After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issues

  • JDK-8271579: G1: Move copy before CAS in do_copy_to_survivor_space
  • JDK-8272070: G1: Simplify age calculation after JDK-8271579

Reviewers

Contributors

  • shoubing ma <mashoubing1@huawei.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4983/head:pull/4983
$ git checkout pull/4983

Update a local copy of the PR:
$ git checkout pull/4983
$ git pull https://git.openjdk.java.net/jdk pull/4983/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4983

View PR using the GUI difftool:
$ git pr show -t 4983

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4983.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 4, 2021

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 4, 2021
@openjdk
Copy link

openjdk bot commented Aug 4, 2021

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Aug 4, 2021
@mlbridge
Copy link

mlbridge bot commented Aug 4, 2021

Webrevs

@DamonFool
Copy link
Member

Amazing.
So what do you think is the reason for the performance gain?
Thanks.

@TheRealMDoerr
Copy link
Contributor

Sounds like a good idea. Only doing prefetch immediately before copy looks odd, but this may be a different topic.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any explanation for why this has a significant impact?

@Hamlin-Li
Copy link
Author

Not quite sure, it could be related to the pipeline scheduling at CPU level.
BTW, I just uploaded the test result for x86, no regression for x86.

@tschatzl
Copy link
Contributor

tschatzl commented Aug 5, 2021

We are currently trying to reproduce these improvements on aarch64 and x64.

Also, please revert the change to the age calculation and file this separately. I think I remember the reason for incrementing the age the way it is done is/was that it is supposed to avoid some forced reloads of the mark word which may or may not be important. Either way it is completely unrelated to this change.

@Hamlin-Li Hamlin-Li force-pushed the G1-move.copy.forward branch from 9160187 to 997fdc1 Compare August 6, 2021 01:54
@Hamlin-Li
Copy link
Author

Hamlin-Li commented Aug 6, 2021

I see, just reverted the code related to age which is now tracked by https://bugs.openjdk.java.net/browse/JDK-8272070

@nick-arm
Copy link
Contributor

nick-arm commented Aug 9, 2021

What type of machine are you using to get that result? We tested SPECjbb on AWS M6g but couldn't see any significant difference in critical jops.

@Hamlin-Li
Copy link
Author

Thanks for the testing, and feedback.

setting in run_multi.sh of specjbb2015:

GROUP_COUNT=4
TI_JVM_COUNT=1
JAVA_OPTS_BE="-server -XX:+UseG1GC -Xmx50g -Xms50g -XX:ParallelGCThreads=25"

my test env is as below:

$ uname -a
Linux jvm-97 4.19.36-vhulk1907.1.0.h702.eulerosv2r8.aarch64 #1 SMP Mon Mar 16 00:02:15 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 4
Vendor ID: 0x48
Model: 0
Stepping: 0x1
BogoMIPS: 200.00
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 65536K
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
NUMA node2 CPU(s): 64-95
NUMA node3 CPU(s): 96-127

@Hamlin-Li
Copy link
Author

@tschatzl Hi Thomas, may I ask how's test result on aarch64 and x64?

@tschatzl
Copy link
Contributor

tschatzl commented Aug 11, 2021

Sorry for the wait, but actually my testing just finished today - I initially had a few issues (needing to redo a few times, and due to the results re-testing to get better results), but I could not see any statistically significant difference in either maxjops or criticaljops (or any other benchmark I have tested) with Oracle cloud instances with Ampere Altra (dual-socket) machines at this time. Also tried just running on a single socket.

There is no discernible difference on x64 either.

I will discuss this with others how to proceed today; personally it may be worth aligning the code with Parallel GC anyway.

@Hamlin-Li
Copy link
Author

Thanks for the feedback. Please kindly let me know your discussion result.

@tschatzl
Copy link
Contributor

So we talked about this internally, and standalone it does not seem to be worth adding given that the gain you report is not reproducable on multiple (current) systems, and together we were not able to come up with an explanation why this suggested order of copy/cas would be better.

However it also enables further code improvements by simplifying (mark word) code. So we think that if this change and the mark word change suggested earlier together are at least performance-neutral (which is likely imo), we are going to take in both changes.

I'll start some perf runs with both changes applied (I'll just manually re-apply the mark word change you suggested earlier).

Hth,
Thomas

@tschatzl
Copy link
Contributor

@Hamlin-Li
Copy link
Author

Thanks Thomas, then I'll wait for your perf result.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm; perf results with the mark word change (see earlier) look good, i.e. no particular perf changes. Please wait for a second review.

@openjdk
Copy link

openjdk bot commented Aug 13, 2021

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8271579: G1: Move copy before CAS in do_copy_to_survivor_space
8272070: G1: Simplify age calculation after JDK-8271579

Co-authored-by: shoubing ma <mashoubing1@huawei.com>
Reviewed-by: tschatzl, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 348 new commits pushed to the master branch:

  • ee8bf10: 8272327: Shenandoah: Avoid enqueuing duplicate string candidates
  • 3fb1927: 8271227: Missing {@code } in com.sun.source.*
  • a5ad772: 8272342: [TEST_BUG] java/awt/print/PrinterJob/PageDialogMarginTest.java catches all exceptions
  • ae45592: 8272374: doclint should report missing "body" comments
  • b2c272d: 8272305: several hotspot runtime/modules don't check exit codes
  • 8268825: 8272297: FileInputStream should override transferTo() for better performance
  • 3677734: 8271471: [IR Framework] Rare occurrence of "" in PrintIdeal/PrintOptoAssembly can let tests fail
  • 0a03481: 8272231: G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue*
  • 83d0e12: 8267833: Improve G1CardSetInlinePtr::add()
  • 69cc588: 8272235: G1: update outdated code root fixup
  • ... and 338 more: https://git.openjdk.java.net/jdk/compare/4927ee426aedbeea0f4119bac0a342c6d3576762...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 13, 2021
@Hamlin-Li
Copy link
Author

/issue add JDK-8272070
/contributor add shoubing ma mashoubing1@huawei.com

@openjdk
Copy link

openjdk bot commented Aug 17, 2021

@Hamlin-Li
Adding additional issue to issue list: 8272070: G1: Simplify age calculation after JDK-8271579.

@openjdk
Copy link

openjdk bot commented Aug 17, 2021

@Hamlin-Li
Contributor shoubing ma <mashoubing1@huawei.com> successfully added.

@Hamlin-Li
Copy link
Author

Kindly reminder, I think I still need another reviewer?

@tschatzl
Copy link
Contributor

There are two reviews (from me and @albertnetymk), and Kim's question about the reasons do not seem relevant any more, so this is good to go imho.

@Hamlin-Li
Copy link
Author

Thanks Thomas, I just saw Albert has been a reviewer too.
Congratulation @albertnetymk

@Hamlin-Li
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Aug 20, 2021

Going to push as commit d874e96.
Since your change was applied there have been 387 commits pushed to the master branch:

  • 92bde67: 8271946: Cleanup leftovers in Space and subclasses
  • db9834f: 8258951: java/net/httpclient/HandshakeFailureTest.java failed with "RuntimeException: Not found expected SSLHandshakeException in java.io.IOException"
  • a81e5e9: 8272654: Mark word accesses should not use Access API
  • 4bd37c3: 8272708: [Test]: Cleanup: test/jdk/security/infra/java/security/cert/CertPathValidator/certification/BuypassCA.java no longer needs ocspEnabled
  • ddcd851: 8272602: [macos] not all KEY_PRESSED events sent when control modifier is used
  • d007be0: 8272700: [macos] Build failure with Xcode 13.0 after JDK-8264848
  • f4be211: 8270041: Consolidate oopDesc::cas_forward_to() and oopDesc::forward_to_atomic()
  • b40e8f0: 8271951: Consolidate preserved marks overflow stack in SerialGC
  • 7eccbd4: 8266519: Cleanup resolve() leftovers from BarrierSet et al
  • 9569159: 8272674: Logging missing keytab file in Krb5LoginModule
  • ... and 377 more: https://git.openjdk.java.net/jdk/compare/4927ee426aedbeea0f4119bac0a342c6d3576762...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Aug 20, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 20, 2021
@openjdk
Copy link

openjdk bot commented Aug 20, 2021

@Hamlin-Li Pushed as commit d874e96.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl
Copy link
Contributor

Just for clarification: the default requirements for a change in the gc-team to integrate is to have at least two "r"eviewers (note the lowercase r - those are people that approved it regardless of role), at least one of which must be a "R"eviewer, i.e. has the openjdk reviewer role (and there are no other open questions/comments).

So that formal requirement had already been fulfilled by me and Albert without Albert being a "R"eviewer.

@Hamlin-Li Hamlin-Li deleted the G1-move.copy.forward branch October 21, 2021 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

7 participants