Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8278020: ~13% variation in Renaissance-Scrabble #6838

Closed

Conversation

iklam
Copy link
Member

@iklam iklam commented Dec 14, 2021

We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other.

When CDS is disabled, we do not see such variations.

In the slow case, there seems to be frequent dcache misses when loading the Klass::_vtable_len field, which is at offset 24 from the beginning of the Klass (see bug report for details).

We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects.

My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds Klass::_vtable_len. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore.

As an experiment, I swapped Klass::_vtable_len with Klass::_modifier_flags (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size.

I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6838/head:pull/6838
$ git checkout pull/6838

Update a local copy of the PR:
$ git checkout pull/6838
$ git pull https://git.openjdk.java.net/jdk pull/6838/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6838

View PR using the GUI difftool:
$ git pr show -t 6838

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6838.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 14, 2021

👋 Welcome back iklam! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 14, 2021

@iklam The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Dec 14, 2021
@iklam iklam marked this pull request as ready for review Dec 14, 2021
@openjdk openjdk bot added the rfr label Dec 14, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 14, 2021

Webrevs

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Dec 15, 2021

I would have kept the two fields together after the switch so that you can add a comment. Seems totally bizarre that two such separated fields would have such an affect.

Isn't this needed in 18 though?

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 15, 2021

Yes, this small change is good for JDK 18.
Is it possible for JDK 19 (or later) to make CDS segmented to separate different types of data (Klass, Method, etc)?

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 15, 2021

An other experiment could be done is to add padding (space) before Klass data to make sure it is in different cache line.

@iklam
Copy link
Member Author

@iklam iklam commented Dec 15, 2021

I would have kept the two fields together after the switch so that you can add a comment. Seems totally bizarre that two such separated fields would have such an affect.

Isn't this needed in 18 though?

David, if my theory is correct, the contention does not happen between the two fields. It happens between the _vtable_len field, and the object that immediately precedes the Klass. I have not found out what that other object is. Eric is writing a simplified version of the benchmark and I hope to use that to narrow down the problem.

I have added comments near _vtable_len to explain why it's placed there inside the Klass.

I swapped it with _modifier_flags because they are the same size, and _modifier_flags doesn't seem to be accessed nearly as often as _vtable_len.

@iklam
Copy link
Member Author

@iklam iklam commented Dec 15, 2021

Yes, this small change is good for JDK 18. Is it possible for JDK 19 (or later) to make CDS segmented to separate different types of data (Klass, Method, etc)?

It's possible to make CDS segmented so that the Klasses are allocated together. That can be done in 19.

@iklam
Copy link
Member Author

@iklam iklam commented Dec 15, 2021

An other experiment could be done is to add padding (space) before Klass data to make sure it is in different cache line.

That's a good idea. I will try that to see if it has the same effect as the current patch.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 15, 2021

@iklam This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8278020: ~13% variation in Renaissance-Scrabble

Reviewed-by: dholmes, stuefe, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 14 new commits pushed to the master branch:

  • 758fe9b: 8273904: debug agent ArrayTypeImp::newInstance() fails to send reply packet if there is an error
  • c442587: 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions
  • 46f99ac: 8244765: Undo exclusiveAccess.dirs changes for JDK-8220295 and see if there are still any testing issues
  • 54c9a99: 8278643: CoreUtils.getCoreFileLocation() should print out the size of the core file found
  • 068a450: 8278825: Unused variable for diagnostic in Resolve
  • 2def7e9: 8278584: compiler/vectorapi/VectorMaskLoadStoreTest.java failed with "Error: ShouldNotReachHere()"
  • 98a8d44: 8278638: Remove FLAG_IS_CMDLINE(UseSharedSpaces)
  • 03f647f: 8278028: [test-library] Warnings cleanup of the test library
  • a1dfe57: 8276455: C2: iterative EA
  • de65230: 8278767: Shenandoah: Remove unused ShenandoahRootScanner
  • ... and 4 more: https://git.openjdk.java.net/jdk/compare/8401a059bd01b32e3532f806d3d8b60e851c468a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Dec 15, 2021
@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Dec 15, 2021

I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18.

Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour.

Copy link
Member

@tstuefe tstuefe left a comment

Hi Ioi,

The fix looks fine.

this is interesting to me, because in the context of Lilliput (openjdk/lilliput#13) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions.

What tool did you use to measure the dcache misses?

Cheers, Thomas

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 15, 2021

I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18.

Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour.

I suggested padding only as experiment to prove Ioi's theory. Current changes are good as the fix.

@iklam
Copy link
Member Author

@iklam iklam commented Dec 15, 2021

I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18.

Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour.

@ericcaspole checked our benchmark database and the regression seems to have started around JDK 15. So yes, I will backport the fix to 17 and 18. I want to integrate into the mainline first so it can be baked a little before the backport.

@tstuefe
Copy link
Member

@tstuefe tstuefe commented Dec 15, 2021

My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds Klass::_vtable_len. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore.

BTW you could test that theory, if you wanted, by repeating the test with CDS off and disabling compressed class pointers.

@iklam
Copy link
Member Author

@iklam iklam commented Dec 15, 2021

Hi Ioi,

The fix looks fine.

this is interesting to me, because in the context of Lilliput (openjdk/lilliput#13) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions.

What tool did you use to measure the dcache misses?

Cheers, Thomas

Hi Thomas,

@ericcaspole did the measurements so he will have more information, but I believe he used https://github.com/jvm-profiling-tools/async-profiler to generate traces like this (which I pasted into the bug report):

 Column 1: cycles (125424 events)
 Column 2: l1d_pend_miss.pending_cycles (56716 events)
 Column 3: CYCLE_ACTIVITY.CYCLES_L2_MISS (66170 events)

  0.08%   0.02%   0.03%     0x00007f488cda2dc8:  mov  0x10(%r10),%r11d
 12.26%  16.97%  16.23%     0x00007f488cda2dcc:  lea  0x1b8(%r10,%r11,8),%r11

@vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often?

@tstuefe
Copy link
Member

@tstuefe tstuefe commented Dec 15, 2021

Hi Ioi,
The fix looks fine.
this is interesting to me, because in the context of Lilliput (openjdk/lilliput#13) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions.
What tool did you use to measure the dcache misses?
Cheers, Thomas

Hi Thomas,

@ericcaspole did the measurements so he will have more information, but I believe he used https://github.com/jvm-profiling-tools/async-profiler to generate traces like this (which I pasted into the bug report):

 Column 1: cycles (125424 events)
 Column 2: l1d_pend_miss.pending_cycles (56716 events)
 Column 3: CYCLE_ACTIVITY.CYCLES_L2_MISS (66170 events)

  0.08%   0.02%   0.03%     0x00007f488cda2dc8:  mov  0x10(%r10),%r11d
 12.26%  16.97%  16.23%     0x00007f488cda2dcc:  lea  0x1b8(%r10,%r11,8),%r11

Okay, thanks.

@vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often?

Method counters? May be worth spreading them out better, or to pad them to prevent false sharing.

@iklam
Copy link
Member Author

@iklam iklam commented Dec 15, 2021

Thanks @dholmes-ora @vnkozlov @tstuefe for the review.
/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Dec 15, 2021

Going to push as commit 4ba980b.
Since your change was applied there have been 22 commits pushed to the master branch:

  • 04dbdd3: 8274898: Cleanup usages of StringBuffer in jdk tools modules
  • 7517c85: 8269838: BasicTypeDataBase.findDynamicTypeForAddress(addr, basetype) can be simplified
  • 1f1db83: 8278186: org.jcp.xml.dsig.internal.dom.Utils.parseIdFromSameDocumentURI throws StringIndexOutOfBoundsException when calling substring method
  • bcb79fd: 8278241: Implement JVM SpinPause on linux-aarch64
  • fcebe65: 8278842: Parallel: Remove unused VerifyObjectStartArrayClosure::_old_gen
  • 4851ad8: 8278548: G1: Remove unnecessary check in forward_to_block_containing_addr
  • 1e3ae3b: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation
  • 7adf7f3: 8278351: Add function to retrieve worker_id from any context
  • 758fe9b: 8273904: debug agent ArrayTypeImp::newInstance() fails to send reply packet if there is an error
  • c442587: 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions
  • ... and 12 more: https://git.openjdk.java.net/jdk/compare/8401a059bd01b32e3532f806d3d8b60e851c468a...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated label Dec 15, 2021
@openjdk openjdk bot closed this Dec 15, 2021
@openjdk openjdk bot removed the ready label Dec 15, 2021
@openjdk openjdk bot removed the rfr label Dec 15, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 15, 2021

@iklam Pushed as commit 4ba980b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 15, 2021

@vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often?
Method counters? May be worth spreading them out better, or to pad them to prevent false sharing.

I don't think compiled code updates something in Method. We need to look on fields layout.
Compiled code do update frequently MethodCounters (invocations, loops) and MethodData (profiling counters for bytecode). Both are allocated in metaspace as classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot integrated
5 participants