Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8237842: Separate definitions for default cache line and padding sizes #16973

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Dec 5, 2023

JDK-8321137 needs a clean separation between cache line sizes and padding sizes. At least on x86, there is a wrinkle with "assuming" the cache line size is 128 bytes to cater for prefetchers. Cleanly separating cache line size and padding size resolves this. I rewrote uses of DEFAULT_CACHE_LINE_SIZE in padding contexts to new macro.

The goal for this patch is to avoid actual values changes as much as possible. One of the changes come from cleaning up some of the old cases in x86 definition, thus simplifying the definition. I think the LP64 split is still useful there.

Additional testing:

  • Large build matrix of server/zero builds
  • Linux AArch64 server fastdebug, tier{1,2}
  • Linux x86_64 server fastdebug, tier{1,2}

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8237842: Separate definitions for default cache line and padding sizes (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16973/head:pull/16973
$ git checkout pull/16973

Update a local copy of the PR:
$ git checkout pull/16973
$ git pull https://git.openjdk.org/jdk.git pull/16973/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16973

View PR using the GUI difftool:
$ git pr show -t 16973

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16973.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 5, 2023

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 5, 2023

@shipilev The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 5, 2023
@shipilev shipilev mentioned this pull request Dec 5, 2023
5 tasks
@shipilev shipilev marked this pull request as ready for review December 5, 2023 13:53
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 5, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 5, 2023

Webrevs

@stefank
Copy link
Member

stefank commented Dec 6, 2023

The *2 has been a contentious point for a long time. This is what I think happened:

In 'JDK-8049737: Contended Locking reorder and cache line bucket'

DEFAULT_CACHE_LINE was increased from 64 to 128. This was done for the ObjectMonitors, but it had an effect on other data structures in other parts of the JVM (Esp. in the GC). There were performance measurements done to try to see if the 64 to 128 change made any difference, but AFAIK, no performance difference could be seen, but the *2 change was left in place anyways.

And then 'JDK-8235931: add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64' came along and changed the ObjectMonitor code to use its own define without the *2 but leaving the DEFAULT_CACHE_LINE_SIZE and padding code to still use *2!

We have asked Intel if really should be padding with two cache lines as we say here:

// Hardware prefetchers on current implementations may pull 2 cache lines
// on access, therefore we pessimistically assume twice the cache line size
// for padding.
#define DEFAULT_PADDING_SIZE (DEFAULT_CACHE_LINE_SIZE*2)

and their answer was that their hardware stopped doing that over 10 years ago (this question was asked a few years ago).

For ZGC we tried to poke at his a bit but gave up and added our own:

const size_t ZPlatformCacheLineSize    = 64;

So, with all that said. Do we really need to keep this *2 in the x64 code?

@shipilev
Copy link
Member Author

shipilev commented Dec 6, 2023

So, with all that said. Do we really need to keep this *2 in the x64 code?

Maybe, maybe not. Anyhow, let's not change multiple things at the same time. I am provisionally good with dropping *2, as long as we have a targeted study on its effects or the absence of them, and/or Intel and AMD folks agree this should go.

What this change does is clearly separating the notions of "this is what we expect the cache line to be on given platform" (DEFAULT_CACHE_LINE_SIZE) and "this is how by how much we should pad" (DEFAULT_PADDING_SIZE). While I agree both might be the same value for known platforms, they convey different intents. This is also why, AFAICS, C++ spec calls it hardware_{constructive,destructive}_interference_size, not just "cache line size".

This PR effectively renames DEFAULT_CACHE_LINE_SIZE -> DEFAULT_PADDING_SIZE. We leave DEFAULT_CACHE_LINE_SIZE for cases where we want to know CL size outside the padding contexts.

@stefank
Copy link
Member

stefank commented Dec 6, 2023

Sure. It think it is a good patch. However, given that you added this commment:

// Hardware prefetchers on current implementations may pull 2 cache lines
// on access, therefore we pessimistically assume twice the cache line size
// for padding.

Do you have anything that backs up the claim that this is the case for "current implementations"? Maybe @sviswa7 can help answering if this is still the case for Intel hardware?

@shipilev
Copy link
Member Author

shipilev commented Dec 6, 2023

Sure. It think it is a good patch. However, given that you added this commment:

// Hardware prefetchers on current implementations may pull 2 cache lines
// on access, therefore we pessimistically assume twice the cache line size
// for padding.

Do you have anything that backs up the claim that this is the case for "current implementations"?

I was merely trying to explain why *2 is even there. The common explanation is referring to a "common wisdom" about adjacent cache line prefetchers. Granted, that might have been true only a decade ago, and it might not hold true anymore. I can change "current" to "some" or "some old" if you think that is more neutral.

Again, I don't think this PR should be discussing *2 thing. It should be a separate deep dive and clear few-liner change later. That reminds me we would probably change the default for ContendedPaddingWidth, probably by tying it to DEFAULT_PADDING_SIZE once this lands. We should not be doing it here, though, because it would have footprint implications on platforms with large cache lines, like S390X.

@stefank
Copy link
Member

stefank commented Dec 6, 2023

Sure. It think it is a good patch. However, given that you added this commment:

// Hardware prefetchers on current implementations may pull 2 cache lines
// on access, therefore we pessimistically assume twice the cache line size
// for padding.

Do you have anything that backs up the claim that this is the case for "current implementations"?

I was merely trying to explain why *2 is even there. The common explanation is referring to a "common wisdom" about adjacent cache line prefetchers. Granted, that might have been true only a decade ago, and it might not hold true anymore. I can change "current" to "some" or "some old" if you think that is more neutral.

Again, I don't think this PR should be discussing *2 thing. It should be a separate deep dive and clear few-liner change later. That reminds me we would probably change the default for ContendedPaddingWidth, probably by tying it to DEFAULT_PADDING_SIZE once this lands. We should not be doing it here, though, because it would have footprint implications on platforms with large cache lines, like S390X.

Well I bring this up because you are adding a comment that further sediments the understanding of the need to use *2. As I said I think the patch looks good and I'm just taking the opportunity to talk about this now that someone is yet again juggling around this *2 value. Changing to "some" is probably a good idea.

@openjdk
Copy link

openjdk bot commented Dec 6, 2023

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8237842: Separate definitions for default cache line and padding sizes

Reviewed-by: stefank, kvn, stuefe, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 6, 2023
@shipilev
Copy link
Member Author

shipilev commented Dec 6, 2023

Well I bring this up because you are adding a comment that further sediments the understanding of the need to use *2. As I said I think the patch looks good and I'm just taking the opportunity to talk about this now that someone is yet again juggling around this *2 value. Changing to "some" is probably a good idea.

Yes, true. I rewrote the comment to hopefully much weaker wording that now implies we are not actually that sure *2 adjustment is needed.

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 6, 2023

My few cents about x2 ;^)
Looking on JDK-8049737 changes and it is used to make sure that padding is greater then cache line. In padded.hpp we have several macros DEFINE_PAD_MINUS_SIZE and PADDED_END_SIZE which calculate offset as (default_padding - size) .

@shipilev
Copy link
Member Author

shipilev commented Dec 6, 2023

My few cents about x2 ;^) Looking on JDK-8049737 changes and it is used to make sure that padding is greater then cache line.

Yes, I think the intent was the same: cater for weird prefetcher interactions.

In padded.hpp we have several macros DEFINE_PAD_MINUS_SIZE and PADDED_END_SIZE which calculate offset as (default_padding - size) .

Sure, I think this PR covers these paths too, or am I missing something there?

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 6, 2023

If intention of original changes (add padding) was to avoid false sharing why we do it only for x64? From JDK-8049737:

INFO: offset(_header)=0 
INFO: offset(_owner)=24 
WARNING: the _header and _owner fields are closer than a cache line which permits false sharing. 
WARNING: ObjectMonitor size is not a multiple of a cache line which permits false sharing. 

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 6, 2023

Orignal changes also use double size for SPARCv9.

@shipilev
Copy link
Member Author

shipilev commented Dec 6, 2023

If intention of original changes (add padding) was to avoid false sharing why we do it only for x64? From JDK-8049737:

I don't understand the question. We add padding for all architectures to avoid false sharing. Avoiding false sharing nominally requires padding by cache line size. That is why the default for DEFAULT_PADDING_SIZE is DEFAULT_CACHE_LINE_SIZE. x86 has (had?) a peculiarity with adjacent cache line hardware prefetchers that did require padding for twice the cache line size to avoid false sharing in those unusual circumstances. I guess SPARCv9 had the similar problem?

@shipilev
Copy link
Member Author

shipilev commented Dec 6, 2023

I submitted the RFE for lifting the *2 padding for x86_64 here:
https://bugs.openjdk.org/browse/JDK-8321481

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 6, 2023

@dcubed-ojdk, as author of JDK-8049737 changes, do you remember why we use double cacheline for padding?

@sviswa7
Copy link

sviswa7 commented Dec 6, 2023

Sure. It think it is a good patch. However, given that you added this commment:

// Hardware prefetchers on current implementations may pull 2 cache lines
// on access, therefore we pessimistically assume twice the cache line size
// for padding.

Do you have anything that backs up the claim that this is the case for "current implementations"? Maybe @sviswa7 can help answering if this is still the case for Intel hardware?

From my understanding:
Padding to 64 byte is needed to avoid cache line false sharing.
Padding to 256 byte is recommended for heavily accessed contended data to avoid false sharing induced by prefetchers.
Padding to 128 byte may be sufficient for general contended data.

@stefank
Copy link
Member

stefank commented Dec 7, 2023

Sure. It think it is a good patch. However, given that you added this commment:

// Hardware prefetchers on current implementations may pull 2 cache lines
// on access, therefore we pessimistically assume twice the cache line size
// for padding.

Do you have anything that backs up the claim that this is the case for "current implementations"? Maybe @sviswa7 can help answering if this is still the case for Intel hardware?

From my understanding: Padding to 64 byte is needed to avoid cache line false sharing. Padding to 256 byte is recommended for heavily accessed contended data to avoid false sharing induced by prefetchers. Padding to 128 byte may be sufficient for general contended data.

Thanks, Sandhya.

I think I found the statement that I was remembering:
https://mail.openjdk.org/pipermail/zgc-dev/2018-March/000184.html

FWIW, adjacent cache line prefetching is usually enabled for clients (desktops, laptops) and disabled for servers. It has been this way for a long time. For servers, the bandwidth penalty of adjacent cache line prefetching was likely the determining factor in this difference.

@shipilev
Copy link
Member Author

shipilev commented Dec 7, 2023

Folks, I submitted https://bugs.openjdk.org/browse/JDK-8321481 yesterday to figure out the padding situation on x86_64. Please migrate your comments about *2 there, so they are not lost. I transplanted some of the comments from this PR there.

Let's not allow to scope creep here. This PR is very specifically for splitting the definitions without the actual value changes, and it allows to clearly set the padding size if we decide it should not really match the cache line size.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need this too.

Lets ship this.

@openjdk
Copy link

openjdk bot commented Dec 20, 2023

@shipilev this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8237842-cache-line-padding-defs
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Dec 20, 2023
@dcubed-ojdk
Copy link
Member

@dcubed-ojdk, as author of JDK-8049737 changes, do you remember why we use double cacheline for padding?

Sorry, I've been pretty much off the air w.r.t e-mail due to problems with my MBP13.

We used double cache line size way back then because some hardware fetched two cache
lines worth. I believe that was true of some Intel versions and SPARCV9. I got the two cache
line worth request from Dave Dice.

Way back in the time frame of JDK-8049737 I wrote some micro benchmarks that we tested
with no padding, with one cache line worth of padding and two cache lines worth of padding
and we saw performance improvements on linux-x64, solaris-sparc64, solaris-x64 and I think
smaller improvements on windows-x64. We tried to convert my microbenchmarks into JMHs
without success.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Jan 3, 2024
@shipilev
Copy link
Member Author

shipilev commented Jan 3, 2024

I had to resolve some merge conflicts in g1ConcurrentMark.hpp due to recent changes. I am planning to integrate this after GHA turn back green.

@shipilev
Copy link
Member Author

shipilev commented Jan 4, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Jan 4, 2024

Going to push as commit dd517c6.
Since your change was applied there have been 18 commits pushed to the master branch:

  • 83564ea: 8322888: Parallel: Remove unused variables in PSPromotionManager
  • bbe0079: 8322298: Obsolete unused AdaptiveSizePolicyCollectionCostMargin
  • 7306636: 8322945: Problemlist runtime/CompressedOops/CompressedClassPointers.java on AIX
  • 1369c54: 8322782: Clean up usages of unnecessary fully qualified class name "java.util.Arrays"
  • 4db7a1c: 8322818: Thread::getStackTrace can fail with InternalError if virtual thread is timed-parked when pinned
  • 755722c: 8322214: Return value of XMLInputFactory.getProperty() changed from boolean to String in JDK 22 early access builds
  • 1cf9335: 8321404: Limit the number of heap dumps triggered by HeapDumpBeforeFullGC/AfterFullGC
  • 13c1148: 8321599: Data loss in AVX3 Base64 decoding
  • 028ec7e: 8319948: jcmd man page needs to be updated
  • 54b3cee: 8322877: java/io/BufferedInputStream/TransferTo.java failed with IndexOutOfBoundsException
  • ... and 8 more: https://git.openjdk.org/jdk/compare/a8e4229852fac703c6271aa8c5f94f67bea44902...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 4, 2024
@openjdk openjdk bot closed this Jan 4, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 4, 2024
@openjdk
Copy link

openjdk bot commented Jan 4, 2024

@shipilev Pushed as commit dd517c6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8237842-cache-line-padding-defs branch January 9, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

7 participants