Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8310233: Fix THP detection on Linux #14739

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Jun 30, 2023

Today, if we use UseTransparentHugePages, we assume that the static hugepage detection we do is valid for THPs:

  • that THPs use the page size (in hotspot used as "default large page size") found in /proc/memlimit "Hugepagesize")
  • that THPs are enabled if that page size is >0.

Both assumptions are incorrect:

  • whether THPs are enabled should be checked at /sys/kernel/mm/transparent_hugepage/enabled, which is a tri-state value ("always", "madvise", "never"). THPs are available for the first two states.
  • The page size employed by khugepaged is set in /sys/kernel/mm/transparent_hugepage/hpage_pmd_size. It can differ from the default page size used for static hugepages. For example, we could configure a system such that it uses 1G static hugepages, but the THP page size would still be 2M.

About the patch:

This is a limited, minimally invasive patch to fix THP detection. The patch aims to be easy to downport. There is more work to do, which I will do in subsequent RFEs.

Functionally, for static (non-THP) pages nothing changes. THP-mode now correctly detects THP support in the OS, and uses the correct page size (see examples below).


Example 1: System has THPs disabled, but static hugepages (1g, 2m) configured:

thomas@starfish $ cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
thomas@starfish $ cat /proc/meminfo | grep Hugepage
Hugepagesize:    1048576 kB

Without patch, we incorrectly assume THPs are enabled, and that THP page size is 1G (!), which we then proceed and use as heap page size, causing the heap size to be rounded up from 512m -> 1G. But - even though it is printed as "1G page backed" in log output - in reality it will still 4K-page-backed: the madvise(2) we use to set the THP page size will be ignored by the system, since THPs are disabled.

thomas@starfish $ ./images/jdk/bin/java -Xmx512m -XX:+UseLargePages -XX:+UseTransparentHugePages -Xlog:pagesize -version
[0.001s][info][pagesize] Using the default large page size: 1G
[0.001s][info][pagesize] Usable page sizes: 4k, 2M, 1G
...
[0.016s][info][pagesize] Heap:  min=1G max=1G base=0x00000000c0000000 size=1G page_size=1G

With patch, we correctly refuse to use large pages (and we log more info):

thomas@starfish $ ./images/jdk/bin/java -Xmx512m -XX:+UseLargePages -XX:+UseTransparentHugePages -Xlog:pagesize -version
[0.001s][info][pagesize] Static hugepage support: 2M, 1G (default)
[0.001s][info][pagesize]   default pagesize: 1G
[0.001s][info][pagesize] Transparent hugepage (THP) support:
[0.001s][info][pagesize]   mode: never
[0.001s][warning][pagesize] UseLargePages disabled, no large pages configured and available on the system.

Example 2: System has THPs enabled, but THP page size is just 2M, whereas the system uses a static default hugepage size of 1G:

thomas@starfish $ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
thomas@starfish $ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size 
2097152
thomas@starfish $ cat /proc/meminfo | grep Hugepage
Hugepagesize:    1048576 kB

Without patch, THP page size is not correctly recognized as 2M. Instead, we again use 1G as page size for the heap:

thomas@starfish $ ./images/jdk/bin/java -Xmx512m -XX:+UseLargePages -XX:+UseTransparentHugePages -Xlog:pagesize -version
[0.001s][info][pagesize] Using the default large page size: 1G
[0.001s][info][pagesize] Usable page sizes: 4k, 2M, 1G
...
[0.010s][info][pagesize] Heap:  min=1G max=1G base=0x00000000c0000000 size=1G page_size=1G

With patch, we correctly identify the THP page size as 2M, and use that for the heap:

thomas@starfish $ ./images/jdk/bin/java -Xmx512m -XX:+UseLargePages -XX:+UseTransparentHugePages -Xlog:pagesize -version
[0.001s][info][pagesize] Static hugepage support: 2M, 1G (default)
[0.001s][info][pagesize]   default pagesize: 1G
[0.001s][info][pagesize] Transparent hugepage (THP) support:
[0.001s][info][pagesize]   mode: madvise
[0.001s][info][pagesize]   pagesize: 2M
[0.001s][info][pagesize] Large page support enabled. Usable page sizes: 4k, 2M
[0.001s][info][pagesize]  Default: 2M
...
[0.010s][info][pagesize] Heap:  min=8M max=512M base=0x00000000e0000000 size=512M page_size=2M

Tests: GHAs all green. Local experiments on x64 Linux on machines with 1G pages succeeded.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14739/head:pull/14739
$ git checkout pull/14739

Update a local copy of the PR:
$ git checkout pull/14739
$ git pull https://git.openjdk.org/jdk.git pull/14739/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14739

View PR using the GUI difftool:
$ git pr show -t 14739

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14739.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 30, 2023

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 30, 2023

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jun 30, 2023
@tstuefe tstuefe force-pushed the JDK-8310233-Linux-THP-initialization-incorrect branch from 75829ce to a216486 Compare June 30, 2023 16:42
@tstuefe tstuefe force-pushed the JDK-8310233-Linux-THP-initialization-incorrect branch from a216486 to a6c19c9 Compare June 30, 2023 16:46
@tstuefe tstuefe marked this pull request as ready for review July 1, 2023 08:41
@tstuefe tstuefe marked this pull request as draft July 1, 2023 08:42
@tstuefe tstuefe marked this pull request as ready for review July 1, 2023 08:43
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 1, 2023
@mlbridge
Copy link

mlbridge bot commented Jul 1, 2023

Webrevs

Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Thomas,

Thanks for these changes. I've some suggestions for the code, but the conceptual change is good.

src/hotspot/os/linux/hugepages.cpp Show resolved Hide resolved
src/hotspot/os/linux/hugepages.cpp Show resolved Hide resolved
src/hotspot/os/linux/hugepages.cpp Outdated Show resolved Hide resolved
src/hotspot/os/linux/hugepages.cpp Show resolved Hide resolved
src/hotspot/os/linux/hugepages.cpp Show resolved Hide resolved
@openjdk
Copy link

openjdk bot commented Jul 3, 2023

⚠️ @tstuefe This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@tstuefe
Copy link
Member Author

tstuefe commented Jul 3, 2023

@jdksjolen Thanks a lot for your review!

See inline remarks. Most of your code suggestions are good, but this patch just moved the static hugepage detection parts out of os_linux.cpp, and left them (mostly) alone otherwise, to avoid adding regressions. I'll keep your input in mind for the next round of cleanups.

Wrt NonInterleavingLogStream, I decided to not do that. It is not needed, since we are single-threaded, and it without it I may still be lucky to get it integrated cleanly to at least 17.

I added a new repro case. Please take a look!

Thank you.

Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the methods seems to leak file descriptors, other than that tests looks good.

test/hotspot/jtreg/runtime/os/HugePageConfiguration.java Outdated Show resolved Hide resolved
test/hotspot/jtreg/runtime/os/HugePageConfiguration.java Outdated Show resolved Hide resolved
test/hotspot/jtreg/runtime/os/HugePageConfiguration.java Outdated Show resolved Hide resolved
@tstuefe
Copy link
Member Author

tstuefe commented Jul 4, 2023

Thanks @jdksjolen! I changed the test accordingly.

Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Thomas!

These changes look good to me now, I'm approving this.

@openjdk
Copy link

openjdk bot commented Jul 4, 2023

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8310233: Fix THP detection on Linux

Reviewed-by: jsjolen, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 127 new commits pushed to the master branch:

  • 81c4e8f: 8304006: jlink should create the jimage file in the native endian for the target platform
  • e8f66bf: 8310949: RISC-V: Initialize UseUnalignedAccesses
  • 19691fa: 6361826: (reflect) provide method for mapping strings to class object for primitive types
  • c84866a: 8310551: vmTestbase/nsk/jdb/interrupt/interrupt001/interrupt001.java timed out due to missing prompt
  • 0d2196f: 8311992: Test java/lang/Thread/virtual/JfrEvents::testVirtualThreadPinned failed
  • f3b96f6: 8311862: RISC-V: small improvements to shift immediate instructions
  • a63f865: 8311946: add support for libgraal specific jtreg tests
  • 167d1c1: 8311986: Disable runtime/os/TestTracePageSizes.java for ShenandoahGC
  • 7539cc0: 8303134: JFR: Missing stack trace during chunk rotation stress
  • d1fa1a8: 8311825: Duplicate qualified enum constants not detected
  • ... and 117 more: https://git.openjdk.org/jdk/compare/33011ea19bb29e88ce18a138a8fa8b34f8c97407...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 4, 2023
@tstuefe
Copy link
Member Author

tstuefe commented Jul 6, 2023

Gentle ping, second review needed.

@tstuefe tstuefe changed the title JDK-8310233: Linux: THP initialization is incorrect JDK-8310233: Fix THP detection on Linux Jul 7, 2023
return _default_hugepage_size;
}

// Scan /proc/meminfo and return value of Hugepagesize
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer notes:
scan_default_hugepagesize is just a code move, it used to live in scan_default_large_page_size in os_linux.cpp. It is responsible for reading the default static hugepage size from /proc/meminfo. This code is mostly untouched to reduce chance for regressions (though the code could be tightened and cleaned for sure).

@tstuefe
Copy link
Member Author

tstuefe commented Jul 11, 2023

May I have a second review?

@tstuefe
Copy link
Member Author

tstuefe commented Jul 11, 2023

@jdksjolen was kind enough to put this through Oracle's CI, all good.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an area I am familiar with but scanning through the code this seems reasonable. A few minor comments below.

Thanks.

test/hotspot/jtreg/runtime/os/HugePageConfiguration.java Outdated Show resolved Hide resolved
src/hotspot/os/linux/hugepages.cpp Show resolved Hide resolved
@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2023

Not an area I am familiar with but scanning through the code this seems reasonable. A few minor comments below.

Thanks.

Many thanks, @dholmes-ora! Remarks follow.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2023

@dholmes-ora : I massaged in your feedback.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing further from me.

Thanks.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 17, 2023

Thanks, @dholmes-ora !

/integrate

@openjdk
Copy link

openjdk bot commented Jul 17, 2023

Going to push as commit 37ca902.
Since your change was applied there have been 127 commits pushed to the master branch:

  • 81c4e8f: 8304006: jlink should create the jimage file in the native endian for the target platform
  • e8f66bf: 8310949: RISC-V: Initialize UseUnalignedAccesses
  • 19691fa: 6361826: (reflect) provide method for mapping strings to class object for primitive types
  • c84866a: 8310551: vmTestbase/nsk/jdb/interrupt/interrupt001/interrupt001.java timed out due to missing prompt
  • 0d2196f: 8311992: Test java/lang/Thread/virtual/JfrEvents::testVirtualThreadPinned failed
  • f3b96f6: 8311862: RISC-V: small improvements to shift immediate instructions
  • a63f865: 8311946: add support for libgraal specific jtreg tests
  • 167d1c1: 8311986: Disable runtime/os/TestTracePageSizes.java for ShenandoahGC
  • 7539cc0: 8303134: JFR: Missing stack trace during chunk rotation stress
  • d1fa1a8: 8311825: Duplicate qualified enum constants not detected
  • ... and 117 more: https://git.openjdk.org/jdk/compare/33011ea19bb29e88ce18a138a8fa8b34f8c97407...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 17, 2023
@openjdk openjdk bot closed this Jul 17, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 17, 2023
@openjdk
Copy link

openjdk bot commented Jul 17, 2023

@tstuefe Pushed as commit 37ca902.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the JDK-8310233-Linux-THP-initialization-incorrect branch July 25, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
3 participants