8312182: THPs cause huge RSS due to thread start timing issue #2086
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Unclean composite backport to fix JDK-8312182 - "THPs cause huge RSS due to thread start timing issue" (https://bugs.openjdk.org/browse/JDK-8312182)
Problem:
On a machine with transparent huge pages (THP) unconditionally enabled (/sys/kernel/mm/transparent_hugepage/enabled = "always"), the JVM may show a huge memory footprint (RSS) and degraded thread start performance.
The following factors make the problem more severe and more likely:
For a detailed discussion of the underlying problem, please see openjdk/jdk#14919.
In jdk Head, the issue got fixed with a sequence of patches:
However, JDK-8312182 itself needed one preparatory fix:
and then we had several corner-case test problems which are fixed with:
and finally, we decided to rename the switch that allows to switch off the THP mitigation with a final patch:
Instead of downporting these 7 patches verbatim, I prepared a composite patch containing only the necessary mitigation and mitigation tests.
This patch does:
The patch needs some infrastructure, but I downported only the necessary parts: the helper class "HugePages", which is used in head to scan the operating system for information about THP settings. I only included the parts to do with THPs and left the rest out.
The patch also includes a regression test.
Testing:
I manually tested the JVM on Linux x64 with THP=always:
Without the patch (-Xmx1g -Xms1g -XX:+AlwaysPreTouch -Xss2m, 10000 threads started), I see slow thread startup and 11 GB - 14 GB of RSS.
The patched version comes up a lot faster and only shows 1.3 GB of RSS.
GHAs: unfortunately broken due to infrastructure issues.
Progress
Issues
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk11u-dev.git pull/2086/head:pull/2086
$ git checkout pull/2086
Update a local copy of the PR:
$ git checkout pull/2086
$ git pull https://git.openjdk.org/jdk11u-dev.git pull/2086/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 2086
View PR using the GUI difftool:
$ git pr show -t 2086
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk11u-dev/pull/2086.diff
Webrev
Link to Webrev Comment