Skip to content

Conversation

@fitzsim
Copy link
Contributor

@fitzsim fitzsim commented Feb 26, 2025

This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811.

I tested it with:

java -Xlog:os+container=trace -version

on:

Red Hat Enterprise Linux 8 (cgroups v1 only):
No change in behaviour

Fedora 41 (cgroups v2):
More verbose output due to /sys/fs/cgroup/cgroup.controllers parsing:

--- tt-old-f41.txt	2025-02-26 15:37:56.310738515 -0500
+++ tt-new-f41.txt	2025-02-26 15:37:56.601739407 -0500
@@ -1,7 +1,12 @@
 [trace][os,container] OSContainer::init: Initializing Container Support
-[debug][os,container] Detected optional pids controller entry in /proc/cgroups
-[debug][os,container] controller cpuset is not enabled
-                    ] 
+[debug][os,container] v2 controller cpuset is enabled and relevant
+[debug][os,container] v2 controller cpu is enabled and required
+[debug][os,container] v2 controller io is enabled but not relevant
+[debug][os,container] v2 controller memory is enabled and required
+[debug][os,container] v2 controller hugetlb is enabled but not relevant
+[debug][os,container] v2 controller pids is enabled and relevant
+[debug][os,container] v2 controller rdma is enabled but not relevant
+[debug][os,container] v2 controller misc is enabled but not relevant
 [debug][os,container] Detected cgroups v2 unified hierarchy
 [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user@4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope
 [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user@4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max

Fedora 41 (custom kernel with cgroups v1 disabled):
Fixes cgroups v2 detection:

--- tt-old-f41-custom-kernel.txt	2025-02-26 15:37:58.197744304 -0500
+++ tt-new-f41-custom-kernel.txt	2025-02-26 15:37:59.380747933 -0500
@@ -1,7 +1,63 @@
 [trace][os,container] OSContainer::init: Initializing Container Support
-[debug][os,container] Detected optional pids controller entry in /proc/cgroups
-[debug][os,container] controller cpuset is not enabled
-                    ] 
-[debug][os,container] controller memory is not enabled
-                    ] 
-[debug][os,container] One or more required controllers disabled at kernel level.
+[debug][os,container] v2 controller cpuset is enabled and relevant
+[debug][os,container] v2 controller cpu is enabled and required
+[debug][os,container] v2 controller io is enabled but not relevant
+[debug][os,container] v2 controller memory is enabled and required
+[debug][os,container] v2 controller hugetlb is enabled but not relevant
+[debug][os,container] v2 controller pids is enabled and relevant
+[debug][os,container] v2 controller rdma is enabled but not relevant
+[debug][os,container] v2 controller misc is enabled but not relevant
+[debug][os,container] Detected cgroups v2 unified hierarchy
+[trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope
+[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/memory.max
+[trace][os,container] Memory Limit is: -1
+[trace][os,container] Memory Limit is: Unlimited
+[debug][os,container] container memory limit unlimited: -1, using host value 4094947328
+[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/memory.max
+[trace][os,container] Memory Limit is: -1
+[trace][os,container] Memory Limit is: Unlimited
+[debug][os,container] container memory limit unlimited: -1, using host value 4094947328
+[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/memory.max
+[trace][os,container] Memory Limit is: -1
+[trace][os,container] Memory Limit is: Unlimited
+[debug][os,container] container memory limit unlimited: -1, using host value 4094947328
+[trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max
+[debug][os,container] Open of file /sys/fs/cgroup/memory.max failed, No such file or directory
+[trace][os,container] Memory Limit failed: -2
+[trace][os,container] Memory Limit is: -2
+[debug][os,container] container memory limit failed: -2, using host value 4094947328
+[trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup, adjusting to original path /user.slice/user-1000.slice/session-95.scope
+[trace][os,container] Adjusting controller path for cpu: /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max
+[trace][os,container] CPU Quota is: -1
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max
+[trace][os,container] CPU Period is: 100000
+[trace][os,container] OSContainer::active_processor_count: 2
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/cpu.max
+[trace][os,container] CPU Quota is: -1
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/cpu.max
+[trace][os,container] CPU Period is: 100000
+[trace][os,container] OSContainer::active_processor_count: 2
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/cpu.max
+[trace][os,container] CPU Quota is: -1
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/cpu.max
+[trace][os,container] CPU Period is: 100000
+[trace][os,container] OSContainer::active_processor_count: 2
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max
+[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max
+[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory
+[trace][os,container] CPU Period failed: -2
+[trace][os,container] OSContainer::active_processor_count: 2
+[trace][os,container] No lower limit found for cpu in hierarchy /sys/fs/cgroup, adjusting to original path /user.slice/user-1000.slice/session-95.scope
+[trace][os,container] total physical memory: 4094947328
+[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/memory.max
+[trace][os,container] Memory Limit is: -1
+[trace][os,container] Memory Limit is: Unlimited
+[debug][os,container] container memory limit unlimited: -1, using host value 4094947328
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max
+[trace][os,container] CPU Quota is: -1
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max
+[trace][os,container] CPU Period is: 100000
+[trace][os,container] OSContainer::active_processor_count: 2
+[debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present

Alpine Linux v3.21 (unified, aka cgroups v2 only):
Fixes cgroups v2 detection:

--- tt-old-alpine-unified.txt	2025-02-26 15:38:34.575898350 -0500
+++ tt-new-alpine-unified.txt	2025-02-26 15:38:36.156905658 -0500
@@ -1,7 +1,21 @@
 [trace][os,container] OSContainer::init: Initializing Container Support
-[debug][os,container] Detected optional pids controller entry in /proc/cgroups
-[debug][os,container] controller cpuset is not enabled
-                    ] 
-[debug][os,container] controller memory is not enabled
-                    ] 
-[debug][os,container] One or more required controllers disabled at kernel level.
+[debug][os,container] v2 controller cpuset is enabled and relevant
+[debug][os,container] v2 controller cpu is enabled and required
+[debug][os,container] v2 controller io is enabled but not relevant
+[debug][os,container] v2 controller memory is enabled and required
+[debug][os,container] v2 controller hugetlb is enabled but not relevant
+[debug][os,container] v2 controller pids is enabled and relevant
+[debug][os,container] Detected cgroups v2 unified hierarchy
+[trace][os,container] total physical memory: 2074931200
+[trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max
+[debug][os,container] Open of file /sys/fs/cgroup/memory.max failed, No such file or directory
+[trace][os,container] Memory Limit failed: -2
+[trace][os,container] Memory Limit is: -2
+[debug][os,container] container memory limit failed: -2, using host value 2074931200
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max
+[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory
+[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max
+[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory
+[trace][os,container] CPU Period failed: -2
+[trace][os,container] OSContainer::active_processor_count: 2
+[debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present

Alpine Linux v3.21 (hybrid):
No change in behaviour.

Alpine Linux v3.21 (legacy):
No change in behaviour.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issues

  • JDK-8349988: Change cgroup version detection logic to not depend on /proc/cgroups (Sub-task - P3)
  • JDK-8347811: Container detection code for cgroups v2 should use cgroup.controllers (Enhancement - P3)

Reviewers

Contributors

  • Severin Gehwolf <sgehwolf@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23811/head:pull/23811
$ git checkout pull/23811

Update a local copy of the PR:
$ git checkout pull/23811
$ git pull https://git.openjdk.org/jdk.git pull/23811/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23811

View PR using the GUI difftool:
$ git pr show -t 23811

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23811.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 26, 2025

👋 Welcome back fitzsim! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 26, 2025

@fitzsim This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8349988: Change cgroup version detection logic to not depend on /proc/cgroups
8347811: Container detection code for cgroups v2 should use cgroup.controllers

Co-authored-by: Severin Gehwolf <sgehwolf@openjdk.org>
Reviewed-by: sgehwolf, asmehra

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 21 new commits pushed to the master branch:

  • 209e72d: 8353234: Refactor XMLSecurityPropertyManager
  • cc870d4: 8352088: Call of com.sun.jdi.ThreadReference.threadGroups() can lock up target VM
  • d979bd8: 8344671: Few JFR streaming tests fail with application not alive error on MacOS 15
  • 49cb7aa: 8339114: DaCapo xalan performance with -XX:+UseObjectMonitorTable
  • d32ff13: 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable)
  • a0677d9: 8353263: Parallel: Remove locking in PSOldGen::resize
  • 8608b16: 8348887: Create IR framework test for JDK-8347997
  • 23eb648: 8353545: Improve debug info for StartOptionTest
  • 4f97c4c: 8349211: Add support for intrusive trees to the utilities red-black tree
  • c9baa8a: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead
  • ... and 11 more: https://git.openjdk.org/jdk/compare/6891490892cc0405882658e067d587ffe5401a6d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jerboaa, @ashu-mehra) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Feb 26, 2025

@fitzsim The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Feb 26, 2025
@jerboaa
Copy link
Contributor

jerboaa commented Feb 27, 2025

@fitzsim Please use /issue add JDK-8347811 since this PR is addressing them both (JDK-8349988 and JDK-8347811). That way both will get resolved when this PR integrates.

@fitzsim
Copy link
Contributor Author

fitzsim commented Feb 27, 2025

/issue add JDK-8347811

@openjdk
Copy link

openjdk bot commented Feb 27, 2025

@fitzsim
Adding additional issue to issue list: 8347811: Container detection code for cgroups v2 should use cgroup.controllers.

@fitzsim
Copy link
Contributor Author

fitzsim commented Feb 27, 2025

The actual changes are easier to see when whitespace is ignored: 39a6463?w=1

@fitzsim
Copy link
Contributor Author

fitzsim commented Feb 27, 2025

I fixed an existing assert message typo that I noticed while working on the patch, hierarchy mismatch for cpuacc[t]. Strictly speaking it is not related to either bug report, but I figured it did not warrant a bug report of its own.

@fitzsim fitzsim marked this pull request as ready for review February 27, 2025 15:16
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 27, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 27, 2025

Webrevs

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. It's getting there...

@fitzsim
Copy link
Contributor Author

fitzsim commented Feb 28, 2025

/reviewers 2

@openjdk
Copy link

openjdk bot commented Feb 28, 2025

@fitzsim
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@fitzsim
Copy link
Contributor Author

fitzsim commented Feb 28, 2025

/contributor add jerboaa

@openjdk
Copy link

openjdk bot commented Feb 28, 2025

@fitzsim jerboaa was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

@fitzsim
Copy link
Contributor Author

fitzsim commented Feb 28, 2025

/contributor add @jerboaa

@openjdk
Copy link

openjdk bot commented Feb 28, 2025

@fitzsim
Contributor Severin Gehwolf <sgehwolf@openjdk.org> successfully added.

Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test
cases such that their /proc/cgroups and /proc/self/cgroup contents
correspond.  This prevents assertion failures these tests were
producing when is_cgroupsV2 was replaced with cgroups_v2_enabled.
Do not log enabled controllers that are not relevant to the JDK.
Remove from cgroups v1 branch incorrect log messages about cpuset
controller being optional.  Add test case for cgroups v1, cpuset
disabled.
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 1, 2025
@fitzsim
Copy link
Contributor Author

fitzsim commented Apr 1, 2025

/integrate

@openjdk
Copy link

openjdk bot commented Apr 1, 2025

@fitzsim This pull request has not yet been marked as ready for integration.

@fitzsim fitzsim requested review from ashu-mehra and jerboaa April 2, 2025 00:01
Copy link
Contributor

@ashu-mehra ashu-mehra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 2, 2025
@fitzsim
Copy link
Contributor Author

fitzsim commented Apr 2, 2025

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 2, 2025
@openjdk
Copy link

openjdk bot commented Apr 2, 2025

@fitzsim
Your change (at version b29d869) is now ready to be sponsored by a Committer.

@fitzsim
Copy link
Contributor Author

fitzsim commented Apr 2, 2025

Thank you for re-reviewing, @jerboaa and @ashu-mehra. I have issued the integrate command. Can one of you please sponsor the change?

@jerboaa
Copy link
Contributor

jerboaa commented Apr 3, 2025

/sponsor

@openjdk
Copy link

openjdk bot commented Apr 3, 2025

Going to push as commit 9c5ed23.
Since your change was applied there have been 29 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 3, 2025
@openjdk openjdk bot closed this Apr 3, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Apr 3, 2025
@openjdk
Copy link

openjdk bot commented Apr 3, 2025

@jerboaa @fitzsim Pushed as commit 9c5ed23.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@matzf
Copy link

matzf commented May 8, 2025

@jerboaa @fitzsim, is there any plan to backport this fix (to 21)?

@jerboaa
Copy link
Contributor

jerboaa commented May 8, 2025

@jerboaa @fitzsim, is there any plan to backport this fix (to 21)?

Eventually yes. But this change hasn't seen a lot of real-world exposure yet. It would be good to have that before attempting a backport.

@jglick
Copy link
Contributor

jglick commented Jul 22, 2025

In case anyone comes across this, I filed it as https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-6.14/+bug/2117446

Effectiveness of patch confirmed on Ubuntu 24.04.2 HWE using Docker CE 28.3.2 by

git checkout 9c5ed23eac7470f56d498e9c4d3c51c2f80fd571^
bash configure --with-boot-jdk=$HOME/.sdkman/candidates/java/24-tem
make images
docker run -m 1GB --rm -v $(pwd)/build/linux-x86_64-server-release/images/jdk:/jdk ubuntu /jdk/bin/java '-Xlog:os*=trace' -XshowSettings:vm -XX:MaxRAMPercentage=50 -version
git checkout 9c5ed23eac7470f56d498e9c4d3c51c2f80fd571
make images
docker run -m 1GB --rm -v $(pwd)/build/linux-x86_64-server-release/images/jdk:/jdk ubuntu /jdk/bin/java '-Xlog:os*=trace' -XshowSettings:vm -XX:MaxRAMPercentage=50 -version

Before: shows

One or more required controllers disabled at kernel level.

and max heap size is ½ of physical RAM. After,

Memory Limit is: 1073741824

and max heap size is 512Mb as expected.

@sercher
Copy link
Contributor

sercher commented Oct 14, 2025

Hi @jerboaa,
We are recieving multiple customer reports on the issue with 6.14 HWE kernels.
I would like to backport this to 21.0.10 & 17.0.18. Do you think it could be safe for the upcoming January update release?

@fitzsim Please let me know if you plan to work on the backports.

@jerboaa
Copy link
Contributor

jerboaa commented Oct 14, 2025

Hi @jerboaa,
We are recieving multiple customer reports on the issue with 6.14 HWE kernels.
I would like to backport this to 21.0.10 & 17.0.18. Do you think it could be safe for the upcoming January update release?

I tend to agree. Note that it's an issue on how the actual kernel is configured. But Ubuntu 24.04 LTS is affected. Noted as such in the bug recently: https://bugs.openjdk.org/browse/JDK-8349988

First step would be JDK 21. JDK 17 I'm not sure. It'll depend how the patch will look like.

@sercher
Copy link
Contributor

sercher commented Oct 15, 2025

First step would be JDK 21. JDK 17 I'm not sure. It'll depend how the patch will look like.

JDK 21 patch is mostly clean (minor context issue). I will do the backport next week, if @fitzsim does not take it over. JDK 17 backport won't be clean, JDK 17 doesn't have JDK-8301479 and JDK-8238161 (NULLs were replaced with nullptr, fopens with os::fopen, both are irrelevant to this patch), JDK-8347129 (backport is underway) and JDK-8261242 which I consider a context conflict, not really a dependency (only the variable name has changed).

@fitzsim
Copy link
Contributor Author

fitzsim commented Oct 15, 2025

JDK 21 patch is mostly clean (minor context issue). I will do the backport next week, if @fitzsim does not take it over.

@sercher It's fine with me if @jerboaa or you do the backporting; thanks!

@sercher
Copy link
Contributor

sercher commented Oct 28, 2025

@fitzsim @jerboaa here's the new PR in 21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants