-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8343191: Cgroup v1 subsystem fails to set subsystem path #21808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back schernyshev! A progress list of the required criteria for merging this PR into |
@sercher This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1033 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jerboaa, @MBaesken) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
@sercher The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
What testing have you done? Did you run existing container tests in:
As far as I can tell this breaks privileged container runs. I.e. |
I've done the standard tiers (1-3), and additionally "jtreg:jdk/internal/platform/cgroup" and "gtest::cgroupTest". I see now some of the dockers are failing. I am looking into it. |
Thanks Severin! It was the problematic change in the logic that skips duplicate cgroup contoller mount points. Failing tests are mounting duplicates of the host's cgroups with |
Yes. See https://bugs.openjdk.org/browse/JDK-8261242 for details. This patch shouldn't change it and the logic of |
Here's an updated version of the patch. The long standing behavior was to leave Create a new cgroup for memory
Run the following script
In the above script, a containerized process ( The result would be ($JAVA_HOME points to JDK before fix)
JDK updated version:
The updated version falls back to the mount point (only when Testing
|
Have you checked on cg v2? Is this a problem there as well? |
Hi Severin, thanks for this question. I didn't check cg v2 because the issue (NPE) was observed in v1 hosts only. It's an open question what happens if a process is moved between cgroups in v2 mode. I will look into it and file an issue if there are problems in v2. |
It looks to me that v2 mode is not affected, at least the way it is in v1. In v2 mode, cgroup is mounted either at leaf node (private namespace), or the complete hierarchy at /sys/fs/cgroup (host namespace). In host mode it works right away, as the full hierarchy is accessible. With a cgroup v2 created like this:
The result would be
In the private namespace (it's a default setting in v2 hosts), it may fail migrating the process between cgroups (a docker issue?). It may look like the cgroup files are not mapped at all, while
The following script
will display
means there are no files in When moved into a subgroup, such as
the cgroup will be mounted at /sys/fs/cgroup, and the correct memory limit is displayed (thanks to the conroller path adjustment) - inherited from the parent.
|
The JBS issue doesn't mention |
OK, but why is https://bugs.openjdk.org/browse/JDK-8322420 not in effect in such a case?
It would be good to see the full boot JVM output at the trace level. I'm wondering why the adjustment isn't sufficient for the use-case the bug describes. I.e. if the move happens before the JVM starts then there is a chance it works OK by detecting some limit. If not it would really be useful to understand it better. If, however, the cgroup move happens after the JVM has started, there is nothing in the JVM which "corrects" the detected physical memory (i.e. heap size et. al) and/or detected CPUs. It's not supported to do that dynamically. |
I also wonder, then, if the issue is NPE if JDK-8336881 would fix that issue. The controller adjustment doesn't yet happen on the Java (Metrics) level. Only hotspot so far. |
Answering my own question. Because the
On cg v2, on the other hand, Edit:
|
So on cg v1 you start out and end with a |
Exactly. That's why JDK-8322420 is not in effect and also JDK-8336881 does not fix it on Java side (path stays uninitialized in certain conditions). |
@sercher As far as I can see this is a fairly simple case which would be covered by a simpler patch. My comment was in regards to my comment here. Where you replied with this answer. I don't see where anything you've described in your answer is being tested, covering this new code: That is, some sub-group without a match, but some with one. |
* Set directory to subsystem specific files based | ||
* on the contents of the mountinfo and cgroup files. | ||
* When runs in a container, the method handles the case | ||
* when a process is moved between cgroups. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to explain exactly what is happening when. The current comment isn't even remotely explaining in detail what it does. What does "... handles the case when a process is moved between cgroups" mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to explain exactly what is happening when. The current comment isn't even remotely explaining in detail what it does. What does "... handles the case when a process is moved between cgroups" mean exactly?
Either it shall be a high level comment such as in your suggestion here, or a deeper description in detail what happens where. Could you please be more specific on what kind of description is required and where? Please note the method has inline comments that are fairly self describing. In the meanwhile I'll try to add a description of what "a process is moved between cgroups" exactly means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high level description that mentions what the function does. The inline comments are not very self describing for anyone not having written the patch.
Example:
Sets the subsystem path for a controller. The common container case is
handled by XXX. When a process has been moved from a cgroup to another
the following happens:
- A
- B
- C
I believe this is desperately needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the updated comment.
The code fragment you mentioned is executed under condition at line 62, that is,
the loop at lines 67-77 is determining the "subgroup" part of the above
In both cases it needs to be determined what (trailing) part of
or
and then the cgroup files path is either
or
The docker case is not any different from the CF case. I therefore suggest this case is covered by
|
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright (c) 2020, 2022, Red Hat Inc. | |||
* Copyright (c) 2020, 2024, Red Hat Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this must be 2025 now ? Same for other files ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, updated the copyright headers.
@@ -64,15 +64,28 @@ public void testCgPathNonEmptyRoot() { | |||
assertEquals(expectedPath, ctrl.path()); | |||
} | |||
|
|||
/* | |||
* Less common cases: Containers | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure why this comment was added, is it refering to 'container mode' mentioned in the comment above in another file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because there are already "Common case: Containers" and "Common case: Host". The old test testCgPathSubstring() and the new test testCgPathToMovedPath() do not belong to "Common case: Host" that comes just before them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I had another look and the Docker test TestMemoryWithSubgroups.java
does indeed cover this case for cg v1.
Please update copyright years to 2025 and this should be good to go (FYI: I'll be away the next week).
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright (c) 2020, 2022, Red Hat Inc. | |||
* Copyright (c) 2020, 2024, Red Hat Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indeed.
src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/containers/docker/TestMemoryWithSubgroups.java
Outdated
Show resolved
Hide resolved
test/jdk/jdk/internal/platform/docker/TestDockerMemoryMetricsSubgroup.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me now. test_cgroupSubsystem_linux.cpp
needs a copyright update as well.
Thanks for your review @jerboaa ! I cheched the test_cgroupSubsystem_linux.cpp, it's already updated to 2025 in the master branch. |
OK! |
/contributor add sgehwolf |
/integrate |
@sercher |
/sponsor |
Going to push as commit de29ef3.
Your commit was automatically rebased without conflicts. |
@sercher your new test is failing in our CI:
I will file a new bug - JDK-8351382 |
@dholmes-ora I submitted a fix here. Could you please re-run the tests in your CI? |
Cgroup V1 subsustem fails to initialize mounted controllers properly in certain cases, that may lead to controllers left undetected/inactive. We observed the behavior in CloudFoundry deployments, it affects also host systems.
The relevant /proc/self/mountinfo line is
/proc/self/cgroup:
Here, Java runs inside containerized process that is being moved cgroups due to load balancing.
Let's examine the condition at line 64 here
jdk/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp
Lines 59 to 72 in 55a7cf1
It is always FALSE and the branch is never taken. The issue was spotted earlier by @jerboaa in JDK-8288019.
The original logic was intended to find the common prefix of
_root
andcgroup_path
and concatenate the remaining suffix to the_mount_point
(lines 67-68). That could lead to the following results:Example input
result _path
Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control groups states:
This explicitly states the "pathname is relative to the mount point of the hierarchy". Hence, the correct result could have been
However, if Java runs in a container,
/proc/self/cgroup
and/proc/self/mountinfo
are mapped (read-only) from host, because docker uses--cgroupns=host
by default in cgroup v1 hosts. Then_root
andcgroup_path
belong to the host and do not exist in the container. In containers Java must fall back to_mount_point
of the corresponding cgroup controller.When
--cgroupns=private
is used,_root
andcgroup_path
are always equal to/
.In hosts, the
cgroup_path
should always be added to the mount point, no matter how it compares to the_root
.The PR fixes
CgroupUtil::adjust_controller
so that it handles the case when a process is moved to a supergroup or a sibling (in --cgroupns=private it produces invalid "../" paths). It also changes theCgroupV1Controller::set_subsystem_path
in Cgroup V1 mode, so that it detects the actual subgroup part of the given cgroup_path, because exactly this part should be concatenated to the mount point to get the correct path of cgroup files. The PR updates the Java metrics side accordingly.The new tests are proposed that cover processes moved over cgroups.
Progress
Issue
Reviewers
Contributors
<sgehwolf@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21808/head:pull/21808
$ git checkout pull/21808
Update a local copy of the PR:
$ git checkout pull/21808
$ git pull https://git.openjdk.org/jdk.git pull/21808/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21808
View PR using the GUI difftool:
$ git pr show -t 21808
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21808.diff
Using Webrev
Link to Webrev Comment