Skip to content

8343191: Cgroup v1 subsystem fails to set subsystem path #21808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from

Conversation

sercher
Copy link
Contributor

@sercher sercher commented Oct 31, 2024

Cgroup V1 subsustem fails to initialize mounted controllers properly in certain cases, that may lead to controllers left undetected/inactive. We observed the behavior in CloudFoundry deployments, it affects also host systems.

The relevant /proc/self/mountinfo line is

2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct

/proc/self/cgroup:

11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c

Here, Java runs inside containerized process that is being moved cgroups due to load balancing.

Let's examine the condition at line 64 here

if (strcmp(_root, cgroup_path) == 0) {
ss.print_raw(_mount_point);
_path = os::strdup(ss.base());
} else {
char *p = strstr((char*)cgroup_path, _root);
if (p != nullptr && p == _root) {
if (strlen(cgroup_path) > strlen(_root)) {
ss.print_raw(_mount_point);
const char* cg_path_sub = cgroup_path + strlen(_root);
ss.print_raw(cg_path_sub);
_path = os::strdup(ss.base());
}
}
}

It is always FALSE and the branch is never taken. The issue was spotted earlier by @jerboaa in JDK-8288019.

The original logic was intended to find the common prefix of _rootand cgroup_path and concatenate the remaining suffix to the _mount_point (lines 67-68). That could lead to the following results:

Example input

_root = "/a"
cgroup_path = "/a/b"
_mount_point = "/sys/fs/cgroup/cpu,cpuacct"

result _path

"/sys/fs/cgroup/cpu,cpuacct/b"

Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control groups states:

...
       /proc/pid/cgroup (since Linux 2.6.24)
              This file describes control groups to which the process
              with the corresponding PID belongs.  The displayed
              information differs for cgroups version 1 and version 2
              hierarchies.
              For each cgroup hierarchy of which the process is a
              member, there is one entry containing three colon-
              separated fields:

                  hierarchy-ID:controller-list:cgroup-path

              For example:

                  5:cpuacct,cpu,cpuset:/daemons
...
              [3]  This field contains the pathname of the control group
                   in the hierarchy to which the process belongs. This
                   pathname is relative to the mount point of the
                   hierarchy.

This explicitly states the "pathname is relative to the mount point of the hierarchy". Hence, the correct result could have been

/sys/fs/cgroup/cpu,cpuacct/a/b

However, if Java runs in a container, /proc/self/cgroup and /proc/self/mountinfo are mapped (read-only) from host, because docker uses --cgroupns=host by default in cgroup v1 hosts. Then _root and cgroup_path belong to the host and do not exist in the container. In containers Java must fall back to _mount_point of the corresponding cgroup controller.

When --cgroupns=private is used, _root and cgroup_path are always equal to /.

In hosts, the cgroup_path should always be added to the mount point, no matter how it compares to the _root.

The PR fixes CgroupUtil::adjust_controller so that it handles the case when a process is moved to a supergroup or a sibling (in --cgroupns=private it produces invalid "../" paths). It also changes the CgroupV1Controller::set_subsystem_path in Cgroup V1 mode, so that it detects the actual subgroup part of the given cgroup_path, because exactly this part should be concatenated to the mount point to get the correct path of cgroup files. The PR updates the Java metrics side accordingly.

The new tests are proposed that cover processes moved over cgroups.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8343191: Cgroup v1 subsystem fails to set subsystem path (Bug - P3)

Reviewers

Contributors

  • Severin Gehwolf <sgehwolf@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21808/head:pull/21808
$ git checkout pull/21808

Update a local copy of the PR:
$ git checkout pull/21808
$ git pull https://git.openjdk.org/jdk.git pull/21808/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21808

View PR using the GUI difftool:
$ git pr show -t 21808

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21808.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 31, 2024

👋 Welcome back schernyshev! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 31, 2024

@sercher This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8343191: Cgroup v1 subsystem fails to set subsystem path

Co-authored-by: Severin Gehwolf <sgehwolf@openjdk.org>
Reviewed-by: sgehwolf, mbaesken

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1033 new commits pushed to the master branch:

  • fae37aa: 8345627: [REDO] Use gcc12 -ftrivial-auto-var-init=pattern in debug builds
  • 1f10ffb: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures
  • 4fc72b8: 8351082: Remove dead code for estimating CDS archive size
  • b6e2d66: 8351087: Combine scratch object tables in heapShared.cpp
  • d9b98f7: 8350771: Fix -Wzero-as-null-pointer-constant warning in nsk/monitoring ThreadController utility
  • 7c173fd: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect
  • 96613cc: 8349516: StAXStream2SAX.handleCharacters() fails on empty CDATA
  • 3a8a432: 8349094: GenShen: Race between control and regulator threads may violate assertions
  • 99fb350: 8350654: (fs) Files.createTempDirectory should say something about the default file permissions when no file attributes specified
  • 768b024: 8350682: [JMH] vector.IndexInRangeBenchmark failed with IndexOutOfBoundsException for size=1024
  • ... and 1023 more: https://git.openjdk.org/jdk/compare/bd6d911cbe4b04221e52120cd0f8f04e219eca4d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jerboaa, @MBaesken) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 31, 2024
@openjdk
Copy link

openjdk bot commented Oct 31, 2024

@sercher The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-runtime
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Oct 31, 2024
@mlbridge
Copy link

mlbridge bot commented Oct 31, 2024

@jerboaa
Copy link
Contributor

jerboaa commented Oct 31, 2024

What testing have you done? Did you run existing container tests in:

test/jdk/jdk/internal/platform
test/hotspot/jtreg/containers

As far as I can tell this breaks privileged container runs. I.e. docker run --privileged --memory 400m --memory-swap 400m ... /opt/jdk/bin/java -Xlog:os+container=trace wouldn't pick up the 400m container limit on CG v1?

@sercher
Copy link
Contributor Author

sercher commented Nov 1, 2024

I've done the standard tiers (1-3), and additionally "jtreg:jdk/internal/platform/cgroup" and "gtest::cgroupTest". I see now some of the dockers are failing. I am looking into it.

@sercher
Copy link
Contributor Author

sercher commented Nov 1, 2024

Thanks Severin! It was the problematic change in the logic that skips duplicate cgroup contoller mount points. Failing tests are mounting duplicates of the host's cgroups with --volume=/sys/fs/cgroup:/cgroup-in:ro. As they're in fact mounting read-write, the logic picked up rw mount option and falsely detected "host mode". Also the --privileged creates rw mounts, so the entire approach needs correction. I am changing to the draft PR for now.

@sercher sercher marked this pull request as draft November 1, 2024 13:14
@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 1, 2024
@jerboaa
Copy link
Contributor

jerboaa commented Nov 4, 2024

As they're in fact mounting read-write, the logic picked up rw mount option and falsely detected "host mode". Also the --privileged creates rw mounts, so the entire approach needs correction.

Yes. See https://bugs.openjdk.org/browse/JDK-8261242 for details. This patch shouldn't change it and the logic of OSContainer::is_containerized() shouldn't change semantically in all scenarios.

@sercher
Copy link
Contributor Author

sercher commented Nov 7, 2024

Here's an updated version of the patch. The long standing behavior was to leave _path uninitialized when _root is not "/" and not equal to cgroup_path. The issue can be reproduced as follows.

Create a new cgroup for memory

sudo mkdir -p /sys/fs/cgroup/memory/test

Run the following script

docker run --tty=true --rm --volume=$JAVA_HOME:/jdk --memory 400m ubuntu:latest \
    sh -c "sleep 10 ; /jdk/bin/java -Xlog:os+container=trace -version" | grep Memory\ Limit &
sleep 10
HOSTPID=$(sudo ps -ef | awk '/container=trace/ && !/docker/ && !/awk/ { print $2 }')
echo $HOSTPID | sudo tee /sys/fs/cgroup/memory/test/cgroup.procs
sleep 10

In the above script, a containerized process (/bin/sh) is moved to cgroup /test before /jdk/bin/java gets executed. Java inherits cgroup /test from its parent process, its _root will be /docker/<CONTAINER_ID>, cgroup_path will be /test.

The result would be ($JAVA_HOME points to JDK before fix)

9804
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] Memory Limit failed: -2
[0.002s][trace][os,container] Memory Limit failed: -2
[0.043s][trace][os,container] Memory Limit failed: -2

JDK updated version:

10001
[0.001s][trace  ][os,container] Memory Limit is: 419430400
[0.001s][trace  ][os,container] Memory Limit is: 419430400
[0.002s][trace  ][os,container] Memory Limit is: 419430400
[0.035s][trace  ][os,container] Memory Limit is: 419430400

The updated version falls back to the mount point (only when _root is other than "/").

Testing

  • Standard tiers (1-3)
  • jtreg:test/jdk/jdk/internal/platform
  • jtreg:test/hotspot/jtreg/containers
  • gtest:cgroupTest

@sercher sercher marked this pull request as ready for review November 7, 2024 18:32
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 7, 2024
@jerboaa
Copy link
Contributor

jerboaa commented Nov 8, 2024

Have you checked on cg v2? Is this a problem there as well?

@sercher
Copy link
Contributor Author

sercher commented Nov 8, 2024

Hi Severin, thanks for this question. I didn't check cg v2 because the issue (NPE) was observed in v1 hosts only.
I believe it's because v2 uses --cgroupns=private by default, in which cgroup is mounted at hierarchy leaf, so both _root and cgroup_path are /.

It's an open question what happens if a process is moved between cgroups in v2 mode. I will look into it and file an issue if there are problems in v2.

@sercher
Copy link
Contributor Author

sercher commented Nov 8, 2024

It looks to me that v2 mode is not affected, at least the way it is in v1. In v2 mode, cgroup is mounted either at leaf node (private namespace), or the complete hierarchy at /sys/fs/cgroup (host namespace).

In host mode it works right away, as the full hierarchy is accessible. With a cgroup v2 created like this:

sudo mkdir -p /sys/fs/cgroup/test
echo 200000000 | sudo tee /sys/fs/cgroup/test/memory.max

The result would be

[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.001s][trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/test
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/test/memory.max
[0.001s][trace][os,container] Memory Limit is: 199999488

In the private namespace (it's a default setting in v2 hosts), it may fail migrating the process between cgroups (a docker issue?). It may look like the cgroup files are not mapped at all, while cgroup_path appears to be set relative to the old cgroup (the old cgroup isn't mapped though).

[0.000s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.001s][trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/../../test
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../test/memory.max
[0.001s][debug][os,container] Open of file /sys/fs/cgroup/../../test/memory.max failed, No such file or directory
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] Memory Limit is: -2
[0.001s][debug][os,container] container memory limit failed: -2, using host value 4105613312
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../memory.max
[0.001s][debug][os,container] Open of file /sys/fs/cgroup/../../memory.max failed, No such file or directory
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] Memory Limit is: -2
[0.001s][debug][os,container] container memory limit failed: -2, using host value 4105613312
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../memory.max
[0.001s][debug][os,container] Open of file /sys/fs/cgroup/../memory.max failed, No such file or directory
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] Memory Limit is: -2
[0.001s][debug][os,container] container memory limit failed: -2, using host value 4105613312
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max
[0.001s][debug][os,container] Open of file /sys/fs/cgroup/memory.max failed, No such file or directory
[0.001s][trace][os,container] Memory Limit failed: -2

The following script

sudo docker run --tty=true --rm --volume=$JAVA_HOME:/jdk --memory 400m ubuntu:latest \
    sh -c "N=\$(ls -la /sys/fs/cgroup | wc -l) ; sleep 10 ; echo \$N ; ls -la /sys/fs/cgroup | wc -l" &
sleep 10
HOSTPID=$(sudo ps -ef | awk '/sys\/fs\/cgroup/ && !/docker/ && !/awk/ && !/grep/ { print $2 }')
echo $HOSTPID | sudo tee /sys/fs/cgroup/test/cgroup.procs > /dev/null
sleep 5

will display

74
1

means there are no files in /sys/fs/cgroup after migration. It seems like it's not something that can be fixed in Java (and it hasn't much to do with this PR too).

When moved into a subgroup, such as

sudo docker run --tty=true --rm --volume=$JAVA_HOME:/jdk --memory 400m ubuntu:latest \
    sh -c "sleep 10 ; /jdk/bin/java -Xlog:os+container=trace -version" &
sleep 5
HOSTPID=$(sudo ps -ef | awk '/container=trace/ && !/docker/ && !/awk/ { print $2 }')
CGPATH=$(cat /proc/$HOSTPID/cgroup | cut -f3 -d: )
sudo mkdir -p "/sys/fs/cgroup$CGPATH/test" 
echo $HOSTPID | sudo tee "/sys/fs/cgroup$CGPATH/test/cgroup.procs" > /dev/null
sleep 10

the cgroup will be mounted at /sys/fs/cgroup, and the correct memory limit is displayed (thanks to the conroller path adjustment) - inherited from the parent.

[0.001s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.001s][trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/test
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/test/memory.max
[0.001s][debug][os,container] Open of file /sys/fs/cgroup/test/memory.max failed, No such file or directory
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] Memory Limit is: -2
[0.001s][debug][os,container] container memory limit failed: -2, using host value 4105613312
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max
[0.001s][trace][os,container] Memory Limit is: 419430400

@jerboaa
Copy link
Contributor

jerboaa commented Nov 11, 2024

I didn't check cg v2 because the issue (NPE) was observed in v1 hosts only.

The JBS issue doesn't mention NullPointerException. It would be good to list the observed NPE issue.

@jerboaa
Copy link
Contributor

jerboaa commented Nov 11, 2024

Create a new cgroup for memory

sudo mkdir -p /sys/fs/cgroup/memory/test

Run the following script

docker run --tty=true --rm --volume=$JAVA_HOME:/jdk --memory 400m ubuntu:latest \
    sh -c "sleep 10 ; /jdk/bin/java -Xlog:os+container=trace -version" | grep Memory\ Limit &
sleep 10
HOSTPID=$(sudo ps -ef | awk '/container=trace/ && !/docker/ && !/awk/ { print $2 }')
echo $HOSTPID | sudo tee /sys/fs/cgroup/memory/test/cgroup.procs
sleep 10

In the above script, a containerized process (/bin/sh) is moved to cgroup /test before /jdk/bin/java gets executed. Java inherits cgroup /test from its parent process, its _root will be /docker/<CONTAINER_ID>, cgroup_path will be /test.

OK, but why is https://bugs.openjdk.org/browse/JDK-8322420 not in effect in such a case?

The result would be ($JAVA_HOME points to JDK before fix)

9804
[0.001s][trace][os,container] Memory Limit failed: -2
[0.001s][trace][os,container] Memory Limit failed: -2
[0.002s][trace][os,container] Memory Limit failed: -2
[0.043s][trace][os,container] Memory Limit failed: -2

JDK updated version:

10001
[0.001s][trace  ][os,container] Memory Limit is: 419430400
[0.001s][trace  ][os,container] Memory Limit is: 419430400
[0.002s][trace  ][os,container] Memory Limit is: 419430400
[0.035s][trace  ][os,container] Memory Limit is: 419430400

It would be good to see the full boot JVM output at the trace level. I'm wondering why the adjustment isn't sufficient for the use-case the bug describes. I.e. if the move happens before the JVM starts then there is a chance it works OK by detecting some limit. If not it would really be useful to understand it better.

If, however, the cgroup move happens after the JVM has started, there is nothing in the JVM which "corrects" the detected physical memory (i.e. heap size et. al) and/or detected CPUs. It's not supported to do that dynamically.

@jerboaa
Copy link
Contributor

jerboaa commented Nov 11, 2024

I didn't check cg v2 because the issue (NPE) was observed in v1 hosts only.

The JBS issue doesn't mention NullPointerException. It would be good to list the observed NPE issue.

I also wonder, then, if the issue is NPE if JDK-8336881 would fix that issue. The controller adjustment doesn't yet happen on the Java (Metrics) level. Only hotspot so far.

@jerboaa
Copy link
Contributor

jerboaa commented Nov 11, 2024

In the above script, a containerized process (/bin/sh) is moved to cgroup /test before /jdk/bin/java gets executed. Java inherits cgroup /test from its parent process, its _root will be /docker/<CONTAINER_ID>, cgroup_path will be /test.

OK, but why is https://bugs.openjdk.org/browse/JDK-8322420 not in effect in such a case?

Answering my own question. Because the set_subsystem_path() function for cg v1 in this unusual setup returns null.

[0.001s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.002s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers
[0.002s][trace][os,container] Adjusting controller path for memory: (null)
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][debug][os,container] read_string: subsystem path is null
[0.002s][trace][os,container] Memory Limit failed: -2
[0.002s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup/memory, adjusting to original path /test
[0.002s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.003s][trace][os,container] Path to /cpu.cfs_quota_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
[0.003s][trace][os,container] CPU Quota is: -1
[0.003s][trace][os,container] Path to /cpu.cfs_period_us is /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us
[0.003s][trace][os,container] CPU Period is: 100000
[0.003s][trace][os,container] OSContainer::active_processor_count: 12
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.003s][trace][os,container] total physical memory: 67163226112
[0.003s][debug][os,container] read_string: subsystem path is null
[0.003s][trace][os,container] Memory Limit failed: -2
[0.005s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
[0.021s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 12
openjdk 24-internal 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)

On cg v2, on the other hand, set_subsystem_path() will never set the path to a null value.

Edit:
Yet, cg v2 will get into trouble since there, for example on rootless podman on cg v2 you'd end up with this instead:

[0.008s][trace][os,container] OSContainer::init: Initializing Container Support
[0.008s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.008s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.008s][trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/../../../../../../test
[0.008s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../../../../test/memory.max
[0.008s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/memory.max failed, No such file or directory
[0.008s][trace][os,container] Memory Limit failed: -2
[0.008s][trace][os,container] Memory Limit is: -2
[0.008s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.008s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../../../../memory.max
[0.008s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../memory.max failed, No such file or directory
[0.008s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../../../memory.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../memory.max failed, No such file or directory
[0.009s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../../memory.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../memory.max failed, No such file or directory
[0.009s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../memory.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../memory.max failed, No such file or directory
[0.009s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../memory.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../memory.max failed, No such file or directory
[0.009s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../memory.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../memory.max failed, No such file or directory
[0.009s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/memory.max failed, No such file or directory
[0.009s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.009s][trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup, adjusting to original path /../../../../../../test
[0.009s][trace][os,container] Adjusting controller path for cpu: /sys/fs/cgroup/../../../../../../test
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../test/cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../test/cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] No lower limit found for cpu in hierarchy /sys/fs/cgroup, adjusting to original path /../../../../../../test
[0.009s][debug][os,container] OSContainer::init: is_containerized() = true because all controllers are mounted read-only (container case)
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../test/cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/cpu.max failed, No such file or directory
[0.009s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../test/cpu.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/cpu.max failed, No such file or directory
[0.009s][trace][os,container] CPU Period failed: -2
[0.009s][trace][os,container] OSContainer::active_processor_count: 6
[0.009s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 6
[0.009s][trace][os,container] total physical memory: 6204755968
[0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../../../../test/memory.max
[0.009s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/memory.max failed, No such file or directory
[0.009s][trace][os,container] Memory Limit failed: -2
[0.009s][trace][os,container] Memory Limit is: -2
[0.009s][debug][os,container] container memory limit failed: -2, using host value 6204755968
[0.011s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 6
[0.104s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../test/cpu.max
[0.105s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/cpu.max failed, No such file or directory
[0.105s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/../../../../../../test/cpu.max
[0.105s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/cpu.max failed, No such file or directory
[0.105s][trace][os,container] CPU Period failed: -2
[0.105s][trace][os,container] OSContainer::active_processor_count: 6
[0.112s][trace][os,container] total physical memory: 6204755968
[0.112s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../../../../test/memory.max
[0.112s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/memory.max failed, No such file or directory
[0.112s][trace][os,container] Memory Limit failed: -2
[0.112s][trace][os,container] Memory Limit is: -2
[0.112s][debug][os,container] container memory limit failed: -2, using host value 6204755968
openjdk version "24-internal" 2025-03-18
OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.sgehwolf.jdk-jdk)
OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing)

@jerboaa
Copy link
Contributor

jerboaa commented Nov 11, 2024

So on cg v1 you start out and end with a subsystem_path() == null and on cg v2 you start out and end with a subsystem_path() == /../../../../../../test. In both cases the memory limit of 400m won't be detected.

@sercher
Copy link
Contributor Author

sercher commented Nov 11, 2024

On cg v2, on the other hand, set_subsystem_path() will never set the path to a null value.

Exactly. That's why JDK-8322420 is not in effect and also JDK-8336881 does not fix it on Java side (path stays uninitialized in certain conditions).

@jerboaa
Copy link
Contributor

jerboaa commented Feb 25, 2025

@sercher As far as I can see this is a fairly simple case which would be covered by a simpler patch. My comment was in regards to my comment here. Where you replied with this answer. I don't see where anything you've described in your answer is being tested, covering this new code:

https://github.com/openjdk/jdk/pull/21808/files#diff-8910f554ed4a7bc465e01679328b3e9bd64ceaa6c85f00f0c575670e748ebba9R63-R77

That is, some sub-group without a match, but some with one.

Comment on lines 38 to 42
* Set directory to subsystem specific files based
* on the contents of the mountinfo and cgroup files.
* When runs in a container, the method handles the case
* when a process is moved between cgroups.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to explain exactly what is happening when. The current comment isn't even remotely explaining in detail what it does. What does "... handles the case when a process is moved between cgroups" mean exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to explain exactly what is happening when. The current comment isn't even remotely explaining in detail what it does. What does "... handles the case when a process is moved between cgroups" mean exactly?

Either it shall be a high level comment such as in your suggestion here, or a deeper description in detail what happens where. Could you please be more specific on what kind of description is required and where? Please note the method has inline comments that are fairly self describing. In the meanwhile I'll try to add a description of what "a process is moved between cgroups" exactly means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high level description that mentions what the function does. The inline comments are not very self describing for anyone not having written the patch.

Example:

Sets the subsystem path for a controller. The common container case is
handled by XXX. When a process has been moved from a cgroup to another
the following happens:
 - A
 - B
 - C

I believe this is desperately needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the updated comment.

@sercher
Copy link
Contributor Author

sercher commented Feb 25, 2025

@sercher As far as I can see this is a fairly simple case which would be covered by a simpler patch. My comment was in regards to my comment here. Where you replied with this answer. I don't see where anything you've described in your answer is being tested, covering this new code:

https://github.com/openjdk/jdk/pull/21808/files#diff-8910f554ed4a7bc465e01679328b3e9bd64ceaa6c85f00f0c575670e748ebba9R63-R77

The code fragment you mentioned is executed under condition at line 62, that is, _root and cgroup_path are not equal. This happens exactly when a process PID is written to cgroup.procs file in the directory, belonging to a certain control group G, i.e. the process PID is moved to the control group G. Now that we have _root and cgroup_path non-equal, such as in my response here, i.e.

_root: /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c
cgroup_path: /system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c

the loop at lines 67-77 is determining the "subgroup" part of the above cgroup_path , producing the debug message at line 73. The above case is CloudFoundry specific, while in the default docker setup it will be

_root: /docker/cc32e455402a8c98d1df6a81c685a540e7e528e714c981b10845c31b64d8a370
cgroup_path: /docker/cc32e455402a8c98d1df6a81c685a540e7e528e714c981b10845c31b64d8a370/test

In both cases it needs to be determined what (trailing) part of cgroup_path is an actual subgroup path, because this is how we find a directory that exists in the container. It's not known whether the subgroup path is

/bad/2f57368b-0eda-4e52-64d8-af5c

or

/garden/bad/2f57368b-0eda-4e52-64d8-af5c

and then the cgroup files path is either

/sys/fs/cgroup/cpu,cpuacct/bad/2f57368b-0eda-4e52-64d8-af5c

or

/sys/fs/cgroup/cpu,cpuacct/garden/bad/2f57368b-0eda-4e52-64d8-af5c

The docker case is not any different from the CF case. I therefore suggest this case is covered by TestMemoryWithSubgroups#testMemoryLimitSubgroupV1 as noted previously. Hope this helps.

That is, some sub-group without a match, but some with one.

@sercher sercher requested a review from MBaesken February 27, 2025 14:40
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, 2022, Red Hat Inc.
* Copyright (c) 2020, 2024, Red Hat Inc.
Copy link
Member

@MBaesken MBaesken Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this must be 2025 now ? Same for other files ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, updated the copyright headers.

@@ -64,15 +64,28 @@ public void testCgPathNonEmptyRoot() {
assertEquals(expectedPath, ctrl.path());
}

/*
* Less common cases: Containers
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure why this comment was added, is it refering to 'container mode' mentioned in the comment above in another file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because there are already "Common case: Containers" and "Common case: Host". The old test testCgPathSubstring() and the new test testCgPathToMovedPath() do not belong to "Common case: Host" that comes just before them.

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I had another look and the Docker test TestMemoryWithSubgroups.java does indeed cover this case for cg v1.

Please update copyright years to 2025 and this should be good to go (FYI: I'll be away the next week).

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, 2022, Red Hat Inc.
* Copyright (c) 2020, 2024, Red Hat Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed.

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me now. test_cgroupSubsystem_linux.cpp needs a copyright update as well.

@sercher
Copy link
Contributor Author

sercher commented Mar 2, 2025

OK for me now. test_cgroupSubsystem_linux.cpp needs a copyright update as well.

Thanks for your review @jerboaa ! I cheched the test_cgroupSubsystem_linux.cpp, it's already updated to 2025 in the master branch.

@jerboaa
Copy link
Contributor

jerboaa commented Mar 3, 2025

OK for me now. test_cgroupSubsystem_linux.cpp needs a copyright update as well.

Thanks for your review @jerboaa ! I cheched the test_cgroupSubsystem_linux.cpp, it's already updated to 2025 in the master branch.

OK!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 4, 2025
@sercher
Copy link
Contributor Author

sercher commented Mar 4, 2025

/contributor add sgehwolf

@sercher
Copy link
Contributor Author

sercher commented Mar 4, 2025

/integrate

@openjdk
Copy link

openjdk bot commented Mar 4, 2025

@sercher
Contributor Severin Gehwolf <sgehwolf@openjdk.org> successfully added.

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 4, 2025
@openjdk
Copy link

openjdk bot commented Mar 4, 2025

@sercher
Your change (at version bae78ff) is now ready to be sponsored by a Committer.

@dchuyko
Copy link
Member

dchuyko commented Mar 5, 2025

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 5, 2025

Going to push as commit de29ef3.
Since your change was applied there have been 1052 commits pushed to the master branch:

  • 75f028b: 8348657: compiler/loopopts/superword/TestEquivalentInvariants.java timed out
  • b1a21b5: 8351101: RISC-V: C2: Small improvement to MacroAssembler::revb
  • 62fa33a: 8351158: Incorrect APX EGPR register save ordering
  • 20ea218: 8336042: Caller/callee param size mismatch in deoptimization causes crash
  • 38b4d46: 8351081: Off-by-one error in ShenandoahCardCluster
  • 29de20d: 8280991: [XWayland] No displayChanged event after setDisplayMode call
  • 3e86b3a: 8350013: Add a test for JDK-8150442
  • a21302b: 8351036: [JVMCI] value not an s2: -32776
  • 0753376: 8297531: sun/security/krb5/MicroTime.java fails with "Exception: What? only 100 musec precision?"
  • 5598792: 8351064: JFR: Consistent timestamps
  • ... and 1042 more: https://git.openjdk.org/jdk/compare/bd6d911cbe4b04221e52120cd0f8f04e219eca4d...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 5, 2025
@openjdk openjdk bot closed this Mar 5, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 5, 2025
@openjdk
Copy link

openjdk bot commented Mar 5, 2025

@dchuyko @sercher Pushed as commit de29ef3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dholmes-ora
Copy link
Member

dholmes-ora commented Mar 7, 2025

@sercher your new test is failing in our CI:

[STDOUT]
mkdir: cannot create directory '/sys/fs/cgroup/memory/test': Permission denied
sh: /sys/fs/cgroup/memory/test/memory.limit_in_bytes: No such file or directory
sh: /sys/fs/cgroup/memory/test/cgroup.procs: No such file or directory

I will file a new bug - JDK-8351382

@sercher
Copy link
Contributor Author

sercher commented Mar 7, 2025

@dholmes-ora I submitted a fix here. Could you please re-run the tests in your CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

6 participants