Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8230305: Cgroups v2: Container awareness #840

Closed

Conversation

zhengyu123
Copy link
Contributor

@zhengyu123 zhengyu123 commented Feb 24, 2022

I would like backport cgroups v2 support to openjdk11u.

The original patch does not apply cleanly, conflicts are resolved manually.

Test:

  • jtreg containers/docker test on Ubuntu 20.04.4 LTS

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issues

  • JDK-8230305: Cgroups v2: Container awareness
  • JDK-8229202: Docker reporting causes secondary crashes in error handling
  • JDK-8216366: Add rationale to PER_CPU_SHARES define

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk11u-dev pull/840/head:pull/840
$ git checkout pull/840

Update a local copy of the PR:
$ git checkout pull/840
$ git pull https://git.openjdk.java.net/jdk11u-dev pull/840/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 840

View PR using the GUI difftool:
$ git pr show -t 840

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk11u-dev/pull/840.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 24, 2022

👋 Welcome back zgu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title Backport d462a6b5c9bd3dae5257cca42ea38c19cb742e3c 8230305: Cgroups v2: Container awareness Feb 24, 2022
@openjdk
Copy link

openjdk bot commented Feb 24, 2022

This backport pull request has now been updated with issue and summary from the original commit.

@openjdk openjdk bot added backport rfr Pull request is ready for review labels Feb 24, 2022
@mlbridge
Copy link

mlbridge bot commented Feb 24, 2022

Webrevs

jerboaa
jerboaa approved these changes Mar 4, 2022
Copy link
Contributor

@jerboaa jerboaa left a comment

This seems fine. A few minor comments.

Aside: We need to devise a plan how we want to deal with integrating this set of patches. For example I'm seeing test failures of the hotspot container tests on a cgroups v2 system due to https://bugs.openjdk.java.net/browse/JDK-8253714.

@@ -68,8 +67,7 @@ class OSContainer: AllStatic {
};

inline bool OSContainer::is_containerized() {
assert(_is_initialized, "OSContainer not initialized");
Copy link
Contributor

@jerboaa jerboaa Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially a backport of https://bugs.openjdk.java.net/browse/JDK-8229202 please mention that issue in the backport (/issue add JDK-8229202). Not sure if this will work for a backport.

/*
* PER_CPU_SHARES has been set to 1024 because CPU shares' quota
* is commonly used in cloud frameworks like Kubernetes[1],
* AWS[2] and Mesos[3] in a similar way. They spawn containers with
* --cpu-shares option values scaled by PER_CPU_SHARES. Thus, we do
* the inverse for determining the number of possible available
* CPUs to the JVM inside a container. See JDK-8216366.
*
* [1] https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu
* In particular:
* When using Docker:
* The spec.containers[].resources.requests.cpu is converted to its core value, which is potentially
* fractional, and multiplied by 1024. The greater of this number or 2 is used as the value of the
* --cpu-shares flag in the docker run command.
* [2] https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerDefinition.html
* [3] https://github.com/apache/mesos/blob/3478e344fb77d931f6122980c6e94cd3913c441d/src/docker/docker.cpp#L648
* https://github.com/apache/mesos/blob/3478e344fb77d931f6122980c6e94cd3913c441d/src/slave/containerizer/mesos/isolators/cgroups/constants.hpp#L30
*/
Copy link
Contributor

@jerboaa jerboaa Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially a backport of https://bugs.openjdk.java.net/browse/JDK-8216366. Perhaps add this issue as part of the backport (/isssue add JDK-8216366).

@openjdk
Copy link

openjdk bot commented Mar 4, 2022

@zhengyu123 This change now passes all automated pre-integration checks.

After integration, the commit message for the final commit will be:

8230305: Cgroups v2: Container awareness
8229202: Docker reporting causes secondary crashes in error handling
8216366: Add rationale to PER_CPU_SHARES define

Implement Cgroups v2 container awareness in hotspot

Reviewed-by: sgehwolf

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 4, 2022
@zhengyu123
Copy link
Contributor Author

zhengyu123 commented Mar 4, 2022

/issue add JDK-8229202
/issue add JDK-8216366

@openjdk
Copy link

openjdk bot commented Mar 4, 2022

@zhengyu123
Adding additional issue to issue list: 8229202: Docker reporting causes secondary crashes in error handling.

@openjdk
Copy link

openjdk bot commented Mar 4, 2022

@zhengyu123
Adding additional issue to issue list: 8216366: Add rationale to PER_CPU_SHARES define.

@zhengyu123
Copy link
Contributor Author

zhengyu123 commented Mar 4, 2022

@jerboaa Thanks for the review.

@jmtd
Copy link

jmtd commented Mar 11, 2022

For me, PlainRead.java is failing on a pure cgroups v2 system (passes on v1):

;$JT_HOME/bin/jtreg -jdk:build/linux-x86_64-normal-server-release/jdk test/hotspot/jtreg/containers/cgroup/PlainRead.java
Test results: failed: 1

(I noticed this in a downstream backport and traced it back up here)

I've put my JTreport here: https://jmtd.net/tmp/jdk11u_8230305_PlainRead_jtreport/JTreport/html/index.html

Edit: @jerboaa points out that this is probably JDK-8278951, although backporting that does not fix the test for me locally. I'll keep exploring.

Edit 2: fails for me on jdk17u-dev master, passes on jdk master.

@mlbridge
Copy link

mlbridge bot commented Mar 11, 2022

Mailing list message from Zhengyu Gu on jdk-updates-dev:

On 3/11/22 10:52, Jonathan Dowland wrote:

On Thu, 24 Feb 2022 21:18:12 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

I would like backport cgroups v2 support to openjdk11u.

The original patch does not apply cleanly, conflicts are resolved manually.

Test:
- [x] jtreg containers/docker test on Ubuntu 20.04.4 LTS

For me, `PlainRead.java` is failing on a pure cgroups v2 system (passes on v1):

;$JT_HOME/bin/jtreg -jdk:build/linux-x86_64-normal-server-release/jdk test/hotspot/jtreg/containers/cgroup/PlainRead.java
Test results: failed: 1

Yes, I noticed. I would suggest you postpone your backport until the
first four patches are pushed.

-Zhengyu

@jerboaa
Copy link
Contributor

jerboaa commented Mar 14, 2022

For me, PlainRead.java is failing on a pure cgroups v2 system (passes on v1):

;$JT_HOME/bin/jtreg -jdk:build/linux-x86_64-normal-server-release/jdk test/hotspot/jtreg/containers/cgroup/PlainRead.java
Test results: failed: 1

(I noticed this in a downstream backport and traced it back up here)

I've put my JTreport here: https://jmtd.net/tmp/jdk11u_8230305_PlainRead_jtreport/JTreport/html/index.html

Edit: @jerboaa points out that this is probably JDK-8278951, although backporting that does not fix the test for me locally. I'll keep exploring.

Edit 2: fails for me on jdk17u-dev master, passes on jdk master.

Note that the PlainRead.java test depends on proper setup of the cpu controller on your cgroups v2 system (unlike cgroups v1). It passes for me on my configured cgroups v2 system:

# /home/sgehwolf/jdk11u-dev/../jdk11u-jdk/bin/java -cp /home/sgehwolf/jdk11u-dev/JTwork/classes/containers/cgroup/PlainRead.d:/home/sgehwolf/jdk11u-dev/test/hotspot/jtreg/containers/cgroup:/home/sgehwolf/jdk11u-dev/JTwork/classes/testlibrary:/home/sgehwolf/jdk11u-dev/test/hotspot/jtreg/testlibrary:/home/sgehwolf/jdk11u-dev/JTwork/classes/test/lib:/home/sgehwolf/jdk11u-dev/test/lib:/home/sgehwolf/jtreg-5/lib/javatest.jar:/home/sgehwolf/jtreg-5/lib/jtreg.jar -Xlog:os+container=trace -version
[0.001s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/memory.max
[0.001s][trace][os,container] Raw value for memory limit is: max
[0.001s][trace][os,container] Memory Limit is: Unlimited
[0.002s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/cpu.max
[0.002s][trace][os,container] Raw value for CPU quota is: max
[0.002s][trace][os,container] CPU Quota is: -1
[0.002s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/cpu.max
[0.002s][trace][os,container] CPU Period is: 100000
[0.002s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/cpu.weight
[0.002s][trace][os,container] Raw value for CPU shares is: 100
[0.002s][debug][os,container] CPU Shares is: -1
[0.002s][trace][os,container] OSContainer::active_processor_count: 4
[0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4
[0.002s][debug][os,container] container memory limit unlimited: -1, using host value
[0.002s][debug][os,container] container memory limit unlimited: -1, using host value
[0.003s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4
[0.072s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/cpu.max
[0.072s][trace][os,container] Raw value for CPU quota is: max
[0.072s][trace][os,container] CPU Quota is: -1
[0.072s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/cpu.max
[0.072s][trace][os,container] CPU Period is: 100000
[0.072s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/cpu.weight
[0.072s][trace][os,container] Raw value for CPU shares is: 100
[0.072s][debug][os,container] CPU Shares is: -1
[0.072s][trace][os,container] OSContainer::active_processor_count: 4
[0.095s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/memory.max
[0.095s][trace][os,container] Raw value for memory limit is: max
[0.095s][trace][os,container] Memory Limit is: Unlimited
[0.095s][debug][os,container] container memory limit unlimited: -1, using host value
[0.095s][debug][os,container] container memory limit unlimited: -1, using host value
[0.096s][debug][os,container] container memory limit unlimited: -1, using host value
[0.096s][debug][os,container] container memory limit unlimited: -1, using host value
[0.097s][debug][os,container] container memory limit unlimited: -1, using host value
[0.097s][debug][os,container] container memory limit unlimited: -1, using host value
[0.098s][debug][os,container] container memory limit unlimited: -1, using host value
[0.098s][debug][os,container] container memory limit unlimited: -1, using host value
[0.100s][debug][os,container] container memory limit unlimited: -1, using host value
[0.100s][debug][os,container] container memory limit unlimited: -1, using host value
[0.101s][debug][os,container] container memory limit unlimited: -1, using host value
[0.103s][debug][os,container] container memory limit unlimited: -1, using host value
[0.104s][debug][os,container] container memory limit unlimited: -1, using host value
[0.104s][debug][os,container] container memory limit unlimited: -1, using host value
[0.104s][debug][os,container] container memory limit unlimited: -1, using host value
[0.105s][debug][os,container] container memory limit unlimited: -1, using host value
[0.105s][debug][os,container] container memory limit unlimited: -1, using host value
[0.106s][debug][os,container] container memory limit unlimited: -1, using host value
[0.107s][debug][os,container] container memory limit unlimited: -1, using host value
[0.108s][debug][os,container] container memory limit unlimited: -1, using host value
[0.108s][debug][os,container] container memory limit unlimited: -1, using host value
[0.108s][debug][os,container] container memory limit unlimited: -1, using host value
[0.109s][debug][os,container] container memory limit unlimited: -1, using host value
[0.114s][debug][os,container] container memory limit unlimited: -1, using host value
[0.115s][debug][os,container] container memory limit unlimited: -1, using host value
[0.118s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/memory.max
[0.118s][trace][os,container] Raw value for memory limit is: max
[0.118s][trace][os,container] Memory Limit is: Unlimited
[0.118s][debug][os,container] container memory limit unlimited: -1, using host value
[0.120s][debug][os,container] container memory limit unlimited: -1, using host value
[0.121s][debug][os,container] container memory limit unlimited: -1, using host value
[0.121s][debug][os,container] container memory limit unlimited: -1, using host value
[0.122s][debug][os,container] container memory limit unlimited: -1, using host value
[0.123s][debug][os,container] container memory limit unlimited: -1, using host value
[0.123s][debug][os,container] container memory limit unlimited: -1, using host value
[0.124s][debug][os,container] container memory limit unlimited: -1, using host value
[0.125s][debug][os,container] container memory limit unlimited: -1, using host value
[0.131s][debug][os,container] container memory limit unlimited: -1, using host value
[0.132s][debug][os,container] container memory limit unlimited: -1, using host value
[0.135s][debug][os,container] container memory limit unlimited: -1, using host value
[0.137s][debug][os,container] container memory limit unlimited: -1, using host value
[0.137s][debug][os,container] container memory limit unlimited: -1, using host value
[0.138s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-2.scope/memory.max
[0.138s][trace][os,container] Raw value for memory limit is: max
[0.138s][trace][os,container] Memory Limit is: Unlimited
[0.138s][debug][os,container] container memory limit unlimited: -1, using host value
[0.138s][debug][os,container] container memory limit unlimited: -1, using host value
openjdk version "11.0.15-internal" 2022-04-19
OpenJDK Runtime Environment (fastdebug build 11.0.15-internal+0-adhoc.sgehwolf.jdk11u-dev)
OpenJDK 64-Bit Server VM (fastdebug build 11.0.15-internal+0-adhoc.sgehwolf.jdk11u-dev, mixed mode)
[0.142s][debug][os,container] container memory limit unlimited: -1, using host value
[0.142s][debug][os,container] container memory limit unlimited: -1, using host value

This shouldn't block this backport.

@zhengyu123
Copy link
Contributor Author

zhengyu123 commented Mar 23, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Mar 23, 2022

Going to push as commit 8715f37.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 23, 2022
@openjdk openjdk bot closed this Mar 23, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 23, 2022
@openjdk
Copy link

openjdk bot commented Mar 23, 2022

@zhengyu123 Pushed as commit 8715f37.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport integrated Pull request has been integrated
3 participants