Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8245543: Cgroups: Incorrect detection logic on some systems (still reproducible) #485

Closed

Conversation

@jerboaa
Copy link
Contributor

@jerboaa jerboaa commented Oct 2, 2020

An issue similar to JDK-8239559 has been discovered. On the affected system, /proc/self/mountinfo
contains a line such as this one:

35 26 0:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup systemd rw,name=systemd 

Note that /proc/cgroups looks like this:

#subsys_name	hierarchy	num_cgroups	enabled
cpuset	0	1	1
cpu	0	1	1
cpuacct	0	1	1
blkio	0	1	1
memory	0	1	1
devices	0	1	1
freezer	0	1	1
net_cls	0	1	1
perf_event	0	1	1
net_prio	0	1	1
hugetlb	0	1	1
pids	0	1	1

This is in fact a cgroups v1 system with systemd put into a separate namespace via FS type cgroup. That mountinfo line confuses the cgroups version detection heuristic. It only looked whether or not there is any entry in mountinfo of FS-type cgroup . That's wrong. A better heuristic would be looking at controllers we care about: cpu, cpuacct, memory, cpuset for hotspot and some more for the Java side. The proposed fix on the hotspot side is to set any_cgroup_mounts_found to true only if one of those controllers show up in mountinfo. Otherwise, we know it's cgroup v2 due to the zero hierarchy ids. The fix on the Java side is similar.

For the Java side the proposal is also to do the parsing of the cgroup files in the factory now (single pass of files). No longer in the version specific objects. In order to not regress here, I've added more tests.

Testing

  • Linux x86_64 container tests on cgroup v1 (fastdebug)
  • Linux x86_64 container tests on cgroup v2 (fastdebug)
  • Added regression test which fails prior the patch and passes after
  • Submit testing

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Windows x64 macOS x64
Build (2/5 failed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Build / test ✔️ (0/0 passed) ✔️ (0/0 passed) ✔️ (0/0 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed)

Failed test tasks

Issue

  • JDK-8245543: Cgroups: Incorrect detection logic on some systems (still reproducible)

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/485/head:pull/485
$ git checkout pull/485

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 2, 2020

👋 Welcome back sgehwolf! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk openjdk bot added the rfr label Oct 2, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 2, 2020

@jerboaa The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-runtime
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

Loading

@jerboaa jerboaa changed the title 8245543: Cgroups: Incorrect detection logic on some systems 8245543: Cgroups: Incorrect detection logic on some systems (still reproducible) Oct 2, 2020
@jerboaa
Copy link
Contributor Author

@jerboaa jerboaa commented Oct 2, 2020

@bulasevich Do you want to give this a spin?

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 2, 2020

Webrevs

Loading

@bulasevich
Copy link
Contributor

@bulasevich bulasevich commented Oct 4, 2020

The change looks reasonable. I checked the fail - it is gone with the change! And both jtreg tests passed.

Loading

@jerboaa
Copy link
Contributor Author

@jerboaa jerboaa commented Oct 5, 2020

The change looks reasonable. I checked the fail - it is gone with the change! And both jtreg tests passed.

Thanks for the review and for testing it!

Loading

@jerboaa
Copy link
Contributor Author

@jerboaa jerboaa commented Oct 5, 2020

@bobvandette Could you perhaps have a look at this? Many thanks in advance!

Loading

@jerboaa jerboaa force-pushed the jdk-8245543-cgroup-detect-bug branch from 5e2ca2d to e72ac21 Oct 8, 2020
@jerboaa
Copy link
Contributor Author

@jerboaa jerboaa commented Oct 8, 2020

@bobvandette How about this?

Loading

@jerboaa
Copy link
Contributor Author

@jerboaa jerboaa commented Oct 8, 2020

Thanks for the review!

Loading

Copy link
Contributor

@shipilev shipilev left a comment

This looks fine to me. Consider a minor nit below.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Oct 9, 2020

@jerboaa This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8245543: Cgroups: Incorrect detection logic on some systems (still reproducible)

Reviewed-by: bobv, shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 8 new commits pushed to the master branch:

  • aaa0a2a: 8254297: Zero and Minimal VMs are broken with undeclared identifier 'DerivedPointerTable' after JDK-8253180
  • 7e80c98: 8254261: fix javadocs in jdk.test.lib.Utils
  • d4b5dfd: 8253857: Shenandoah: Bugs in ShenandoahEvacOOMHandler related code
  • e9c1905: 8253740: [PPC64] Minor interpreter cleanup
  • b1448da: 8253900: SA: wrong size computation when JVM was built without AOT
  • 2bc8bc5: 8254265: s390 and linux 32 bit builds broken
  • 4f9a1ff: 8254073: Tokenizer improvements (revised)
  • 9cecc16: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label Oct 9, 2020
@jerboaa jerboaa force-pushed the jdk-8245543-cgroup-detect-bug branch from c18be73 to 7a988bf Oct 9, 2020
@jerboaa
Copy link
Contributor Author

@jerboaa jerboaa commented Oct 9, 2020

/integrate

Loading

@openjdk openjdk bot closed this Oct 9, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 9, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 9, 2020

@jerboaa Since your change was applied there have been 8 commits pushed to the master branch:

  • aaa0a2a: 8254297: Zero and Minimal VMs are broken with undeclared identifier 'DerivedPointerTable' after JDK-8253180
  • 7e80c98: 8254261: fix javadocs in jdk.test.lib.Utils
  • d4b5dfd: 8253857: Shenandoah: Bugs in ShenandoahEvacOOMHandler related code
  • e9c1905: 8253740: [PPC64] Minor interpreter cleanup
  • b1448da: 8253900: SA: wrong size computation when JVM was built without AOT
  • 2bc8bc5: 8254265: s390 and linux 32 bit builds broken
  • 4f9a1ff: 8254073: Tokenizer improvements (revised)
  • 9cecc16: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation

Your commit was automatically rebased without conflicts.

Pushed as commit 2bbf8a2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment