Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8269851: OperatingSystemMXBean getProcessCpuLoad reports incorrect process cpu usage in containers #4702

Closed
wants to merge 8 commits into from

Conversation

xpbob
Copy link
Contributor

@xpbob xpbob commented Jul 7, 2021

…ocess cpu usage in containers


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8269851: OperatingSystemMXBean getProcessCpuLoad reports incorrect process cpu usage in containers

Reviewers

Contributors

  • Severin Gehwolf <sgehwolf@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4702/head:pull/4702
$ git checkout pull/4702

Update a local copy of the PR:
$ git checkout pull/4702
$ git pull https://git.openjdk.java.net/jdk pull/4702/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4702

View PR using the GUI difftool:
$ git pr show -t 4702

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4702.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 7, 2021

👋 Welcome back xpbob! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8269851: OperatingSystemMXBean getProcessCpuLoad reports incorrect pr… 8269851: OperatingSystemMXBean getProcessCpuLoad reports incorrect process cpu usage in containers Jul 7, 2021
@xpbob
Copy link
Contributor Author

xpbob commented Jul 7, 2021

/issue add JDK-8269851

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 7, 2021
@openjdk
Copy link

openjdk bot commented Jul 7, 2021

@xpbob This issue is referenced in the PR title - it will now be updated.

@openjdk
Copy link

openjdk bot commented Jul 7, 2021

@xpbob The following labels will be automatically applied to this pull request:

  • jmx
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org jmx jmx-dev@openjdk.org labels Jul 7, 2021
@xpbob
Copy link
Contributor Author

xpbob commented Jul 7, 2021

getProcessCpuLoad return jvm cpu load。
8 core in machine, set 1 core for container。
Run same program, the getProcessCpuLoad return the same value。

expected
same program run in machine,value 0.12
run in 1 core container,value 0.99
ProcessCpuLoad is associative container setting

@mlbridge
Copy link

mlbridge bot commented Jul 7, 2021

@dholmes-ora
Copy link
Member

Why did you create a new PR ??

@xpbob
Copy link
Contributor Author

xpbob commented Jul 7, 2021

Why did you create a new PR ??

bug description changed。
update the commit info.

@linzang
Copy link
Contributor

linzang commented Jul 7, 2021

Hi Bob,
I am not a reviewer, just a tiny comments that you may need to remove the unnecessary empty line. Thanks.
-Lin

Copy link
Member

@YaSuenag YaSuenag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HotSpot, CPU usage for JVM would be calculated at get_jvm_tickes() in os_perf_linux.cpp . It refers /proc/self/stat - should we refer it in same way?

I discussed about similar issue in #4299 , then we came to an agreement that we should refactor that HotSpot and JMX should share the code. Of course you can fix JMX only in this PR, but it still uses different implementation from HotSpot. I prefer to fix it in code consolidation between HotSpot and JMX.

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch seems fine in principle, but I don't like the duplication. Right now, getProcessUsageDividesTotal() and getUsageDividesTotal() look almost the same, but use different fields to set. Perhaps pass a parameter whether or not it's a system/process load request and use the different setters accordingly. Same duplication between getCpuLoad() and getProcessCpuLoad(). The main difference seems to be how usageNanos are being retrieved.

@jerboaa
Copy link
Contributor

jerboaa commented Jul 7, 2021

In HotSpot, CPU usage for JVM would be calculated at get_jvm_tickes() in os_perf_linux.cpp . It refers /proc/self/stat - should we refer it in same way?

I discussed about similar issue in #4299 , then we came to an agreement that we should refactor that HotSpot and JMX should share the code. Of course you can fix JMX only in this PR, but it still uses different implementation from HotSpot. I prefer to fix it in code consolidation between HotSpot and JMX.

Yes, we need a bug to track this work (or at least decide on a plan of action - if any). I wouldn't conflate this into this bug, though as it's not clear whether or not JFR/thread dumps should report container values. I'll try to create a bug for this consolidation effort today.

@jerboaa
Copy link
Contributor

jerboaa commented Jul 7, 2021

This patch seems fine in principle [...]

Let me revise this. It looks like imposing CPU limits via cpuset-cpus would go down a path which won't have this fixed with the current patch.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jul 8, 2021
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 8, 2021
@xpbob
Copy link
Contributor Author

xpbob commented Jul 8, 2021

This patch seems fine in principle [...]

Let me revise this. It looks like imposing CPU limits via cpuset-cpus would go down a path which won't have this fixed with the current patch.

use (getProcessCpuLoad0() * totalCPUs / containerCPUs) for CPU limits via cpuset-cpus

@mlbridge
Copy link

mlbridge bot commented Jul 8, 2021

Mailing list message from David Holmes on jmx-dev:

On 7/07/2021 2:37 pm, xpbob wrote:

On Wed, 7 Jul 2021 04:29:01 GMT, David Holmes <dholmes at openjdk.org> wrote:

Why did you create a new PR ??

bug description changed?
update the commit info.

There was no reason to create a new PR.

David

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. getProcessCpuLoad() and getCpuLoad() looks way too similar to me. Is there a way to make them more generic and do the process vs. system paths as needed? I think there is.

@xpbob
Copy link
Contributor Author

xpbob commented Jul 12, 2021

Thanks for the update. getProcessCpuLoad() and getCpuLoad() looks way too similar to me. Is there a way to make them more generic and do the process vs. system paths as needed? I think there is.

use getCpuLoadWithTarget for process and system

@jerboaa
Copy link
Contributor

jerboaa commented Jul 12, 2021

Thanks for the update. getProcessCpuLoad() and getCpuLoad() looks way too similar to me. Is there a way to make them more generic and do the process vs. system paths as needed? I think there is.

use getCpuLoadWithTarget for process and system

Thanks. We are getting there. I've proposed a PR to your branch which makes this a bit easier to read:
xpbob#1

If you merge that it'll show up in this PR.

@xpbob
Copy link
Contributor Author

xpbob commented Jul 13, 2021

Thanks for the update. getProcessCpuLoad() and getCpuLoad() looks way too similar to me. Is there a way to make them more generic and do the process vs. system paths as needed? I think there is.

use getCpuLoadWithTarget for process and system

Thanks. We are getting there. I've proposed a PR to your branch which makes this a bit easier to read:
xpbob#1

If you merge that it'll show up in this PR.

Thank you so much @jerboaa .

Code has been merged.
I tested the following cgroup scenario on Linux/x64 and all passed.

  cpu-quota=50000 cpu-period=50000 
  cpuset-cpus=0-2
  cpu.shares=2048
  cpuset-cpus=3

@xpbob
Copy link
Contributor Author

xpbob commented Jul 13, 2021

/Contributor add @jerboaa

@openjdk
Copy link

openjdk bot commented Jul 13, 2021

@xpbob
Contributor Severin Gehwolf <sgehwolf@openjdk.org> successfully added.

@xpbob
Copy link
Contributor Author

xpbob commented Jul 13, 2021

/test

@openjdk
Copy link

openjdk bot commented Jul 13, 2021

@xpbob you need to get approval to run the tests in tier1 for commits up until fdd17d5

@openjdk openjdk bot added the test-request label Jul 13, 2021
@jerboaa
Copy link
Contributor

jerboaa commented Jul 13, 2021

https://github.com/jerboaa/jdk/actions/runs/1026610314 is a pre-integration run of this.

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the missing comments in the new code blocks this looks good to me.

@openjdk
Copy link

openjdk bot commented Jul 13, 2021

@xpbob This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8269851: OperatingSystemMXBean getProcessCpuLoad reports incorrect process cpu usage in containers

Co-authored-by: Severin Gehwolf <sgehwolf@openjdk.org>
Reviewed-by: sgehwolf

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 304 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@YaSuenag, @jerboaa) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 13, 2021
Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good.

@xpbob
Copy link
Contributor Author

xpbob commented Jul 16, 2021

May I get a second review for this change? Thanks.

@xpbob
Copy link
Contributor Author

xpbob commented Jul 28, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jul 28, 2021
@openjdk
Copy link

openjdk bot commented Jul 28, 2021

@xpbob
Your change (at version 0f43d5d) is now ready to be sponsored by a Committer.

@DamonFool
Copy link
Member

Will sponsor it tomorrow if there is no objection.
Thanks.

@DamonFool
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Jul 28, 2021

Going to push as commit 25f00d7.
Since your change was applied there have been 314 commits pushed to the master branch:

  • 41b4c19: 8271353: PerfDataManager::destroy crashes in VM_Exit
  • 357947a: 8270061: Change parameter order of ResourceHashtable
  • 034788a: 8270925: replay dump using CICrashAt does not include inlining data
  • 9711033: 8209776: Refactor jdk/security/JavaDotSecurity/ifdefs.sh to plain java test
  • 60c11fe: 8261088: Repeatable annotations without @target cannot have containers that target module declarations
  • dcdb1b6: 8137101: [TEST_BUG] javax/swing/plaf/basic/BasicHTML/4251579/bug4251579.java failure due to timing
  • a066c7b: 8270086: ARM32-softfp: Do not load CONSTANT_double using the condy helper methods in the interpreter
  • 072fe48: 8270901: Typo PHASE_CPP in CompilerPhaseType
  • d7b5cb6: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization
  • ecd4455: 8266510: Nimbus JTree default tree cell renderer does not use selected text color
  • ... and 304 more: https://git.openjdk.java.net/jdk/compare/17f53f2f9c5928395eff9186160924e9a8e9a794...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jul 28, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jul 28, 2021
@openjdk
Copy link

openjdk bot commented Jul 28, 2021

@DamonFool @xpbob Pushed as commit 25f00d7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated jmx jmx-dev@openjdk.org serviceability serviceability-dev@openjdk.org test-request
Development

Successfully merging this pull request may close these issues.

6 participants