-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8265836: OperatingSystemImpl.getCpuLoad() returns incorrect CPU load inside a container #3656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Shaojun Wang <jeffery.wsj@alibaba-inc.com>
Hi @tanghaoth90, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user tanghaoth90" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
@tanghaoth90 The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/signed |
/covered |
Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
@tanghaoth90 Syntax:
|
/contributor add Shaojun Wang jeffery.wsj@alibaba-inc.com |
@tanghaoth90 |
Webrevs
|
long elapsedNanos = TimeUnit.MICROSECONDS.toNanos(periodLength * numPeriods); | ||
double systemLoad = (double) usageNanos / elapsedNanos; | ||
if (numPeriods > 0 && usageNanos > 0) { | ||
long quotaNanos = TimeUnit.MICROSECONDS.toNanos(quota * numPeriods); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We happened to hit an exactly similar problem when running on a container with openjdk15.
Given we effectively agree that the problem is elapsedNanos
doesn't accurately reflect the cpu time allocated across all shares vs a single share, my proposal was to use getCpuShares
as a multiplier for periodLength
above.
Is there a good reason getCpuQuota
is a better alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Argha, thanks a lot for your suggestion. I think both "quota" and "share" are worth considering. Let us look into the implementation of CgroupSubsystem::active_processor_count()
in OpenJDK HotSpot (https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/cgroupSubsystem_linux.cpp).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for linking that. It sounds reasonable to me to prefer quota
in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for linking that. It sounds reasonable to me to prefer
quota
in that case.
Yes, flag PreferContainerQuotaForCPUCount
is true by default. Therefore, my current implementation might be a minimal implementation.
We can also cover the case where PreferContainerQuotaForCPUCount
is false. There are two different ways.
- To access the value of
PreferContainerQuotaForCPUCount
, so that we can decide if we should usequota
or (quota
&share
); - Reuse
CgroupSubsystem::active_processor_count
. However, the function returns an integer. It is more reasonable to use a floating number.
Looking forward to your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We happened to hit an exactly similar problem when running on a container with openjdk15.
Given we effectively agree that the problem is
elapsedNanos
doesn't accurately reflect the cpu time allocated across all shares vs a single share, my proposal was to usegetCpuShares
as a multiplier forperiodLength
above.
Is there a good reasongetCpuQuota
is a better alternative?
@argha-c The proposed fix is within the quota > 0
condition. I.e. this is code only run when CPU quotas, not shares are in effect. In docker/podman speach these are --cpu-quota=...
and --cpu-period=....
switches. So no, in this case it wouldn't make sense to use cpu shares info in a branch which looks at cpu quotas ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@argha-c The proposed fix is within the
quota > 0
condition. I.e. this is code only run when CPU quotas, not shares are in effect. In docker/podman speach these are--cpu-quota=...
and--cpu-period=....
switches. So no, in this case it wouldn't make sense to use cpu shares info in a branch which looks at cpu quotas ;-)
@jerboaa : Correct. My comment was less specific to the branch, and more to highlight that a fix here needs to consider the case for both quota
and shares
. I see the bug report has been updated to reflect that. Cheers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanghaoth90 My local testing suggests that your fix addresses the issue of CPU quotas set via
--cpu-quota/--cpu-period
. When using--cpu-shares
the CPU load calculation is wrong since it will (wrongly) report host values. Lets look at them individually, fix the quota and shares case individually (i.e. when not both are set). Once that's done, quota will be preferred in the OperatingSystemMXBean impl, which is reasonable. I don't think we need to account for the shares-preferred-over-quota at this point since that changed in HotSpot in JDK 11 time-frame (JDK-8197589) and OperatingSystemMXBean has only been made container aware in JDK 14 (yes, it got backported, but still).
@jerboaa I have updated my fix by inserting a new branch for --cpu-shares
. By setting up containers with different restrictions based on your suggestion, I noticed two problems for the CPU load calculation. Take TestLoad.java as the example.
--cpu-quota=700000 --cpu-period=100000
as the restriction: the result is getting close to6/7
slowly as time goes by, which indicates that the result of
long usageNanos = containerMetrics.getCpuUsage();
is an accumulated CPU usage rather than a real-time CPU usage.
According to the javadoc, getCpuLoad()
returns the "recent cpu usage". The current fix obviously does not address this issue.
--cpu-shares=2048
as the restriction: the "cpu-share" branch only returns-1
sincecontainerMetrics.getCpuNumPeriods()
returns0
(nr_periods
ofcpu.stat
stays0
when only--cpu-shares
is set). By now, I have no idea to compute the elapsed time or the total available CPU time with the help of the methods ofjdk.internal.platform.CgroupMetrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to your observation of 1). Yes, that's true and I'm not sure it's worth changing. We could change the javadoc.
As for 2) I've created a PR for your branch with a suggested implementation for cpu shares based accounting. It uses a similar heuristics than the getCpuLoad0() uses here:
https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L278
That's the best I could think of. It's certainly more accurate than the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerboaa Just read your comment about my observation (due to the weird format of github pull request). Sorry for making further change for the quota
branch without asking for your suggestion. For now, I still hold a different opinion about the quota
branch. My local test shows that
podman run --cpu-shares=4096 <image>
podman run --cpu-quota=400000 --cpu-period=100000 <image>
yields the same result after the change, otherwise the second one still prints the average cpu load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanghaoth90 There are a couple of ways how CPU restrictions can be enforced. It appears that the current patch fixes the cpuacct controller case (--cpu-quota
), but doesn't yet address --cpu-shares
case. I've made a note of that in the bug.
long elapsedNanos = TimeUnit.MICROSECONDS.toNanos(periodLength * numPeriods); | ||
double systemLoad = (double) usageNanos / elapsedNanos; | ||
if (numPeriods > 0 && usageNanos > 0) { | ||
long quotaNanos = TimeUnit.MICROSECONDS.toNanos(quota * numPeriods); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We happened to hit an exactly similar problem when running on a container with openjdk15.
Given we effectively agree that the problem is
elapsedNanos
doesn't accurately reflect the cpu time allocated across all shares vs a single share, my proposal was to usegetCpuShares
as a multiplier forperiodLength
above.
Is there a good reasongetCpuQuota
is a better alternative?
@argha-c The proposed fix is within the quota > 0
condition. I.e. this is code only run when CPU quotas, not shares are in effect. In docker/podman speach these are --cpu-quota=...
and --cpu-period=....
switches. So no, in this case it wouldn't make sense to use cpu shares info in a branch which looks at cpu quotas ;-)
long elapsedNanos = TimeUnit.MICROSECONDS.toNanos(periodLength * numPeriods); | ||
double systemLoad = (double) usageNanos / elapsedNanos; | ||
if (numPeriods > 0 && usageNanos > 0) { | ||
long quotaNanos = TimeUnit.MICROSECONDS.toNanos(quota * numPeriods); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Argha, thanks a lot for your suggestion. I think both "quota" and "share" are worth considering.
@tanghaoth90 My local testing suggests that your fix addresses the issue of CPU quotas set via --cpu-quota/--cpu-period
. When using --cpu-shares
the CPU load calculation is wrong since it will (wrongly) report host values. Lets look at them individually, fix the quota and shares case individually (i.e. when not both are set). Once that's done, quota will be preferred in the OperatingSystemMXBean impl, which is reasonable. I don't think we need to account for the shares-preferred-over-quota at this point since that changed in HotSpot in JDK 11 time-frame (JDK-8197589) and OperatingSystemMXBean has only been made container aware in JDK 14 (yes, it got backported, but still).
@tanghaoth90 I've added a comment which should address the cpu shares based cpuLoad problem. Let me know what you think. |
/contributor add @jerboaa |
/reviewers 2 |
Bumping number of required reviewers to 2 as I'm a contributing author of the change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nit: Please remove PER_CPU_SHARES
constant as it's no longer used.
@@ -41,7 +41,10 @@ | |||
implements com.sun.management.UnixOperatingSystemMXBean { | |||
|
|||
private static final int MAX_ATTEMPTS_NUMBER = 10; | |||
private static final int PER_CPU_SHARES = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer used anywhere so can get removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@tanghaoth90 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 530 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jerboaa, @YaSuenag) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
@jerboaa Only the author (@tanghaoth90) is allowed to issue the |
@jerboaa |
/contributor add @jerboaa |
@tanghaoth90 |
@jerboaa Thank you for the cpu-shares-based load calculation! I also update the cpu-quota-based load calculation to use the same strategy as the cpu-shares-based one, because my previous implementation always reports the average load since the container starts. |
OK.
Seems reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still good.
Any other reviewers, please? @YaSuenag perhaps? |
I haven't followed yet all of discussions in this review, but I concern this PR changes the meaning of
|
@YaSuenag Thanks for your comment. I can give two reasons for that.
The result might be too coarse/inaccurate, if the time between two calls is too long/short. Any comments for that? |
Does it? Here is the javadoc: It says:
If I run outside a container, getCpuLoad0() is being used which uses a similar calculation. So in my book this fixes the bug (earlier reporting is just plain wrong, see bug for details) and just brings it in line with the non-container values. Also note that the cpu quota-based limits implementation as been changed to also report recent load values. But I'm not sure it's worth discussing this further as the previous implementation (supposedly reporting an average) was wrong.
IMHO, yes as the previous implementation was incorrect. I suspect nobody noticed until now.
I disagree. This change makes [1] https://bugs.openjdk.java.net/browse/JDK-8265836?focusedCommentId=14423632&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14423632 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanghaoth90 @jerboaa Thank you for explanation!
The result might be too coarse/inaccurate, if the time between two calls is too long/short. Any comments for that?
I concerned about that too. Javadoc says "recent cpu usage" about this - it is ambiguous. In other words, our concerns are tolerated IMHO.
Of course it is better if we can get sampling value, but it is difficult now. Maybe we can implement sampling thread, but we need to think some things (e.g. sampling frequency). I'm not sure it is worth to work for it.
I think it is important that getCpuLoad()
behaves similar in all cases, and also I can't see any problems in your change. So I give +1 to your change 😉
Thanks very much for the review! |
/integrate |
@tanghaoth90 |
/sponsor |
@jerboaa @tanghaoth90 Since your change was applied there have been 531 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit ef368b3. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
OperatingSystemImpl.getCpuLoad() may return 1.0 in a container, even though the CPU load is obviously below 100%.
We created a 5-core container and run 4 "while (true)" loops in the container. OperatingSystemImpl.getCpuLoad() returned 1.0, which is incorrect (0.8 is correct).
"systemLoad" in getCpuLoad() is exactly 4.0 before "systemLoad = Math.min(1.0, systemLoad);". The problem is caused by using the elapsed time (specified by "cpu.cfs_period_us") instead of the total CPU time (specified by "cpu.cfs_quota_us"). Therefore, it is more reasonable to divide cpu usage time by "quotaNanos" instead of "elapsedNanos".
Progress
Issue
Reviewers
Contributors
<jeffery.wsj@alibaba-inc.com>
<sgehwolf@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3656/head:pull/3656
$ git checkout pull/3656
Update a local copy of the PR:
$ git checkout pull/3656
$ git pull https://git.openjdk.java.net/jdk pull/3656/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3656
View PR using the GUI difftool:
$ git pr show -t 3656
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3656.diff