Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CPU limits for cgroup v2 #5850

Closed
vlvkobal opened this issue Apr 12, 2019 · 8 comments

Comments

Projects
None yet
3 participants
@vlvkobal
Copy link
Member

commented Apr 12, 2019

Feature idea summary

Thanks to the work @skrzyp1 have done in #5407 netdata supports cgroup v2. As far as cgroup v2 provides cpu.max, CPU limits should be implemented.

Expected behavior

cgroups plugin shows CPU limits for cgroup v2.

@skrzyp1

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

@vlvkobal I have to find a way to enable this controller on my machine. I will try to investigate and possibly implement this next week.

@Ferroin

This comment has been minimized.

Copy link
Collaborator

commented Apr 15, 2019

I'm still a bit confused as to why it's believed that it makes sense to track this. It's a CPU bandwidth limiting parameter that is functionally read-only on the kernel side (the kernel will NEVER change it, it only reads the value the user sets), standard usage involves setting it once at cgroup creation and then never changing it, and the few rare cases where it is adjusted dynamically at runtime by whatever is managing the cgroup change it so slowly that it just doesn't make sense to graph at the timescale Netdata operates at.

@vlvkobal

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

The reason is to use cpu.max for calculating the percentage of the cgroup cpu usage relative to an allowed cpu bandwidth, to have a gauge with a useful upper limit, and to set a corresponding health variable and use it in alarms.

@skrzyp1

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Here is WiP PR #5895
Reading cpu limits works, however it behaves weirdly on charts when limits are changed (there are peaks).
After first limit change chart fluctuates instead of being straight line.
Probably reading values goes out of sync with each other.
Screenshot from 2019-04-19 00-45-04
I don't know if this behavior also occurs in cgroups-v1and if its problem / something to be fixed.

@vlvkobal

This comment has been minimized.

Copy link
Member Author

commented May 7, 2019

For cgroups-v1 you'll also have a spike on the chart when you have lowered a CPU quota. This behaviour was chosen because scheduler can't limit CPU consumption precisely, so it is possible to have CPU load over the limit. Instead of cutting the over-limited values and hide from a user the actual information we consented to allow some weird spikes on the chart.

Also, because of the scheduler behaviour, the higher quota you set, the fewer fluctuations you get.

@skrzyp1

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

I removed WiP code and changed reviewed lines. Should be ready to merge.
I was thinking if cpu cores number is something that changes during runtime.
In memory limits logic here we cache ram total number into static variable, maybe we should also do it for cpu cores instead of invoking get_system_cpus for each update?

@vlvkobal

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

Theoretically, the number of CPU cores can change at runtime. So it's safer to invoke get_system_cpus on every update.

@vlvkobal

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

Thank you @skrzyp1 for your work! 👍

@vlvkobal vlvkobal added this to the v1.15 milestone May 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.