Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS 7 containers on CentOS 7.6 LXD host reports process utilization incorrectly #284

Closed
trenb opened this issue May 29, 2019 · 11 comments
Labels
Bug Confirmed to be a bug Incomplete Waiting on more information from reporter

Comments

@trenb
Copy link

trenb commented May 29, 2019

I tried logging a thread here, but doesn't seem to be going anywhere - https://discuss.linuxcontainers.org/t/centos-7-x-containers-on-a-centos-7-6-lxc-host-providing-incorrect-cpu-utilization/4863/4

So far I've been able to reproduce this on multiple kernels, my own compiled lxc/lxd/lxcfs and the Fedora provided packages at https://copr-be.cloud.fedoraproject.org/results/ganto/lxc3/epel-7-x86_64/

The issue is that the containers report that they're using more CPU than they actually are. It's only a reporting issue, however it makes monitoring CPU utilization inside the container difficult, as it reports 40% or more utilization, on a completely idle container.

I can only reproduce this with CentOS 7 64 bit containers. I can also reproduce it with a container launched via lxc launch images:centos/7/amd64 testing

I've run out of things to test myself, and if anyone has any ideas or suggestions, that would be appreciated.

@trenb
Copy link
Author

trenb commented May 29, 2019

Also, if I stop lxcfs and ensure it's not running when I start the container, process utilization reporting is correct.

@trenb
Copy link
Author

trenb commented Jun 4, 2019

@tomponline Here's the github issue for the above thread on discuss.linuxcontainers.org.

@hallyn
Copy link
Member

hallyn commented Jun 7, 2019

Also, if I stop lxcfs and ensure it's not running when I start the container, process utilization reporting is correct.

Do you mean you stop lxcfs, start the container, then start lxcfs?

@trenb
Copy link
Author

trenb commented Jun 7, 2019

Nope, I mean ensure that lxcfs does not run at all when I start the container. If lxcfs is running, process utilization isn't properly reported.

@hallyn
Copy link
Member

hallyn commented Jun 7, 2019

How reliable is this for you? I just created a fresh bionic VM, installed lxc from ppa, created an unpriv container and ran top. I'm not seeing this.

ii  lxcfs                            3.0.3-0ubuntu1~18.04.1                      amd64        FUSE based filesystem for LXC
ii  lxc1                           3.1.0+master~2019060 all                  Transitional package - lxc1 -> lxc-utils

Do you know which specific lxcfs (/proc) file is wrong in your tests?

@trenb
Copy link
Author

trenb commented Jun 7, 2019

Press the space-bar repeatedly to force refreshes. You should see it pretty quickly. I can reproduce this easily, and @tomponline reproduced this on ubuntu 16.04 and centos 7 hosts.

@hallyn
Copy link
Member

hallyn commented Jun 7, 2019

Which value are you seeing too high? I'm still not really seeing it (once in awhile top comes back at '100%' but that happens for me even when not in a container).

Do you see this whether or not your cpuset.cpus is limited? Top does scale used cpu time by number of real hw threads, so it's possible that reducing the number of cpus seen in /proc/cpuinfo is throwing things off, since we don't scale what's in /proc/$$/stat etc.

@hallyn
Copy link
Member

hallyn commented Jun 7, 2019

(It's also possible that this is a bug in the load tracking calculations, but I don't see /proc/loadavg giving bad numbers)

@hallyn
Copy link
Member

hallyn commented Jun 7, 2019

actually here /proc/loadavg always gives me 0's.

@trenb
Copy link
Author

trenb commented Jun 7, 2019

If you check https://discuss.linuxcontainers.org/t/centos-7-x-containers-on-a-centos-7-6-lxc-host-providing-incorrect-cpu-utilization/4863 there's a screenshot from graphite showing inflated CPU use on LXD containers. As well I have several outputs from top there.

When @tomponline is able to jump in here, hopefully he can provide more technical information.

@stgraber
Copy link
Member

stgraber commented Mar 3, 2020

As per what's mentioned in 283, can you check for any such log messages on your system when the issue occurs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug Incomplete Waiting on more information from reporter
Development

No branches or pull requests

3 participants