"ps" show 0.0 of %cpu for a busy process in container #444

borgerli · 2021-02-03T03:11:29Z

I'm using lxcfs 4.0.7. I created an container whit lxcfs proc files mount, then kicked off a process while true;do echo test > /dev/null;done. In container, top command showed the correct %cpu information, while ps always showed 0.0. However when not using lxcfs, ps worked well.

Steps

start lxcfs

`/usr/local/bin/lxcfs -l --enable-cfs --enable-pidfd /var/lib/lxc/lxcfs

start docker container

docker run -it -m 128m --cpus=1 --rm \
  -v /var/lib/lxc/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \
  -v /var/lib/lxc/lxcfs/proc/diskstats:/proc/diskstats:rw \
  -v /var/lib/lxc/lxcfs/proc/meminfo:/proc/meminfo:rw \
  -v /var/lib/lxc/lxcfs/proc/stat:/proc/stat:rw \
  -v /var/lib/lxc/lxcfs/proc/swaps:/proc/swaps:rw \
  -v /var/lib/lxc/lxcfs/proc/loadavg:/proc/loadavg:rw \
  -v /var/lib/lxc/lxcfs/proc/uptime:/proc/uptime:rw \
  -v /var/lib/lxc/lxcfs/sys/devices/system/cpu/online:/sys/devices/system/cpu/online:rw \
  -v /var/lib/lxc:/var/lib/lxc:rshared \
  centos:7 /bin/bash

test

top shows 100.0, while ps shows 0.0 for process 16

[root@af61796cf0ed /]# while true; do echo test > /dev/null;done &
[1] 16
[root@af61796cf0ed /]# top -b -n 1
top - 03:06:17 up 0 min,  0 users,  load average: 0.00, 0.00, 0.00
Tasks:   3 total,   2 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.0 us,  0.0 sy,  0.0 ni, 88.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   131072 total,   127272 free,     3800 used,        0 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   127272 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   16 root      20   0   11840    396      0 R 100.0  0.3   0:03.40 bash
    1 root      20   0   11840   2984   2588 S   0.0  2.3   0:00.04 bash
   17 root      20   0   56064   3696   3248 R   0.0  2.8   0:00.00 top
[root@af61796cf0ed /]# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  2.2  11840  2984 pts/0    Ss   03:05   0:00 /bin/bash
root        16  0.0  0.3  11840   396 pts/0    R    03:06   0:09 /bin/bash
root        18  0.0  2.6  51744  3416 pts/0    R+   03:06   0:00 ps aux

test without lxcfs

top shows 100.0, and ps shows 102

root@dev:~# docker run -it -m 128m --cpus=1 centos:7 /bin/bash
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[root@0d6cb011e598 /]# while true; do echo test > /dev/null;done &
[1] 16
[root@0d6cb011e598 /]# top -b -n 1
top - 03:09:29 up 7 days, 15:12,  0 users,  load average: 2.07, 1.79, 1.42
Tasks:   3 total,   2 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 15.4 us, 10.6 sy,  0.0 ni, 73.2 id,  0.0 wa,  0.0 hi,  0.8 si,  0.0 st
KiB Mem : 16260516 total,  1436644 free,  1152404 used, 13671468 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 14788144 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   16 root      20   0   11840    396      0 R 100.0  0.0   0:05.02 bash
    1 root      20   0   11840   2912   2516 S   0.0  0.0   0:00.03 bash
   17 root      20   0   56064   3656   3212 R   0.0  0.0   0:00.00 top
[root@0d6cb011e598 /]# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.2  0.0  11840  2912 pts/0    Ss   03:09   0:00 /bin/bash
root        16  102  0.0  11840   396 pts/0    R    03:09   0:07 /bin/bash
root        18  0.0  0.0  51744  3392 pts/0    R+   03:09   0:00 ps aux
[root@0d6cb011e598 /]#

The text was updated successfully, but these errors were encountered:

brauner · 2021-02-03T15:23:30Z

LXCFS virtualizes cpu utilization according to the cgroup the target process is in. If it's not using a lot of cpu then you won't see anything. Try to create some load by e.g. calling stress with the cpu option inside of the container and you should see an increase.

borgerli · 2021-02-04T02:43:10Z

@brauner Thanks for comment.

Acutally, we did run a process which uses lots of cpu(while true; do echo test > /dev/null;done &). And as you suggested, I tested with stress, and got the same result: top showed ~100 %cpu, but ps still 0.0 %cpu.

start lxcfs: · /usr/local/bin/lxcfs -l --enable-cfs --enable-pidfd /var/lib/lxc/lxcfs
start stress container

docker run -it --name stress -m 128m --cpus=1 --rm \
  -v /var/lib/lxc/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \
  -v /var/lib/lxc/lxcfs/proc/diskstats:/proc/diskstats:rw \
  -v /var/lib/lxc/lxcfs/proc/meminfo:/proc/meminfo:rw \
  -v /var/lib/lxc/lxcfs/proc/stat:/proc/stat:rw \
  -v /var/lib/lxc/lxcfs/proc/swaps:/proc/swaps:rw \
  -v /var/lib/lxc/lxcfs/proc/loadavg:/proc/loadavg:rw \
  -v /var/lib/lxc/lxcfs/proc/uptime:/proc/uptime:rw \
  -v /var/lib/lxc/lxcfs/sys/devices/system/cpu/online:/sys/devices/system/cpu/online:rw \
  progrium/stress --cpu 1

get into the container and verity %cpu with top(99.6) and ps(0.0)

root@borgerli-devcloud:~# docker exec -it $(docker inspect stress -f "{{.Id}}") /bin/bash
root@33bc005fa2d5:/# top -b -n 1
top - 02:34:42 up 4 min,  0 users,  load average: 0.28, 0.07, 0.02
Tasks:   4 total,   2 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s): 98.5 us,  0.0 sy,  0.0 ni,  1.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    131072 total,     3660 used,   127412 free,        0 buffers
KiB Swap:        0 total,        0 used,        0 free.        4 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
    7 root      20   0    7316     96      0 R 99.6  0.1   4:03.82 stress
    1 root      20   0    7316    896    812 S  0.0  0.7   0:00.02 stress
   28 root      20   0   18164   3300   2828 S  0.0  2.5   0:00.02 bash
   36 root      20   0   19748   2372   2124 R  0.0  1.8   0:00.00 top
root@33bc005fa2d5:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.6   7316   896 pts/0    Ss+  02:30   0:00 /usr/bin/stress --verbose --
root         7  0.0  0.0   7316    96 pts/0    R+   02:30   4:10 /usr/bin/stress --verbose --
root        28  0.0  2.5  18164  3300 pts/1    Ss   02:34   0:00 /bin/bash
root        37  0.0  1.5  15576  2064 pts/1    R+   02:34   0:00 ps aux

brauner · 2021-02-05T11:21:28Z

Odd, what happens, if you turn off cpu shares, i.e. skip --enable-cfs?

borgerli · 2021-02-06T13:37:12Z

@brauner I checked procps code related to pcpu, and found the reason of this issue.

As shown in below code of procps, if lxcfs uptime mounted in containers, the seconds_since_boot (since the container starts) will be always little than the process start_time(since the boot time of the host). And as a result, seconds will always be zero, and then pcpu is also zero.

https://gitlab.com/procps-ng/procps/-/blob/master/ps/output.c#L525:

  seconds = cook_etime(pp);
  if(seconds) pcpu = (total_time * 1000ULL / Hertz) / seconds;

https://gitlab.com/procps-ng/procps/-/blob/master/ps/output.c#L136:

#define cook_etime(P) (((unsigned long long)seconds_since_boot >= (P->start_time / Hertz)) ? ((unsigned long long)seconds_since_boot - (P->start_time / Hertz)) : 0)

A workaround is not to mount lxcfs proc/uptime for containers. But this will make containers lose uptime virtualization.

Is it possible for lxcfs to just return host uptime when the calling progress comm is `ps' ?

borgerli · 2021-02-07T06:47:24Z

@brauner I submitted a PR for this issue, please review. Thank you.

PR #445

borgerli · 2021-02-23T09:02:07Z

@brauner Could you please help review the PR?

fixes [lxc#444](lxc#444) Signed-off-by: LI Bo borgerli@tencent.com

mihalicyn · 2024-03-19T17:04:07Z

Hi @borgerli

Sorry for a long delay with response from us. We are working on sorting out issues here and there right now.

I have read through your PR and understood the idea. But the question is that if we can, instead of adding hacks to LXCFS, fix procps utils not to use the uptime value to calculate CPU load and adjust algorithm to be similar to what we have in top utility?

mihalicyn · 2024-03-19T17:04:32Z

cc @stgraber

stgraber · 2024-03-19T19:42:54Z

Yeah, returning different output based on command name definitely isn't something I'd want is to do. It's way too hacky and will let to an undebugable mess.

Tweaking userspace to be a bit smarter would definitely be easier on this case. Especially as there's no way for us to visualize those per-process files.

Once we get @mihalicyn 's work to have lxcfs features per container, then you'd also get the ability to turn off the uptime virtualization where it remains problematic.

brauner closed this as completed Feb 3, 2021

brauner reopened this Feb 5, 2021

borgerli mentioned this issue Feb 7, 2021

fix 0.0 pcpu issue for ps command #445

Closed

borgerli added a commit to borgerli/lxcfs that referenced this issue Feb 23, 2021

fix 0.0 pcpu issue for ps command

bc64e43

fixes [lxc#444](lxc#444) Signed-off-by: LI Bo borgerli@tencent.com

mihalicyn added External Issue is about a bug/feature in another project Maybe Undecided whether in scope for the project labels Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ps" show 0.0 of %cpu for a busy process in container #444

"ps" show 0.0 of %cpu for a busy process in container #444

borgerli commented Feb 3, 2021

brauner commented Feb 3, 2021

borgerli commented Feb 4, 2021

brauner commented Feb 5, 2021

borgerli commented Feb 6, 2021

borgerli commented Feb 7, 2021

borgerli commented Feb 23, 2021

mihalicyn commented Mar 19, 2024

mihalicyn commented Mar 19, 2024

stgraber commented Mar 19, 2024

"ps" show 0.0 of %cpu for a busy process in container #444

"ps" show 0.0 of %cpu for a busy process in container #444

Comments

borgerli commented Feb 3, 2021

Steps

start lxcfs

start docker container

test

test without lxcfs

brauner commented Feb 3, 2021

borgerli commented Feb 4, 2021

brauner commented Feb 5, 2021

borgerli commented Feb 6, 2021

borgerli commented Feb 7, 2021

borgerli commented Feb 23, 2021

mihalicyn commented Mar 19, 2024

mihalicyn commented Mar 19, 2024

stgraber commented Mar 19, 2024