New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When --cpuset-cpus argument is used, processes inspecting CPU configuration in the container see all cores #20770

Open
benjamincburns opened this Issue Feb 29, 2016 · 19 comments

Comments

Projects
None yet
3 participants
@benjamincburns

benjamincburns commented Feb 29, 2016

Output of docker version:

Client:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 16:16:33 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 16:16:33 2016
 OS/Arch:      linux/amd64

Output of docker info:

sudo docker info
Containers: 66
 Running: 55
 Paused: 0
 Stopped: 11
Images: 110
Server Version: 1.10.2
Storage Driver: devicemapper
 Pool Name: docker-253:0-73188844-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 5.769 GB
 Data Space Total: 107.4 GB
 Data Space Available: 22.45 GB
 Metadata Space Used: 13.09 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.134 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.10.0-229.14.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 251.6 GiB
Name: [redacted]
ID: [redacted]
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Provide additional environment details (AWS, VirtualBox, physical, etc.):
Physical machine

List the steps to reproduce the issue:

  1. Run something like docker run -it --cpuset-cpus=0 centos:centos7
  2. In the container's console, run grep processor /proc/cpuinfo | wc -l

Describe the results you received:
Output: 32

Describe the results you expected:
Output: 1

Provide additional info you think is important:

Per the title, it appears that docker 1.10.2 isn't respecting the --cpuset-cpus argument. We have a number of containers for applications which use thread pools which are sized based on the number of cores available. Since updating to 1.10.2 (from a various array of versions starting somewhere in 1.3.x), the thread counts on our docker hosts are through the roof. [Edit: this wasn't actually linked to the update, but rather we'd deployed a few new containers which ran on mono at around the same time. This is still an issue, however.]

OS version info:

user@host ~ $ cat /etc/*release*
CentOS Linux release 7.1.1503 (Core) 
Derived from Red Hat Enterprise Linux 7.1 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.1.1503 (Core) 
CentOS Linux release 7.1.1503 (Core) 
cpe:/o:centos:centos:7
@benjamincburns

This comment has been minimized.

benjamincburns commented Feb 29, 2016

On the surface this issue looks to be similar to what's described as Ubuntu bug ID 1435571, though I can see how this behaviour might manifest from some other root cause. However in this case it may have been a kernel bug, as they've fixed it with these two kernel patches.

Knowing very little about cgroups myself, I'd also wonder if CentOS7 issue 9078 isn't related.

Either way, I raised the issue here on the chance that either this is an issue specific to docker and not the host OS, or that docker would be improved by including a workaround to this issue.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Feb 29, 2016

@benjamincburns can you try running the check-config.sh script? It's possible this is not supported or enabled in your kernel; https://github.com/docker/docker/blob/master/contrib/check-config.sh

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

Thanks @thaJeztah.

Before seeing your comment I fired up a fresh install of CentOS 7 and made sure it was up to date. I then installed docker according to the official installation instructions. This issue does not occur in that configuration.

I will run this the check-config script in both locations and compare the output.

If it turns out that this was an issue with this feature not being supported by the kernel, I'd suggest that this script be converted into runtime checks within docker itself so that the docker CLI can fail with an appropriate error message when trying to create a container which would use kernel features that aren't supported.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

I have run the check-config.sh script on the test VM (where things work properly), and on my actual docker host. Full output for the known-good machine is at local-vm-check-config-output.txt.

Their diff:

user@hostname:~$ diff -u docker-host-check-config-output.txt local-vm-check-config-output.txt 
--- docker-host-check-config-output.txt 2016-03-01 15:01:08.238722606 +1300
+++ local-vm-check-config-output.txt    2016-03-01 15:01:26.494242760 +1300
@@ -1,5 +1,5 @@
 warning: /proc/config.gz does not exist, searching other paths for kernel config ...
-info: reading kernel config from /boot/config-3.10.0-229.14.1.el7.x86_64 ...
+info: reading kernel config from /boot/config-3.10.0-327.10.1.el7.x86_64 ...

 Generally Necessary:
 - cgroup hierarchy: properly mounted [/sys/fs/cgroup]

Note of course that the last line is not a deletion, but the hyphen is part of the script output.

I'll see if I can't review patches which have been applied between 3.10.0-229.14.1 and 3.10.0-327.10.1.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

Actually, I think the patch review is unnecessary, as this issue occurs on a different docker host in our prod environment which is already running 3.10.0-327.10.1, and the latest userspace, CentOS 7.2.1511. To avoid (or inadvertently create) confusion, I refer to this host as host-with-latest-userspace-and-kernel below.

Copy & pasted repro output, modified slightly to change hostname:

user@host-with-latest-userspace-and-kernel ~ $ uname -a
Linux host-with-latest-userspace-and-kernel 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
user@host-with-latest-userspace-and-kernel ~ $ docker run -it --cpuset-cpus=0 centos:centos7
[root@82cac19350b2 /]# grep processor /proc/cpuinfo | wc -l
12

The output of check-config.sh ran on this host is identical to my test VM.

This also suggests that the exact CentOS version may also not matter much, as both my test VM and host-with-latest-userspace-and-kernel are CentOS 7.2.1511, while the machine upon which I originally reported is CentOS 7.1.1503.

Just for completeness, below you will find the same info requested in the issue template, but for host-with-latest-userspace-and-kernel

Output of docker version:

Client:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 16:16:33 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 16:16:33 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 48
 Running: 44
 Paused: 0
 Stopped: 4
Images: 9
Server Version: 1.10.2
Storage Driver: devicemapper
 Pool Name: docker-253:3-134434010-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 4.418 GB
 Data Space Total: 107.4 GB
 Data Space Available: 10.34 GB
 Metadata Space Used: 9.925 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.138 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.39 GiB
Name: redacted
ID: redacted
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

And for good measure, OS release specifics:

user@host-with-latest-userspace-and-kernel:~ $ cat /etc/*release*
CentOS Linux release 7.2.1511 (Core) 
Derived from Red Hat Enterprise Linux 7.2 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.2.1511 (Core) 
CentOS Linux release 7.2.1511 (Core) 
cpe:/o:centos:centos:7
@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

To see if I could spot a pattern of some sort, I've tested for the presence of this on the 10 docker hosts to which I have access. The only machine on which I have not observed this issue is the clean VM I set up specifically to test this issue. Below are the configurations of the machines in question (hosts discussed above are included).

Except for the test VM, which is excluded from the machine counts in the table below, all machines tested are bare metal.

Number of Machines Docker Version OS OS Version Kernel Version
1 1.10.1, build 9e83765 Ubuntu 15.10 4.2.0-25-generic
1 1.9.1, build a34a1d5 Ubuntu 15.10 4.2.0-30-generic
1 1.7.1, build 3043001/1.7.1 CentOS 7.1.1503 3.10.0-229.11.1.el7.x86_64
4 1.8.2-el7.centos, build a01dc02/1.8.2 CentOS 7.2.1511 3.10.0-327.3.1.el7.x86_64
1 1.10.2, build c3959b1 CentOS 7.2.1511 3.10.0-327.10.1.el7.x86_64
2 1.10.2, build c3959b1 7.1.1503 3.10.0-229.14.1.el7.x86_64

On the off chance that there's some difference in behaviour between --cpuset and --cpuset-cpus, I also tested --cpuset on one of the 4 machines running the el7 build of Docker 1.8.2. No change in behaviour.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

Argh... forget everything I said about the test VM working correctly. It turns out I'd forgotten that I'd only provisioned one vcpu for the vm. Now that I've switched it to 4 vcpus, the problem occurs there, too.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

I see that the proper value is being set to cpuset.cpus on my test VM, leading me full circle back to thinking it's a kernel issue.

[bburns@localhost ~]$ cat /sys/fs/cgroup/cpuset/docker/e047d1596aac8375c6cf711c3c241c44d2404a5203e79f36469709e131ddee49/cpuset.cpus
0

And after using --cpuset-cpus=0,1 I see:

[bburns@localhost ~]$ cat /sys/fs/cgroup/cpuset/docker/731bf72f01f8c3305f3bbca1a1af4b5bc5fb8b0b752e78720528abc1c773fe2f/cpuset.cpus
0-1

I don't fully understand the patches I linked in my first comment, but I have verified that nothing like them has been applied to the CentOS kernel. In fact, there is no effective_cpus member in the cpuset struct in kernel 3.10.0.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

So it's looking like --cpuset-cpus does assign processor affinity correctly, however code which inspects the machine configuration still thinks it has access to the full core count of the machine.

To determine this I created two containers, one with --cpuset-cpus=0 and the other with no --cpuset-cpus argument. In the container console I then backgrounded 4 bash while true loops, and checked process affinity with ps -o pid,cpuid,comm. On the container which had the --cpuset-cpus=0 arg, all cpuid values were 0, while on the other container multiple cpuid values were listed.

Question: Is solving this issue in scope for docker, or is this a kernel-level problem?

Console session:

user@host ~ $ sudo docker run -it --cpuset-cpus=0 --cpuset-mems=0 centos:centos7
[root@f887dac642a6 /]# while true; do echo blah; done > /dev/null &
[1] 14
[root@f887dac642a6 /]# while true; do echo blah; done > /dev/null &
[2] 15
[root@f887dac642a6 /]# while true; do echo blah; done > /dev/null &
[3] 16
[root@f887dac642a6 /]# while true; do echo blah; done > /dev/null &
[4] 17
[root@f887dac642a6 /]# ps -o pid,cpuid,comm
  PID CPUID COMMAND
    1     0 bash
   14     0 bash
   15     0 bash
   16     0 bash
   17     0 bash
   18     0 ps
[root@f887dac642a6 /]# exit

user@host:~$ docker run -it centos:centos7
[root@9612d2e4c7dd /]# while true; do echo blah; done > /dev/null & 
[1] 14
[root@9612d2e4c7dd /]# while true; do echo blah; done > /dev/null &
[2] 15
[root@9612d2e4c7dd /]# while true; do echo blah; done > /dev/null &
[3] 16
[root@9612d2e4c7dd /]# while true; do echo blah; done > /dev/null &
[4] 17
[root@9612d2e4c7dd /]# ps -o pid,cpuid,comm
  PID CPUID COMMAND
    1     0 bash
   16     0 bash
   17     1 bash
   18     2 bash
   19     3 bash
   20     2 ps
[root@9612d2e4c7dd /]# exit
exit
@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

From the Ubuntu bug report in my first comment, it looks like docker can work around this issue by creating its cgroup with cpuset.clone_children set to 0.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

Whoops, didn't mean to close.

@benjamincburns benjamincburns reopened this Mar 1, 2016

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Mar 1, 2016

hm, interesting, let me ping @LK4D4 and @anusha-ragunathan, perhaps they have some thoughts on that

@benjamincburns benjamincburns changed the title from --cpuset-cpus argument appears to be ignored on 1.10.2 under CentOS 7.1.1503 to When --cpuset-cpus argument is used, processes inspecting CPU configuration in the container see all cores Mar 1, 2016

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

Eh, that might be a red herring. I've tried doing this manually to no effect. Also it appears that cgroup.clone_children is only defaulting to 1 on my Ubuntu boxes. On my CentOS hosts /sys/fs/cgroup/cpuset/docker/cgroup.clone_children was already set to 0.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Mar 1, 2016

What do you get inside the container? i.e.

docker run --rm --cpuset-cpus=0,1 ubuntu sh -c "cat /sys/fs/cgroup/cpuset/cpuset.cpus"
@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

That command works correctly, which is good news as for the applications for which we control we can inspect this file. However for applications running in vms like mono, this will present some pain. It'd be much simpler overall if the process didn't need to be aware that it was running within a cgroup.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 1, 2016

To add a bit of supporting info to my last statement, I grepped mono's source quickly and found that on systems with a proper glibc, mono detects the core count via sysconf(_SC_NPROCESSORS_ONLN). So, I wrote a quick and dirty c program to call this and print the result, copied it into a container built with --cpuset-cpus=0, and it returns the core count of the full machine.

This can be seen in the mono source at

  • libgc/pthread_support.c
  • mono/io-layer/system.c
  • mono/profiler/proflog.c
  • mono/utils/mono-proclib.c
  • support/map.c
@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Mar 1, 2016

This sounds similar to #20688, and a nice article describing the situation http://fabiokung.com/2014/03/13/memory-inside-linux-containers/

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 2, 2016

Yes, it certainly does. Digging into mono source a bit further it's also parsing /proc/stat in places.

I'll likely open an issue with mono to make the VM cgroup aware, however I agree with @thechile's last comment on #20688 that the container community ought to be working with kernel maintainers to sort out a solution to this problem.

Linus has a pretty famous rule that the kernel shouldn't break userspace. I'd think that the container shouldn't break userspace, either. You might argue that it's not the container, it's cgroups, but if the choice to use cgroups forces containerized processes to become cgroup aware, then from the perspective of the user it's the same result.

It's pain enough for native processes where I control thread pooling and resource allocation, but when you've got a full platform stack that you're trying to drop into a container it gets quite expensive quite quick.

@benjamincburns

This comment has been minimized.

benjamincburns commented Mar 2, 2016

I've raised a mono issue with the hope that they'll pick it up and at least work around this problem. That said, I'd rather not need to also raise issues for go, python, ruby, java, and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment