Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential memory leak in docker 1.12.6 #257

Closed
fortinj66 opened this issue Jun 20, 2017 · 46 comments
Closed

potential memory leak in docker 1.12.6 #257

fortinj66 opened this issue Jun 20, 2017 · 46 comments
Assignees
Labels

Comments

@fortinj66
Copy link

Moving this issue from moby#33472 as they are indicating that 1.12.6 is no longer supported and there are significant changes between the vanilla moby and projectatomic

Let me step back and describe the initial issue...

  1. I am using Openshift Origin 3.4 on Centos 7 installed via the normal ansible installer
  2. Openshift requires Docker 1.12.x.
  3. The latest Centos build for Docker 1.12 is 1.12.6.xxx and is sourced from projectatomic branch docker -1.12.6 which is forked/synced from moby/moby
  4. The dockerd daemon seems to have a memory leak. It uses an increasing large amount of memory over time. Currently one of my nodes has it over 1.9 gig real memory. This does not seem to be related to the number of containers running and stopping all the containers does not reduce the memory usage. This leads to multiple restarts of docker during the week...
  5. I responded to an issue (Dockerd eats 20GB of RAM moby/moby#32711) which seemed to be similar to what I was seeing and provided the debug and heap information
  6. The patch described in this pull was created and applied to master (I'm assuming here).
  7. i requested the patch to be applied to projectatomic branch 1.12.6
  8. It was applied (Is it possible to pull https://github.com/moby/moby/pull/33472 to fix a memory leak in the current 1.12.6 (and probably all) version of docker #252) by @runcom
  9. As there was no ETA for when this patch would be available for download from the Centos repositories I worked on getting it built myself which I was able to do (Build issues with Centos 7 #253)
  10. the build in 9 was applied to my environment and has the memory usage described in 4

Attached is the heap infomation from earlier today:
heap.txt


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:
You do NOT have to include this information if this is a FEATURE REQUEST
-->

Output of docker version:

[johnf@os-node-s04 ~]$ docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: <unknown>
 Go version:      go1.7.4
 Git commit:      6ffd653/1.12.6
 Built:           Sat Jun 17 20:13:20 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: <unknown>
 Go version:      go1.7.4
 Git commit:      6ffd653/1.12.6
 Built:           Sat Jun 17 20:13:20 2017
 OS/Arch:         linux/amd64

Output of docker info:

[johnf@os-node-s04 ~]$ docker info
Containers: 18
 Running: 18
 Paused: 0
 Stopped: 0
Images: 36
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker--vg-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 28.53 GB
 Data Space Total: 53.17 GB
 Data Space Available: 24.63 GB
 Metadata Space Used: 9.982 MB
 Metadata Space Total: 54.53 MB
 Metadata Space Available: 44.54 MB
 Thin Pool Minimum Free Space: 5.316 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: null bridge host overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp selinux
Kernel Version: 3.10.0-514.10.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 8
Total Memory: 35.18 GiB
Name: os-node-s04
ID: 7ZD3:FQ3S:SUWP:UY5L:DA22:DNUL:RLMJ:ZTUW:DH3X:KM46:NKBU:LPT6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 102
 Goroutines: 117939
 System Time: 2017-06-20T12:36:20.539530389-04:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Insecure Registries:
 172.30.0.0/16
 127.0.0.0/8
Registries: docker.io (secure)

Additional environment details (AWS, VirtualBox, physical, etc.):

OS: CentOS 7
Openshift Origin 3.4

Steps to reproduce the issue:

  1. Run Openshift 3.4 on Centos 7
  2. I have seen this in non-openshift environments also but I dod not have data for them...

Describe the results you received:

dockerd memory usage increased with time leading to 1-2 docker restarts per week. Shutting down containers does not release the memory

Describe the results you expected:
dockerd memory usage should not always increase over time...

Additional information you deem important (e.g. issue happens only occasionally):

@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2017

We should have this fix in docker-1.12.6-32.git88a4867.el7 which is going to be shipped in RHEL7.3.6 in a couple of weeks, and will show up in Centos Soon after.

@fortinj66
Copy link
Author

Would I be able to build now against that commit and test it?

@fortinj66
Copy link
Author

Looks like the new version is out for Centos... I have installed it and will see what it looks like over the next couple of weeks...

@fortinj66
Copy link
Author

fortinj66 commented Jul 7, 2017

Well, my initial reaction was that the new release looked good... Unfortunately, I am now seeing the docker daemon randomly crashing in all environments with nothing logged. Sometimes docker will start right back up with a systemctl restart docker and sometime it takes several iterations for it to restart.

This is happening in both openstack and non-openstack environments...

Here is an example systemctl status docker:

[johnf@os-node-s02 ~]$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─custom.conf
   Active: inactive (dead) since Fri 2017-07-07 14:05:46 EDT; 5min ago
     Docs: http://docs.docker.com
  Process: 103209 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY (code=exited, status=0/SUCCESS)
 Main PID: 103209 (code=exited, status=0/SUCCESS)
[johnf@os-node-s02 ~]$

and an example of trying to do a build on a non-openshift environment where dockerd died in the middle of the build:

[johnf@majeftest-sb01 cf12-docker]$ make s2i
docker build -t artifactory.marketamerica.com:8443/internalsystems/johnf/cf12-s2i .
Sending build context to Docker daemon 372.2 MB
Step 1 : FROM centos:7
 ---> 3bee3060bfc8
Step 2 : MAINTAINER John Fortin
 ---> Using cache
 ---> ab898a9f6c5a
Step 3 : EXPOSE 8080 8500 8501 8502
 ---> Using cache
 ---> 6c59b5746427
Step 4 : RUN yum clean all &&     yum -y update &&     yum install -y http://rpms.famillecollet.com/enterprise/remi-release-7.rpm &&     yum install -y epel-release bind-utils net-tools &&     yum install -y apr-util-ldap                    httpd                    mod_ldap                    openldap                    openldap-devel                    supervisor                    unzip                    wget  &&     yum install -y ImageMagick                    libxslt                    libfontenc                    libXfont                    xorg-x11-font-utils                    fontconfig                    fontpackages-filesystem                    cjkuni-fonts-common                    cjkuni-fonts-ghostscript                    cjkuni-ukai-fonts                    cjkuni-uming-fonts                    ghostscript-fonts                    urw-fonts &&     yum clean all &&     groupadd -g 500 cfusion &&     useradd -g 500 -u 500 cfusion &&     mkdir -p /var/www/html/logs/
 ---> Running in 6e9b288c5b89
Loaded plugins: fastestmirror, ovl
Cleaning repos: base extras updates
Cleaning up everything
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors
 * base: mirror.5ninesolutions.com
 * extras: centos.mirror.constant.com
 * updates: repos.mia.quadranet.com
The command '/bin/sh -c yum clean all &&     yum -y update &&     yum install -y http://rpms.famillecollet.com/enterprise/remi-release-7.rpm &&     yum install -y epel-release bind-utils net-tools &&     yum install -y apr-util-ldap                    httpd                    mod_ldap                    openldap                    openldap-devel                    supervisor                    unzip                    wget  &&     yum install -y ImageMagick                    libxslt                    libfontenc                    libXfont                    xorg-x11-font-utils                    fontconfig                    fontpackages-filesystem                    cjkuni-fonts-common                    cjkuni-fonts-ghostscript                    cjkuni-ukai-fonts                    cjkuni-uming-fonts                    ghostscript-fonts                    urw-fonts &&     yum clean all &&     groupadd -g 500 cfusion &&     useradd -g 500 -u 500 cfusion &&     mkdir -p /var/www/html/logs/' returned a non-zero code: 137
make: *** [s2i] Error 137
[johnf@majeftest-sb01 cf12-docker]$ docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
[johnf@majeftest-sb01 cf12-docker]$

I saw something similar with the previous version (12.6.28) and had to move back to 12.6.16...

I'm not sure how to further debug this issue...

@fortinj66
Copy link
Author

I do see this regarding oci-register-machine:

Jul 07 14:25:55 majeftest-sb01 dockerd-current[18440]: Total download size: 18 M
Jul 07 14:25:55 majeftest-sb01 dockerd-current[18440]: Downloading packages:
Jul 07 14:25:55 majeftest-sb01 dockerd-current[18440]: Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
Jul 07 14:25:55 majeftest-sb01 dockerd-current[18440]: warning: /var/cache/yum/x86_64/7/updates/packages/bind-license-9.9.4-50.el7_3.1.noarch.rpm: Header V3 RSA/SHA25
Jul 07 14:25:55 majeftest-sb01 dockerd-current[18440]: Public key for bind-license-9.9.4-50.el7_3.1.noarch.rpm is not installed
Jul 07 14:25:56 majeftest-sb01 dockerd-current[18440]: time="2017-07-07T14:25:56.731547480-04:00" level=info msg="Container 77122002ca2a2350ce8fa6cc299208332311b78a4a
Jul 07 14:25:56 majeftest-sb01 oci-register-machine[18717]: 2017/07/07 14:25:56 Register machine: poststop 77122002ca2a2350ce8fa6cc299208332311b78a4ab1eb020e088e5dab9
Jul 07 14:25:56 majeftest-sb01 oci-systemd-hook[18724]: systemdhook <info>: gidMappings not found in config
Jul 07 14:25:57 majeftest-sb01 dockerd-current[18440]: time="2017-07-07T14:25:57.143002795-04:00" level=info msg="stopping containerd after receiving terminated"
Jul 07 14:25:58 majeftest-sb01 systemd[1]: Stopped Docker Application Container Engine.

@fortinj66
Copy link
Author

I reverted to 12.6.16 which completes the build without issue

@runcom
Copy link
Collaborator

runcom commented Jul 7, 2017

Ping @rhatdan about register machine

@fortinj66
Copy link
Author

I'm not seeing that particular message on my other servers though... May be a red herring...

@fortinj66
Copy link
Author

So here are the log entries after a clean install and start of the latest Centos docker image. Within three seconds the docker process terminates. This seems to occur right after NetworkManager does some stuff. Is it possible that NetworkManager is causing dockerd to fail?:

Jul  7 16:42:08 majeftest-sb01 systemd: Starting Docker Application Container Engine...
Jul  7 16:42:08 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:08.704005578-04:00" level=info msg="libcontainerd: new containerd process, pid: 1505"
Jul  7 16:42:09 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:09.733552706-04:00" level=info msg="devmapper: Creating filesystem xfs on device docker-8:1-167                                                                    774071-base"
Jul  7 16:42:09 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:09.820787914-04:00" level=info msg="devmapper: Successfully created filesystem xfs on device do                                                                    cker-8:1-167774071-base"
Jul  7 16:42:09 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:09.857357376-04:00" level=info msg="Graph migration to content-addressability took 0.00 seconds                                                                    "
Jul  7 16:42:09 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:09.858225773-04:00" level=info msg="Loading containers: start."
Jul  7 16:42:09 majeftest-sb01 kernel: Bridge firewalling registered
Jul  7 16:42:09 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:09.873350867-04:00" level=info msg="Firewalld running: true"
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER                                                                    ' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -D PREROUTING' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -D OUTPUT' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -F DOCKER' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -X DOCKER' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -F DOCKER' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -X DOCKER' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -F DOCKER-ISOLATION' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -X DOCKER-ISOLATION' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -n -L DOCKER' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -n -L DOCKER' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -n -L DOCKER-ISOLATION' failed:
Jul  7 16:42:09 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C DOCKER-ISOLATION -j RETURN' failed:
Jul  7 16:42:09 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:09.994597297-04:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172                                                                    .17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jul  7 16:42:09 majeftest-sb01 NetworkManager[452]: <info>  [1499460129.9965] manager: (docker0): new Bridge device (/org/freedesktop/NetworkManager/Devices/2)
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C DOCKER -i docker0 -j RETURN' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -j DROP' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 -o docker0 -j ACCEPT' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 ! -o docker0 -j ACCEPT' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED                                                                     -j ACCEPT' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C PREROUTING -m addrtype --dst-type LOCAL -j DOCKER' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8                                                                    ' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -j DOCKER' failed:
Jul  7 16:42:10 majeftest-sb01 firewalld: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -j DOCKER-ISOLATION' failed:
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.0970] device (docker0): state change: unmanaged -> unavailable (reason 'connection-assumed') [                                                                    10 20 41]
Jul  7 16:42:10 majeftest-sb01 kernel: IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.0991] ifcfg-rh: add connection in-memory (e0c85257-4919-441c-8fe7-e15fc9376b35,"docker0")
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1003] device (docker0): state change: unavailable -> disconnected (reason 'connection-assumed'                                                                    ) [20 30 41]
Jul  7 16:42:10 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:10.100341038-04:00" level=info msg="Loading containers: done."
Jul  7 16:42:10 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:10.100549787-04:00" level=info msg="Daemon has completed initialization"
Jul  7 16:42:10 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:10.100566564-04:00" level=info msg="Docker daemon" commit="88a4867/1.12.6" graphdriver=devicema                                                                    pper version=1.12.6
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1011] device (docker0): Activation: starting connection 'docker0' (e0c85257-4919-441c-8fe7-e15                                                                    fc9376b35)
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1013] device (docker0): state change: disconnected -> prepare (reason 'none') [30 40 0]
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1016] device (docker0): state change: prepare -> config (reason 'none') [40 50 0]
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1018] device (docker0): state change: config -> ip-config (reason 'none') [50 70 0]
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1059] device (docker0): state change: ip-config -> ip-check (reason 'none') [70 80 0]
Jul  7 16:42:10 majeftest-sb01 systemd: Started Docker Application Container Engine.
Jul  7 16:42:10 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:10.105624062-04:00" level=info msg="API listen on /var/run/docker.sock"
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1064] device (docker0): state change: ip-check -> secondaries (reason 'none') [80 90 0]
Jul  7 16:42:10 majeftest-sb01 systemd: Started Run docker-cleanup every hour.
Jul  7 16:42:10 majeftest-sb01 dbus-daemon: dbus[419]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dis                                                                    patcher.service'
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1066] device (docker0): state change: secondaries -> activated (reason 'none') [90 100 0]
Jul  7 16:42:10 majeftest-sb01 systemd: Starting Run docker-cleanup every hour.
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1066] manager: NetworkManager state is now CONNECTED_LOCAL
Jul  7 16:42:10 majeftest-sb01 systemd: Cannot add dependency job for unit iptables.service, ignoring: Unit is masked.
Jul  7 16:42:10 majeftest-sb01 dbus[419]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.servi                                                                    ce'
Jul  7 16:42:10 majeftest-sb01 NetworkManager[452]: <info>  [1499460130.1102] device (docker0): Activation: successful, device activated.
Jul  7 16:42:10 majeftest-sb01 systemd: Starting Network Manager Script Dispatcher Service...
Jul  7 16:42:10 majeftest-sb01 systemd: Stopped Run docker-cleanup every hour.
Jul  7 16:42:10 majeftest-sb01 systemd: Stopping Run docker-cleanup every hour.
Jul  7 16:42:10 majeftest-sb01 systemd: Stopping Docker Application Container Engine...
Jul  7 16:42:10 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:10.111438779-04:00" level=info msg="Processing signal 'terminated'"
Jul  7 16:42:10 majeftest-sb01 dockerd-current: time="2017-07-07T16:42:10.11478574-04:00" level=info msg="stopping containerd after receiving terminated"
Jul  7 16:42:10 majeftest-sb01 dbus[419]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul  7 16:42:10 majeftest-sb01 dbus-daemon: dbus[419]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul  7 16:42:10 majeftest-sb01 systemd: Started Network Manager Script Dispatcher Service.
Jul  7 16:42:10 majeftest-sb01 nm-dispatcher: req:1 'connectivity-change': new request (3 scripts)
Jul  7 16:42:10 majeftest-sb01 nm-dispatcher: req:1 'connectivity-change': start running ordered scripts...
Jul  7 16:42:10 majeftest-sb01 nm-dispatcher: req:2 'up' [docker0]: new request (3 scripts)
Jul  7 16:42:10 majeftest-sb01 nm-dispatcher: req:2 'up' [docker0]: start running ordered scripts...
Jul  7 16:42:10 majeftest-sb01 systemd: Unit iscsi.service cannot be reloaded because it is inactive.
Jul  7 16:42:11 majeftest-sb01 systemd: Stopped Docker Application Container Engine.

@runcom
Copy link
Collaborator

runcom commented Jul 7, 2017

Very likely actually :/

@fortinj66
Copy link
Author

Well, disabling NetworkManager didn't seem to help:

Jul  7 17:08:22 majeftest-sb01 journal: Running transaction
Jul  7 17:08:22 majeftest-sb01 journal:  Updating   : 32:bind-license-9.9.4-50.el7_3.1.noarch                      1/2
Jul  7 17:08:22 majeftest-sb01 journal:  Cleanup    : 32:bind-license-9.9.4-38.el7_3.3.noarch                      2/2
Jul  7 17:08:22 majeftest-sb01 journal:  Verifying  : 32:bind-license-9.9.4-50.el7_3.1.noarch                      1/2
Jul  7 17:08:22 majeftest-sb01 journal:  Verifying  : 32:bind-license-9.9.4-38.el7_3.3.noarch                      2/2
Jul  7 17:08:22 majeftest-sb01 journal:
Jul  7 17:08:22 majeftest-sb01 journal: Updated:
Jul  7 17:08:22 majeftest-sb01 journal:  bind-license.noarch 32:9.9.4-50.el7_3.1
Jul  7 17:08:22 majeftest-sb01 journal:
Jul  7 17:08:22 majeftest-sb01 journal: Complete!
Jul  7 17:08:22 majeftest-sb01 journal: Loaded plugins: fastestmirror, ovl
Jul  7 17:08:22 majeftest-sb01 dockerd-current: time="2017-07-07T17:08:22.921840235-04:00" level=info msg="Container 45d2a822a70a0259400f5e4e171e3139c575b970055da64c9378dd1930f24742 failed to exit within 10 seconds of signal 15 - using the force"
Jul  7 17:08:22 majeftest-sb01 systemd-machined: Machine 45d2a822a70a0259400f5e4e171e3139 terminated.
Jul  7 17:08:22 majeftest-sb01 oci-register-machine[1744]: 2017/07/07 17:08:22 Register machine: poststop 45d2a822a70a0259400f5e4e171e3139c575b970055da64c9378dd1930f24742 0 /var/lib/docker/devicemapper/mnt/942be3058fec49988ed354203d018435583e9b9744f58d33efe5a5c98c158e16/rootfs
Jul  7 17:08:22 majeftest-sb01 oci-systemd-hook: systemdhook <info>: gidMappings not found in config
Jul  7 17:08:22 majeftest-sb01 oci-systemd-hook: systemdhook <info>: uidMappings not found in config
Jul  7 17:08:22 majeftest-sb01 kernel: docker0: port 1(veth797691d) entered disabled state
Jul  7 17:08:22 majeftest-sb01 kernel: docker0: port 1(veth797691d) entered disabled state
Jul  7 17:08:22 majeftest-sb01 kernel: device veth797691d left promiscuous mode
Jul  7 17:08:22 majeftest-sb01 kernel: docker0: port 1(veth797691d) entered disabled state
Jul  7 17:08:23 majeftest-sb01 kernel: XFS (dm-3): Unmounting Filesystem
Jul  7 17:08:23 majeftest-sb01 dockerd-current: time="2017-07-07T17:08:23.155343661-04:00" level=info msg="stopping containerd after receiving terminated"
Jul  7 17:08:24 majeftest-sb01 systemd: Stopped Docker Application Container Engine.
Jul  7 17:08:25 majeftest-sb01 ntpd[433]: Deleting interface #7 veth797691d, fe80::7822:f3ff:fedf:46b6#123, interface stats: received=0, sent=0, dropped=0, active_time=9 secs

@runcom
Copy link
Collaborator

runcom commented Jul 7, 2017

Mmm seems likely a container being terminated tearing down the whole Docker daemon?

@fortinj66
Copy link
Author

fortinj66 commented Jul 7, 2017 via email

@fortinj66
Copy link
Author

I'm going to open a separate issue for this current issue...

@fortinj66
Copy link
Author

fortinj66 commented Jul 17, 2017

After testing with the new version of the CentOS RPMs for docker 1.12.6.32 I do not believe the memory leak issue has been resolved. dockerd is still growing without bound requiring a restart of the docker service...

For example,

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  ...
 68838 root      20   0 18.576g 0.011t   8120 S   0.7 49.7 318:43.03 dockerd-current 
  ...

@fortinj66
Copy link
Author

[johnf@os-node-d04 ~]$ docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.centos.x86_64
 Go version:      go1.7.4
 Git commit:      88a4867/1.12.6
 Built:           Mon Jul  3 16:02:02 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.centos.x86_64
 Go version:      go1.7.4
 Git commit:      88a4867/1.12.6
 Built:           Mon Jul  3 16:02:02 2017
 OS/Arch:         linux/amd64

[johnf@os-node-d04 ~]$ docker info
Containers: 33
 Running: 26
 Paused: 0
 Stopped: 7
Images: 43
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker--vg-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 23.15 GB
 Data Space Total: 53.17 GB
 Data Space Available: 30.02 GB
 Metadata Space Used: 7.291 MB
 Metadata Space Total: 54.53 MB
 Metadata Space Available: 47.24 MB
 Thin Pool Minimum Free Space: 5.316 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp selinux
Kernel Version: 3.10.0-514.10.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 8
Total Memory: 23.5 GiB
Name: os-node-d04
ID: 7ZD3:FQ3S:SUWP:UY5L:DA22:DNUL:RLMJ:ZTUW:DH3X:KM46:NKBU:LPT6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 172.30.0.0/16
 127.0.0.0/8
Registries: docker.io (secure)

@fortinj66
Copy link
Author

[johnf@os-node-d04 ~]$ curl --unix-socket /var/run/docker.sock http:/debug/pprof/heap > heap.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1247k    0 1247k    0     0  28.2M      0 --:--:-- --:--:-- --:--:-- 29.0M


[johnf@os-node-d04 ~]$  go tool pprof --inuse_space $(which dockerd-current) heap.txt
Entering interactive mode (type "help" for commands)
(pprof) top5
10.10GB of 10.36GB total (97.44%)
Dropped 1469 nodes (cum <= 0.05GB)
Showing top 5 nodes out of 27 (cum >= 0.15GB)
      flat  flat%   sum%        cum   cum%
    9.76GB 94.23% 94.23%     9.76GB 94.23%  github.com/docker/docker/pkg/pools.newBufferPoolWithSize.func1
    0.26GB  2.52% 96.74%     0.26GB  2.52%  runtime.malg
    0.03GB  0.32% 97.06%     0.07GB  0.63%  github.com/docker/docker/pkg/ioutils.NewBytesPipe
    0.03GB  0.27% 97.33%     0.14GB  1.32%  github.com/docker/docker/daemon.(*Daemon).ContainerExecStart
    0.01GB   0.1% 97.44%     0.15GB  1.41%  github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart
(pprof)

@fortinj66
Copy link
Author

what it looks like after a restart:

[johnf@os-node-d04 ~]$ curl --unix-socket /var/run/docker.sock http:/debug/pprof/heap > heap.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  138k    0  138k    0     0  35.1M      0 --:--:-- --:--:-- --:--:-- 45.2M

[johnf@os-node-d04 ~]$  go tool pprof --inuse_space $(which dockerd-current) heap.txt
Entering interactive mode (type "help" for commands)
(pprof) top5
20.21MB of 24.81MB total (81.46%)
Dropped 573 nodes (cum <= 0.12MB)
Showing top 5 nodes out of 111 (cum >= 1MB)
      flat  flat%   sum%        cum   cum%
   11.35MB 45.74% 45.74%    11.35MB 45.74%  github.com/docker/docker/pkg/pools.newBufferPoolWithSize.func1
    4.29MB 17.30% 63.03%     4.29MB 17.30%  github.com/docker/libnetwork/ipamutils.initGranularPredefinedNetworks
    2.04MB  8.21% 71.24%     2.04MB  8.21%  github.com/docker/docker/daemon/graphdriver/devmapper.NewDeviceSet
    1.53MB  6.18% 77.43%     1.53MB  6.18%  regexp.(*bitState).reset
       1MB  4.03% 81.46%        1MB  4.03%  runtime.malg
(pprof)

@agunnerson-ibm
Copy link

agunnerson-ibm commented Aug 21, 2017

Is there any additional info we can provide to help solve this problem? I'm seeing the exact same behavior as @fortinj66.

[root@[REDACTED] andrew.gunnerson]# curl --unix-socket /var/run/docker.sock http:/debug/pprof/heap > heap.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  344k    0  344k    0     0  16.2M      0 --:--:-- --:--:-- --:--:-- 18.6M
[root@[REDACTED] andrew.gunnerson]# go tool pprof --inuse_space $(which dockerd-current) heap.txt
Entering interactive mode (type "help" for commands)
(pprof) top5
180.04MB of 193.23MB total (93.17%)
Dropped 718 nodes (cum <= 0.97MB)
Showing top 5 nodes out of 99 (cum >= 2MB)
      flat  flat%   sum%        cum   cum%
  167.63MB 86.75% 86.75%   167.63MB 86.75%  github.com/docker/docker/pkg/pools.newBufferPoolWithSize.func1
    4.29MB  2.22% 88.97%     4.29MB  2.22%  github.com/docker/libnetwork/ipamutils.initGranularPredefinedNetworks
    4.07MB  2.11% 91.08%     4.07MB  2.11%  regexp.(*bitState).reset
    2.04MB  1.05% 92.14%     2.04MB  1.05%  github.com/docker/docker/daemon/graphdriver/devmapper.NewDeviceSet
       2MB  1.04% 93.17%        2MB  1.04%  runtime.malg

Looking at docker info, the number of goroutines on an idle kubernetes node creep up throughout the day:

[root@[REDACTED] andrew.gunnerson]# docker info |& grep -i goroutines
 Goroutines: 128
[root@[REDACTED] andrew.gunnerson]# docker info |& grep -i goroutines
 Goroutines: 2765
[root@[REDACTED] andrew.gunnerson]# docker info |& grep -i goroutines
 Goroutines: 8057
[root@[REDACTED] andrew.gunnerson]# docker info |& grep -i goroutines
 Goroutines: 9569

@agunnerson-ibm
Copy link

This is the goroutine list:

goroutine profile: total 9569
4716 @ 0x4438ea 0x4439de 0x454272 0x6a8640 0x72dd29 0x4b1187 0x4b0f98 0x93e3d4 0x8b20eb 0x474901
#	0x454271	sync.runtime_notifyListWait+0x121						/usr/lib/golang/src/runtime/sema.go:267
#	0x6a863f	sync.(*Cond).Wait+0x7f								/usr/lib/golang/src/sync/cond.go:57
#	0x72dd28	github.com/docker/docker/pkg/ioutils.(*BytesPipe).Read+0x58			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/ioutils/bytespipe.go:134
#	0x4b1186	io.copyBuffer+0x146								/usr/lib/golang/src/io/io.go:390
#	0x4b0f97	io.CopyBuffer+0x87								/usr/lib/golang/src/io/io.go:371
#	0x93e3d3	github.com/docker/docker/pkg/pools.Copy+0xa3					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/pools/pools.go:81
#	0x8b20ea	github.com/docker/docker/container/stream.(*Config).CopyStreams.func2+0x16a	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/container/stream/attach.go:126

2358 @ 0x4438ea 0x4439de 0x453bdd 0x453870 0x6a9df7 0x8b23fd 0x474901
#	0x45386f	sync.runtime_Semacquire+0x2f							/usr/lib/golang/src/runtime/sema.go:47
#	0x6a9df6	sync.(*WaitGroup).Wait+0x96							/usr/lib/golang/src/sync/waitgroup.go:131
#	0x8b23fc	github.com/docker/docker/container/stream.(*Config).CopyStreams.func3.1+0x2c	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/container/stream/attach.go:149

2358 @ 0x4438ea 0x453069 0x451e5c 0x8b26c4 0xb8f357 0x474901
#	0x8b26c3	github.com/docker/docker/container/stream.(*Config).CopyStreams.func3+0x2a3	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/container/stream/attach.go:152
#	0xb8f356	github.com/docker/docker/pkg/promise.Go.func1+0x26				/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/promise/promise.go:8

50 @ 0x4438ea 0x4439de 0x454272 0x6a8640 0x72dd29 0x5b3930 0x474901
#	0x454271	sync.runtime_notifyListWait+0x121				/usr/lib/golang/src/runtime/sema.go:267
#	0x6a863f	sync.(*Cond).Wait+0x7f						/usr/lib/golang/src/sync/cond.go:57
#	0x72dd28	github.com/docker/docker/pkg/ioutils.(*BytesPipe).Read+0x58	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/ioutils/bytespipe.go:134
#	0x5b392f	github.com/docker/docker/daemon/logger.(*Copier).copySrc+0x1ef	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/daemon/logger/copier.go:64

31 @ 0x660865 0x65c1b5 0x6586d9 0x4b8e0f 0x4b5c89 0xb1116a 0x4b1187 0x4b0f98 0x93e3d4 0x8b29c9 0x474901
#	0x660864	syscall.Syscall+0x4								/usr/lib/golang/src/syscall/asm_linux_amd64.s:18
#	0x65c1b4	syscall.read+0x54								/usr/lib/golang/src/syscall/zsyscall_linux_amd64.go:783
#	0x6586d8	syscall.Read+0x48								/usr/lib/golang/src/syscall/syscall_unix.go:161
#	0x4b8e0e	os.(*File).read+0x4e								/usr/lib/golang/src/os/file_unix.go:228
#	0x4b5c88	os.(*File).Read+0x58								/usr/lib/golang/src/os/file.go:101
#	0xb11169	github.com/tonistiigi/fifo.(*fifo).Read+0x1e9					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/tonistiigi/fifo/fifo.go:139
#	0x4b1186	io.copyBuffer+0x146								/usr/lib/golang/src/io/io.go:390
#	0x4b0f97	io.CopyBuffer+0x87								/usr/lib/golang/src/io/io.go:371
#	0x93e3d3	github.com/docker/docker/pkg/pools.Copy+0xa3					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/pools/pools.go:81
#	0x8b29c8	github.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1+0x78	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/container/stream/streams.go:121

24 @ 0x660865 0x65c1b5 0x6586d9 0x4b8e0f 0x4b5c89 0xb11079 0x4b1187 0x4b0f98 0x93e3d4 0x8b29c9 0x474901
#	0x660864	syscall.Syscall+0x4								/usr/lib/golang/src/syscall/asm_linux_amd64.s:18
#	0x65c1b4	syscall.read+0x54								/usr/lib/golang/src/syscall/zsyscall_linux_amd64.go:783
#	0x6586d8	syscall.Read+0x48								/usr/lib/golang/src/syscall/syscall_unix.go:161
#	0x4b8e0e	os.(*File).read+0x4e								/usr/lib/golang/src/os/file_unix.go:228
#	0x4b5c88	os.(*File).Read+0x58								/usr/lib/golang/src/os/file.go:101
#	0xb11078	github.com/tonistiigi/fifo.(*fifo).Read+0xf8					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/tonistiigi/fifo/fifo.go:134
#	0x4b1186	io.copyBuffer+0x146								/usr/lib/golang/src/io/io.go:390
#	0x4b0f97	io.CopyBuffer+0x87								/usr/lib/golang/src/io/io.go:371
#	0x93e3d3	github.com/docker/docker/pkg/pools.Copy+0xa3					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/pools/pools.go:81
#	0x8b29c8	github.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1+0x78	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/container/stream/streams.go:121

4 @ 0x4438ea 0x43e1db 0x43d939 0x60edc8 0x60ee34 0x610671 0x6235b0 0x6610c1 0x783484 0x72285c 0x723505 0x723687 0xb53e3e 0xb53c3f 0x77def5 0x7847fd 0x788813 0x474901
#	0x43d938	net.runtime_pollWait+0x58								/usr/lib/golang/src/runtime/netpoll.go:160
#	0x60edc7	net.(*pollDesc).wait+0x37								/usr/lib/golang/src/net/fd_poll_runtime.go:73
#	0x60ee33	net.(*pollDesc).waitRead+0x33								/usr/lib/golang/src/net/fd_poll_runtime.go:78
#	0x610670	net.(*netFD).Read+0x1a0									/usr/lib/golang/src/net/fd_unix.go:243
#	0x6235af	net.(*conn).Read+0x6f									/usr/lib/golang/src/net/net.go:173
#	0x6610c0	github.com/docker/docker/cmd/dockerd/hack.(*MalformedHostHeaderOverrideConn).Read+0x6c0	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/cmd/dockerd/hack/malformed_host_override.go:110
#	0x783483	net/http.(*connReader).Read+0x143							/usr/lib/golang/src/net/http/server.go:586
#	0x72285b	bufio.(*Reader).fill+0x10b								/usr/lib/golang/src/bufio/bufio.go:97
#	0x723504	bufio.(*Reader).ReadSlice+0xb4								/usr/lib/golang/src/bufio/bufio.go:330
#	0x723686	bufio.(*Reader).ReadLine+0x36								/usr/lib/golang/src/bufio/bufio.go:359
#	0xb53e3d	net/textproto.(*Reader).readLineSlice+0x5d						/usr/lib/golang/src/net/textproto/reader.go:55
#	0xb53c3e	net/textproto.(*Reader).ReadLine+0x2e							/usr/lib/golang/src/net/textproto/reader.go:36
#	0x77def4	net/http.readRequest+0xa4								/usr/lib/golang/src/net/http/request.go:793
#	0x7847fc	net/http.(*conn).readRequest+0x10c							/usr/lib/golang/src/net/http/server.go:765
#	0x788812	net/http.(*conn).serve+0x3d2								/usr/lib/golang/src/net/http/server.go:1532

3 @ 0x4438ea 0x453069 0x451e5c 0x562f82 0x4faae0 0x4fc3e9 0x4edd2c 0x4ed2b2 0x4ec34d 0x4ecbfb 0x4e80b5 0x78a074 0x7b6a95 0x4e6670 0x78be6d 0x7888f7 0x474901
#	0x562f81	github.com/docker/docker/daemon.(*Daemon).ContainerExecStart+0x1181												/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/daemon/exec.go:224
#	0x4faadf	github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart+0x32f								/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/router/container/exec.go:110
#	0x4fc3e8	github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm+0x68	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/router/container/container.go:67
#	0x4edd2b	github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x73b									/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/middleware/version.go:56
#	0x4ed2b1	github.com/docker/docker/api/server/middleware.UserAgentMiddleware.WrapHandler.func1+0x271									/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/middleware/user_agent.go:45
#	0x4ec34c	github.com/docker/docker/api/server/middleware.AuditMiddleware.WrapHandler.func1+0x9c										/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/middleware/audit_linux.go:27
#	0x4ecbfa	github.com/docker/docker/api/server/middleware.DebugRequestMiddleware.func1+0x5ba										/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/middleware/debug.go:53
#	0x4e80b4	github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0xc4											/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/server.go:146
#	0x78a073	net/http.HandlerFunc.ServeHTTP+0x43																/usr/lib/golang/src/net/http/server.go:1726
#	0x7b6a94	github.com/gorilla/mux.(*Router).ServeHTTP+0x254														/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/gorilla/mux/mux.go:98
#	0x4e666f	github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x6f												/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/router_swapper.go:29
#	0x78be6c	net/http.serverHandler.ServeHTTP+0x7c																/usr/lib/golang/src/net/http/server.go:2202
#	0x7888f6	net/http.(*conn).serve+0x4b6																	/usr/lib/golang/src/net/http/server.go:1579

1 @ 0x424f82 0x456de7 0x644f52 0x474901
#	0x456de6	os/signal.signal_recv+0x156	/usr/lib/golang/src/runtime/sigqueue.go:116
#	0x644f51	os/signal.loop+0x21		/usr/lib/golang/src/os/signal/signal_unix.go:22

1 @ 0x4438ea 0x43e1db 0x43d939 0x60edc8 0x60ee34 0x610671 0x6235b0 0x72285c 0x722ebc 0x4b0b84 0x4b0ce8 0xe5d06b 0xe5daf4 0xb24e9f 0xb1bed6 0x474901
#	0x43d938	net.runtime_pollWait+0x58					/usr/lib/golang/src/runtime/netpoll.go:160
#	0x60edc7	net.(*pollDesc).wait+0x37					/usr/lib/golang/src/net/fd_poll_runtime.go:73
#	0x60ee33	net.(*pollDesc).waitRead+0x33					/usr/lib/golang/src/net/fd_poll_runtime.go:78
#	0x610670	net.(*netFD).Read+0x1a0						/usr/lib/golang/src/net/fd_unix.go:243
#	0x6235af	net.(*conn).Read+0x6f						/usr/lib/golang/src/net/net.go:173
#	0x72285b	bufio.(*Reader).fill+0x10b					/usr/lib/golang/src/bufio/bufio.go:97
#	0x722ebb	bufio.(*Reader).Read+0x1bb					/usr/lib/golang/src/bufio/bufio.go:209
#	0x4b0b83	io.ReadAtLeast+0xa3						/usr/lib/golang/src/io/io.go:307
#	0x4b0ce7	io.ReadFull+0x57						/usr/lib/golang/src/io/io.go:325
#	0xe5d06a	golang.org/x/net/http2.readFrameHeader+0x7a			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/golang.org/x/net/http2/frame.go:236
#	0xe5daf3	golang.org/x/net/http2.(*Framer).ReadFrame+0xa3			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/golang.org/x/net/http2/frame.go:463
#	0xb24e9e	google.golang.org/grpc/transport.(*framer).readFrame+0x2e	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/transport/http_util.go:406
#	0xb1bed5	google.golang.org/grpc/transport.(*http2Client).reader+0xb5	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/transport/http2_client.go:788

1 @ 0x4438ea 0x43e1db 0x43d939 0x60edc8 0x60ee34 0x611f88 0x6366d2 0x634339 0x661147 0x78c2ee 0x4e6e20 0x4e7f39 0x474901
#	0x43d938	net.runtime_pollWait+0x58								/usr/lib/golang/src/runtime/netpoll.go:160
#	0x60edc7	net.(*pollDesc).wait+0x37								/usr/lib/golang/src/net/fd_poll_runtime.go:73
#	0x60ee33	net.(*pollDesc).waitRead+0x33								/usr/lib/golang/src/net/fd_poll_runtime.go:78
#	0x611f87	net.(*netFD).accept+0x237								/usr/lib/golang/src/net/fd_unix.go:419
#	0x6366d1	net.(*UnixListener).accept+0x31								/usr/lib/golang/src/net/unixsock_posix.go:158
#	0x634338	net.(*UnixListener).Accept+0x48								/usr/lib/golang/src/net/unixsock.go:229
#	0x661146	github.com/docker/docker/cmd/dockerd/hack.(*MalformedHostHeaderOverride).Accept+0x36	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/cmd/dockerd/hack/malformed_host_override.go:116
#	0x78c2ed	net/http.(*Server).Serve+0x1cd								/usr/lib/golang/src/net/http/server.go:2273
#	0x4e6e1f	github.com/docker/docker/api/server.(*HTTPServer).Serve+0x3f				/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/server.go:121
#	0x4e7f38	github.com/docker/docker/api/server.(*Server).serveAPI.func1+0xb8			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/server.go:94

1 @ 0x4438ea 0x43e1db 0x43d939 0x60edc8 0x60ee34 0x611f88 0x6366d2 0x634339 0x854413 0x474901
#	0x43d938	net.runtime_pollWait+0x58						/usr/lib/golang/src/runtime/netpoll.go:160
#	0x60edc7	net.(*pollDesc).wait+0x37						/usr/lib/golang/src/net/fd_poll_runtime.go:73
#	0x60ee33	net.(*pollDesc).waitRead+0x33						/usr/lib/golang/src/net/fd_poll_runtime.go:78
#	0x611f87	net.(*netFD).accept+0x237						/usr/lib/golang/src/net/fd_unix.go:419
#	0x6366d1	net.(*UnixListener).accept+0x31						/usr/lib/golang/src/net/unixsock_posix.go:158
#	0x634338	net.(*UnixListener).Accept+0x48						/usr/lib/golang/src/net/unixsock.go:229
#	0x854412	github.com/docker/libnetwork.(*controller).acceptClientConnections+0x42	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/docker/libnetwork/sandbox_externalkey_unix.go:128

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ad65 0x411d88 0x41536d 0x443484 0x474901
#	0x411d87	main.(*DaemonCli).start+0x1a47	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/cmd/dockerd/daemon.go:330
#	0x41536c	main.main+0x21c			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/cmd/dockerd/docker.go:68
#	0x443483	runtime.main+0x1f3		/usr/lib/golang/src/runtime/proc.go:183

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ad65 0x4e6d31 0x4e7a3f 0x474901
#	0x4e6d30	github.com/docker/docker/api/server.(*Server).serveAPI+0x130	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/server.go:102
#	0x4e7a3e	github.com/docker/docker/api/server.(*Server).Wait+0x2e		/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/server.go:196

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ada5 0x13743aa 0x474901
#	0x13743a9	github.com/golang/glog.(*loggingT).flushDaemon+0x79	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/golang/glog/glog.go:879

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ada5 0x415ab3 0x474901
#	0x415ab2	main.(*DaemonCli).setupConfigReloadTrap.func1+0x52	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/cmd/dockerd/daemon_unix.go:55

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ada5 0x56362d 0x474901
#	0x56362c	github.com/docker/docker/daemon.(*Daemon).execCommandGC+0x8c	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/daemon/exec.go:250

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ada5 0x58ff25 0x474901
#	0x58ff24	github.com/docker/docker/daemon.(*statsCollector).run+0xc4	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/daemon/stats_collector_unix.go:106

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ada5 0x598b73 0x474901
#	0x598b72	github.com/docker/docker/daemon.setupDumpStackTrap.func1+0x52	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/daemon/debugtrap_unix.go:17

1 @ 0x4438ea 0x4439de 0x41b09f 0x41ada5 0x5dbae7 0x474901
#	0x5dbae6	github.com/docker/docker/pkg/signal.Trap.func1+0x76	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/signal/trap.go:32

1 @ 0x4438ea 0x4439de 0x454272 0x6a8640 0x4b23c2 0x4b2a5c 0x4b1187 0x4b0f98 0x93e3d4 0x8b2bcf 0x474901
#	0x454271	sync.runtime_notifyListWait+0x121						/usr/lib/golang/src/runtime/sema.go:267
#	0x6a863f	sync.(*Cond).Wait+0x7f								/usr/lib/golang/src/sync/cond.go:57
#	0x4b23c1	io.(*pipe).read+0x101								/usr/lib/golang/src/io/pipe.go:47
#	0x4b2a5b	io.(*PipeReader).Read+0x4b							/usr/lib/golang/src/io/pipe.go:129
#	0x4b1186	io.copyBuffer+0x146								/usr/lib/golang/src/io/io.go:390
#	0x4b0f97	io.CopyBuffer+0x87								/usr/lib/golang/src/io/io.go:371
#	0x93e3d3	github.com/docker/docker/pkg/pools.Copy+0xa3					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/pkg/pools/pools.go:81
#	0x8b2bce	github.com/docker/docker/container/stream.(*Config).CopyToPipe.func2+0xae	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/container/stream/streams.go:139

1 @ 0x4438ea 0x4439de 0x454272 0x6a8640 0x9ac3c6 0x9ad9fd 0x9ab9b2 0x5c0bb9 0x474901
#	0x454271	sync.runtime_notifyListWait+0x121						/usr/lib/golang/src/runtime/sema.go:267
#	0x6a863f	sync.(*Cond).Wait+0x7f								/usr/lib/golang/src/sync/cond.go:57
#	0x9ac3c5	google.golang.org/grpc.(*Conn).WaitForStateChange+0x185				/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/clientconn.go:405
#	0x9ad9fc	google.golang.org/grpc.(*unicastPicker).WaitForStateChange+0x4c			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/picker.go:96
#	0x9ab9b1	google.golang.org/grpc.(*ClientConn).WaitForStateChange+0x51			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/clientconn.go:283
#	0x5c0bb8	github.com/docker/docker/libcontainerd.(*remote).handleConnectionChange+0x68	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/libcontainerd/remote_linux.go:139

1 @ 0x4438ea 0x453069 0x451e5c 0x46c2a3 0x474901
#	0x4438e9	runtime.gopark+0x139		/usr/lib/golang/src/runtime/proc.go:259
#	0x453068	runtime.selectgoImpl+0x11d8	/usr/lib/golang/src/runtime/select.go:423
#	0x451e5b	runtime.selectgo+0x1b		/usr/lib/golang/src/runtime/select.go:238
#	0x46c2a2	runtime.ensureSigM.func1+0x2f2	/usr/lib/golang/src/runtime/signal1_unix.go:304

1 @ 0x4438ea 0x453069 0x451e5c 0x8674fb 0x474901
#	0x8674fa	github.com/docker/libnetwork.(*controller).watchLoop+0x14a	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/docker/libnetwork/store.go:439

1 @ 0x4438ea 0x453069 0x451e5c 0x9ad2c3 0x9b8308 0x474901
#	0x9ad2c2	google.golang.org/grpc.(*Conn).transportMonitor+0x1d2	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/clientconn.go:544
#	0x9b8307	google.golang.org/grpc.NewConn.func1+0x177		/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/clientconn.go:355

1 @ 0x4438ea 0x453069 0x451e5c 0x9b84c6 0x474901
#	0x9b84c5	google.golang.org/grpc.(*Conn).WaitForStateChange.func1+0x1a5	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/clientconn.go:394

1 @ 0x4438ea 0x453069 0x451e5c 0x9b92a1 0x474901
#	0x9b92a0	google.golang.org/grpc.NewClientStream.func1+0x1f0	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/stream.go:151

1 @ 0x4438ea 0x453069 0x451e5c 0xb1ce07 0x474901
#	0xb1ce06	google.golang.org/grpc/transport.(*http2Client).controller+0x596	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/transport/http2_client.go:866

1 @ 0x4438ea 0x453069 0x451e5c 0xb2582d 0xb26076 0x4b0b84 0x4b0ce8 0x9adcd1 0x9ae800 0x9b644f 0x8e7b82 0x5c1eb4 0x474901
#	0xb2582c	google.golang.org/grpc/transport.(*recvBufferReader).Read+0x5ec			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/transport/transport.go:141
#	0xb26075	google.golang.org/grpc/transport.(*Stream).Read+0x55				/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/transport/transport.go:294
#	0x4b0b83	io.ReadAtLeast+0xa3								/usr/lib/golang/src/io/io.go:307
#	0x4b0ce7	io.ReadFull+0x57								/usr/lib/golang/src/io/io.go:325
#	0x9adcd0	google.golang.org/grpc.(*parser).recvMsg+0x60					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/rpc_util.go:216
#	0x9ae7ff	google.golang.org/grpc.recv+0x3f						/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/rpc_util.go:300
#	0x9b644e	google.golang.org/grpc.(*clientStream).RecvMsg+0xae				/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/google.golang.org/grpc/stream.go:234
#	0x8e7b81	github.com/docker/containerd/api/grpc/types.(*aPIEventsClient).Recv+0x61	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/docker/containerd/api/grpc/types/api.pb.go:1032
#	0x5c1eb3	github.com/docker/docker/libcontainerd.(*remote).handleEventStream+0x63		/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/libcontainerd/remote_linux.go:286

1 @ 0x4438ea 0x453069 0x451e5c 0xbf2685 0x474901
#	0xbf2684	github.com/docker/libnetwork/osl.removeUnusedPaths+0x434	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/docker/libnetwork/osl/namespace_linux.go:88

1 @ 0x474901

1 @ 0x6608d5 0x4bbc9c 0x4b543b 0x4b39bb 0x9bf6dd 0x5c5b0b 0x474901
#	0x6608d4	syscall.Syscall6+0x4								/usr/lib/golang/src/syscall/asm_linux_amd64.s:44
#	0x4bbc9b	os.(*Process).blockUntilWaitable+0xbb						/usr/lib/golang/src/os/wait_waitid.go:28
#	0x4b543a	os.(*Process).wait+0xaa								/usr/lib/golang/src/os/exec_unix.go:22
#	0x4b39ba	os.(*Process).Wait+0x2a								/usr/lib/golang/src/os/doc.go:49
#	0x9bf6dc	os/exec.(*Cmd).Wait+0x6c							/usr/lib/golang/src/os/exec/exec.go:434
#	0x5c5b0a	github.com/docker/docker/libcontainerd.(*remote).runContainerdDaemon.func1+0x2a	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/libcontainerd/remote_linux.go:418

1 @ 0xb7db3f 0xb7d940 0xb7a661 0x7b6126 0x7b6719 0x4e7c7d 0x78a074 0x7b6a95 0x4e6670 0x78be6d 0x7888f7 0x474901
#	0xb7db3e	runtime/pprof.writeRuntimeProfile+0x9e					/usr/lib/golang/src/runtime/pprof/pprof.go:614
#	0xb7d93f	runtime/pprof.writeGoroutine+0x9f					/usr/lib/golang/src/runtime/pprof/pprof.go:576
#	0xb7a660	runtime/pprof.(*Profile).WriteTo+0x340					/usr/lib/golang/src/runtime/pprof/pprof.go:298
#	0x7b6125	net/http/pprof.handler.ServeHTTP+0x1a5					/usr/lib/golang/src/net/http/pprof/pprof.go:209
#	0x7b6718	net/http/pprof.(*handler).ServeHTTP+0x78				<autogenerated>:1
#	0x4e7c7c	net/http.(Handler).ServeHTTP-fm+0x4c					/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/profiler.go:22
#	0x78a073	net/http.HandlerFunc.ServeHTTP+0x43					/usr/lib/golang/src/net/http/server.go:1726
#	0x7b6a94	github.com/gorilla/mux.(*Router).ServeHTTP+0x254			/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/vendor/src/github.com/gorilla/mux/mux.go:98
#	0x4e666f	github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x6f	/builddir/build/BUILD/docker-1398f249013601ab999d286910664d70fd1329a2/_build/src/github.com/docker/docker/api/server/router_swapper.go:29
#	0x78be6c	net/http.serverHandler.ServeHTTP+0x7c					/usr/lib/golang/src/net/http/server.go:2202
#	0x7888f6	net/http.(*conn).serve+0x4b6						/usr/lib/golang/src/net/http/server.go:1579

@runcom
Copy link
Collaborator

runcom commented Aug 21, 2017

Seems like an IO issue, I will look into this.

@agunnerson-ibm
Copy link

@runcom Thanks!

I'm not sure if this is a red herring, but I noticed that the nessus scans we're required to run causes a lot of docker exec failures due to the lack of a shell in the container. Any chance that those failures cause some resources to not be cleaned up properly? It looks like stdout is opened, but not always closed.

[andrew.gunnerson@[REDACTED] ~]$ sudo journalctl -u docker | grep -c 'attach: stdout: begin'
1635
[andrew.gunnerson@[REDACTED] ~]$ sudo journalctl -u docker | grep -c 'attach: stdout: end'
1025

Debug log during exec failure:

Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.734134390-04:00" level=debug msg="attach: stdout: begin"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.734197097-04:00" level=debug msg="attach: stderr: begin"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.834128491-04:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\\"exec: \\\\\\\"sh\\\\\\\": executable file not found in $PATH\\\"\\n\""
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.834689978-04:00" level=debug msg="Calling GET /v1.24/exec/1532b16f6677f16e9c15df097265b73225021a302755d72ef76b6cd893babd0f/json"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.834982048-04:00" level=debug msg="{Action=json, Username=nessus, LoginUID=5000, PID=6271}"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.841344851-04:00" level=debug msg="Calling GET /v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dpodsandbox%22%3Atrue%7D%7D&limit=0"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.841477182-04:00" level=debug msg="{Action=json, LoginUID=4294967295, PID=5590}"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.847242876-04:00" level=debug msg="Calling GET /v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dcontainer%22%3Atrue%7D%7D&limit=0"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.847336750-04:00" level=debug msg="{Action=json, LoginUID=4294967295, PID=5590}"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.974121601-04:00" level=debug msg="Calling POST /v1.24/containers/150405dc696/exec"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.974361439-04:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"sh\",\"-c\",\"[[ -f /etc/redhat-release ]] \\u0026\\u0026 echo 'File Exists' || echo 'File does not exist'\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.974634605-04:00" level=info msg="{Action=exec, Username=nessus, LoginUID=5000, PID=6354}"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.976422187-04:00" level=debug msg="Calling POST /v1.24/exec/8bc3fe7facb6902c1bc3854f5935c47c6a3909a84d80de0c98ae42710a71de44/start"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.976616157-04:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"sh\",\"-c\",\"[[ -f /etc/redhat-release ]] \\u0026\\u0026 echo 'File Exists' || echo 'File does not exist'\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.976835187-04:00" level=info msg="{Action=start, Username=nessus, LoginUID=5000, PID=6354}"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.977096254-04:00" level=debug msg="starting exec command 8bc3fe7facb6902c1bc3854f5935c47c6a3909a84d80de0c98ae42710a71de44 in container 150405dc696ca86a7e186ad9942f793f41df4f4b0e28f54c9ec499c2e045a83b"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.981081409-04:00" level=debug msg="attach: stdout: begin"
Aug 22 04:08:18 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:18.981162378-04:00" level=debug msg="attach: stderr: begin"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.044655361-04:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\\"exec: \\\\\\\"sh\\\\\\\": executable file not found in $PATH\\\"\\n\""
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.045234336-04:00" level=debug msg="Calling GET /v1.24/exec/8bc3fe7facb6902c1bc3854f5935c47c6a3909a84d80de0c98ae42710a71de44/json"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.045556400-04:00" level=debug msg="{Action=json, Username=nessus, LoginUID=5000, PID=6354}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.170877657-04:00" level=debug msg="Calling POST /v1.24/containers/150405dc696/exec"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.171114660-04:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"sh\",\"-c\",\"[[ -f /etc/centos-release ]] \\u0026\\u0026 echo 'File Exists' || echo 'File does not exist'\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.171364614-04:00" level=info msg="{Action=exec, Username=nessus, LoginUID=5000, PID=6411}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.173098693-04:00" level=debug msg="Calling POST /v1.24/exec/fbe3c907d4fbbd2a481d0e5ddf07cedf7fe4b283d3db016b00fb7197eb163274/start"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.173278670-04:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"sh\",\"-c\",\"[[ -f /etc/centos-release ]] \\u0026\\u0026 echo 'File Exists' || echo 'File does not exist'\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.173497123-04:00" level=info msg="{Action=start, Username=nessus, LoginUID=5000, PID=6411}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.173710953-04:00" level=debug msg="starting exec command fbe3c907d4fbbd2a481d0e5ddf07cedf7fe4b283d3db016b00fb7197eb163274 in container 150405dc696ca86a7e186ad9942f793f41df4f4b0e28f54c9ec499c2e045a83b"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.174272277-04:00" level=debug msg="attach: stdout: begin"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.174336934-04:00" level=debug msg="attach: stderr: begin"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.224088856-04:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\\"exec: \\\\\\\"sh\\\\\\\": executable file not found in $PATH\\\"\\n\""
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.225058920-04:00" level=debug msg="Calling GET /v1.24/exec/fbe3c907d4fbbd2a481d0e5ddf07cedf7fe4b283d3db016b00fb7197eb163274/json"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.225216106-04:00" level=debug msg="{Action=json, Username=nessus, LoginUID=5000, PID=6411}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.278991098-04:00" level=debug msg="Calling GET /v1.24/containers/json?filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dpodsandbox%22%3Atrue%7D%7D&limit=0"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.279131079-04:00" level=debug msg="{Action=json, LoginUID=4294967295, PID=5590}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.282520875-04:00" level=debug msg="Calling GET /v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dcontainer%22%3Atrue%7D%2C%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.282593704-04:00" level=debug msg="{Action=json, LoginUID=4294967295, PID=5590}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.288500188-04:00" level=debug msg="Calling GET /v1.24/containers/json?filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dpodsandbox%22%3Atrue%7D%7D&limit=0"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.288600297-04:00" level=debug msg="{Action=json, LoginUID=4294967295, PID=5590}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.291599994-04:00" level=debug msg="Calling GET /v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22io.kubernetes.docker.type%3Dcontainer%22%3Atrue%7D%2C%22status%22%3A%7B%22running%22%3Atrue%7D%7D&limit=0"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.291694340-04:00" level=debug msg="{Action=json, LoginUID=4294967295, PID=5590}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.353187091-04:00" level=debug msg="Calling POST /v1.24/containers/150405dc696/exec"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.353404095-04:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"sh\",\"-c\",\"[[ -f /etc/fedora-release ]] \\u0026\\u0026 echo 'File Exists' || echo 'File does not exist'\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.353570777-04:00" level=info msg="{Action=exec, Username=nessus, LoginUID=5000, PID=6486}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.355214330-04:00" level=debug msg="Calling POST /v1.24/exec/106753bd9fff5159226674260d538f61ef9c4148971bc235e8ff6d428ef27f71/start"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.355346076-04:00" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"sh\",\"-c\",\"[[ -f /etc/fedora-release ]] \\u0026\\u0026 echo 'File Exists' || echo 'File does not exist'\"],\"Detach\":false,\"DetachKeys\":\"\",\"Privileged\":false,\"Tty\":false,\"User\":\"\"}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.355482804-04:00" level=info msg="{Action=start, Username=nessus, LoginUID=5000, PID=6486}"
Aug 22 04:08:19 [REDACTED] dockerd-current[4782]: time="2017-08-22T04:08:19.355686510-04:00" level=debug msg="starting exec command 106753bd9fff5159226674260d538f61ef9c4148971bc235e8ff6d428ef27f71 in container 150405dc696ca86a7e186ad9942f793f41df4f4b0e28f54c9ec499c2e045a83b"

@runcom
Copy link
Collaborator

runcom commented Aug 22, 2017

It could be very likely an IO leak from your logs, I'm looking into this right now.

@fortinj66
Copy link
Author

I am seeing the same:

[root@os-node-d04 log]# journalctl -u docker | grep -c 'attach: stdout: begin'
147613
[root@os-node-d04 log]# journalctl -u docker | grep -c 'attach: stdout: end'
60692

@agunnerson-ibm
Copy link

After further testing, it looks like the streams are closed properly when running:

docker exec -i <container ID> non_existent_command

but the streams are not closed properly when running:

docker exec <container ID> non_existent_command

@fortinj66
Copy link
Author

are you seeing an uptick in the docker goroutines when you run without the -i (the bad method)?

@agunnerson-ibm
Copy link

agunnerson-ibm commented Aug 22, 2017

Ah yep, I didn't catch that before. 4 goroutines are being created with or without -i. EDIT: And they never exit

EDIT 2: I can't reproduce the issue with upstream 17.05. I'll test upstream 1.12.6 now. If the issue happens there, I'll do a git bisect to find out which commit fixed the problem.

@agunnerson-ibm
Copy link

@fortinj66 @runcom I found the commit that fixed the leak in upstream docker after doing a git bisect. It looks like this is the fix:

Issue: moby#30311
Commit: moby@3cc0d6b

Could we get that backported to 1.12.6?

@runcom
Copy link
Collaborator

runcom commented Aug 22, 2017

Yeah thanks for spotting that, I'll try to backport here and comment here once done.

@fortinj66
Copy link
Author

That is great news!

@agunnerson-ibm
Copy link

I just built new RPMs, pushed them out to one of our dev kubernetes clusters, and things are looking good! I will let it run overnight and see how it goes.

[andrew.gunnerson@[REDACTED] ~]$ sudo docker info 2>/dev/null | grep Goroutines
 Goroutines: 47
[andrew.gunnerson@[REDACTED] ~]$ sudo docker exec -it b398be892854 foobar
rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"foobar\\\": executable file not found in $PATH\"\n"
[andrew.gunnerson@[REDACTED] ~]$ sudo docker exec -it b398be892854 foobar
rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"foobar\\\": executable file not found in $PATH\"\n"
[andrew.gunnerson@[REDACTED] ~]$ sudo docker exec -it b398be892854 foobar
rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"exec: \\\"foobar\\\": executable file not found in $PATH\"\n"
[andrew.gunnerson@[REDACTED] ~]$ sudo docker info 2>/dev/null | grep Goroutines
 Goroutines: 47

runcom pushed a commit that referenced this issue Aug 22, 2017
Ref: #257

Signed-off-by: Dmitry Shyshkin <dmitry@shyshkin.org.ua>
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
@runcom
Copy link
Collaborator

runcom commented Aug 22, 2017

Cherry-picked upstream fix to docker-1.12.6, docker-1.13.1, docker-1.13.1-rhel - will keep this open until anyone can confirm it really fixes this issue :) Thanks a lot guys

/cc @jwhonce

@fortinj66
Copy link
Author

fortinj66 commented Aug 22, 2017

I'll give this a build tomorrow morning and give it a try...

Actually, I built them tonight and deployed to my dev Openshift 1.5 cluster. So far looks good but I'll know better tomorrow.

@rhatdan
Copy link
Member

rhatdan commented Aug 23, 2017

Greate work everyone on finding this.

@fortinj66
Copy link
Author

I can happily report that docker memory usage so far looks very good. Where normally I would see dockerd at 4-5 Gig, it is now steady at just under 200 Meg. Huge difference! The goroutines count is under 200 where before it was over 150,000.

I believe that this issue has been resolved !!

I have pushed this to a busier set of servers and will monitor during the day.

@agunnerson-ibm Awesone job tracking this down!!

@agunnerson-ibm
Copy link

agunnerson-ibm commented Aug 23, 2017

@agunnerson-ibm Awesone job tracking this down!!

Thanks!

I can also confirm that it's fixed on our cluster! Only approximately ~50 goroutines on each node in an idle kubernetes cluster after the nessus scans run. It used to go above 200,000. Memory usage has been below 150MB too.

@fortinj66
Copy link
Author

I'll keep this open until the official Centos7 RPMs are available and test again with those.

--John

@fortinj66
Copy link
Author

After several days of testing with the new code I have not seen any sign of the memory leak rearing its ugly head....

So I guess all we are going to be waiting for is the new RPMs generated for Centos and RHEL...

@runcom @lsm5

@agunnerson-ibm
Copy link

Agreed. We pushed it out to all of our kubernetes clusters and haven't run into this issue again.

@runcom
Copy link
Collaborator

runcom commented Aug 24, 2017

Awesome, let's keep this open until we build rpms

@xThomo
Copy link

xThomo commented Nov 10, 2017

@runcom if i've checked the correct branch and commits, this fix is now available in docker-1.12.6-61.git85d7426.el7.centos.x86_64.rpm has anyone used this on 7.3 yet?

@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2018

@gogeof this is a random issue and I believe has nothing to do with the current issue. You should open a different issue.

@gogeof
Copy link

gogeof commented Apr 4, 2018

@rhatdan yeah, ok I will open a different issue if the problem should again in new docker version. Thanks!

@honeyankit
Copy link

honeyankit commented Jul 26, 2018

Hi Everyone,

I am facing this issue "dockerd leaks ExecIds on failed exec -i ". Whenever I use wrong command with docker exec -ti < wrong command> the ExecIDs leakage happens. I am using docker version 1.12.6 and I am able to reproduce this issue.

Steps to reproduce:
$ docker exec -ti fdd1b3133760 dsdss

OS - CoreOS v: 1632.3.0
My docker details:
$ docker --version
Docker version 1.12.6, build d9ad3fc

$ docker info
Containers: 6
Running: 6
Paused: 0
Stopped: 0
Images: 11
Server Version: 1.12.6
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp selinux
Kernel Version: 4.14.19-coreos
Operating System: Container Linux by CoreOS 1632.3.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.8 GiB
Name: node-216-31.rack216.omaha
ID: TRSC:QML2:RLQX:NH4E:PUL2:FHSX:ZN4W:5NPU:POVT:BJJM:BI3S:DU4P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
0.0.0.0/0
127.0.0.0/8

@rhatdan
Copy link
Member

rhatdan commented Jul 26, 2018

We are not updating docker-1.12 any longer, does this issue still exist in docker-1.13?

@honeyankit
Copy link

honeyankit commented Jul 26, 2018

I understand that but the previous comments suggests that this issue is fixed in docker 1.12.6.
All my production DC are having docker version 1.12.6 and it will take time for us to move it to another docker version because we are using CoreOS as our OS and to upgrade the docker, we need to upgrade the CoreOS altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants