Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very long Docker run commands cause Docker to hang indefinitely #37525

Open
idiamond-stripe opened this issue Jul 25, 2018 · 4 comments
Open

Comments

@idiamond-stripe
Copy link

idiamond-stripe commented Jul 25, 2018

Description

Docker hangs when a very long run command is provided. It's possible that this is a result of a deadlock due to some STDERR buffer overflowing, and Docker never receiving the OCI runtime create failed output it expects from the underlying OCI driver. This also prevents the daemon from being restarted properly.

cc: @jmillikin-stripe

Steps to reproduce the issue:

  1. Run the following command https://gist.github.com/idiamond-stripe/3687fd43e53cbca28dd78012c679c813

Generated by:

with open('docker-hang.sh', 'w') as fp: 
    fp.write('docker run -d alpine /')
    fp.write('a' * 33000)

Describe the results you received:
Docker hangs forever and the container remains in the CREATED state forever. Trying to docker inspect, docker start, or docker rm the created container hangs forever.

Describe the results you expected:
Docker exits with the error message file name too long": unknown.. or a similar error message identifying the length as being too long. No container should be created.

Additional information you deem important (e.g. issue happens only occasionally):
I've confirmed this happens in 18.06.1-ce of Docker for Mac and 18.03.1-ce of Docker for Linux.

We suspect this is a deadlock related to a buffer overflow where something is overflowing a buffer that is set to 32kb.

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:13:02 2018
 OS/Arch:      darwin/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:22:38 2018
  OS/Arch:      linux/amd64
  Experimental: true

Output of docker info:

Containers: 92
 Running: 0
 Paused: 0
 Stopped: 92
Images: 56
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.952GiB
Name: linuxkit-025000000001
ID: HKWI:CTOH:ZA55:ZZYY:XT7E:56TA:35BY:H4XU:HDRM:BQ2Z:ERAH:BGIJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 21
 Goroutines: 39
 System Time: 2018-07-25T23:11:48.755000528Z
 EventsListeners: 2
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

@thaJeztah
Copy link
Member

wondering if this is specific to docker for mac (which uses a proxy to forward the local docker socket to the daemon running inside the LinuxKit vmware)

Does this also reproduce if you run this from within a container that connects to the docker daemon through the socket (that way, not going through the proxy);

The following command gives you an interactive container with the docker client inside it, connecting to the daemon inside the LinuxKit VM

docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker:18.06 sh

@idiamond-stripe
Copy link
Author

idiamond-stripe commented Jul 25, 2018

I was able to reproduce this on a Linux host:

docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false

docker info:

sudo docker info
Containers: 146
 Running: 7
 Paused: 0
 Stopped: 139
Images: 12
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-1063-aws
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 59.97GiB
Name: <redacted>
ID: ZJNS:E7YF:QYF3:2XFU:FRBL:UNSH:URKP:MLAW:33TZ:OETM:SGAV:IZVG
Docker Root Dir: /<redacted>/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true

WARNING: No swap limit support

@idiamond-stripe
Copy link
Author

idiamond-stripe commented Jul 26, 2018

I'm able to reproduce this behavior at HEAD of master from inside a development container. Here's the debug logging after running the script I've linked above:

root@197e211a590a:/go/src/github.com/docker/docker# docker version
DEBU[2018-07-26T05:44:05.143726887Z] Calling GET /_ping
DEBU[2018-07-26T05:44:05.144483078Z] Calling GET /v1.30/version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:15:15 2017
 OS/Arch:      linux/amd64

Server:
 Version:      dev
 API version:  1.38 (minimum version 1.12)
 Go version:   go1.10.3
 Git commit:   8239526f1
 Built:        Thu Jul 26 05:43:44 2018
 OS/Arch:      linux/amd64
 Experimental: false

root@197e211a590a:/go/src/github.com/docker/docker# ./docker-hang.sh
DEBU[2018-07-26T05:45:52.392678664Z] Calling GET /_ping
DEBU[2018-07-26T05:45:52.394589949Z] Calling POST /v1.30/containers/create
Unable to find image 'alpine:latest' locally
DEBU[2018-07-26T05:45:52.398363366Z] Calling GET /v1.30/info
DEBU[2018-07-26T05:45:52.407024765Z] Calling POST /v1.30/images/create?fromImage=alpine&tag=latest
DEBU[2018-07-26T05:45:52.407150197Z] Trying to pull alpine from https://registry-1.docker.io v2
DEBU[2018-07-26T05:45:53.533400548Z] Pulling ref from V2 registry: alpine:latest
DEBU[2018-07-26T05:45:53.533448874Z] docker.io/library/alpine:latest resolved to a manifestList object with 6 entries; looking for a unknown/amd64 match
DEBU[2018-07-26T05:45:53.533462396Z] found match for linux/amd64 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:0873c923e00e0fd2ba78041bfb64a105e1ecb7678916d1f7776311e45bf5634b
latest: Pulling from library/alpine
DEBU[2018-07-26T05:45:53.933232573Z] pulling blob "sha256:8e3ba11ec2a2b39ab372c60c16b421536e50e5ce64a0bc81765c2e38381bcff6"
8e3ba11ec2a2: Downloading [>                                                  ]  22.23kB/2.207MB
8e3ba11ec2a2: Extracting [==================================================>]  2.207MB/2.207MB
8e3ba11ec2a2: Pull complete
Digest: sha256:7043076348bf5040220df6ad703798fd8593a0918d06d3ce30c6c93be117e430
Status: Downloaded newer image for alpine:latest
DEBU[2018-07-26T05:45:54.572021843Z] Calling POST /v1.30/containers/create
WARN[2018-07-26T05:45:54.577563826Z] Couldn't run auplink before unmount /var/lib/docker/aufs/mnt/a64bf2a71b81730134f5f0d8b9276ab62c83eaca6702e846440db917629320d0-init  error="exit status 25" storage-driver=aufs
DEBU[2018-07-26T05:45:54.636820787Z] container mounted via layerStore: &{/var/lib/docker/aufs/mnt/a64bf2a71b81730134f5f0d8b9276ab62c83eaca6702e846440db917629320d0 0x32263a0 0x32263a0}
WARN[2018-07-26T05:45:54.637828456Z] Couldn't run auplink before unmount /var/lib/docker/aufs/mnt/a64bf2a71b81730134f5f0d8b9276ab62c83eaca6702e846440db917629320d0  error="exit status 25" storage-driver=aufs
faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073
DEBU[2018-07-26T05:45:54.657919727Z] Calling POST /v1.30/containers/faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073/wait?condition=next-exit
DEBU[2018-07-26T05:45:54.658380950Z] Calling POST /v1.30/containers/faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073/start
DEBU[2018-07-26T05:45:54.658951836Z] container mounted via layerStore: &{/var/lib/docker/aufs/mnt/a64bf2a71b81730134f5f0d8b9276ab62c83eaca6702e846440db917629320d0 0x32263a0 0x32263a0}
DEBU[2018-07-26T05:45:54.659223840Z] Assigning addresses for endpoint keen_franklin's interface on network bridge
DEBU[2018-07-26T05:45:54.659235232Z] RequestAddress(LocalDefault/172.18.0.0/16, <nil>, map[])
DEBU[2018-07-26T05:45:54.659251415Z] Request address PoolID:172.18.0.0/16 App: ipam/default/data, ID: LocalDefault/172.18.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65533, Sequence: (0xc0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:0 Serial:false PrefAddress:<nil>
DEBU[2018-07-26T05:45:54.664983472Z] Assigning addresses for endpoint keen_franklin's interface on network bridge
DEBU[2018-07-26T05:45:54.670761331Z] Programming external connectivity on endpoint keen_franklin (8163691f8eb43ee37ef38aef0d7d62113b738d72f8f437399be6f4af68ea0d2a)
DEBU[2018-07-26T05:45:54.671918228Z] EnableService faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073 START
DEBU[2018-07-26T05:45:54.671938705Z] EnableService faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073 DONE
DEBU[2018-07-26T05:45:54.675232330Z] bundle dir created                            bundle=/var/run/docker/containerd/faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073 module=libcontainerd namespace=moby root=/var/lib/docker/aufs/mnt/a64bf2a71b81730134f5f0d8b9276ab62c83eaca6702e846440db917629320d0
DEBU[0010] event published                               ns=moby topic="/containers/create" type=containerd.events.ContainerCreate
INFO[0010] shim docker-containerd-shim started           address="/containerd-shim/moby/faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073/shim.sock" debug=true pid=2813
DEBU[0000] registering ttrpc server
DEBU[0000] serving api on unix socket                    socket="[inherited from parent]"
DEBU[2018-07-26T05:45:54.769672375Z] sandbox set key processing took 44.519981ms for container faba3d6e55743f5bf31b5f380bddd121fa4d984d1699758259fb6a03a6efa073

The command hangs here. I ran the following in another shell:

root@197e211a590a:/go/src/github.com/docker/docker# docker ps -a
DEBU[2018-07-26T05:46:48.100565942Z] Calling GET /_ping
DEBU[2018-07-26T05:46:48.100968552Z] Calling GET /v1.30/containers/json?all=1
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
faba3d6e5574        alpine              "/aaaaaaaaaaaaaaaa..."   54 seconds ago      Created                                 keen_franklin

@thaJeztah
Copy link
Member

ping @kolyshkin @mlaventure PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants