Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot start container: exit status 4 #12547

Closed
dnephin opened this issue Apr 20, 2015 · 11 comments
Closed

Cannot start container: exit status 4 #12547

dnephin opened this issue Apr 20, 2015 · 11 comments
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/1.5

Comments

@dnephin
Copy link
Member

dnephin commented Apr 20, 2015

We have a build system which runs across many hosts, and runs 20+ concurrent builds per host. Each build will start/stop a few docker containers.

We're consistently hitting this error trying to start containers, it happens on roughly 5% of the builds. We tried adding a retry to the "start container" API call (5 tries with exponential backoff up to 15s) , but that didn't resolve the issue.

Earlier related issues: #8912, #6010

docker version

Go version (client): go1.4.1
Git commit (client): 5f78bd5
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): 5f78bd5

docker info (from one random host, they should all be pretty similar)

Containers: 0
Images: 200
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 512
Execution Driver: native-0.2
Kernel Version: 3.13.0-43-generic
Operating System: <unknown> (containerized)
CPUs: 32
Total Memory: 240.2 GiB

client request log

"GET /run/docker.sock/v1.14/containers/json?all=1&limit=-1&trunc_cmd=1&size=0 HTTP/1.1" 200 None
"POST /run/docker.sock/v1.14/containers/create?name=yelpmainsandboxsb91_sessionsservice_1 "GET /run/docker.sock/v1.14/containers/f358242a34a9fc89a70ba588203b938418322deb5d086f50479b8653
"GET /run/docker.sock/v1.14/containers/json?all=0&limit=-1&trunc_cmd=1&size=0 HTTP/1.1" 200 None
"GET /run/docker.sock/v1.14/containers/json?all=1&limit=-1&trunc_cmd=1&size=0 HTTP/1.1" 200 None
"POST /run/docker.sock/v1.14/containers/f358242a34a9fc89a70ba588203b938418322deb5d086f50479b86531a1349d3/start HTTP/1.1" 500 106
...

APIError: 500 Server Error: Internal Server Error ("Cannot start container f358242a34a9fc89a70ba588203b938418322deb5d086f50479b86531a1349d3:  (exit status 4)")

dockerd log

-job log(create, a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16, container_name:6e4f1ea) = OK (0)
+job log(create, a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16, container_name:6e4f1ea)
+job container_inspect(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16) 
[info] GET /v1.14/containers/a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16/json
-job container_inspect(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16) = OK (0)
[info] POST /v1.14/containers/a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16/start
+job start(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16)
+job allocate_interface(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16)
-job allocate_interface(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16) = OK (0)
-job release_interface(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16) = OK (0)
+job release_interface(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16) 
+job log(die, a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16, container_name:6e4f1ea)
Cannot start container a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16:  (exit status 4)
-job log(die, a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16, container_name:6e4f1ea) = OK (0)
-job start(a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16) = ERR (1)
[error] server.go:1207 Handler for POST /containers/{name:.*}/start returned error: Cannot start container a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16:  (exit status 4)
[error] server.go:110 HTTP Error: statusCode=500 Cannot start container a865857adf0b126a534124fd83f0cd6a36f7ab4253b9947d63cfe6b087449a16:  (exit status 4)
@dnephin
Copy link
Member Author

dnephin commented Apr 20, 2015

None of the containers have ports bound to the host, but they all expose at least one port. From the dockerd log I looked over the code, and from what I can tell, this exit status comes from an iptables command.

@dnephin
Copy link
Member Author

dnephin commented Apr 28, 2015

+/system/networking

I believe this is related to the problem described in #10218 (comment)

http://patchwork.ozlabs.org/patch/287955/ suggests "exit status 4" from iptables is related to a concurrency issues (exactly what we're seeing here), and might be recoverable. I'll try to dig in further.

@thaJeztah thaJeztah added /system/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. labels Apr 28, 2015
@cebe
Copy link

cebe commented Apr 29, 2015

I am expieriencing the same issue on a testing server where I run several tests in parallel i.e. starting many docker containers simulatinously.

docker run --rm=true -v /var/lib/jenkins/jobs/dev-master:/opt/test --link cd2d1333d290f2a27228189ec52cda6ce153c54bec3e6c57d5b2a67b8cec4418:postgres yiitest/php:master phpunit --verbose --color

results in

time="2015-04-29T21:12:31+02:00" level=fatal msg="Error response from daemon: Cannot start container 1f96dfd03eb2e7cf102c382ca75acdfb54a5a320b6289594327cf2b0ab5003b0:  (exit status 4)" 

Here is a gist of syslog at that time, there is pretty much going on but nothing I think is related because I see the same logging even if all containers run fine. https://gist.github.com/cebe/73190a21cacd37a92581
here is docker.log: https://gist.github.com/cebe/a31059d8023fa2e1e6a4#file-docker-log-L18-L19
there is also in line 7 something failing with the link.

docker version

Client version: 1.6.0
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 4749651
OS/Arch (client): linux/amd64
Server version: 1.6.0
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 4749651
OS/Arch (server): linux/amd64

let me know if you need further input.

@dnephin
Copy link
Member Author

dnephin commented May 4, 2015

I was able to workaround this issue for now. I did that by removing all PORTS and EXPOSEfrom the Dockerfiles and "docker run" calls. I'm still able to use docker links (the ports are known ahead of time), and I can always communicate with the container from the host by looking up the IP address from docker inspect.

@LK4D4
Copy link
Contributor

LK4D4 commented May 4, 2015

@dnephin So, do you use just iptables without other firewalls?

@dnephin
Copy link
Member Author

dnephin commented May 4, 2015

@LK4D4 I'm not quite sure what you mean. These containers are being used as part of a test suite, so they aren't really being exposed to anything public. We aren't doing anything except for what docker is doing to setup the networking. The hosts shouldn't be running any other firewall, just iptables.

@LK4D4
Copy link
Contributor

LK4D4 commented May 4, 2015

@dnephin Yes, that's what I'm asked. Thanks!

@cebe
Copy link

cebe commented May 4, 2015

@dnephin thanks for the hint, I was not aware that this was possible without EXPOSE.

@adyatlov
Copy link

I've been able to reproduce the issue on Docker 1.11 by running the following script in a background:

#!/bin/bash
while :
do
    iptables -w -I INPUT -p tcp --dport 12345 --syn -j DROP
    iptables -w -D INPUT -p tcp --dport 12345 --syn -j DROP
done

and the docker run -it -p 1111:1111 ubuntu /bin/bash command.

But I wasn't able to reproduce it with 1.12 and 1.13. Does it mean that the issue was addressed in these versions?

@bboreham
Copy link
Contributor

bboreham commented Jun 5, 2017

As of iptables version 1.6.2, -w is implemented as a file lock on /run/xlock. Therefore if your container does not mount that file you are locking a file that is private to your container and hence if another process on the host is manipulating the table at the same time yours will hit the clash and return exit code 4.

Put it another way: -w inside a container does not work unless you mount /run/xlock.

More points to consider:

  • if /run/xlock does not exist when Docker encounters the -v instruction it will create a directory which is bad news
  • if you mount /run you now have mounts within mounts which can tickle bugs elsewhere

mterrel added a commit to unboundedsystems/adapt that referenced this issue Nov 21, 2019
Test fails intermittently because dind container fails to come up.
The issue appears to be a docker problem where it tries to modify
iptables rules and gets error code 4, similar to this bug:
moby/moby#12547
Fix is to retry starting the container using restartPolicy.
@thaJeztah
Copy link
Member

Let me close this ticket for now, as it looks like it went stale.

@thaJeztah thaJeztah closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/1.5
Projects
None yet
Development

No branches or pull requests

8 participants