New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting container fails with 'System error: read parent: connection reset by peer' #14203

Closed
meeee opened this Issue Jun 26, 2015 · 90 comments

Comments

Projects
None yet
@meeee

meeee commented Jun 26, 2015

On our CI server, we run tests in Docker containers using docker-compose. We link 2-15 containers during one run. We ensure that test jobs running concurrently have different docker-compose project names.

Since we upgraded to Docker 1.7.0 (from 1.6), docker-compose to 1.3.1 (from 1.2), and started killing containers instead of stopping them (faster, remove them anyway), we twice had containers failing to start with the following message from compose:

Creating x_db_1...
Creating x_1...
Cannot start container 10bbc5af8ec0d3bb39b207a6474ec70a0954bff01ff94389684a8b9f52df6067: [8] System error: read parent: connection reset by peer

/var/log/docker.log contains the following:

time="2015-06-25T10:29:44.322521665+02:00" level=info msg="POST /v1.18/containers/19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2/start" 
time="2015-06-25T10:29:44.690044235+02:00" level=warning msg="signal: killed" 
time="2015-06-25T10:29:44.915997839+02:00" level=error msg="Handler for POST /containers/{name:.*}/start returned error: Cannot start container 19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2: [8] System error: read parent: connection reset by peer" 
time="2015-06-25T10:29:44.916111471+02:00" level=error msg="HTTP Error" err="Cannot start container 19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2: [8] System error: read parent: connection reset by peer" statusCode=500 

The container is created, but doesn't start. Trying to manually start it using docker start fails with the same error message. Memory is available and the kernel log doesn't show any message from the OOM killer.

Restarting docker temporarily solves the problem, so I assume this is a problem with docker itself, not with docker-compose.

docker version:

Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): linux/amd64
Server version: 1.7.0
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 0baf609
OS/Arch (server): linux/amd64

docker info:

Containers: 30
Images: 451
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 517
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-0.bpo.4-amd64
Operating System: Debian GNU/Linux 7 (wheezy)
CPUs: 4
Total Memory: 15.52 GiB
Name: <hostname>
ID: HYYT:WNZW:UPU7:VI2O:HUTP:EZVV:2MQ2:WCRJ:3SHJ:LZXF:MVLS:P3XC
WARNING: No memory limit support
WARNING: No swap limit support

uname -a:

<hostname> 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1~bpo70+1 (2015-04-27) x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.): Physical machine

Steps to Reproduce:

  1. docker-compose run ..., kill and rm 50-200 (estimated) containers with links using docker-compose
  2. At some point, docker fails to start a container.

Actual Results: Starting the container fails.

Expected Results: Container starts.

Additional info:

Kubernetes seemed to have a similar problem, apparently some sort of race. The issue also links to a few occurences where people had a similar problem (mainly on IRC).

@meeee meeee changed the title from Starting container fails with `System error: read parent: connection reset by peer` to Starting container fails with 'System error: read parent: connection reset by peer' Jun 26, 2015

eremite added a commit to eremite/docker_rails_app that referenced this issue Jun 26, 2015

clean: Stop instead of kill running docker containers.
In the hopes that it'll prevent "System error: read parent: connection
reset by peer" errors. See moby/moby#14203
@cjcullen

This comment has been minimized.

cjcullen commented Jul 6, 2015

I've included my repro instructions here: kubernetes/kubernetes#9822 (comment)

Unfortunately, I don't know how to get docker into the magic state where this is reproducible.

@dchen1107

This comment has been minimized.

Contributor

dchen1107 commented Jul 6, 2015

Please note that the issue we found with kubernetes is with 1.6.2 docker release.

@shapiroj

This comment has been minimized.

shapiroj commented Jul 14, 2015

We see this exact error in several of our containers. Our only workaround is to rename the container. Interested in hearing other workarounds or solutions.

@njuicsgz

This comment has been minimized.

njuicsgz commented Jul 17, 2015

Any advice for this bug? I suffered it for a long time with docker v1.6.2 in our production environment.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Jul 17, 2015

Seems like an error connecting to sqlite, or rather while reading it.

@jjelev

This comment has been minimized.

jjelev commented Aug 4, 2015

Docker 1.7.1 and Ubuntu 14.04. Updated my environment through apt-get and restarted. Nginx container no longer failed to start.

@meeee

This comment has been minimized.

meeee commented Aug 4, 2015

@jjelev Yep, as stated in the original description, restarting Docker temporarily fixes the problem. Unfortunately, the problem reappears later.

@airhorns

This comment has been minimized.

airhorns commented Aug 4, 2015

We're seeing this too, will try and dig up some more information. We're simultaneously executing a lot of the same container, might have something to do with it? Most executions work fine but this has just started happening.

@airhorns

This comment has been minimized.

airhorns commented Aug 4, 2015

Wow, as has been seen by @cjcullen , the length of the docker run command seems to have something to do with this. I changed the length of one of the -e vars and my deterministic container launch failure went away.

@adeslade

This comment has been minimized.

adeslade commented Aug 5, 2015

I've been having the same issue. Added an extra space to the failing command and it worked. So odd.

@rflynn

This comment has been minimized.

rflynn commented Aug 5, 2015

We've hit this bug twice this morning after never seeing it in months of Docker use. Re: string length, we have made tweaks to our command recently.

@meeee

This comment has been minimized.

meeee commented Aug 11, 2015

I didn't expect the workaround to work, but after adding a few spaces in an environment variable, we now had a full week without the error occurring. Previously, it occurred almost daily.

We added this environment variable to all our compose containers:

DOCKER_FIX: '                                        '
@andrecp

This comment has been minimized.

andrecp commented Aug 20, 2015

I am having this same problem and I same docker version as OP

@andrecp

This comment has been minimized.

andrecp commented Aug 23, 2015

I also had to add

        - DOCKER_FIX='                                        '

to all my dockerbuild files for this error to go away...

@chrisjhoughton

This comment has been minimized.

chrisjhoughton commented Aug 23, 2015

+1 one for the weird DOCKER_FIX!

@airhorns

This comment has been minimized.

airhorns commented Sep 16, 2015

@burke or @sirupsen do you guys have any ideas on this one?

@burke

This comment has been minimized.

Contributor

burke commented Sep 16, 2015

Nope, this is weird, I can't imagine what it would be.

@AlbertodelaCruz

This comment has been minimized.

AlbertodelaCruz commented Sep 24, 2015

+1 adding a FOO env variable. Docker version 1.8.2 and docker-compose 1.4.0.
Curiously, not all hosts suffer this behaviour and need the variable.

alapidas added a commit to zenoss/zenoss-service that referenced this issue Oct 1, 2015

alapidas added a commit to control-center/serviced that referenced this issue Oct 1, 2015

@rachit1arora

This comment has been minimized.

rachit1arora commented Mar 24, 2016

hello , i have also encountered this problem in docker version 1.9.1 .
I am encountering this in our production environment when we do a
docker exec and not during the docker run .

I know that the fix is avilable in docker 1.10 but is there a way to get the fix in docker 1.9.1 ? We may not be able to migrate to docker 1.10 soon .

Is there a work around we can try in docker exec command ? Many people reported that docker run -e DOCKER_FIX='' worked for them .
How Can we resolve this in docker exec command ?

@extemporalgenome

This comment has been minimized.

extemporalgenome commented Apr 3, 2016

You could try passing in a longer command line, for example:

docker exec the-container sh -c "intended-command args # extra padding junk
in comment"

On Thu, Mar 24, 2016, 1:46 AM rachit1arora notifications@github.com wrote:

hello , i have also encountered this problem in docker version 1.9.1 .
I am encountering this in our production environment when we do a
docker exec and not during the docker run .

I know that the fix is avilable in docker 1.10 but is there a way to get
the fix in docker 1.9.1 ? We may not be able to migrate to docker 1.10 soon
.

Is there a work around we can try in docker exec command ? Many people
reported that docker run -e DOCKER_FIX='' worked for them .
How Can we resolve this in docker exec command ?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14203 (comment)

yujuhong added a commit to yujuhong/kubernetes that referenced this issue Apr 26, 2016

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Apr 27, 2016

Merge pull request #24830 from yujuhong/dummy_env
Automatic merge from submit-queue

e2e: add a dummy environment variable in the service tests

This works around the docker bug:
moby/moby#14203

yujuhong added a commit to yujuhong/kubernetes that referenced this issue Apr 27, 2016

chrislovecnm added a commit to chrislovecnm/kubernetes that referenced this issue Apr 28, 2016

alena1108 added a commit to rancher/kubernetes that referenced this issue May 20, 2016

@loretoparisi

This comment has been minimized.

loretoparisi commented Sep 28, 2016

I get the same error with this DockerFile

FROM ubuntu:16.04
COPY . /app
VOLUME /app

I was doing a long running task using tensorflow with

./nvidia-docker-run --volumes-from myImage --rm -it tensorflow/tensorflow:0.10.0-gpu bash

@vielmetti

This comment has been minimized.

vielmetti commented Oct 11, 2016

I'm getting the error

System error: json: cannot unmarshal object into Go value of type libcontainer.syncType.

on CoreOS. docker -v reports Docker version 1.10.3, build 1f8f545

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Oct 11, 2016

@vielmetti looks like possibly the JSON of one of your containers got corrupted, might be worth trying to find which one and either remove that container, or try to fix the JSON. Also keep in mind that CoreOS ships with a modified version of Docker (see coreos@1f8f545 for the commit it's built from), and issues should be reported in their issue tracker first

@vielmetti

This comment has been minimized.

vielmetti commented Oct 12, 2016

Thanks @thaJeztah , I opened a CoreOS issue which appears to be unrelated to this particular issue (the error text is the same but the reproduction is different).

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this issue Dec 1, 2016

shouhong pushed a commit to shouhong/kubernetes that referenced this issue Feb 14, 2017

@shawntoxu

This comment has been minimized.

shawntoxu commented Jun 28, 2017

Which version of this bug can be resolved ????

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Jun 29, 2017

@shawntoxu see the milestone attached to this issue; it's in docker 1.10.0

@shawntoxu

This comment has been minimized.

shawntoxu commented Jun 29, 2017

@jfdoerre

This comment has been minimized.

jfdoerre commented Jul 6, 2017

We are still seeing this sporadic error "... [9] System error: read parent: connection reset by peer" with docker 1.10.3. Maybe it is, because we are on CentOS7.2?

It seems the fix of this error requires that you are using opencontainers/runc, but I guess we don't use that in our setup.
What we are using are the docker rpms from the CentOS repository:
`# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

uname -a

Linux ilgnext-jenkins.svl.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

docker -v

Docker version 1.10.3, build 9419b24-unsupported`

Any chances that this error is fixed in a more recent version for CentOS, e.g. in CentOS7.3?

Or else, is there a solid way to test, if on my upgraded system the error does no longer appear.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Jul 6, 2017

@jfdoerre docker 1.10 is no longer maintained, and the version in the CentOS repository is the Red Hat fork of docker, for which the code doesn't live in this repository.

Any chances that this error is fixed in a more recent version for CentOS, e.g. in CentOS7.3?

Be aware that CentOS is a rolling release, which means that when 7.3 was released, 7.2 no longer receives updates, so it's indeed recommended to be on the current version

If you're still seeing this on the current (17.03 or 17.06) release of the official Docker packages, please open a new issue.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Jul 6, 2017

I'm locking the conversation on this issue, because the original issue was resolved; if you encounter this issue on an up to date version of docker, please open a new issue instead.

@moby moby locked and limited conversation to collaborators Jul 6, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.