Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop/Start of containers fail with "/var/run/docker/containerd/docker-containerd.sock: connect: connection refused" #36117

Closed
JennyLawrance opened this issue Jan 25, 2018 · 8 comments
Labels

Comments

@JennyLawrance
Copy link

@JennyLawrance JennyLawrance commented Jan 25, 2018

Docker daemon ends up in a state where new container start, or stop of existing containers fail with

"connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused": unknown"

The dockerd process continues to take pretty much all the CPU of the VM at this state.

The issue appears to happen right after dockerd kills and restarts docker-containerd process, indicated by syslogs

6255 Jan 24 12:25:58 RD0003FF41E7E8 dockerd[8004]: time="2018-01-24T12:25:58.744982796Z" level=info msg="killing and restarting containerd" module=libcontainerd pid=8011
6269 Jan 24 12:25:59 RD0003FF41E7E8 dockerd[8004]: time="2018-01-24T12:25:59.962944219Z" level=info msg="libcontainerd: started new docker-containerd process" pid=123372

Steps to reproduce the issue:
Happens in production, no clear repro yet.

Describe the results you received:
Existing containers continue to run correctly.
Not able to start new containers. Create container works.
Not able to stop existing containers.

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):
Intermittent.

Output of docker version:

Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:11:19 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:09:53 2017
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 4
 Running: 4
 Paused: 0
 Stopped: 0
Images: 36
Server Version: 17.12.0-ce
Storage Driver: aufs
 Root Dir: /mnt/data/docker/images/231072.231072/aufs
 Backing Filesystem: extfs
 Dirs: 353
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A (expected: 89623f28b87a6004d4b785663257362d1658a729)
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
 userns
Kernel Version: 4.4.0-108-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.627GiB
Name: RD0003FF41E7E8
ID: HJFR:J6V2:JDWZ:APMQ:OY5F:PRST:757V:C74J:KWK7:L7X5:CUYB:OVHT
Docker Root Dir: /mnt/data/docker/images/231072.231072
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 10.0.5.2:13210
 10.0.5.2:13211
 10.0.5.3:13209
 10.0.5.3:13210
 10.0.5.3:13211
 10.0.5.2:13209
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

@thaJeztah

This comment has been minimized.

Copy link
Member

@thaJeztah thaJeztah commented Jan 26, 2018

May be a duplicate of #36002, could you have a look if that matches what you're seeing?

Does docker recover after you killall -HUP dockerd on the daemon's host?

@JennyLawrance

This comment has been minimized.

Copy link
Author

@JennyLawrance JennyLawrance commented Jan 26, 2018

Yes, I did try that, but the system did not recover after sending the dockerd HUP signal. Which is why I think this might be a slightly different issue. High CPU usage was observed on docker daemon as well. Is the dockerd go-stacks useful? I can fetch that information.

@thaJeztah

This comment has been minimized.

Copy link
Member

@thaJeztah thaJeztah commented Jan 26, 2018

Never hurts to have more information

@francislavoie

This comment has been minimized.

Copy link

@francislavoie francislavoie commented Jan 26, 2018

I'm actually getting a similar error message when in the middle of a simple docker build.

connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused"

Edit: It was using like 250% CPU (4 core, 8 thread system)

Restarting docker with sudo service docker restart solved it though. This is on a desktop machine, not a production server - but still weird.

@JennyLawrance

This comment has been minimized.

Copy link
Author

@JennyLawrance JennyLawrance commented Jan 27, 2018

Logs taken from same VM a day apart each other. [On the VM from where the original issue was found].
DockerD is still spinning CPU here.

https://github.com/JennyLawrance/Debugging-docker/blob/master/goroutine-stacks-2018-01-25T180001Z.log

https://github.com/JennyLawrance/Debugging-docker/blob/master/goroutine-stacks-2018-01-27T025901Z.log

@ztolley

This comment has been minimized.

Copy link

@ztolley ztolley commented Feb 1, 2018

Same issue but with docker running on Windows 10 using the moby linux vm with Docker for windows. Docker is running a single redis container but the vm seems to goto 30% CPU constantly and cannot stop the redis container until I restart the VM. Only happens when waking the machine from sleep.

@JennyLawrance

This comment has been minimized.

Copy link
Author

@JennyLawrance JennyLawrance commented Feb 1, 2018

From reading the description on this issue and commit, I feel that the current issue might be solved by the patch to this one:
#36173

@cpuguy83

This comment has been minimized.

Copy link
Contributor

@cpuguy83 cpuguy83 commented Feb 2, 2018

This is a duplicate of the above referenced issue.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.