Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm restarts all containers #38203

Open
Umaaz opened this issue Nov 14, 2018 · 18 comments
Open

Swarm restarts all containers #38203

Umaaz opened this issue Nov 14, 2018 · 18 comments

Comments

@Umaaz
Copy link

Umaaz commented Nov 14, 2018

Description
We are running a docker swarm cluster with 3 managers and 5 workers. Twice now we have experienced some error in the cluster where every service is restarted. After some time all the services recover and it all goes back to normal.

Steps to reproduce the issue:
I am unable to reproduce the error on demand, it has only happened twice on the cluster that has been running for 105 days, with over 200 containers.

Describe the results you received:
When looking into the issues i came across this in the logs:

Nov 14 09:38:13 int020522 dockerd[18624]: time="2018-11-14T09:38:13.079639333+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:15 int020522 dockerd[18624]: time="2018-11-14T09:38:15.079967857+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:15 int020522 dockerd[18624]: time="2018-11-14T09:38:15.116089882+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:17 int020522 dockerd[18624]: time="2018-11-14T09:38:17.080341930+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:17 int020522 dockerd[18624]: time="2018-11-14T09:38:17.116549890+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:17 int020522 dockerd[18624]: time="2018-11-14T09:38:17.973307312+01:00" level=info msg="memberlist: Marking 29fdf1feb7d0 as failed, suspect timeout reached (2 peer confirmations)"
Nov 14 09:38:17 int020522 dockerd[18624]: time="2018-11-14T09:38:17.973375887+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, left gossip cluster"
Nov 14 09:38:17 int020522 dockerd[18624]: time="2018-11-14T09:38:17.973421640+01:00" level=info msg="Node 29fdf1feb7d0 change state NodeActive --> NodeFailed"
Nov 14 09:38:17 int020522 dockerd[18624]: time="2018-11-14T09:38:17.976071755+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, added to failed nodes list"
Nov 14 09:38:18 int020522 dockerd[18624]: time="2018-11-14T09:38:18.126988968+01:00" level=info msg="memberlist: Suspect 29fdf1feb7d0 has failed, no acks received"
Nov 14 09:38:18 int020522 dockerd[18624]: time="2018-11-14T09:38:18.161264366+01:00" level=error msg="Attempting to transfer leadership" raft_id=71921ff23bc70421
Nov 14 09:38:18 int020522 dockerd[18624]: time="2018-11-14T09:38:18.290127782+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, joined gossip cluster"
Nov 14 09:38:18 int020522 dockerd[18624]: time="2018-11-14T09:38:18.290199543+01:00" level=info msg="Node 29fdf1feb7d0 change state NodeFailed --> NodeActive"
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 470 [running]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/pkg/signal.DumpStacks(0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/signal/trap.go:83 +0xaa
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft.(*Node).Run(0xc420e84000, 0x55ef18e49460, 0xc4256e8700, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft/raft.go:597 +0x17e8
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.(*Manager).Run.func6(0xc42039e340, 0x55ef18e49460, 0xc420520580)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:584 +0x4c
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.(*Manager).Run
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:583 +0x1544
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 1 [chan receive, 33064 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: main.(*DaemonCli).start(0xc42047f710, 0xc420179da0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/daemon.go:228 +0xf26
Nov 14 09:38:18 int020522 dockerd[18624]: main.runDaemon(0xc420179da0, 0xc420196700, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/docker_unix.go:7 +0x47
Nov 14 09:38:18 int020522 dockerd[18624]: main.newDaemonCommand.func1(0xc4200eb400, 0xc420087700, 0x0, 0x4, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/docker.go:28 +0x5d
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute(0xc4200eb400, 0xc4200c4100, 0x4, 0x4, 0xc4200eb400, 0xc4200c4100)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:762 +0x46a
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc4200eb400, 0x55ef18e1ff70, 0x55ef189fda20, 0x55ef18e1ff80)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:852 +0x30c
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute(0xc4200eb400, 0xc4200c2010, 0x55ef16d6f19f)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:800 +0x2d
Nov 14 09:38:18 int020522 dockerd[18624]: main.main()
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/cmd/dockerd/docker.go:63 +0xa2
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 20 [syscall, 8583 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: os/signal.signal_recv(0x55ef18e33c20)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/sigqueue.go:139 +0xa8
Nov 14 09:38:18 int020522 dockerd[18624]: os/signal.loop()
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/os/signal/signal_unix.go:22 +0x24
Nov 14 09:38:18 int020522 dockerd[18624]: created by os/signal.init.0
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/os/signal/signal_unix.go:28 +0x43
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 25 [select]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/go.opencensus.io/stats/view.(*worker).start(0xc420086f80)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/go.opencensus.io/stats/view/worker.go:144 +0x11f
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/go.opencensus.io/stats/view.init.0
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/go.opencensus.io/stats/view/worker.go:29 +0x5a
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 28 [syscall, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: syscall.Syscall6(0xf7, 0x1, 0x48c8, 0xc4204a75c8, 0x1000004, 0x0, 0x0, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
Nov 14 09:38:18 int020522 dockerd[18624]: os.(*Process).blockUntilWaitable(0xc420340cc0, 0x55ef16ca2b3b, 0xc42008a480, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/os/wait_waitid.go:31 +0x9a
Nov 14 09:38:18 int020522 dockerd[18624]: os.(*Process).wait(0xc420340cc0, 0xc42008a401, 0x55ef17250677, 0xc4204a7750)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/os/exec_unix.go:22 +0x3e
Nov 14 09:38:18 int020522 dockerd[18624]: os.(*Process).Wait(0xc420340cc0, 0xc4204a7728, 0xc420157960, 0x55ef18e15aa0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/os/exec.go:123 +0x2d
Nov 14 09:38:18 int020522 dockerd[18624]: os/exec.(*Cmd).Wait(0xc4205086e0, 0xc4204a77b8, 0x55ef17261d6b)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/os/exec/exec.go:461 +0x5e
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/libcontainerd.(*remote).startContainerd.func1(0xc4205086e0, 0xc4203a28c0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/remote_daemon.go:243 +0x31
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/libcontainerd.(*remote).startContainerd
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/remote_daemon.go:241 +0x3db
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 29 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*ccResolverWrapper).watcher(0xc420368330)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/resolver_conn_wrapper.go:109 +0x184
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc.(*ccResolverWrapper).start
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/resolver_conn_wrapper.go:95 +0x41
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 30 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc420376240)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:122 +0x14c
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc.newCCBalancerWrapper
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:113 +0x151
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 31 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*addrConn).transportMonitor(0xc42048cb00)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:1373 +0x23d
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*addrConn).connect.func1(0xc42048cb00)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:949 +0x1b7
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc.(*addrConn).connect
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:940 +0xe3
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 33 [IO wait]:
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.runtime_pollWait(0x7fcb32003c90, 0x72, 0xc42006abb8)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/netpoll.go:173 +0x59
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*pollDesc).wait(0xc420010698, 0x72, 0xffffffffffffff00, 0x55ef18e2c800, 0x55ef199ba288)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9d
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*pollDesc).waitRead(0xc420010698, 0xc420016000, 0x8000, 0x8000)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*FD).Read(0xc420010680, 0xc420016000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_unix.go:157 +0x17f
Nov 14 09:38:18 int020522 dockerd[18624]: net.(*netFD).Read(0xc420010680, 0xc420016000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/fd_unix.go:202 +0x51
Nov 14 09:38:18 int020522 dockerd[18624]: net.(*conn).Read(0xc4200c2770, 0xc420016000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/net.go:176 +0x6c
Nov 14 09:38:18 int020522 dockerd[18624]: bufio.(*Reader).Read(0xc4202baa20, 0xc4203f2038, 0x9, 0x9, 0x20, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/bufio/bufio.go:216 +0x23a
Nov 14 09:38:18 int020522 dockerd[18624]: io.ReadAtLeast(0x55ef18e25520, 0xc4202baa20, 0xc4203f2038, 0x9, 0x9, 0x9, 0xc42006adf0, 0x55ef16c9f0e0, 0xc42006ae9f)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/io/io.go:309 +0x88
Nov 14 09:38:18 int020522 dockerd[18624]: io.ReadFull(0x55ef18e25520, 0xc4202baa20, 0xc4203f2038, 0x9, 0x9, 0x55ef3787be7e, 0x3787be7e2794492c, 0x5bebdef9)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/io/io.go:327 +0x5a
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/golang.org/x/net/http2.readFrameHeader(0xc4203f2038, 0x9, 0x9, 0x55ef18e25520, 0xc4202baa20, 0x0, 0xbef3159e00000000, 0x70c637ff8c2f7, 0x55ef19a18fe0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/golang.org/x/net/http2/frame.go:237 +0x7d
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc4203f2000, 0xc427944920, 0xc427944920, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/golang.org/x/net/http2/frame.go:492 +0xa6
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*http2Client).reader(0xc4201c0000)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:1123 +0x117
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:265 +0xb41
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 66 [select]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*controlBuffer).get(0xc420376440, 0x1, 0x0, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/controlbuf.go:289 +0x135
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*loopyWriter).run(0xc42008bf20)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/controlbuf.go:374 +0x1be
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client.func3(0xc4201c0000)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:298 +0x7e
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:296 +0xc91
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 55 [select]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/libcontainerd.(*remote).monitorConnection(0xc4203a28c0, 0xc42007cc60)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/remote_daemon.go:267 +0x11f
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/libcontainerd.New
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/remote_daemon.go:116 +0x58e
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 56 [select, 33064 minutes, locked to thread]:
Nov 14 09:38:18 int020522 dockerd[18624]: runtime.gopark(0x55ef18e15948, 0x0, 0x55ef182074ab, 0x6, 0x18, 0x1)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/proc.go:291 +0x120
Nov 14 09:38:18 int020522 dockerd[18624]: runtime.selectgo(0xc4204a2f50, 0xc4201bef00)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/select.go:392 +0xe56
Nov 14 09:38:18 int020522 dockerd[18624]: runtime.ensureSigM.func1()
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/signal_unix.go:549 +0x1f6
Nov 14 09:38:18 int020522 dockerd[18624]: runtime.goexit()
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/asm_amd64.s:2361 +0x1
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 57 [chan receive, 8583 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/pkg/signal.Trap.func1(0xc4202e2060, 0x55ef18e27c00, 0xc4200c41e0, 0xc42000d640)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/signal/trap.go:38 +0x5d
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/pkg/signal.Trap
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/signal/trap.go:36 +0x120
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 58 [chan receive, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/daemon.(*Daemon).setupDumpStackTrap.func1(0xc4202bac00, 0x55ef18215e8b, 0xf)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/debugtrap_unix.go:18 +0x46
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/daemon.(*Daemon).setupDumpStackTrap
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/debugtrap_unix.go:17 +0xc1
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 63 [IO wait, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.runtime_pollWait(0x7fcb32003bc0, 0x72, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/netpoll.go:173 +0x59
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*pollDesc).wait(0xc4202bd598, 0x72, 0xc420376800, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9d
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*pollDesc).waitRead(0xc4202bd598, 0xffffffffffffff00, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*FD).Accept(0xc4202bd580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_unix.go:372 +0x1aa
Nov 14 09:38:18 int020522 dockerd[18624]: net.(*netFD).accept(0xc4202bd580, 0xc42006be58, 0x55ef16cb00ea, 0x30)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/fd_unix.go:238 +0x44
Nov 14 09:38:18 int020522 dockerd[18624]: net.(*UnixListener).accept(0xc42003d470, 0x55ef16db30fc, 0x55ef18c90500, 0xc4203688a0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/unixsock_posix.go:162 +0x34
Nov 14 09:38:18 int020522 dockerd[18624]: net.(*UnixListener).Accept(0xc42003d470, 0xc4200c0040, 0x55ef18a9c620, 0x55ef19995210, 0x55ef18dd3860)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/unixsock.go:253 +0x4b
Nov 14 09:38:18 int020522 dockerd[18624]: net/http.(*Server).Serve(0xc4202f05b0, 0x55ef18e476e0, 0xc42003d470, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/http/server.go:2770 +0x1a7
Nov 14 09:38:18 int020522 dockerd[18624]: net/http.Serve(0x55ef18e476e0, 0xc42003d470, 0x55ef18e287c0, 0xc42003d4a0, 0x55ef16cd296a, 0x55ef18e157d8)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/http/server.go:2389 +0x75
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/daemon.(*Daemon).listenMetricsSock.func1(0x55ef18e476e0, 0xc42003d470, 0xc42003d4a0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/metrics_unix.go:31 +0x4d
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/daemon.(*Daemon).listenMetricsSock
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/metrics_unix.go:30 +0x195
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 64 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*ccResolverWrapper).watcher(0xc42003dec0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/resolver_conn_wrapper.go:109 +0x184
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc.(*ccResolverWrapper).start
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/resolver_conn_wrapper.go:95 +0x41
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 65 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc420198640)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:122 +0x14c
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc.newCCBalancerWrapper
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/balancer_conn_wrappers.go:113 +0x151
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 82 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*addrConn).transportMonitor(0xc420436b00)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:1373 +0x23d
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*addrConn).connect.func1(0xc420436b00)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:949 +0x1b7
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc.(*addrConn).connect
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/clientconn.go:940 +0xe3
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 84 [IO wait, 95 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.runtime_pollWait(0x7fcb32003af0, 0x72, 0xc4202dbbb8)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/runtime/netpoll.go:173 +0x59
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*pollDesc).wait(0xc4202bd798, 0x72, 0xffffffffffffff00, 0x55ef18e2c800, 0x55ef199ba288)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9d
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*pollDesc).waitRead(0xc4202bd798, 0xc420266000, 0x8000, 0x8000)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
Nov 14 09:38:18 int020522 dockerd[18624]: internal/poll.(*FD).Read(0xc4202bd780, 0xc420266000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/internal/poll/fd_unix.go:157 +0x17f
Nov 14 09:38:18 int020522 dockerd[18624]: net.(*netFD).Read(0xc4202bd780, 0xc420266000, 0x8000, 0x8000, 0x11, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/fd_unix.go:202 +0x51
Nov 14 09:38:18 int020522 dockerd[18624]: net.(*conn).Read(0xc4201820f0, 0xc420266000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/net/net.go:176 +0x6c
Nov 14 09:38:18 int020522 dockerd[18624]: bufio.(*Reader).Read(0xc4202e29c0, 0xc42027e038, 0x9, 0x9, 0xc4202ecc00, 0x4, 0xc4202dbd98)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/bufio/bufio.go:216 +0x23a
Nov 14 09:38:18 int020522 dockerd[18624]: io.ReadAtLeast(0x55ef18e25520, 0xc4202e29c0, 0xc42027e038, 0x9, 0x9, 0x9, 0xc4202dbe10, 0x3, 0x18)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/io/io.go:309 +0x88
Nov 14 09:38:18 int020522 dockerd[18624]: io.ReadFull(0x55ef18e25520, 0xc4202e29c0, 0xc42027e038, 0x9, 0x9, 0x55ef16cf4410, 0xc4201bf740, 0xc4202dbe58)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/io/io.go:327 +0x5a
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/golang.org/x/net/http2.readFrameHeader(0xc42027e038, 0x9, 0x9, 0x55ef18e25520, 0xc4202e29c0, 0x0, 0x55ef00000000, 0x1007fcb3205fd90, 0xc42006d5b0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/golang.org/x/net/http2/frame.go:237 +0x7d
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc42027e000, 0xc42d78e7e0, 0xc42d78e7e0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/golang.org/x/net/http2/frame.go:492 +0xa6
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*http2Client).reader(0xc420282000)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:1123 +0x117
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:265 +0xb41
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 85 [select, 95 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*controlBuffer).get(0xc420198740, 0x1, 0x0, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/controlbuf.go:289 +0x135
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*loopyWriter).run(0xc4202bacc0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/controlbuf.go:374 +0x1be
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client.func3(0xc420282000)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:298 +0x7e
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc/transport.newHTTP2Client
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/http2_client.go:296 +0xc91
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 86 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/libcontainerd.(*client).processEventStream(0xc420390000, 0x55ef18e49460, 0xc4200c7fc0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:751 +0x379
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/libcontainerd.(*remote).NewClient
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/remote_daemon.go:136 +0x24b
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 45 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.newClientStream.func5(0xc42025e000, 0xc420014200, 0x55ef18e49520, 0xc420338150)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:311 +0x100
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/google.golang.org/grpc.newClientStream
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:310 +0xa78
Nov 14 09:38:18 int020522 dockerd[18624]: goroutine 46 [select, 33065 minutes]:
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*recvBufferReader).read(0xc42032b720, 0xc42033a9f0, 0x5, 0x5, 0x65, 0x1d, 0x50)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:142 +0x1eb
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*recvBufferReader).Read(0xc42032b720, 0xc42033a9f0, 0x5, 0x5, 0x55ef172f6d01, 0xc42003e3a0, 0xc4202dcae0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:131 +0x69
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*transportReader).Read(0xc420338240, 0xc42033a9f0, 0x5, 0x5, 0x65, 0xc4202dcb20, 0x55ef1736618a)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:369 +0x57
Nov 14 09:38:18 int020522 dockerd[18624]: io.ReadAtLeast(0x55ef18e283e0, 0xc420338240, 0xc42033a9f0, 0x5, 0x5, 0x5, 0xc420282000, 0xc420540000, 0xc400000005)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/io/io.go:309 +0x88
Nov 14 09:38:18 int020522 dockerd[18624]: io.ReadFull(0x55ef18e283e0, 0xc420338240, 0xc42033a9f0, 0x5, 0x5, 0xc4202dcbf0, 0x55ef16cadf8f, 0x55ef18b0eba0)
Nov 14 09:38:18 int020522 dockerd[18624]: /usr/local/go/src/io/io.go:327 +0x5a
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).Read(0xc420540000, 0xc42033a9f0, 0x5, 0x5, 0xc4205e20a0, 0x91, 0x91)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:353 +0xc1
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*parser).recvMsg(0xc42033a9e0, 0x1000000, 0xc4202ecd80, 0x3, 0xc4202dcf08, 0xc4202ecd80, 0xc4202d64a0, 0xc4202dcec8)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:452 +0x67
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.recv(0xc42033a9e0, 0x7fcb3200cea0, 0x55ef19a3a188, 0xc420540000, 0x0, 0x0, 0x55ef18c98f40, 0xc420376dc0, 0x1000000, 0x0, ...)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/rpc_util.go:561 +0x4f
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc420542000, 0x55ef18c98f40, 0xc420376dc0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:529 +0x134
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc420014200, 0x55ef18c98f40, 0xc420376dc0, 0xc4202b6900, 0x1)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/events/v1.(*eventsSubscribeClient).Recv(0xc4203f8c70, 0x0, 0x0, 0x0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/events/v1/events.pb.go:209 +0x64
Nov 14 09:38:18 int020522 dockerd[18624]: github.com/docker/docker/vendor/github.com/containerd/containerd.(*eventRemote).Subscribe.func1(0xc4202d7440, 0x55ef18e54fa0, 0xc4203f8c70, 0xc4202b6960, 0x55ef18e49460, 0xc4200c7fc0)
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/events.go:99 +0x7a
Nov 14 09:38:18 int020522 dockerd[18624]: created by github.com/docker/docker/vendor/github.com/containerd/containerd.(*eventRemote).Subscribe
Nov 14 09:38:18 int020522 dockerd[18624]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/events.go:95 +0x1bb
Nov 14 09:38:48 int020522 dockerd[18624]: sync duration of 11.076656207s, expected less than 1s

This dump only appears on one of the 3 managers, on the other 2 there are logs such as:

Nov 14 09:37:38 int020521 dockerd[849]: time="2018-11-14T09:37:38.351718994+01:00" level=info msg="manager selected by agent for new session: { }" module=node/agent node.id=eyfb92om7v0g8osi2i93rruy0
Nov 14 09:37:38 int020521 dockerd[849]: time="2018-11-14T09:37:38.351767611+01:00" level=info msg="waiting 356.630906ms before registering session" module=node/agent node.id=eyfb92om7v0g8osi2i93rruy0
Nov 14 09:37:39 int020521 dockerd[849]: time="2018-11-14T09:37:39.557138300+01:00" level=warning msg="memberlist: Refuting a suspect message (from: 79b85c7e9512)"
Nov 14 09:37:43 int020521 dockerd[849]: time="2018-11-14T09:37:43.542961618+01:00" level=info msg="memberlist: Suspect 0a9e7a91e12a has failed, no acks received"
Nov 14 09:37:51 int020521 dockerd[849]: time="2018-11-14T09:37:51.484614862+01:00" level=error msg="heartbeat to manager { } failed" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" method="(*session).heartbeat"
Nov 14 09:37:57 int020521 dockerd[849]: time="2018-11-14T09:37:57.651680868+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:00 int020521 dockerd[849]: time="2018-11-14T09:38:00.007012307+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:00 int020521 dockerd[849]: time="2018-11-14T09:38:00.007214746+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:00 int020521 dockerd[849]: time="2018-11-14T09:38:00.007277176+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:00 int020521 dockerd[849]: time="2018-11-14T09:38:00.684388095+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:00 int020521 dockerd[849]: time="2018-11-14T09:38:00.792709786+01:00" level=info msg="memberlist: Suspect 4b1cf53873fb has failed, no acks received"
Nov 14 09:38:03 int020521 dockerd[849]: time="2018-11-14T09:38:03.792478217+01:00" level=error msg="Error getting tasks: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:03 int020521 dockerd[849]: time="2018-11-14T09:38:03.792618812+01:00" level=error msg="Handler for GET /tasks returned error: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:04 int020521 dockerd[849]: time="2018-11-14T09:38:04.282694609+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:06 int020521 dockerd[849]: time="2018-11-14T09:38:06.554832190+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:06 int020521 dockerd[849]: time="2018-11-14T09:38:06.555065184+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:06 int020521 dockerd[849]: time="2018-11-14T09:38:06.555159784+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:09 int020521 dockerd[849]: time="2018-11-14T09:38:09.454666203+01:00" level=error msg="error receiving response" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:10 int020521 dockerd[849]: time="2018-11-14T09:38:10.636116137+01:00" level=warning msg="memberlist: Failed to push local state: write tcp 10.2.5.33:7946->10.2.5.37:33812: i/o timeout from=10.2.5.37:33812"
Nov 14 09:38:16 int020521 dockerd[849]: time="2018-11-14T09:38:16.820984395+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:17 int020521 dockerd[849]: time="2018-11-14T09:38:17.378482227+01:00" level=warning msg="memberlist: Failed to push local state: write tcp 10.2.5.33:7946->10.2.5.36:58784: i/o timeout from=10.2.5.36:58784"
Nov 14 09:38:17 int020521 dockerd[849]: time="2018-11-14T09:38:17.378925104+01:00" level=warning msg="memberlist: Failed to push local state: write tcp 10.2.5.33:7946->10.2.5.34:38172: i/o timeout from=10.2.5.34:38172"
Nov 14 09:38:17 int020521 dockerd[849]: time="2018-11-14T09:38:17.379366371+01:00" level=warning msg="memberlist: Refuting a suspect message (from: 79b85c7e9512)"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.541191141+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.541443509+01:00" level=error msg="Error getting services: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.541518401+01:00" level=error msg="Handler for GET /v1.22/services returned error: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.590921998+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.684823084+01:00" level=error msg="error while reading from stream" error="rpc error: code = Canceled desc = context canceled"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.783440428+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, left gossip cluster"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.843426761+01:00" level=info msg="Node 29fdf1feb7d0 change state NodeActive --> NodeFailed"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.922176988+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, added to failed nodes list"
Nov 14 09:38:18 int020521 dockerd[849]: time="2018-11-14T09:38:18.939693741+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, joined gossip cluster"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:18.975033585+01:00" level=info msg="Node 29fdf1feb7d0 change state NodeFailed --> NodeActive"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:18.975284620+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, left gossip cluster"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.171412844+01:00" level=info msg="Node 29fdf1feb7d0 change state NodeActive --> NodeFailed"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.246547054+01:00" level=warning msg="memberlist: Failed to push local state: write tcp 10.2.5.33:7946->10.2.5.32:54448: i/o timeout from=10.2.5.32:54448"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.259079148+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, added to failed nodes list"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.259849145+01:00" level=info msg="memberlist: Suspect 0a9e7a91e12a has failed, no acks received"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.261016113+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, joined gossip cluster"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.261254928+01:00" level=info msg="Node 29fdf1feb7d0 change state NodeFailed --> NodeActive"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.771817478+01:00" level=error msg="Error getting nodes: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:19 int020521 dockerd[849]: time="2018-11-14T09:38:19.771902515+01:00" level=error msg="Handler for GET /v1.35/nodes returned error: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Nov 14 09:38:22 int020521 dockerd[849]: time="2018-11-14T09:38:22.202186309+01:00" level=warning msg="NetworkDB stats int020521(79b85c7e9512) - healthscore:5 (connectivity issues)"
Nov 14 09:38:29 int020521 dockerd[849]: time="2018-11-14T09:38:29.300432239+01:00" level=info msg="memberlist: Suspect 29fdf1feb7d0 has failed, no acks received"
Nov 14 09:38:30 int020521 dockerd[849]: time="2018-11-14T09:38:30.792494913+01:00" level=info msg="Node 29fdf1feb7d0/10.2.5.32, left gossip cluster"
Nov 14 09:38:30 int020521 dockerd[849]: time="2018-11-14T09:38:30.792552538+01:00" level=info msg="Node 29fdf1feb7d0 change state NodeActive --> NodeFailed"

Seems that the managers are having trouble communicating, but i am unsure as to why.

I would appriciate any assistance you can give in solving this issue.

Output of docker version:

Client:
 Version:           18.06.0-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        0ffa825
 Built:             Wed Jul 18 19:08:18 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.0-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       0ffa825
  Built:            Wed Jul 18 19:10:42 2018
  OS/Arch:          linux/amd64
  Experimental:     true

Output of docker info:

Containers: 21
 Running: 6
 Paused: 0
 Stopped: 15
Images: 11
Server Version: 18.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay weaveworks/net-plugin:latest_release
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 04wzeypxq4nz49yz4uhf6ydc0
 Is Manager: true
 ClusterID: j27guugg9bq6zk2k2a9mp20pc
 Managers: 3
 Nodes: 8
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.2.5.32
 Manager Addresses:
  10.2.5.32:2377
  10.2.5.33:2377
  10.2.5.34:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.9.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.795GiB
Name: int020520
ID: UHSX:ENDP:X2RM:3XBV:IM2Y:YXPO:UIXT:RA47:EPT6:DYVG:VUZO:G3ZQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 docker.bbn.intergral.com:5000
 docker.bbn.intergral.com:5050
 127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):
Virtualization: kvm
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:
Kernel: Linux 3.10.0-862.9.1.el7.x86_64
Architecture: x86-64

@thaJeztah
Copy link
Member

ping @dperny PTAL

@amir20
Copy link

amir20 commented Nov 30, 2018

I am also having the same issue.

Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:48:57 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false

When I look at the logs, I can see the only that happens during that time is

Nov 30 11:15:22 clashstats dockerd[18108]: time="2018-11-30T11:15:22.538031336Z" level=error msg="heartbeat to manager { } failed" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" method="(*session)
Nov 30 11:15:24 clashstats dockerd[18108]: time="2018-11-30T11:15:24.693388807Z" level=error msg="agent: session failed" backoff=100ms error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" module=node/ag
Nov 30 11:15:24 clashstats dockerd[18108]: time="2018-11-30T11:15:24.694231133Z" level=info msg="manager selected by agent for new session: { }" module=node/agent node.id=zonpupegf9s4iibrculs70l35
Nov 30 11:15:24 clashstats dockerd[18108]: time="2018-11-30T11:15:24.697775134Z" level=info msg="waiting 25.312139ms before registering session" module=node/agent node.id=zonpupegf9s4iibrculs70l35
Nov 30 11:19:46 clashstats dockerd[18108]: time="2018-11-30T11:19:42.110784862Z" level=warning msg="Health check for container 065602cecb6008c4f388dd9f06fb7a142147b78a1898183db5fb87998d2b3574 error: context deadline exceeded"
Nov 30 11:19:47 clashstats dockerd[18108]: time="2018-11-30T11:19:47.128087218Z" level=error msg="heartbeat to manager { } failed" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" method="(*session)
Nov 30 11:19:48 clashstats dockerd[18108]: time="2018-11-30T11:19:48.498814529Z" level=info msg="NetworkDB stats clashstats(534674fe59a2) - netID:k8dd6ll6bvp3yr1of0oq6iood leaving:false netPeers:1 entries:4 Queue qLen:0 netMs
Nov 30 11:19:48 clashstats dockerd[18108]: time="2018-11-30T11:19:48.501430685Z" level=info msg="NetworkDB stats clashstats(534674fe59a2) - netID:td4dfllp0zwgrl6gh6i18n2ii leaving:false netPeers:1 entries:13 Queue qLen:0 netM
Nov 30 11:19:48 clashstats dockerd[18108]: time="2018-11-30T11:19:48.702087018Z" level=warning msg="Ignoring Exit Event, no such exec command found" container=065602cecb6008c4f388dd9f06fb7a142147b78a1898183db5fb87998d2b3574 e
Nov 30 11:19:48 clashstats dockerd[18108]: time="2018-11-30T11:19:48.809213315Z" level=error msg="agent: session failed" backoff=100ms error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" module=node/ag
Nov 30 11:19:48 clashstats dockerd[18108]: time="2018-11-30T11:19:48.809833226Z" level=info msg="manager selected by agent for new session: { }" module=node/agent node.id=zonpupegf9s4iibrculs70l35
Nov 30 11:19:48 clashstats dockerd[18108]: time="2018-11-30T11:19:48.813528291Z" level=info msg="waiting 71.314344ms before registering session" module=node/agent node.id=zonpupegf9s4iibrculs70l35
Nov 30 11:19:49 clashstats dockerd[18108]: time="2018-11-30T11:19:49.045204544Z" level=info msg="worker zonpupegf9s4iibrculs70l35 was successfully registered" method="(*Dispatcher).register"

Which is the same @Umaaz's error I think. This only started happening when I upgrade docker recently. I am not sure how else to provide debug information.

@amir20
Copy link

amir20 commented Nov 30, 2018

Just an update from my side. I am pretty sure I am getting OOM error and that's why everything restarts.

screen shot 2018-11-30 at 11 08 33 am

So this might not be related to docker.

@olljanat
Copy link
Contributor

@wk8 PTAL

@bugwheels94
Copy link

I am also getting same issue when the heartbeat is getting failed. The service 4adb11869318 on manager node and the service e7b284330420 on worker node had the issues very frequently

My manager node logs:

Apr 18 09:49:22 kms-mediator dockerd[25059]: time="2019-04-18T09:49:22.467787577Z" level=info msg="memberlist: Suspect e7b284330420 has failed, no acks received"
Apr 18 09:49:23 kms-mediator dockerd[25059]: time="2019-04-18T09:49:23.267884494Z" level=warning msg="memberlist: Refuting a suspect message (from: e7b284330420)"
Apr 18 09:49:25 kms-mediator dockerd[25059]: time="2019-04-18T09:49:25.467471174Z" level=warning msg="memberlist: Was able to connect to e7b284330420 but other probes failed, network may be misconfigured"
Apr 18 09:49:25 kms-mediator dockerd[25059]: time="2019-04-18T09:49:25.728220236Z" level=info msg="memberlist: Marking e7b284330420 as failed, suspect timeout reached (0 peer confirmations)"
Apr 18 09:49:25 kms-mediator dockerd[25059]: time="2019-04-18T09:49:25.728834806Z" level=info msg="Node e7b284330420/142.93.61.181, left gossip cluster"
Apr 18 09:49:25 kms-mediator dockerd[25059]: time="2019-04-18T09:49:25.729191439Z" level=info msg="Node e7b284330420 change state NodeActive --> NodeFailed"
Apr 18 09:49:25 kms-mediator dockerd[25059]: time="2019-04-18T09:49:25.731386994Z" level=info msg="Node e7b284330420/142.93.61.181, added to failed nodes list"
Apr 18 09:49:25 kms-mediator kernel: [3376954.372918] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:25 kms-mediator kernel: [3376954.481196] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:25 kms-mediator kernel: [3376954.550956] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:25 kms-mediator kernel: [3376954.622775] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:26 kms-mediator kernel: [3376954.694568] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:26 kms-mediator dockerd[25059]: time="2019-04-18T09:49:26.468973653Z" level=error msg="node: e7b284330420 is unknown to memberlist"
Apr 18 09:49:26 kms-mediator dockerd[25059]: time="2019-04-18T09:49:26.665523578Z" level=info msg="Node e7b284330420/142.93.61.181, joined gossip cluster"
Apr 18 09:49:26 kms-mediator dockerd[25059]: time="2019-04-18T09:49:26.666218716Z" level=info msg="Node e7b284330420 change state NodeFailed --> NodeActive"
Apr 18 09:49:28 kms-mediator dockerd[25059]: time="2019-04-18T09:49:28.467625225Z" level=info msg="memberlist: Suspect e7b284330420 has failed, no acks received"
Apr 18 09:49:30 kms-mediator dockerd[25059]: time="2019-04-18T09:49:30.468476025Z" level=info msg="memberlist: Suspect 3af987f41544 has failed, no acks received"
Apr 18 09:49:32 kms-mediator dockerd[25059]: time="2019-04-18T09:49:32.468450928Z" level=info msg="memberlist: Marking e7b284330420 as failed, suspect timeout reached (0 peer confirmations)"
Apr 18 09:49:32 kms-mediator dockerd[25059]: time="2019-04-18T09:49:32.469189666Z" level=info msg="Node e7b284330420/142.93.61.181, left gossip cluster"
Apr 18 09:49:32 kms-mediator dockerd[25059]: time="2019-04-18T09:49:32.469470452Z" level=info msg="Node e7b284330420 change state NodeActive --> NodeFailed"
Apr 18 09:49:32 kms-mediator dockerd[25059]: time="2019-04-18T09:49:32.470109864Z" level=info msg="Node e7b284330420/142.93.61.181, added to failed nodes list"
Apr 18 09:49:33 kms-mediator dockerd[25059]: time="2019-04-18T09:49:33.469522967Z" level=info msg="memberlist: Suspect e7b284330420 has failed, no acks received"
Apr 18 09:49:33 kms-mediator dockerd[25059]: time="2019-04-18T09:49:33.667422538Z" level=warning msg="NetworkDB stats kms-mediator(4adb11869318) - healthscore:3 (connectivity issues)"
Apr 18 09:49:34 kms-mediator dockerd[25059]: time="2019-04-18T09:49:34.469674875Z" level=info msg="memberlist: Marking 3af987f41544 as failed, suspect timeout reached (0 peer confirmations)"
Apr 18 09:49:34 kms-mediator dockerd[25059]: time="2019-04-18T09:49:34.470319061Z" level=info msg="Node 3af987f41544/134.209.118.8, left gossip cluster"
Apr 18 09:49:34 kms-mediator dockerd[25059]: time="2019-04-18T09:49:34.470637570Z" level=info msg="Node 3af987f41544 change state NodeActive --> NodeFailed"
Apr 18 09:49:34 kms-mediator dockerd[25059]: time="2019-04-18T09:49:34.472656043Z" level=info msg="Node 3af987f41544/134.209.118.8, added to failed nodes list"
Apr 18 09:49:34 kms-mediator kernel: [3376963.113830] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:34 kms-mediator kernel: [3376963.226682] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:34 kms-mediator kernel: [3376963.308286] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:36 kms-mediator dockerd[25059]: time="2019-04-18T09:49:36.869239471Z" level=warning msg="bulk sync to node e7b284330420 failed: failed to send a TCP message during bulk sync: dial tcp 142.93.61.181:7946: i/o timeout"
Apr 18 09:49:38 kms-mediator dockerd[25059]: time="2019-04-18T09:49:38.467811908Z" level=info msg="memberlist: Suspect 3af987f41544 has failed, no acks received"
Apr 18 09:49:41 kms-mediator dockerd[25059]: time="2019-04-18T09:49:41.192632861Z" level=warning msg="failed to deactivate service binding for container registry.1.xsbeeiivmethu6y39sla2rhm4" error="No such container: registry.1.xsbeeiivmethu6y39sla2rhm4" module=node/agent node.id=0ipcceidwwiwvbtt17gzm85qh
Apr 18 09:49:43 kms-mediator dockerd[25059]: time="2019-04-18T09:49:43.871978135Z" level=warning msg="memberlist: Refuting a suspect message (from: 4adb11869318)"
Apr 18 09:49:43 kms-mediator dockerd[25059]: time="2019-04-18T09:49:43.872527373Z" level=info msg="Node 3af987f41544/134.209.118.8, joined gossip cluster"
Apr 18 09:49:43 kms-mediator dockerd[25059]: time="2019-04-18T09:49:43.872804716Z" level=info msg="Node 3af987f41544 change state NodeFailed --> NodeActive"
Apr 18 09:49:50 kms-mediator dockerd[25059]: time="2019-04-18T09:49:50.467724250Z" level=warning msg="memberlist: Was able to connect to 3af987f41544 but other probes failed, network may be misconfigured"
Apr 18 09:49:54 kms-mediator dockerd[25059]: time="2019-04-18T09:49:54.268218780Z" level=warning msg="bulk sync to node 3af987f41544 failed: failed to send a TCP message during bulk sync: dial tcp 134.209.118.8:7946: i/o timeout"
Apr 18 09:49:56 kms-mediator dockerd[25059]: time="2019-04-18T09:49:56.467677675Z" level=warning msg="memberlist: Was able to connect to 3af987f41544 but other probes failed, network may be misconfigured"
Apr 18 09:49:58 kms-mediator systemd-udevd[27120]: Could not generate persistent MAC address for vethbd080e0: No such file or directory
Apr 18 09:49:58 kms-mediator kernel: [3376987.439119] veth6: renamed from veth749d935
Apr 18 09:49:58 kms-mediator kernel: [3376987.439403] device veth6 entered promiscuous mode
Apr 18 09:49:58 kms-mediator systemd-udevd[27132]: Could not generate persistent MAC address for veth168e384: No such file or directory
Apr 18 09:49:58 kms-mediator kernel: [3376987.443324] device veth12a1589 entered promiscuous mode
Apr 18 09:49:58 kms-mediator kernel: [3376987.443389] IPv6: ADDRCONF(NETDEV_UP): veth12a1589: link is not ready
Apr 18 09:49:58 kms-mediator kernel: [3376987.443393] docker_gwbridge: port 5(veth12a1589) entered forwarding state
Apr 18 09:49:58 kms-mediator kernel: [3376987.443405] docker_gwbridge: port 5(veth12a1589) entered forwarding state
Apr 18 09:49:58 kms-mediator systemd-udevd[27135]: Could not generate persistent MAC address for veth12a1589: No such file or directory
Apr 18 09:49:58 kms-mediator containerd[1449]: time="2019-04-18T09:49:58.823363801Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/8196bb6167b12c8099c5804c752b5dc37a5d72955979f78e36ed928d8bcb0859/shim.sock" debug=false pid=27142
Apr 18 09:49:58 kms-mediator kernel: [3376987.546195] IPVS: Creating netns size=2192 id=259
Apr 18 09:49:59 kms-mediator kernel: [3376987.859396] eth0: renamed from vethbd080e0
Apr 18 09:49:59 kms-mediator kernel: [3376987.859641] docker_gwbridge: port 5(veth12a1589) entered disabled state
Apr 18 09:49:59 kms-mediator kernel: [3376987.859679] br0: port 4(veth6) entered forwarding state
Apr 18 09:49:59 kms-mediator kernel: [3376987.859692] br0: port 4(veth6) entered forwarding state
Apr 18 09:49:59 kms-mediator kernel: [3376987.955523] eth1: renamed from veth168e384
Apr 18 09:49:59 kms-mediator kernel: [3376987.955779] IPv6: ADDRCONF(NETDEV_CHANGE): veth12a1589: link becomes ready
Apr 18 09:49:59 kms-mediator kernel: [3376987.955811] docker_gwbridge: port 5(veth12a1589) entered forwarding state
Apr 18 09:49:59 kms-mediator kernel: [3376987.955819] docker_gwbridge: port 5(veth12a1589) entered forwarding state
Apr 18 09:49:59 kms-mediator dockerd[25059]: time="2019-04-18T09:49:59.852392328Z" level=info msg="worker gctoo2ifeyh7q36gndsvj8wzw was successfully registered" method="(*Dispatcher).register"
Apr 18 09:50:00 kms-mediator dockerd[25059]: time="2019-04-18T09:50:00.159227736Z" level=info msg="worker mb6891ny99kiw0spipbfdt73x was successfully registered" method="(*Dispatcher).register"
Apr 18 09:50:01 kms-mediator dockerd[25059]: time="2019-04-18T09:50:01.275724633Z" level=error msg="node: e7b284330420 is unknown to memberlist"
Apr 18 09:50:01 kms-mediator dockerd[25059]: time="2019-04-18T09:50:01.466315575Z" level=info msg="Node e7b284330420/142.93.61.181, joined gossip cluster"
Apr 18 09:50:01 kms-mediator dockerd[25059]: time="2019-04-18T09:50:01.467024727Z" level=info msg="Node e7b284330420 change state NodeFailed --> NodeActive"
Apr 18 09:50:01 kms-mediator dockerd[25059]: time="2019-04-18T09:50:01.468190934Z" level=warning msg="memberlist: Was able to connect to 3af987f41544 but other probes failed, network may be misconfigured"
Apr 18 09:50:04 kms-mediator dockerd[25059]: time="2019-04-18T09:50:04.468612334Z" level=warning msg="memberlist: Was able to connect to 3af987f41544 but other probes failed, network may be misconfigured"
Apr 18 09:50:14 kms-mediator kernel: [3377002.874676] br0: port 4(veth6) entered forwarding state
Apr 18 09:50:14 kms-mediator kernel: [3377003.002697] docker_gwbridge: port 5(veth12a1589) entered forwarding state
Apr 18 09:50:31 kms-mediator dockerd[25059]: time="2019-04-18T09:50:31.668648717Z" level=error msg="Bulk sync to node e7b284330420 timed out"
Apr 18 09:50:43 kms-mediator dockerd[25059]: time="2019-04-18T09:50:43.469005548Z" level=error msg="Bulk sync to node 3af987f41544 timed out"

My worker node logs:

Apr 18 09:49:21 janus dockerd[1507]: time="2019-04-18T09:49:21.265113686Z" level=warning msg="memberlist: Was able to connect to 3af987f41544 but other probes failed, network may be misconfigured"
Apr 18 09:49:22 janus dockerd[1507]: time="2019-04-18T09:49:22.266022948Z" level=warning msg="memberlist: Was able to connect to 3af987f41544 but other probes failed, network may be misconfigured"
Apr 18 09:49:23 janus dockerd[1507]: time="2019-04-18T09:49:23.266784400Z" level=info msg="memberlist: Suspect 4adb11869318 has failed, no acks received"
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.139673348Z" level=error msg="heartbeat to manager {0ipcceidwwiwvbtt17gzm85qh 157.230.233.54:2377} failed" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" method="(*session).heartbeat" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw session.id=ujzh9mwhqrsdwvhd59nx98pg7 sessionID=ujzh9mwhqrsdwvhd59nx98pg7
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.140357481Z" level=error msg="agent: session failed" backoff=100ms error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.140808542Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.141096434Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.141603122Z" level=info msg="manager selected by agent for new session: {0ipcceidwwiwvbtt17gzm85qh 157.230.233.54:2377}" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.141734630Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{157.230.233.54:2377 0  <nil>}]" module=grpc
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.141992006Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.142073678Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420169ca0, CONNECTING" module=grpc
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.141953965Z" level=info msg="waiting 99.356175ms before registering session" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:25 janus dockerd[1507]: time="2019-04-18T09:49:25.267614963Z" level=warning msg="memberlist: Was able to connect to 4adb11869318 but other probes failed, network may be misconfigured"
Apr 18 09:49:26 janus dockerd[1507]: time="2019-04-18T09:49:26.267901488Z" level=info msg="memberlist: Suspect 3af987f41544 has failed, no acks received"
Apr 18 09:49:26 janus dockerd[1507]: time="2019-04-18T09:49:26.468992358Z" level=warning msg="memberlist: Refuting a suspect message (from: e7b284330420)"
Apr 18 09:49:28 janus dockerd[1507]: time="2019-04-18T09:49:28.268244800Z" level=warning msg="memberlist: Was able to connect to 3af987f41544 but other probes failed, network may be misconfigured"
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.241756224Z" level=error msg="agent: session failed" backoff=300ms error="session initiation timed out" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.241890984Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.241927218Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.242272705Z" level=info msg="manager selected by agent for new session: {0ipcceidwwiwvbtt17gzm85qh 157.230.233.54:2377}" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.242317533Z" level=info msg="waiting 244.443604ms before registering session" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.242383708Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{157.230.233.54:2377 0  <nil>}]" module=grpc
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.242438990Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.242495184Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420169110, CONNECTING" module=grpc
Apr 18 09:49:30 janus dockerd[1507]: time="2019-04-18T09:49:30.268553787Z" level=info msg="memberlist: Suspect 4adb11869318 has failed, no acks received"
Apr 18 09:49:33 janus dockerd[1507]: time="2019-04-18T09:49:33.268914127Z" level=info msg="memberlist: Suspect 4adb11869318 has failed, no acks received"
Apr 18 09:49:34 janus dockerd[1507]: time="2019-04-18T09:49:34.268903472Z" level=info msg="memberlist: Marking 4adb11869318 as failed, suspect timeout reached (0 peer confirmations)"
Apr 18 09:49:34 janus dockerd[1507]: time="2019-04-18T09:49:34.268993067Z" level=info msg="Node 4adb11869318/157.230.233.54, left gossip cluster"
Apr 18 09:49:34 janus dockerd[1507]: time="2019-04-18T09:49:34.269028860Z" level=info msg="Node 4adb11869318 change state NodeActive --> NodeFailed"
Apr 18 09:49:34 janus dockerd[1507]: time="2019-04-18T09:49:34.269333679Z" level=info msg="Node 4adb11869318/157.230.233.54, added to failed nodes list"
Apr 18 09:49:34 janus kernel: [3504251.394876] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:34 janus kernel: [3504251.495917] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:34 janus kernel: [3504251.575377] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:34 janus kernel: [3504251.659992] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.487208278Z" level=error msg="agent: session failed" backoff=700ms error="session initiation timed out" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.487868265Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.488186744Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.488704776Z" level=info msg="manager selected by agent for new session: {0ipcceidwwiwvbtt17gzm85qh 157.230.233.54:2377}" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.489027214Z" level=info msg="waiting 365.409385ms before registering session" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.488844574Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{157.230.233.54:2377 0  <nil>}]" module=grpc
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.489582889Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 18 09:49:35 janus dockerd[1507]: time="2019-04-18T09:49:35.489939617Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc421378040, CONNECTING" module=grpc
Apr 18 09:49:37 janus dockerd[1507]: time="2019-04-18T09:49:37.269649164Z" level=info msg="memberlist: Suspect 3af987f41544 has failed, no acks received"
Apr 18 09:49:38 janus do-agent[1404]: 2019/04/18 09:49:38 Sending metrics to DigitalOcean: Post https://nyc1.sonar.digitalocean.com/v1/metrics/droplet_id/135362746: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Apr 18 09:49:39 janus dockerd[1507]: time="2019-04-18T09:49:39.438069338Z" level=warning msg="memberlist: Push/Pull with 4adb11869318 failed: dial tcp 157.230.233.54:7946: i/o timeout"
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.854892764Z" level=error msg="agent: session failed" backoff=1.5s error="session initiation timed out" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.855429738Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.855681824Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.856119073Z" level=info msg="manager selected by agent for new session: {0ipcceidwwiwvbtt17gzm85qh 157.230.233.54:2377}" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.856362886Z" level=info msg="waiting 432.166258ms before registering session" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.856255184Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{157.230.233.54:2377 0  <nil>}]" module=grpc
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.856800015Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 18 09:49:40 janus dockerd[1507]: time="2019-04-18T09:49:40.857067979Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc421378a70, CONNECTING" module=grpc
Apr 18 09:49:41 janus dockerd[1507]: time="2019-04-18T09:49:41.269968071Z" level=info msg="memberlist: Marking 3af987f41544 as failed, suspect timeout reached (0 peer confirmations)"
Apr 18 09:49:41 janus dockerd[1507]: time="2019-04-18T09:49:41.270588789Z" level=info msg="Node 3af987f41544/134.209.118.8, left gossip cluster"
Apr 18 09:49:41 janus dockerd[1507]: time="2019-04-18T09:49:41.270947436Z" level=info msg="Node 3af987f41544 change state NodeActive --> NodeFailed"
Apr 18 09:49:41 janus dockerd[1507]: time="2019-04-18T09:49:41.271527701Z" level=info msg="Node 3af987f41544/134.209.118.8, added to failed nodes list"
Apr 18 09:49:41 janus kernel: [3504258.396729] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:41 janus kernel: [3504258.486638] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:41 janus kernel: [3504258.575787] IPVS: __ip_vs_del_service: enter
Apr 18 09:49:42 janus dockerd[1507]: time="2019-04-18T09:49:42.270019328Z" level=info msg="memberlist: Suspect 3af987f41544 has failed, no acks received"
Apr 18 09:49:45 janus dockerd[1507]: time="2019-04-18T09:49:45.142356141Z" level=warning msg="Failed to dial 157.230.233.54:2377: grpc: the connection is closing; please retry." module=grpc
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.288940894Z" level=error msg="agent: session failed" backoff=3.1s error="session initiation timed out" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.289051117Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.289067940Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.289289871Z" level=info msg="manager selected by agent for new session: {0ipcceidwwiwvbtt17gzm85qh 157.230.233.54:2377}" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.289332266Z" level=info msg="waiting 2.506216736s before registering session" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.289365687Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{157.230.233.54:2377 0  <nil>}]" module=grpc
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.289382561Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 18 09:49:46 janus dockerd[1507]: time="2019-04-18T09:49:46.289446337Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420d15910, CONNECTING" module=grpc
Apr 18 09:49:50 janus dockerd[1507]: time="2019-04-18T09:49:50.242871855Z" level=warning msg="Failed to dial 157.230.233.54:2377: grpc: the connection is closing; please retry." module=grpc
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796030247Z" level=error msg="agent: session failed" backoff=6.3s error="session initiation timed out" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796147710Z" level=info msg="parsed scheme: \"\"" module=grpc
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796164114Z" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796396647Z" level=info msg="manager selected by agent for new session: {0ipcceidwwiwvbtt17gzm85qh 157.230.233.54:2377}" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796442884Z" level=info msg="waiting 5.029222702s before registering session" module=node/agent node.id=gctoo2ifeyh7q36gndsvj8wzw
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796478551Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{157.230.233.54:2377 0  <nil>}]" module=grpc
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796497536Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 18 09:49:53 janus dockerd[1507]: time="2019-04-18T09:49:53.796563651Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4214ed5b0, CONNECTING" module=grpc
Apr 18 09:49:55 janus dockerd[1507]: time="2019-04-18T09:49:55.490382209Z" level=warning msg="Failed to dial 157.230.233.54:2377: grpc: the connection is closing; please retry." module=grpc
Apr 18 09:49:59 janus dockerd[1507]: time="2019-04-18T09:49:59.265377050Z" level=error msg="Failed to join memberlist [157.230.233.54] on retry: 1 error(s) occurred:\n\n* Failed to join 157.230.233.54: dial tcp 157.230.233.54:7946: i/o timeout"
Apr 18 09:49:59 janus dockerd[1507]: time="2019-04-18T09:49:59.801429986Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4214ed5b0, READY" module=grpc

docker version

Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:40:58 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 05:59:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false

docker info

Containers: 4
 Running: 4
 Paused: 0
 Stopped: 0
Images: 75
Server Version: 18.09.3
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: 0ipcceidwwiwvbtt17gzm85qh
 Is Manager: true
 ClusterID: plnokvrq4yq7dqi9js0x5wqdd
 Managers: 1
 Nodes: 3
 Default Address Pool: 10.0.0.0/8  
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 157.230.233.54
 Manager Addresses:
  157.230.233.54:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-142-generic
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.953GiB
Name: kms-mediator
ID: YWTG:TJEH:IYMA:UKAE:ZBOE:ODEB:3IX6:GWTG:ILHM:Q3J7:53PJ:5EBV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 provider=digitalocean
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Environment: Both nodes are separate DigitalOcean droplets

@richardlowenthal
Copy link

I'm facing exactly the same. Any updates about this post?

@r4fek
Copy link

r4fek commented Oct 3, 2019

Same here!

@majiajue
Copy link

I'm facing exactly the same. Any updates about this post?

@ra-coder
Copy link

Same with 19.03.5

ra@barn-01:~$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 35
  Running: 7
  Paused: 0
  Stopped: 28
 Images: 9
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: 952iedkyjkv6up55rq7i64pc3
  Is Manager: false
  Node Address: 91.237.249.65
  Manager Addresses:
   141.105.66.236:2377
   92.53.64.188:2377
   95.213.131.210:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-29-generic
 Operating System: Ubuntu 18.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 62.9GiB
 Name: barn-01
 ID: 4QQ4:TWLU:LOGX:USR7:BF3X:67HF:JOAG:NYQC:JLBK:NHE3:JBTK:HOT4
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

next in journalctl -u docker

sudo journalctl -u docker | tail -n 300
5cf8ab9ebe0734e626adb5928019a56d66df07997aeb91f96/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:25:01 barn-01 dockerd[1649]: time="2020-01-17T07:25:01.507433502-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:ih3ydbug5d7g4k7wvvqmt5a09 leaving:false netPeers:7 entries:94 Queue qLen:0 netMsg/s:0"
Jan 17 07:25:01 barn-01 dockerd[1649]: time="2020-01-17T07:25:01.507551647-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:n58omezmixa5vi6z33v4js5l2 leaving:false netPeers:4 entries:40 Queue qLen:0 netMsg/s:0"
Jan 17 07:25:01 barn-01 dockerd[1649]: time="2020-01-17T07:25:01.507585243-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:myn3onq0xgdgenfc5i7zhm7ai leaving:false netPeers:11 entries:32 Queue qLen:0 netMsg/s:0"
Jan 17 07:25:01 barn-01 dockerd[1649]: time="2020-01-17T07:25:01.507619868-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:k37odopbgoyz9cpv3uilp1h1c leaving:false netPeers:11 entries:97 Queue qLen:0 netMsg/s:0"
Jan 17 07:30:01 barn-01 dockerd[1649]: time="2020-01-17T07:30:01.707435330-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:n58omezmixa5vi6z33v4js5l2 leaving:false netPeers:4 entries:40 Queue qLen:0 netMsg/s:0"
Jan 17 07:30:01 barn-01 dockerd[1649]: time="2020-01-17T07:30:01.707538143-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:myn3onq0xgdgenfc5i7zhm7ai leaving:false netPeers:11 entries:32 Queue qLen:0 netMsg/s:0"
Jan 17 07:30:01 barn-01 dockerd[1649]: time="2020-01-17T07:30:01.707576166-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:k37odopbgoyz9cpv3uilp1h1c leaving:false netPeers:11 entries:97 Queue qLen:0 netMsg/s:0"
Jan 17 07:30:01 barn-01 dockerd[1649]: time="2020-01-17T07:30:01.707616499-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:ih3ydbug5d7g4k7wvvqmt5a09 leaving:false netPeers:7 entries:94 Queue qLen:0 netMsg/s:0"
Jan 17 07:33:01 barn-01 dockerd[1649]: time="2020-01-17T07:33:01.107712331-05:00" level=info msg="memberlist: Suspect db085bc444b4 has failed, no acks received"
Jan 17 07:33:05 barn-01 dockerd[1649]: time="2020-01-17T07:33:05.609730222-05:00" level=warning msg="memberlist: Refuting a suspect message (from: 716cf5e6e16c)"
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.107782488-05:00" level=info msg="memberlist: Suspect db085bc444b4 has failed, no acks received"
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671044727-05:00" level=error msg="heartbeat to manager {l6dndjoram0ptqsf370oe4njw 92.53.64.188:2377} failed" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" method="(*session).heartbeat" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3 session.id=wf4j975pmf9vmfq1hz2gviz3p sessionID=wf4j975pmf9vmfq1hz2gviz3p
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671139870-05:00" level=error msg="agent: session failed" backoff=100ms error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671241477-05:00" level=info msg="parsed scheme: \"\"" module=grpc
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671264789-05:00" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671492230-05:00" level=error msg="closing session after fatal error" error="rpc error: code = Unavailable desc = transport is closing" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671563407-05:00" level=error msg="status reporter failed to report status to agent" error="rpc error: code = Unavailable desc = transport is closing" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671611970-05:00" level=info msg="ccResolverWrapper: sending update to cc: {[{141.105.66.236:2377 0  <nil>}] <nil>}" module=grpc
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671655126-05:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671686536-05:00" level=info msg="manager selected by agent for new session: {yz8061f18re1xpzlalej82t61 141.105.66.236:2377}" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:07 barn-01 dockerd[1649]: time="2020-01-17T07:33:07.671735316-05:00" level=info msg="waiting 13.354896ms before registering session" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:08 barn-01 dockerd[1649]: time="2020-01-17T07:33:08.827949420-05:00" level=warning msg="memberlist: Refuting a suspect message (from: 00fa0d1c7b7a)"
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.107705121-05:00" level=info msg="memberlist: Suspect 716cf5e6e16c has failed, no acks received"
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.685455047-05:00" level=error msg="agent: session failed" backoff=300ms error="session initiation timed out" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.685587098-05:00" level=info msg="parsed scheme: \"\"" module=grpc
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.685613829-05:00" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.685861520-05:00" level=info msg="ccResolverWrapper: sending update to cc: {[{95.213.131.210:2377 0  <nil>}] <nil>}" module=grpc
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.685890806-05:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.685932112-05:00" level=info msg="manager selected by agent for new session: {cc8p2g9w23yftc4py6rozjkie 95.213.131.210:2377}" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:12 barn-01 dockerd[1649]: time="2020-01-17T07:33:12.685998483-05:00" level=info msg="waiting 272.632534ms before registering session" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:13 barn-01 dockerd[1649]: time="2020-01-17T07:33:13.497594295-05:00" level=warning msg="memberlist: Push/Pull with 0ba868292d66 failed: dial tcp 92.53.64.188:7946: i/o timeout"
Jan 17 07:33:15 barn-01 dockerd[1649]: time="2020-01-17T07:33:15.136423582-05:00" level=warning msg="bulk sync to node db085bc444b4 failed: failed to send a TCP message during bulk sync: dial tcp 141.105.66.235:7946: i/o timeout"
Jan 17 07:33:15 barn-01 dockerd[1649]: time="2020-01-17T07:33:15.228102529-05:00" level=warning msg="memberlist: Refuting a suspect message (from: 5f209156808d)"
Jan 17 07:33:17 barn-01 dockerd[1649]: time="2020-01-17T07:33:17.958994315-05:00" level=error msg="agent: session failed" backoff=700ms error="session initiation timed out" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:17 barn-01 dockerd[1649]: time="2020-01-17T07:33:17.959115185-05:00" level=info msg="parsed scheme: \"\"" module=grpc
Jan 17 07:33:17 barn-01 dockerd[1649]: time="2020-01-17T07:33:17.959138286-05:00" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Jan 17 07:33:17 barn-01 dockerd[1649]: time="2020-01-17T07:33:17.959452254-05:00" level=info msg="ccResolverWrapper: sending update to cc: {[{95.213.131.210:2377 0  <nil>}] <nil>}" module=grpc
Jan 17 07:33:17 barn-01 dockerd[1649]: time="2020-01-17T07:33:17.959482862-05:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 17 07:33:17 barn-01 dockerd[1649]: time="2020-01-17T07:33:17.959520415-05:00" level=info msg="manager selected by agent for new session: {cc8p2g9w23yftc4py6rozjkie 95.213.131.210:2377}" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:17 barn-01 dockerd[1649]: time="2020-01-17T07:33:17.959575233-05:00" level=info msg="waiting 126.934663ms before registering session" module=node/agent node.id=952iedkyjkv6up55rq7i64pc3
Jan 17 07:33:19 barn-01 dockerd[1649]: time="2020-01-17T07:33:19.107629788-05:00" level=info msg="memberlist: Suspect 467fcd225449 has failed, no acks received"
Jan 17 07:33:21 barn-01 dockerd[1649]: time="2020-01-17T07:33:21.460283781-05:00" level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
Jan 17 07:33:21 barn-01 dockerd[1649]: time="2020-01-17T07:33:21.461343535-05:00" level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
Jan 17 07:33:21 barn-01 dockerd[1649]: time="2020-01-17T07:33:21.462330071-05:00" level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
Jan 17 07:33:21 barn-01 dockerd[1649]: time="2020-01-17T07:33:21.462539486-05:00" level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
Jan 17 07:33:21 barn-01 dockerd[1649]: time="2020-01-17T07:33:21.464766403-05:00" level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
Jan 17 07:33:23 barn-01 dockerd[1649]: time="2020-01-17T07:33:23.044068020-05:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {141.105.66.236:2377 0  <nil>}. Err :connection error: desc = \"transport: authentication handshake failed: context canceled\". Reconnecting..." module=grpc
Jan 17 07:33:23 barn-01 dockerd[1649]: time="2020-01-17T07:33:23.107834250-05:00" level=warning msg="memberlist: Was able to connect to 716cf5e6e16c but other probes failed, network may be misconfigured"
Jan 17 07:33:24 barn-01 dockerd[1649]: time="2020-01-17T07:33:24.693692492-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:33:24 barn-01 dockerd[1649]: time="2020-01-17T07:33:24.693750727-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:33:24 barn-01 dockerd[1649]: time="2020-01-17T07:33:24.694315305-05:00" level=warning msg="rmServiceBinding 257952125e0a924c4f8e23a207dd0522fa07bd03f059efa2965263678e509072 possible transient state ok:false entries:0 set:false "
Jan 17 07:33:24 barn-01 dockerd[1649]: time="2020-01-17T07:33:24.694623171-05:00" level=warning msg="rmServiceBinding ee482c1172bd8b9416b7acbe90f31214b8a35621abf57d8609f6727509604465 possible transient state ok:false entries:0 set:false "
Jan 17 07:33:25 barn-01 dockerd[1649]: time="2020-01-17T07:33:25.266772275-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:33:25 barn-01 dockerd[1649]: time="2020-01-17T07:33:25.276195904-05:00" level=warning msg="1a33343e051528b55a14d8441d014223cd21a2652c67f19fa386fe14d66dd54e cleanup: failed to unmount IPC: umount /var/lib/docker/containers/1a33343e051528b55a14d8441d014223cd21a2652c67f19fa386fe14d66dd54e/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:33:25 barn-01 dockerd[1649]: time="2020-01-17T07:33:25.276371349-05:00" level=warning msg="rmServiceBinding 81c476110f81553fec3702b74ba5d6ba29f6eff87950e2ac0a5e786a303f65f6 possible transient state ok:false entries:0 set:false "
Jan 17 07:33:25 barn-01 dockerd[1649]: time="2020-01-17T07:33:25.314303935-05:00" level=warning msg="9b080faa0dadeb91f327168940788437061bedb80076ffe1496c85f64c97b4f0 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/9b080faa0dadeb91f327168940788437061bedb80076ffe1496c85f64c97b4f0/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:33:25 barn-01 dockerd[1649]: time="2020-01-17T07:33:25.804910336-05:00" level=warning msg="8ab7fca1d3682259679e8641c76577abb9ce76619af08117be80c54d3a29f8cc cleanup: failed to unmount IPC: umount /var/lib/docker/containers/8ab7fca1d3682259679e8641c76577abb9ce76619af08117be80c54d3a29f8cc/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:33:26 barn-01 dockerd[1649]: time="2020-01-17T07:33:26.091922015-05:00" level=info msg="memberlist: Marking db085bc444b4 as failed, suspect timeout reached (0 peer confirmations)"
Jan 17 07:33:26 barn-01 dockerd[1649]: time="2020-01-17T07:33:26.092006471-05:00" level=info msg="Node db085bc444b4/141.105.66.235, left gossip cluster"
Jan 17 07:33:26 barn-01 dockerd[1649]: time="2020-01-17T07:33:26.092041215-05:00" level=info msg="Node db085bc444b4 change state NodeActive --> NodeFailed"
Jan 17 07:33:26 barn-01 dockerd[1649]: time="2020-01-17T07:33:26.111476093-05:00" level=info msg="Node db085bc444b4/141.105.66.235, added to failed nodes list"
Jan 17 07:33:26 barn-01 dockerd[1649]: time="2020-01-17T07:33:26.116941008-05:00" level=info msg="Node db085bc444b4/141.105.66.235, joined gossip cluster"
Jan 17 07:33:26 barn-01 dockerd[1649]: time="2020-01-17T07:33:26.117005091-05:00" level=info msg="Node db085bc444b4 change state NodeFailed --> NodeActive"
Jan 17 07:33:27 barn-01 dockerd[1649]: time="2020-01-17T07:33:27.462135219-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:33:27 barn-01 dockerd[1649]: time="2020-01-17T07:33:27.462654462-05:00" level=warning msg="rmServiceBinding d4329216ff15081bb7ab151fe061f6ab35f08b88dd653295173ead2c4350a2a4 possible transient state ok:false entries:0 set:false "
Jan 17 07:33:28 barn-01 dockerd[1649]: time="2020-01-17T07:33:28.172762016-05:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {95.213.131.210:2377 0  <nil>}. Err :connection error: desc = \"transport: authentication handshake failed: context canceled\". Reconnecting..." module=grpc
Jan 17 07:33:28 barn-01 dockerd[1649]: time="2020-01-17T07:33:28.449845943-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:33:28 barn-01 dockerd[1649]: time="2020-01-17T07:33:28.450414058-05:00" level=warning msg="rmServiceBinding 009c556201cbd8c65b67c6ae585979a9b14c629a3b30dfa0829750bf177d5b82 possible transient state ok:false entries:0 set:false "
Jan 17 07:33:28 barn-01 dockerd[1649]: time="2020-01-17T07:33:28.776560198-05:00" level=warning msg="f0bcccb2a972344c7a4ec813eaf833a289004ef27461d6e2656b1840a7de594b cleanup: failed to unmount IPC: umount /var/lib/docker/containers/f0bcccb2a972344c7a4ec813eaf833a289004ef27461d6e2656b1840a7de594b/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:33:29 barn-01 dockerd[1649]: time="2020-01-17T07:33:29.777069894-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:33:29 barn-01 dockerd[1649]: time="2020-01-17T07:33:29.777665497-05:00" level=warning msg="rmServiceBinding c8d9a263dd32cf987cdefc99e654e798a3a88af763c266d0c3031c28aee8849e possible transient state ok:false entries:0 set:false "
Jan 17 07:33:30 barn-01 dockerd[1649]: time="2020-01-17T07:33:30.016506648-05:00" level=warning msg="5294e5d15177bad7052ff39e0961a10baf7950647d020ec43e80e9739c040ac1 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/5294e5d15177bad7052ff39e0961a10baf7950647d020ec43e80e9739c040ac1/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:33:30 barn-01 dockerd[1649]: time="2020-01-17T07:33:30.180382787-05:00" level=warning msg="5d2e1154ce813220c184df210586a0232954d485367eeaa95112899660532d6a cleanup: failed to unmount IPC: umount /var/lib/docker/containers/5d2e1154ce813220c184df210586a0232954d485367eeaa95112899660532d6a/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:33:34 barn-01 dockerd[1649]: time="2020-01-17T07:33:34.526927577-05:00" level=info msg="Container fdff305d583d37fb6d9200611505c75c9c98206e826c807789de53a4755be77a failed to exit within 10 seconds of signal 15 - using the force"
Jan 17 07:33:34 barn-01 dockerd[1649]: time="2020-01-17T07:33:34.755291952-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:33:34 barn-01 dockerd[1649]: time="2020-01-17T07:33:34.755934328-05:00" level=warning msg="rmServiceBinding 5b3a989eae2b223b0bfe9bdd55260ad0d88c12e5ad02324c7066da31c9736485 possible transient state ok:false entries:0 set:false "
Jan 17 07:33:35 barn-01 dockerd[1649]: time="2020-01-17T07:33:35.200673173-05:00" level=warning msg="fdff305d583d37fb6d9200611505c75c9c98206e826c807789de53a4755be77a cleanup: failed to unmount IPC: umount /var/lib/docker/containers/fdff305d583d37fb6d9200611505c75c9c98206e826c807789de53a4755be77a/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:34:28 barn-01 dockerd[1649]: time="2020-01-17T07:34:28.867276952-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 17 07:34:29 barn-01 dockerd[1649]: time="2020-01-17T07:34:29.225217183-05:00" level=warning msg="1054fa9aeb0caabc52a3b6a786e0df72213957df6d1414aae7425b7504e22666 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/1054fa9aeb0caabc52a3b6a786e0df72213957df6d1414aae7425b7504e22666/mounts/shm, flags: 0x2: no such file or directory"
Jan 17 07:35:01 barn-01 dockerd[1649]: time="2020-01-17T07:35:01.907441032-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:ih3ydbug5d7g4k7wvvqmt5a09 leaving:false netPeers:7 entries:98 Queue qLen:0 netMsg/s:0"
Jan 17 07:35:01 barn-01 dockerd[1649]: time="2020-01-17T07:35:01.907520030-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:n58omezmixa5vi6z33v4js5l2 leaving:false netPeers:4 entries:46 Queue qLen:0 netMsg/s:0"
Jan 17 07:35:01 barn-01 dockerd[1649]: time="2020-01-17T07:35:01.907555368-05:00" level=info msg="NetworkDB stats barn-01(96a9a06d3105) - netID:myn3onq0xgdgenfc5i7zhm7ai leaving:false netPeers:

all containers restarted simultaniously

ra@barn-01:~$ date
Fri Jan 17 08:21:48 EST 2020
ra@barn-01:~$ docker ps
CONTAINER ID        IMAGE                                              COMMAND                  CREATED             STATUS                    PORTS                NAMES
09b1edebfb11        registry.speech.one/bakery-postgres-slave:latest   "/docker-entrypoint.…"   48 minutes ago      Up 48 minutes             5432/tcp             prod_postgres-slave-01.1.1aiz4yg4h2vuairpb3esjai7b
6bbd2bd958a8        google/cadvisor:v0.33.0                            "/usr/bin/cadvisor -…"   48 minutes ago      Up 48 minutes (healthy)   8080/tcp             monitoring_cadvisor.952iedkyjkv6up55rq7i64pc3.h87no18eu39n0lrs48agbpzm4
70ea9b2a8cac        registry.speech.one/bakery-elastic:latest          "/usr/local/bin/dock…"   48 minutes ago      Up 48 minutes (healthy)   9200/tcp, 9300/tcp   prod_elastic-1.1.uuj3tt5a5hv4akpdyn9hp7w2r
2816ce4c9942        stefanprodan/caddy:latest                          "/sbin/tini -- caddy…"   48 minutes ago      Up 48 minutes                                  monitoring_dockerd-exporter.952iedkyjkv6up55rq7i64pc3.maqb9scd4wjnu7bspdtlfpx91
2e03103a3825        registry.speech.one/bakery-elastic:latest          "/usr/local/bin/dock…"   48 minutes ago      Up 48 minutes (healthy)   9200/tcp, 9300/tcp   preprod_elastic-1.1.5wqokzq683iryz2q9kmez0f7x
812d68dd75f2        stefanprodan/swarmprom-node-exporter:v0.16.0       "/etc/node-exporter/…"   48 minutes ago      Up 48 minutes             9100/tcp             monitoring_node-exporter.952iedkyjkv6up55rq7i64pc3.x30au3dtzodm6nopkgszz1ul9


@dyohan9
Copy link

dyohan9 commented Nov 3, 2020

I am also facing the same problem

image

image

@huepf
Copy link

huepf commented Jan 25, 2021

I am also seeing this issue in docker v.20.10.2

@esyon
Copy link

esyon commented Jan 26, 2021

Same here...
image

This problem persists for two years now and nothing happend?

@jpaarhuis
Copy link

Potential solution can be found here: #36311

@NgyAnthony
Copy link

NgyAnthony commented Jun 6, 2021

Capture d’écran 2021-06-06 à 06 35 06

Capture d’écran 2021-06-06 à 06 44 01

Probably not the proper issue but it's in the same chain of issues so I'll post it there.

Debian 10 - This issue happened without any warning, randomly:

  • One of the node dies
  • Restarting proves to be useless:
    • br0 and docker_gwbridge keep switching on and off
    • IPv6 error thrown
  • Workers keep switching between Down and Ready
  • If you leave the cluster you won't be able to join it back again (nodes on pending or error)

I tried disabling IPv6 on host or setting the proper config inside the deamon but the issue didn't go away.
I had to rebuild my whole cluster on another server... I had some backups so it was fine but how can such an issue appear out of nowhere ?

Similar issue on S.O
https://stackoverflow.com/questions/59787464/docker-swarm-restart-all-containers-on-host-periodically

I didn't try it since I migrated everything, but maybe try to upgrade your docker deamon ?

@shivanisarthi
Copy link

Is the problem still there?

@huepf
Copy link

huepf commented Oct 14, 2021

For my side, I can add that we found the reason on our code. I was not a docker issue for us.

@Cluster2a
Copy link

Cluster2a commented Oct 14, 2021

Same here... image

This problem persists for two years now and nothing happend?

Problem solved - it was a job within the container that was consuming way too much memory.

@Broderick890
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests