New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-runc hang on checking systemd.UseSystemd #1959

Open
jzwlqx opened this Issue Jan 9, 2019 · 2 comments

Comments

Projects
None yet
3 participants
@jzwlqx
Copy link

jzwlqx commented Jan 9, 2019

Yesterday one node of my kubernetes cluster became notready. ps -ef showed some docker-runc processes had been running many days

root     26579  1303  0  2018 ?        00:00:00 docker-runc --systemd-cgroup=true events --stats c29996ea9566f16616505e7118315635582714308564ba0d9a70f8fb8cf73f0a
root     27841  2913  0  2018 ?        00:00:00 docker-runc --systemd-cgroup=true kill --all 8561b78c9cb19c0d883e30eafc8ff41ddf3007043985271386ffdbafc24d4376 SIGKILL
root     28293  1303  0  2018 ?        00:00:00 docker-runc --systemd-cgroup=true delete 25660e4c1f66593ec33ae57823def641a4c4a9ae1a7c6840afd081961b66e66e

After some investigation, I found docker-runc hang when calling systemd.UseSystemd. Below is the stack.

In fact, any dbus method call send to org.freedesktop.systemd1 was not responsed, for example, the below command would wait forever:

dbus-send --system --dest=org.freedesktop.systemd1 --type=method_call --print-reply /org/freedesktop/systemd1 org.freedesktop.DBus.Introspectable.Introspect

Also there were many systemd errors in /var/log/messages:
Jan 4 11:56:31 host-k8s-node001 systemd: Failed to propagate agent release message: Operation not supported

busctl tree reported Failed to introspect object / of service org.freedesktop.systemd1: Connection timed out

Resolved by restarting systemd: systemctl daemon-reexec

docker-runc stack:

PC=0x45a471 m=0 sigcode=128

goroutine 0 [idle]:
runtime.futex(0xb37950, 0x0, 0x0, 0x0, 0x0, 0xbdc4df90, 0x0, 0x0, 0x7ffdbdc4dfa0, 0x41178b, ...)
	/usr/local/go/src/runtime/sys_linux_amd64.s:425 +0x21
runtime.futexsleep(0xb37950, 0x0, 0xffffffffffffffff)
	/usr/local/go/src/runtime/os_linux.go:45 +0x62
runtime.notesleep(0xb37950)
	/usr/local/go/src/runtime/lock_futex.go:145 +0x6b
runtime.stopm()
	/usr/local/go/src/runtime/proc.go:1650 +0xad
runtime.findrunnable(0xc42001c000, 0x0)
	/usr/local/go/src/runtime/proc.go:2102 +0x2e4
runtime.schedule()
	/usr/local/go/src/runtime/proc.go:2222 +0x14c
runtime.park_m(0xc42006e340)
	/usr/local/go/src/runtime/proc.go:2285 +0xab
runtime.mcall(0x7ffdbdc4e130)
	/usr/local/go/src/runtime/asm_amd64.s:269 +0x5b

goroutine 1 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Object).Call(0xc42001bfb0, 0x7a026a, 0x33, 0xc4200fd000, 0xc420011340, 0x4, 0x4, 0xc4200fd0c0)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/object.go:27 +0xc7
github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).startJob(0xc420061570, 0x0, 0x7a026a, 0x33, 0xc420011340, 0x4, 0x4, 0x0, 0x0, 0x0)
	/go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/methods.go:47 +0xdf
github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).StartTransientUnit(0xc420061570, 0x7919cc, 0xa, 0x7904e9, 0x7, 0x0, 0x0, 0x0, 0x0, 0xc420062000, ...)
	/go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/methods.go:137 +0x24b
github.com/opencontainers/runc/libcontainer/cgroups/systemd.UseSystemd(0xc420012a00)
	/go/src/github.com/opencontainers/runc/libcontainer/cgroups/systemd/apply_systemd.go:106 +0x130
main.loadFactory(0xc420012a00, 0x0, 0x792d83, 0xe, 0x0)
	/go/src/github.com/opencontainers/runc/utils_linux.go:38 +0x1a0
main.getContainer(0xc420012a00, 0x1, 0x0, 0x0, 0x0)
	/go/src/github.com/opencontainers/runc/utils_linux.go:54 +0x61
main.glob..func3(0xc420012a00, 0x0, 0x0)
	/go/src/github.com/opencontainers/runc/delete.go:53 +0xa2
reflect.Value.call(0x724960, 0x7a5568, 0x13, 0x78f57e, 0x4, 0xc4200cb760, 0x1, 0x1, 0xc4200cb6f0, 0x784ca0, ...)
	/usr/local/go/src/reflect/value.go:434 +0x91f
reflect.Value.Call(0x724960, 0x7a5568, 0x13, 0xc4200cb760, 0x1, 0x1, 0x6a4bd6, 0x78f712, 0x4)
	/usr/local/go/src/reflect/value.go:302 +0xa4
github.com/opencontainers/runc/vendor/github.com/urfave/cli.HandleAction(0x724960, 0x7a5568, 0xc420012a00, 0x0, 0x0)
	/go/src/github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:487 +0x18f
github.com/opencontainers/runc/vendor/github.com/urfave/cli.Command.Run(0x78feff, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7a2070, 0x4d, 0x0, ...)
	/go/src/github.com/opencontainers/runc/vendor/github.com/urfave/cli/command.go:191 +0xac8
github.com/opencontainers/runc/vendor/github.com/urfave/cli.(*App).Run(0xc4200da000, 0xc420010200, 0x4, 0x4, 0x0, 0x0)
	/go/src/github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:240 +0x5d6
main.main()
	/go/src/github.com/opencontainers/runc/main.go:137 +0xbd2

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2197 +0x1

goroutine 6 [syscall]:
os/signal.signal_recv(0x621f1b)
	/usr/local/go/src/runtime/sigqueue.go:116 +0x104
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
	/usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 7 [IO wait]:
net.runtime_pollWait(0x7f211b3d0000, 0x72, 0xc4200c7778)
	/usr/local/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc420060d18, 0x72, 0x1000, 0x0)
	/usr/local/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc420060d18, 0xc4200f2a60, 0x10)
	/usr/local/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).readMsg(0xc420060cb0, 0xc4200f2a60, 0x10, 0x10, 0xc4200f7320, 0x1000, 0x1000, 0xffffffffffffffff, 0x0, 0x0, ...)
	/usr/local/go/src/net/fd_unix.go:304 +0x21a
net.(*UnixConn).readMsg(0xc42000e0a0, 0xc4200f2a60, 0x10, 0x10, 0xc4200f7320, 0x1000, 0x1000, 0xc4200f2754, 0xc4200fc8a0, 0xc4200f271b, ...)
	/usr/local/go/src/net/unixsock_posix.go:115 +0x91
net.(*UnixConn).ReadMsgUnix(0xc42000e0a0, 0xc4200f2a60, 0x10, 0x10, 0xc4200f7320, 0x1000, 0x1000, 0xc4200613b0, 0x7, 0xc4200f2954, ...)
	/usr/local/go/src/net/unixsock.go:131 +0xaa
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*oobReader).Read(0xc4200f7300, 0xc4200f2a60, 0x10, 0x10, 0x1020, 0xc4200f7300, 0x0)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:21 +0x8f
io.ReadAtLeast(0xb20f00, 0xc4200f7300, 0xc4200f2a60, 0x10, 0x10, 0x10, 0x7563a0, 0xc420062801, 0xc4200f7300)
	/usr/local/go/src/io/io.go:307 +0xa9
io.ReadFull(0xb20f00, 0xc4200f7300, 0xc4200f2a60, 0x10, 0x10, 0x21, 0xc42001bf80, 0x0)
	/usr/local/go/src/io/io.go:325 +0x58
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*unixTransport).ReadMessage(0xc420015e40, 0xc4200fc420, 0x14, 0x78f13a)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:85 +0x138
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).inWorker(0xc4200a6b40)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:248 +0x4b
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:118 +0x749

goroutine 8 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).outWorker(0xc4200a6b40)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:385 +0x5a
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:119 +0x76e

goroutine 18 [IO wait]:
net.runtime_pollWait(0x7f211b3cff40, 0x72, 0xc420123778)
	/usr/local/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc420100068, 0x72, 0x1000, 0x0)
	/usr/local/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc420100068, 0xc4200f2fc0, 0x10)
	/usr/local/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).readMsg(0xc420100000, 0xc4200f2fc0, 0x10, 0x10, 0xc4200f9920, 0x1000, 0x1000, 0xffffffffffffffff, 0x0, 0x0, ...)
	/usr/local/go/src/net/fd_unix.go:304 +0x21a
net.(*UnixConn).readMsg(0xc4200be030, 0xc4200f2fc0, 0x10, 0x10, 0xc4200f9920, 0x1000, 0x1000, 0xc4200f2e5c, 0xb522e0, 0xc4200f2e23, ...)
	/usr/local/go/src/net/unixsock_posix.go:115 +0x91
net.(*UnixConn).ReadMsgUnix(0xc4200be030, 0xc4200f2fc0, 0x10, 0x10, 0xc4200f9920, 0x1000, 0x1000, 0xc420061880, 0x7, 0xc4200f2f32, ...)
	/usr/local/go/src/net/unixsock.go:131 +0xaa
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*oobReader).Read(0xc4200f9900, 0xc4200f2fc0, 0x10, 0x10, 0x1020, 0xc4200f9900, 0x0)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:21 +0x8f
io.ReadAtLeast(0xb20f00, 0xc4200f9900, 0xc4200f2fc0, 0x10, 0x10, 0x10, 0x7563a0, 0x70ea01, 0xc4200f9900)
	/usr/local/go/src/io/io.go:307 +0xa9
io.ReadFull(0xb20f00, 0xc4200f9900, 0xc4200f2fc0, 0x10, 0x10, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/io.go:325 +0x58
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*unixTransport).ReadMessage(0xc4200bc160, 0xc4200ba420, 0xc420123e00, 0xc420100098)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/transport_unix.go:85 +0x138
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).inWorker(0xc420102000)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:248 +0x4b
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:118 +0x749

goroutine 19 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).outWorker(0xc420102000)
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/conn.go:385 +0x5a
created by github.com/opencontainers/runc/vendor/github.com/godbus/dbus.(*Conn).Auth
	/go/src/github.com/opencontainers/runc/vendor/github.com/godbus/dbus/auth.go:119 +0x76e

goroutine 9 [chan receive]:
github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).dispatch.func1(0xc4200708a0, 0xc420061570)
	/go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/subscription.go:64 +0x5f
created by github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus.(*Conn).dispatch
	/go/src/github.com/opencontainers/runc/vendor/github.com/coreos/go-systemd/dbus/subscription.go:96 +0x83

rax    0xca
rbx    0x0
rcx    0xffffffffffffffff
rdx    0x0
rdi    0xb37950
rsi    0x0
rbp    0x7ffdbdc4df70
rsp    0x7ffdbdc4df28
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x0
r13    0x260
r14    0x0
r15    0xd3
rip    0x45a471
rflags 0x286
cs     0x33
fs     0x0
gs     0x0

Bellows are more details

OS
Linux host-k8s-node001.ymt.io 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

DBUS Daemon:
1.10.24

Systemd:

systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Kubelet:

Kubernetes v1.11.2

Docker Info:

 Running: 35
 Paused: 0
 Stopped: 12
Images: 17
Server Version: 17.06.2-ce-3
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 462c82662200a17ee39e74692f536067a3576a50
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 23.39GiB
Name: host-k8s-node001.ymt.io
ID: V2XS:XAL5:3WSG:KYTJ:QAT7:45I7:GMO5:HZAV:UNYU:UDPQ:EE4A:CSQU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 harbor.ymt.io
 127.0.0.0/8
Registry Mirrors:
 https://pqbap4ya.mirror.aliyuncs.com/
Live Restore Enabled: false```
@BSWANG

This comment has been minimized.

Copy link

BSWANG commented Jan 11, 2019

@hqhq

This comment has been minimized.

Copy link
Contributor

hqhq commented Jan 12, 2019

I don't see any useful information on runc side, @mrunalp have you seen this kind of issue on RHEL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment