New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker-runc hang on checking systemd.UseSystemd #1959
Comments
|
I don't see any useful information on runc side, @mrunalp have you seen this kind of issue on RHEL? |
|
while large number of units are created/removed for a period of time like 5 months, sd_bus->cookie will be overflowed and dbus org.freedesktop.systemd1 will be no response at all because systemd cannot seal dbus1 type of messages. this issue impact Kubernetes cluster nodes for a long while and it's not easy to reproduce. when issue occurs, the node cannot create new container. we can only reboot the system, or reexec systemd to solve the issue. we should figure out some way to re-code the UseSystemd function or it's callers, so far the systemd will get like hundreds of message from runc even if we run a docker exec command. the test will overflow sd_bus->cookie sooner or later. int bus_message_seal(sd_bus_message *m, uint64_t cookie, usec_t timeout) { bus_message_seal (m=0x55cbc5a75790, cookie=4294967731, timeout=25000000) at src/libsystemd/sd-bus/bus-message.c:2924 (gdb) info registers |
|
So far when I create or stop a test container in docker environemnt like busybox, there will be about 80 test unit creation/removal messages that are send to bus org.freedesktop.systemd1 in function systemd.UseSystemd, and the bus org.freedesktop.systemd1 is absolutely under unit new/remove message storm. |
Yesterday one node of my kubernetes cluster became notready.
ps -efshowed some docker-runc processes had been running many daysAfter some investigation, I found docker-runc hang when calling systemd.UseSystemd. Below is the stack.
In fact, any dbus method call send to
org.freedesktop.systemd1was not responsed, for example, the below command would wait forever:dbus-send --system --dest=org.freedesktop.systemd1 --type=method_call --print-reply /org/freedesktop/systemd1 org.freedesktop.DBus.Introspectable.IntrospectAlso there were many systemd errors in /var/log/messages:
Jan 4 11:56:31 host-k8s-node001 systemd: Failed to propagate agent release message: Operation not supportedbusctl treereportedFailed to introspect object / of service org.freedesktop.systemd1: Connection timed outResolved by restarting systemd:
systemctl daemon-reexecdocker-runc stack:
Bellows are more details
OS
Linux host-k8s-node001.ymt.io 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/LinuxDBUS Daemon:
1.10.24Systemd:
Kubelet:
Kubernetes v1.11.2Docker Info:
The text was updated successfully, but these errors were encountered: