New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leak of scope units slowing down "systemctl list-unit-files" and delaying logins #1961
Comments
|
I am also seeing this problem on CentOS 7. They delay is causing chef-client to fail when managing services after a month or two of uptime. |
|
Is anyone looking into this or has any hints on how to debug this further? This is pretty annoying as it forces us to reboot every ~2 months. |
|
Yeah, it's easy enough to trigger. Various components fail with exec errors, but pam_systemd also fails in dbus: ... but the reason for left over scope units is probably this: or or |
|
Hm, it seems that this is not just a transient failure, but systemd-logind becomes permanently fucked up: An example (normal) login session: so it seems that dbus things that logind is gone. But systemd thinks that logind is alive (it is, the process is still there), so when dbus requests logind, it just returns success. So the issue seems to be between dbus and logind. |
|
So, running strace on logind shows that logind never even gets woken up, it only sends WATCHDOG=1 notifications to systemd. The question is whether this is a dbus problem, or a systemd-logind problem. One option would be for logind (and other systemd daemons which are only accessible through dbus), to send a ping to the dbus daemon every once in a while before notifying the watchdog. After all they are useless if the connection to the dbus daemon goes down for any reason. |
Add more information to the error messages when we fail to create or destroy a session. When looking at logs it is important to see which user and which sessions caused the failure. systemd#1961
|
Meanwhile, is there a way to "clean" these abandoned scope units whitout rebooting ? |
|
You don't actually need to reboot the hosts. @$work, we have found that it is enough to simply remove the scope files a la "rm -f /run/systemd/system/session-*.scope". The hosts that we have applied this to via cron have yet to lock up. It's early days still, but seems to do the trick. |
|
I'm still working in this. I started looking through the sd-event code which resulted in a bunch of cleanups. Unfortunately without any bearing on this bug. I should have some time to work on this this weekend. |
|
@mazingerzeta : sadly it isn't on my systems (CentOS 7.2, using systemd 219). |
|
We worked around this with a cronjob on all of our systemd systems (CentOS 7 in our case):
|
|
I am facing the same issue with our git server. maybe that helps someone else until the underlying problem is finally solved. PS: Thanks to everybody working on this issue. |
|
Ok, adding "systemctl stop" does the job. Thank you for the hint. |
|
At least the centos version of this bug seems to be caused by losing messages from the cgroup-agent. On my system, cgroup-agent is started every time but systemd sometimes doesn't get the message. |
|
LGTM |
|
Any update on when the official fix for this may happen? We have done our best with manual scripted workarounds, but we are still feeling the pain of this every day. I am happy to share specifics of our environment if that may help. We can replicate it quite quickly. |
This is a bug from systemd/systemd#1961 Signed-off-by: Mei Lei <meilei007@gmail.com>
This is a bug from systemd/systemd#1961 Signed-off-by: Mei Lei <meilei007@gmail.com>
|
OK, so I figured out one part of the puzzle I think: dbus-daemon is broken handling incoming messages where there's first a message without auxiliary fd in the socket buffer, which is then immediately followed by one with auxiliary fd. The kernel will already return the auxiliary fd with the first message, and dbus-daemon takes that as broken message and will abort the connection. |
|
OK, it's slightly different even, and I filed a bug to dbus-daemon now: |
|
Here's another related dbus bug, where I just posted a patch: https://bugs.freedesktop.org/show_bug.cgi?id=95264 (The issue is that we might end up losing cgroups agent messages much earlier than necessary, because of dbus-daemon's low listen backlog of 30) |
|
I prepped a set of fixes to logind in #3190 now, but they won't fix this issue fully – that really needs to be fixed in dbus, see https://bugs.freedesktop.org/show_bug.cgi?id=95263 |
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd#1961
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd/systemd#1961 Cherry-picked from: d8fdc62 Resolves: #1305608
|
I am sorry for abusing the upstream issue tracker (I have not found related bug in centos mantis), but for those centos users ( @mattmcdowell, @canoine @carlivar ), would you be willing to try this test build? https://people.redhat.com/lnykryn/systemd/bz1305608/ |
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd/systemd#1961 Cherry-picked from: d8fdc62 Resolves: #1305608
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd/systemd#1961 Cherry-picked from: d8fdc62 Resolves: #1305608
|
@poettering Thanks for your work on this. We were bit by this problem hard on GitLab.com, which is running Ubuntu 16.04. We applied your latest patch (https://bugs.freedesktop.org/show_bug.cgi?id=95263#c13), and this seems to have made the problem go away for now. A customer also reported experiencing this issue on Red Hat Enterprise Linux 7, which runs systemd 219. I would encourage that these patches make it into stable releases ASAP: Ubuntu bug thread: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1591411 |
|
Just got hit by this, what I find odd is that systemd(pid 1) takes forever to go through all of the scope files. Is it not possible to make the loop tighter, or even make it parallell? Right now I'm forced to remove all the files by hand instead. Or are there patches that I missed that corrects this? I am running systemd-219-19.el7_2.13. |
|
Hi everyone, if you do need to fix this ASAP, then here are the commands that you can type and clean/clear all your scope files and loaded active abandoned sessions: Cleanup abandoned sessions from systemd: Delete session files
Delete session directories
Remove the abandoned sessions
|
|
@Burebista1404, your session directories command has fallen victim to Markdown italics, could you wrap the commands in quotes
|
|
@iDemonix I've modified as requested. Thanks. |
|
systemctl | grep "abandoned" | grep -e "-[[:digit:]]" | sed "s/.scope.*/.scope/" | xargs systemctl stop this command may be kill the processes under the session scope, so be careful to use this command |
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd/systemd#1961 Cherry-picked from: d8fdc62 Resolves: #1305608
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd/systemd#1961 Cherry-picked from: d8fdc62 Resolves: #1305608
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd#1961 (cherry picked from commit d8fdc62) [mkoutny: resolved conflicts: src/cgroups-agent/cgroups-agent.c adjust context src/core/dbus.c replace MANAGER_IS_{USER,SYSTEM} macros adjust for missing 4afd334 adjust context src/core/manager.c adjust context replace MANAGER_IS_SYSTEM macro] [mkoutny: bsc#1045384]
…(v228) dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd/systemd#1961 (cherry picked from commit d8fdc62) [mkoutny: resolved conflicts: src/cgroups-agent/cgroups-agent.c adjust context src/core/dbus.c replace MANAGER_IS_{USER,SYSTEM} macros adjust for missing 4afd334 adjust context src/core/manager.c adjust context replace MANAGER_IS_SYSTEM macro] [mkoutny: fixes bsc#1045384] [mkoutny: fixes bsc#1047379]
|
what works for me here, on CentOS 7: |
|
@diffcity Your workaround is much cleaner than others, and doesn't scare me nearly as much as manually deleting files directly. My servers last about 30 days before filling up /run and your commands clear it out. Will be putting in cron to run daily |
…not lost systemd#1961 (cherry picked from commit 8d5b69a)
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 systemd/systemd#1961 Cherry-picked from: d8fdc62037b5b0a9fd603ad5efd6b49f956f86b5 Resolves: #1305608
|
Redhat doesn't seem to have backported a fix to this in EL7 yet. Deployed a cron job of the following to all our CentOS 7 servers:
|
|
well CentOS 7.x & RHEL 7.x suppose, they move session controlling to loginctl, but i can't find out how to use it this way |
|
Update to my previous hack for Centos 7. Only stops an abandoned scope if it has no children. #!/bin/bash
system_show () {
local t="${1##*.}"
systemctl show "$1" --no-pager --no-legend --no-ask-password | sed -e 's/"/\\"/g' -e 's/\([^=]*\)=\(.*\)/'"$t"'_\1="\2"/'
}
for unit in $(systemctl list-units --state=abandoned -t scope --no-legend --no-pager --no-ask-password | cut -d' ' -f1) ; do
eval "$(system_show "$unit")"
[ "$(systemd-cgls "$scope_ControlGroup" 2>/dev/null | wc -l)" -eq 1 ] && systemctl stop "$unit"
done |
…not lost systemd#1961 (cherry picked from commit 8d5b69a)
There seems to be a race condition in systemd or logind, which results in the leak of scope unit files in /run/systemd/system. This causes PID 1 to use 100% CPU and delays logins.
I am able to reproduce the issue in Debian jessie with systemd 215-17+deb8u2, Debian sid with systemd 227-1 and 228-1 as well as on Arch Linux with systemd 227-1. To reproduce it, prepare the system as follows (install libpam-systemd and add a test user with a ssh key):
First look at the output for this command:
You should see exactly one file "session-$i.scope" (seems to exist only with older versions of systemd like in Debian jessie) and one directory "session-$i.scope.d" for your current login session.
Now let the following command run for a while (you may have to vary the sleep interval depending on how fast the host is, 5 minutes should be enough to get at least a few leaked units):
Then cancel that command and wait for the remaining ssh processes to terminate and look again at the output of
There you will see some remaining scopes that were leaked. Running the command
will now take significantly longer when enough of these leaked files have accumulated. When you strace PID 1, you will notice that it opens many files in /run/systemd/system. On one of our production systems with 35 days uptime, there are roughly 4000 of those leaked scope units and list-unit-files takes about 23 seconds. During that time, further ssh logins are delayed until the command terminates. When I used ; instead of & in the loop doing the ssh logins, I was unable to reproduce the issue, so it looks like it is some kind of race condition.
(I originally reported this as Debian bug #805477)
The text was updated successfully, but these errors were encountered: