Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leak of scope units slowing down "systemctl list-unit-files" and delaying logins #1961

Closed
julianbrost opened this issue Nov 19, 2015 · 35 comments
Assignees

Comments

@julianbrost
Copy link

There seems to be a race condition in systemd or logind, which results in the leak of scope unit files in /run/systemd/system. This causes PID 1 to use 100% CPU and delays logins.

I am able to reproduce the issue in Debian jessie with systemd 215-17+deb8u2, Debian sid with systemd 227-1 and 228-1 as well as on Arch Linux with systemd 227-1. To reproduce it, prepare the system as follows (install libpam-systemd and add a test user with a ssh key):

apt install libpam-systemd    # not necessary on arch linux I think
adduser test    # or useradd on arch linux
ssh-keygen -t ed25519
install -d -o test -g test ~test/.ssh
install -o test -g test .ssh/id_ed25519.pub ~test/.ssh/authorized_keys
ssh test@localhost    # accept the ssh host key
reboot    # just to start with a clean state

First look at the output for this command:

ls -ld /run/systemd/system/session-*.scope*

You should see exactly one file "session-$i.scope" (seems to exist only with older versions of systemd like in Debian jessie) and one directory "session-$i.scope.d" for your current login session.

Now let the following command run for a while (you may have to vary the sleep interval depending on how fast the host is, 5 minutes should be enough to get at least a few leaked units):

while sleep 0.03; do ssh test@localhost sleep 1 & done

Then cancel that command and wait for the remaining ssh processes to terminate and look again at the output of

ls -ld /run/systemd/system/session-*.scope*

There you will see some remaining scopes that were leaked. Running the command

systemctl list-unit-files

will now take significantly longer when enough of these leaked files have accumulated. When you strace PID 1, you will notice that it opens many files in /run/systemd/system. On one of our production systems with 35 days uptime, there are roughly 4000 of those leaked scope units and list-unit-files takes about 23 seconds. During that time, further ssh logins are delayed until the command terminates. When I used ; instead of & in the loop doing the ssh logins, I was unable to reproduce the issue, so it looks like it is some kind of race condition.

(I originally reported this as Debian bug #805477)

@mattmcdowell
Copy link

I am also seeing this problem on CentOS 7. They delay is causing chef-client to fail when managing services after a month or two of uptime.

@julianbrost
Copy link
Author

Is anyone looking into this or has any hints on how to debug this further? This is pretty annoying as it forces us to reboot every ~2 months.

@keszybz
Copy link
Member

keszybz commented Dec 25, 2015

Yeah, it's easy enough to trigger. Various components fail with exec errors, but pam_systemd also fails in dbus:

Dec 25 11:33:26 rawhide sshd[8020]: Accepted publickey for test from 10.0.0.1 port 54108 ssh2: RSA SHA256:Rah0
Dec 25 11:33:27 rawhide sshd[8020]: pam_systemd(sshd:session): Failed to connect to system bus: Resource temporarily unavailable
Dec 25 11:33:27 rawhide sshd[8020]: pam_unix(sshd:session): session opened for user test by (uid=0)
Dec 25 11:33:27 rawhide sshd[8020]: pam_unix(sshd:session): session closed for user test

... but the reason for left over scope units is probably this:

Dec 25 11:32:42 rawhide sshd[30542]: Accepted publickey for test from 10.0.0.1 port 51414 ssh2: RSA SHA256:Rah0
Dec 25 11:32:43 rawhide sshd[30542]: pam_unix(sshd:session): session opened for user test by (uid=0)
Dec 25 11:32:47 rawhide sshd[30542]: pam_systemd(sshd:session): Failed to release session: Message recipient disconnected from message bus without replying
Dec 25 11:32:47 rawhide sshd[30542]: pam_unix(sshd:session): session closed for user test

or

Dec 25 11:33:13 rawhide sshd[30602]: pam_systemd(sshd:session): Failed to release session: Failed to activate service 'org.freedesktop.login1': timed out

or

Dec 25 11:33:13 rawhide sshd[30558]: pam_systemd(sshd:session): Failed to release session: Connection timed out

@keszybz
Copy link
Member

keszybz commented Dec 25, 2015

Hm, it seems that this is not just a transient failure, but systemd-logind becomes permanently fucked up:

$ journalctl -b -u systemd-logind
...
Dec 25 11:32:47 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:32:47 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:32:49 rawhide systemd[1]: Started Login Service.
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:13 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:14 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 11:33:29 rawhide systemd[1]: Started Login Service.
Dec 25 11:54:13 rawhide systemd[1]: Started Login Service.
Dec 25 11:54:53 rawhide systemd[1]: Started Login Service.
Dec 25 12:34:45 rawhide systemd[1]: Started Login Service.
Dec 25 12:39:46 rawhide systemd[1]: Started Login Service.
Dec 25 12:40:11 rawhide systemd-logind[1110]: Failed to abandon session scope: Transport endpoint is not connected
Dec 25 13:05:06 rawhide systemd[1]: Started Login Service.
Dec 25 13:12:24 rawhide systemd[1]: Started Login Service.

An example (normal) login session:

Dec 25 13:05:06 rawhide sshd[21228]: Accepted publickey for test from 10.0.0.1 port 55172 ssh2: RSA SHA256:Rah0
Dec 25 13:05:06 rawhide dbus[1097]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Dec 25 13:05:06 rawhide systemd[1]: Started Login Service.
Dec 25 13:05:32 rawhide dbus[1097]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Dec 25 13:05:32 rawhide sshd[21228]: pam_systemd(sshd:session): Failed to create session: Failed to activate service 'org.freedesktop.login1': timed out
Dec 25 13:05:32 rawhide sshd[21228]: pam_unix(sshd:session): session opened for user test by (uid=0)

so it seems that dbus things that logind is gone. But systemd thinks that logind is alive (it is, the process is still there), so when dbus requests logind, it just returns success. So the issue seems to be between dbus and logind.

@keszybz
Copy link
Member

keszybz commented Dec 25, 2015

So, running strace on logind shows that logind never even gets woken up, it only sends WATCHDOG=1 notifications to systemd.

Dec 25 11:32:48 rawhide dbus[1097]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Dec 25 11:33:13 rawhide dbus[1097]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Dec 25 11:33:29 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30015ms)
Dec 25 11:33:29 rawhide dbus[1097]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Dec 25 11:33:29 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30001ms)
Dec 25 11:33:29 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30001ms)
Dec 25 11:33:29 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30000ms)
Dec 25 11:33:29 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30000ms)
Dec 25 11:33:29 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30000ms)
Dec 25 11:33:30 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30044ms)
...
Dec 25 11:33:32 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30033ms)
Dec 25 11:33:32 rawhide dbus[1097]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30012ms)
Dec 25 11:33:54 rawhide dbus[1097]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Dec 25 11:54:13 rawhide dbus[1097]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Dec 25 11:54:38 rawhide dbus[1097]: [system] Failed to activate service 'org.freedesktop.login1': timed out
...

The question is whether this is a dbus problem, or a systemd-logind problem.

One option would be for logind (and other systemd daemons which are only accessible through dbus), to send a ping to the dbus daemon every once in a while before notifying the watchdog. After all they are useless if the connection to the dbus daemon goes down for any reason.

keszybz added a commit to keszybz/systemd that referenced this issue Dec 28, 2015
Add more information to the error messages when we fail to create
or destroy a session. When looking at logs it is important to see
which user and which sessions caused the failure.

systemd#1961
@keszybz keszybz self-assigned this Jan 4, 2016
@canoine
Copy link

canoine commented Jan 22, 2016

Meanwhile, is there a way to "clean" these abandoned scope units whitout rebooting ?
I tried to remove all the related files, all the occurences of the session numbers, restart dbus and systemd-logind, but vainly. The list keeps growing desperately, and I can't mess around with rebooting my production servers all the time.

@robojandro
Copy link

You don't actually need to reboot the hosts. @$work, we have found that it is enough to simply remove the scope files a la "rm -f /run/systemd/system/session-*.scope". The hosts that we have applied this to via cron have yet to lock up. It's early days still, but seems to do the trick.

@keszybz
Copy link
Member

keszybz commented Jan 24, 2016

I'm still working in this. I started looking through the sd-event code which resulted in a bunch of cleanups. Unfortunately without any bearing on this bug. I should have some time to work on this this weekend.

@canoine
Copy link

canoine commented Jan 25, 2016

@mazingerzeta : sadly it isn't on my systems (CentOS 7.2, using systemd 219).
I deleted the /run/systemd/system/session-.scope files, the system/session-.scope.d directories, the sessions/* files, and cleaned the user/ files, but there still are 390 abandoned session-*.scope units when I run systemctl list-units on a server running several cron tasks every minute, and restarted ten days ago.

@carlivar
Copy link

We worked around this with a cronjob on all of our systemd systems (CentOS 7 in our case):

* 2,14 * * * root /bin/rm -f /run/systemd/system/*.scope

@part-timeDev
Copy link

I am facing the same issue with our git server.
Our build server is polling for changes very frequently against all the different repos/branches using ssh and i see an increasing number of abandoned sessions of that ssh-user. i had the issue with the slowing down login process, too, but was able to fire the following command to stop the left over sessions and using it from time to time proactively at the moment

systemctl |grep "of user git" |grep "abandoned" |grep -e "-[[:digit:]]" |sed "s/\.scope.*/.scope/" |xargs systemctl stop

maybe that helps someone else until the underlying problem is finally solved.

PS: Thanks to everybody working on this issue.

@canoine
Copy link

canoine commented Feb 1, 2016

Ok, adding "systemctl stop" does the job. Thank you for the hint.

@poettering poettering added this to the v230 milestone Feb 26, 2016
@lnykryn
Copy link
Member

lnykryn commented Mar 4, 2016

At least the centos version of this bug seems to be caused by losing messages from the cgroup-agent. On my system, cgroup-agent is started every time but systemd sometimes doesn't get the message.

@dragon9783
Copy link

LGTM

@tozachroberts
Copy link

Any update on when the official fix for this may happen? We have done our best with manual scripted workarounds, but we are still feeling the pain of this every day. I am happy to share specifics of our environment if that may help. We can replicate it quite quickly.

winger007 added a commit to zstackio/zstack-utility that referenced this issue Apr 23, 2016
This is a bug from systemd/systemd#1961

Signed-off-by: Mei Lei <meilei007@gmail.com>
winger007 added a commit to zstackio/zstack-utility that referenced this issue Apr 23, 2016
This is a bug from systemd/systemd#1961

Signed-off-by: Mei Lei <meilei007@gmail.com>
@poettering
Copy link
Member

OK, so I figured out one part of the puzzle I think: dbus-daemon is broken handling incoming messages where there's first a message without auxiliary fd in the socket buffer, which is then immediately followed by one with auxiliary fd. The kernel will already return the auxiliary fd with the first message, and dbus-daemon takes that as broken message and will abort the connection.

@poettering
Copy link
Member

OK, it's slightly different even, and I filed a bug to dbus-daemon now:

https://bugs.freedesktop.org/show_bug.cgi?id=95263

@poettering
Copy link
Member

Here's another related dbus bug, where I just posted a patch:

https://bugs.freedesktop.org/show_bug.cgi?id=95264

(The issue is that we might end up losing cgroups agent messages much earlier than necessary, because of dbus-daemon's low listen backlog of 30)

@poettering
Copy link
Member

I prepped a set of fixes to logind in #3190 now, but they won't fix this issue fully – that really needs to be fixed in dbus, see https://bugs.freedesktop.org/show_bug.cgi?id=95263

@poettering poettering removed this from the v230 milestone May 4, 2016
poettering added a commit to poettering/systemd that referenced this issue May 4, 2016
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd#1961
lnykryn pushed a commit to lnykryn/systemd-rhel that referenced this issue May 10, 2016
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd/systemd#1961

Cherry-picked from: d8fdc62
Resolves: #1305608
@lnykryn
Copy link
Member

lnykryn commented May 10, 2016

I am sorry for abusing the upstream issue tracker (I have not found related bug in centos mantis), but for those centos users ( @mattmcdowell, @canoine @carlivar ), would you be willing to try this test build? https://people.redhat.com/lnykryn/systemd/bz1305608/

lnykryn pushed a commit to lnykryn/systemd-rhel that referenced this issue Jun 6, 2016
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd/systemd#1961

Cherry-picked from: d8fdc62
Resolves: #1305608
lnykryn pushed a commit to lnykryn/systemd-rhel that referenced this issue Jul 27, 2016
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd/systemd#1961

Cherry-picked from: d8fdc62
Resolves: #1305608
@stanhu
Copy link

stanhu commented Aug 5, 2016

@poettering Thanks for your work on this. We were bit by this problem hard on GitLab.com, which is running Ubuntu 16.04. We applied your latest patch (https://bugs.freedesktop.org/show_bug.cgi?id=95263#c13), and this seems to have made the problem go away for now. A customer also reported experiencing this issue on Red Hat Enterprise Linux 7, which runs systemd 219.

I would encourage that these patches make it into stable releases ASAP:

Ubuntu bug thread: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1591411
RedHat bug thread: https://bugzilla.redhat.com/show_bug.cgi?id=1271394

@isodude
Copy link

isodude commented Nov 2, 2016

Just got hit by this, what I find odd is that systemd(pid 1) takes forever to go through all of the scope files. Is it not possible to make the loop tighter, or even make it parallell? Right now I'm forced to remove all the files by hand instead.

openat(22, "session-79467.scope.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW|O_CLOEXEC) = 23
fstat(23, {st_mode=S_IFDIR|0755, st_size=140, ...}) = 0
fcntl(23, F_GETFL)                      = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW)
fcntl(23, F_SETFD, FD_CLOEXEC)          = 0
getdents(23, /* 7 entries */, 32768)    = 304
getdents(23, /* 0 entries */, 32768)    = 0
close(23)                               = 0

Or are there patches that I missed that corrects this? I am running systemd-219-19.el7_2.13.

@Burebista1404
Copy link

Burebista1404 commented Dec 2, 2016

Hi everyone, if you do need to fix this ASAP, then here are the commands that you can type and clean/clear all your scope files and loaded active abandoned sessions:

Cleanup abandoned sessions from systemd:

Delete session files

find /run/systemd/system -name "session-*.scope" -delete

Delete session directories

rm -rf /run/systemd/system/session*scope*

Remove the abandoned sessions

systemctl | grep "abandoned" | grep -e "-[[:digit:]]" | sed "s/\.scope.*/.scope/" | xargs systemctl stop

@iDemonix
Copy link

@Burebista1404, your session directories command has fallen victim to Markdown italics, could you wrap the commands in quotes

so they appear like this?

@Burebista1404
Copy link

@iDemonix I've modified as requested. Thanks.

@go-fish
Copy link

go-fish commented Jun 19, 2017

systemctl | grep "abandoned" | grep -e "-[[:digit:]]" | sed "s/.scope.*/.scope/" | xargs systemctl stop

this command may be kill the processes under the session scope, so be careful to use this command

apovichuk-stratoscale pushed a commit to Stratoscale/systemd that referenced this issue Jun 19, 2017
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd/systemd#1961

Cherry-picked from: d8fdc62
Resolves: #1305608
apovichuk-stratoscale pushed a commit to Stratoscale/systemd that referenced this issue Jun 20, 2017
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd/systemd#1961

Cherry-picked from: d8fdc62
Resolves: #1305608
Werkov pushed a commit to Werkov/systemd that referenced this issue Jun 22, 2017
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd#1961

(cherry picked from commit d8fdc62)

[mkoutny: resolved conflicts:
        src/cgroups-agent/cgroups-agent.c
                adjust context
        src/core/dbus.c
                replace MANAGER_IS_{USER,SYSTEM} macros
                adjust for missing 4afd334
                adjust context
        src/core/manager.c
                adjust context
                replace MANAGER_IS_SYSTEM macro]
[mkoutny: bsc#1045384]
fbuihuu pushed a commit to openSUSE/systemd that referenced this issue Jul 27, 2017
…(v228)

dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd/systemd#1961

(cherry picked from commit d8fdc62)

[mkoutny: resolved conflicts:
        src/cgroups-agent/cgroups-agent.c
                adjust context
        src/core/dbus.c
                replace MANAGER_IS_{USER,SYSTEM} macros
                adjust for missing 4afd334
                adjust context
        src/core/manager.c
                adjust context
                replace MANAGER_IS_SYSTEM macro]
[mkoutny: fixes bsc#1045384]
[mkoutny: fixes bsc#1047379]
@diffcity
Copy link

what works for me here, on CentOS 7:
systemctl restart dbus-org.freedesktop.login1.service
systemctl restart systemd-logind.service
systemctl daemon-reload
wait a minute or two to see the df value change

jrandall added a commit to wtsi-hgi/hgi-systems that referenced this issue Nov 3, 2017
glennpratt added a commit to glennpratt/systemd that referenced this issue Dec 7, 2017
@joshbmarshall
Copy link

@diffcity Your workaround is much cleaner than others, and doesn't scare me nearly as much as manually deleting files directly. My servers last about 30 days before filling up /run and your commands clear it out. Will be putting in cron to run daily

n1zyy pushed a commit to n1zyy/systemd that referenced this issue Nov 14, 2018
glennpratt added a commit to acquia/systemd that referenced this issue Jan 11, 2019
caobinxin pushed a commit to caobinxin/systemd-lx that referenced this issue Oct 19, 2019
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.

This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.

This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:

  1. service notification messages  (SD_EVENT_PRIORITY_NORMAL-7)
  2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
  3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)

This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.

Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
systemd/systemd#1961

Cherry-picked from: d8fdc62037b5b0a9fd603ad5efd6b49f956f86b5
Resolves: #1305608
@h0tw1r3
Copy link
Contributor

h0tw1r3 commented Feb 10, 2020

Redhat doesn't seem to have backported a fix to this in EL7 yet.

Deployed a cron job of the following to all our CentOS 7 servers:

systemctl -t session | awk '/ abandoned /{print $1}' | xargs -r systemctl stop >/dev/null

@aleksey-ru52
Copy link

aleksey-ru52 commented Feb 17, 2020

well

CentOS 7.x & RHEL 7.x
# systemctl -t session
Unknown unit type or load state 'session'.
# systemctl --version
systemd 219

suppose, they move session controlling to loginctl, but i can't find out how to use it this way

@h0tw1r3
Copy link
Contributor

h0tw1r3 commented Sep 29, 2020

Update to my previous hack for Centos 7. Only stops an abandoned scope if it has no children.

#!/bin/bash

system_show () {
    local t="${1##*.}"
    systemctl show "$1" --no-pager --no-legend --no-ask-password | sed -e 's/"/\\"/g' -e 's/\([^=]*\)=\(.*\)/'"$t"'_\1="\2"/'
}

for unit in $(systemctl list-units --state=abandoned -t scope --no-legend --no-pager --no-ask-password | cut -d' ' -f1) ; do
    eval "$(system_show "$unit")"
    [ "$(systemd-cgls "$scope_ControlGroup" 2>/dev/null | wc -l)" -eq 1 ] && systemctl stop "$unit"
done

Xmister pushed a commit to Xmister/systemd that referenced this issue Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests