Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
A non-privileged user could (easily) DOS systemd by exceeding BUS_WQUEUE_MAX #13674
systemd version the issue has been seen with
This is the version that ships with CentOS Linux release 7.7.1908 (Core)
Expected behaviour you didn't see
Unexpected behaviour you saw
Steps to reproduce the problem
We see these errors after running this find command. busctl times out during this period:
I see that the Wqueue and RQueue statically allocated buffers have been increased due to similar complaints, but I am skeptical about any solution that doesn't use something more dynamic.
This is particularly dangerous since a non-root user can prevent root crontabs from running and can prevent systemd from executing a command.
We have to put limits on everything we allocate. Note that the queue lengths are just hard limits for dynamic allocation, and the actual queues are always allocated dynamically only slightly larger than what we need at the specific moment.
Thus: these aren't limits you are supposed to ever hit, and if you do anyway, then other bad things have happened much earlier. They are safety nets, nothing more.
They have been bumped a couple of times, most recently in 83a32ea.
You are running a very old version, maybe just upgrade to something newer? (or ask your distro to backport the bumping).
Or to summarize this in simpler terms: if you manage to hit the limit IRL, then the limit needs to be bumped further for everyone, but not removed entirely, because that means run-away client code can consume resources without bounds, and we should never allow that. Failing at some point with a clear error is much better than failing too late when errors can't be reported anymore.
Anyway, let's close this for now, as this recently was bumped yet and your version is 4y old. Let's presume this is fixed.
if you run into the same issues with a current version of systemd, please report back and we can bump the limit further.
@poettering Is it just me, or is Red Hat / CentOS not really keeping very current with Systemd back-porting? I am using CentOS 7.8 which is ~reasonably current (for enterprise), and ships with systemd-219-67.el7_7.2.x86_64 but below are the values we have set for this issue (many orders of magnitude less than current)
In addition, we also have the systemd-coredump bug where if we hit E2BIG due to ProcessMax, it fails to drop privs when writing to the journal so that non-root users get false negative results when running coredumpctl on dumps that they own but were too big to store (but they should still have access to the journal entries, but dont due to this bug)
@poettering I understand there has to be a ceiling, but could we get this ceiling user-configurable, e.g. through a sysctl, rather than/in addition to baking something into the code? We'd find it very useful to be able to fix this issue ourselves without having to roll a new version of systemd.
@poettering I've been investigating further to try to understand how mounting a couple of thousand NFS mounts can trigger 1.3 million+ dbus messages and subsequent buffer space issues.
It turns out (news to me) that systemd is watching for changes to /proc/self/mountinfo and whenever there is a change in mounts, the entire mountinfo table is re-communicated from systemd to systemd-logind over dbus messages.
When you add 2,000 mount-points, for each mount-point that is added the entire table is sent again (4 property change messages per mount), and of course the table gets incrementally larger each time. I counted over 1.3 million dbus messages via a stap probe as a result.
Under normal circumstances, systemd-logind manages to keep the queue drained, but sometimes - if it is under pressure from other system events - it cannot keep up and systemd chokes. There is an exception in mount_setup_unit() whereby fstype of "autofs" will not produce these messages, but no such exception for mount-points added by /usr/sbin/automount where fstype is "nfs".
Firstly, I don't really understand why systemd needs to watch /proc/self/mountinfo and set-up mount units for any mount that appears. It would be nice to read an explanation of the benefits somewhere and maybe an option to disable this behaviour if the user doesn't need this. If it is enabled, improving the logic so that the entire mount table is not re-communicated for each new mount would be a useful optimisation.
Secondly, I was speaking with Ian Kent to try to determine if there's something that could be done in /usr/sbin/automount to make systemd ignore mount-points managed by it, but at the moment he doesn't know of any way to do that. In fact he admits he was previously unaware that systemd was setting up mount units for itself. Can we get some sort of change into systemd that would allow the automount daemon to communicate that it is responsible for the mount-points and that systemd should ignore them?
@jsmarkb which systemd version are you using?
we track mounts as "mount units", so that people can have deps on them. it's kinda at the core of what systemd does: starting up in the right order, i.e. making sure that everything X needs is done before X is started, and that prominently includes mounts. If you don't want this behaviour you don't want systemd really.
systemd-logind doesn't track mounts though, it's PID 1 only.
we generally generate bus messages for mounts coming and going and changing state. If a mount doesn't change state we should not generate events for it. If we do anyway, it would be a bug. So yes, if you start n mount points you should get O(n) messages.
If a mount is added and removed dynamically by the inlined Linux automounter, then systemd won't be able to build dependencies on it, so the subsequent mount unit that is created serves no purpose as far as I can see?
"If you don't want this behaviour you don't want systemd really". Not really much choice these days :-)
I've watched systemd send dbus messages to systemd-logind for each and every new mount. I was puzzled as to why systemd-logind was involved, but it definitely was.
Right, events ought not be generated for mounts that don't change state, but that's not what I'm witnessing here.
@poettering - I'm not sure this issue ought to be closed.
If the fix was increasing
If #10268 is a partial fix, as I pointed out 8 days ago that commit was reverted in December 2018. So effectively we're left with a closed issue and no strategic fix.
I would suggest the solution is two-fold:
What do you think? Can we reopen this issue and start prioritising the possible solutions?
I've written a blog now that describes how this issue was discovered and why auto-mounting thousands of directories in quick succession can result in millions of
The whole reason I blogged about it was because I didn't think that the problem was fully understood or fully resolved when this issue was closed.
@poettering - we still have no long-term fix for this issue. Would you like me to open a new issue or can this one be re-opened?