Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultTaskMax=512 causes lots of fallout #3211

Closed
1 task
martinpitt opened this issue May 7, 2016 · 10 comments
Closed
1 task

DefaultTaskMax=512 causes lots of fallout #3211

martinpitt opened this issue May 7, 2016 · 10 comments
Labels

Comments

@martinpitt
Copy link
Contributor

Submission type

  • [X ] Bug report
  • Request for enhancement (RFE)

NOTE: Do not submit anything other than bug reports or RFEs via the issue tracker!

systemd version the issue has been seen with

≥ 228

Version 228 introduced a DefaultTaskMax=512 limit in 9ded9cd14. Now that this has been released to the world (Ubuntu 16.04 LTS) we're starting to get a lot of fallout from that. We are getting reports about failing MySQL/Percona (https://launchpad.net/bugs/1578080) and RabbitMQ (from our data center admins) even under moderate workloads (as these use a lot of threads), failures of containers (as they all hang off of lxc.service), package builds (https://bugs.debian.org/823530), etc.

In retrospect, having a default limit there was not such a good idea after all: 512 is way too much for most "simple" services, and it's way too little for others such as the ones mentioned above. There is also no particular rationale about "512", so even if we'd bump it to 1024 we'd just make the limit even less useful while still breaking software.

As a contingency plan we'll disable that default at least for the stable distro release, but I think also for devel: It is both much safer and also much more effective in terms of guarding against berserk programs/bugs/unintended fork bombs etc. to set limits in units individually. Once someone looks at one, this is then a great time to also flip on the other resource and privilege limitations that systemd offers, such as CapabilityBoundingSet=, PrivateDevices=, ProtectSystem= etc.

Are you set on your opinion about this default limit, i. e. should we keep this as a downstream change? Or should we revert this upstream too?

@poettering
Copy link
Member

poettering commented May 7, 2016

Well, we really should place a limit as default to something. I am absolutely and very sure about that, because we should limits on all resources. Also, I think as long as the number of services where this is a problem is small (let's say < 20), then the right approach is to alter those services and be done with it.

This is in fact what we did in Fedora, where to my knowledge the issue is of the past.

And I strongly encourage you to leave this in place, simply to make sure that unit files stay portable between distros.

As long as it really is just mysql, lxc, rabbitmq, then please fix those, they should set their own limits anyway.

Or to say this differently: the default of 512 is really just a default. It's not supposed to cover all services, and services really should override this individually if there's the need to.

So yes, I am very sure we should keep the default in, and as long as the number of services is low I think all is good.

@poettering
Copy link
Member

Or are these legacy sysv scripts and you cannot alter TasksMax= for them? If so, i'd be willing to set TasksMax for them to a higher value in sysv-generator, after all that stuff is just compat anyway.

@martinpitt
Copy link
Contributor Author

Lennart Poettering [2016-05-07 1:56 -0700]:

Well, we really should place a limit as default to something. I am absolutely and very sure about that, because we should limits on all resources. Also, I think as long as the number of services where this is a problem is small (let's say < 20), then the right approach is to alter those services and be done with it.

In theory, yes. But we don't have an exhaustive list yet. Hence my suggestion (on the downstream side) to leave this on for the development series but disable it for the last stable release.

As long as it really is just mysql, lxc, rabbitmq, then please fix those, they should set their own limits anyway.

It's not just these, I'm afraid. We got reports about stuff failing from cron (as that's running from cron.service's cgroups), percona, third-party archives (which we can't fix quickly), and others.

Or to say this differently: the default of 512 is really just a default. It's not supposed to cover all services, and services really should override this individually if there's the need to.

Fair enough, but I wonder how useful that really is -- for really limiting resources and guarding against berserk processes this limit is way too high, so in the end for a really sensible resource lockdown services need individual unit configuration anyway.

Or are these legacy sysv scripts and you cannot alter TasksMax= for them?

No, at least in Debian/Ubuntu most of the above have native .services. But they were written before 228 which introduced this behaviour change.

@poettering
Copy link
Member

This still doesn't sound too bad, and I'd really encourage you to leave this on. You know, this feature solves a long-standing real security problem on Linux: the vulnerability to fork bombs which create new processes exponentially. Until we enabled this feature by default any exploited system service could be used to trivially DoS the system simply by fork-bombing it. It's pretty sad that Linux was vulnerable to this since pretty much time began and we only gained protection to it this recently...

@martinpitt
Copy link
Contributor Author

This is in fact what we did in Fedora, where to my knowledge the issue is of the past.

I checked a few: http://pkgs.fedoraproject.org/cgit/rpms/mariadb.git/tree/mysql.service.in and http://pkgs.fedoraproject.org/cgit/rpms/rabbitmq-server.git/tree/rabbitmq-server.service have no TaskMax= (nor any other limit adjustment), and http://pkgs.fedoraproject.org/cgit/rpms/lxc.git/tree/ and http://pkgs.fedoraproject.org/cgit/rpms/cronie.git/tree/ do not have a patch either (systemd units are shipped upstream). So it looks like this would still affect Fedora in the same way.

I do agree that it would be nice to impose default limits to guard against fork bombs, but

  • restricting the number of tasks alone does not actually help much: 512 threads with unrestricted CPU and RAM are still enough to bring pretty much every machine to their knees,
  • any meaningful limit breaks existing services which aren't aware of this (remember, the default limit was only introduced less than 6 months ago)

TBH I don't see a way to introduce meaningful but safe limits globally, it needs touching the actual units. :-(

@poettering
Copy link
Member

I posted #3753 now, which considerably bumps TasksMax= for services. With that PR the new limit would be at 15% of the the system-wide kernel default limit (which is 32K, hence effectively 4915 is the new default, i.e. a 9x increase from 512)

@drewthaler
Copy link

Having recently been a victim of (and debugged) this issue, which was affecting new Ubuntu 16.04 machines added to our build farm, I'd say the main problem is not so much the default limit as the absolute invisibility of it.

We hit the limit under sshd.service, since practically everything we do is under sshd.service for a headless box. It was clear that some limit was being enforced, because we would get "No more processes" on tty, but the reason why it was happening was a mystery. All the traditional indicators like ulimit, sysctl, and various /proc entries showed that we should have been able to create lots of processes. syslog did not show anything. Even once we suspected systemd and looked at journalctl, that didn't show anything.

A simple syslog pointing us in the right direction would have gone a long way.

@poettering
Copy link
Member

@drewthaler ssh from all cases actually should not be affected, as every successful ssh login is done in its own process that is part of the session scope unit, not the sshd service unit anymore.

Note that "systemctl status sshd" shows you the number of tasks currently in the service plus the enforced limit.

We unfortunately don't get any event from the kernel when the limit was hit. It would be good if we would, and we could log this then...

@drewthaler
Copy link

drewthaler commented Jul 27, 2016

@poettering Is the switch to session scope unit a new behavior since version 229? In 229 which comes with stock Ubuntu 16.04, all processes under sshd are definitely all counted toward the 512 limit. We could hit the limit with both a lot of individual ssh connections (which is how we first encountered it), and with a controlled fork bomb under a single connection (our eventual test case while debugging).

Setting TasksMax=infinity in sshd.service fixed both cases.

@drewthaler
Copy link

@poettering Ah-ha. Your comment made me dig further -- we had disabled PAM in ssh (for speed, we do a lot of small connections), which seems to mean that the authenticated connections didn't get put into a session scope unit. That's why they were all under sshd.service rather than in their own unit. So it is specific to our configuration after all.

Even with PAM enabled we were still hitting this limit, though, as our farm machines can get hit by a lot of incoming ssh connections at times (like the start of a full rebuild) and auth isn't instantaneous. I'm not too bothered by the limit as long as it is configurable -- would be nice to make it more discoverable, though, since this introduces a new and mysterious reason for fork to fail.

jprvita pushed a commit to endlessm/systemd that referenced this issue Mar 5, 2018
… set it to 512"

This reverts commit 9ded9cd.

Introducing a default limit on number of threads broke a lot of software which
regularly needs more, such as MySQL and RabbitMQ, or services that spawn off an
indefinite number of subtasks that are not in a scope, like LXC or cron.

15% is way too much for most "simple" services, and it's too little for others
such as the ones mentioned above. There is also no particular rationale about
any particular global limit, so even if we'd bump it higher we'd just make the
limit even less useful while still breaking software.

It is both much safer and also much more effective in terms of guarding against
berserk programs/bugs/unintended fork bombs etc. to set limits in units
individually. Once someone looks at one, this is then a great time to also flip
on the other resource and privilege limitations that systemd offers.

Bug: systemd/systemd#3211
Bug-Debian: https://bugs.debian.org/823530
Bug-Ubuntu: https://launchpad.net/bugs/1578080
markan added a commit to chef-cookbooks/enterprise-chef-common that referenced this issue May 9, 2018
Systemd 228 reduced the max number of tasks allowed to 512, which when
exceeded causes solr to fail with the somewhat opaque message
'java.lang.OutOfMemoryError: unable to create new native
thread'. Since the runit process group encompasses all of the chef
services, that isn't too hard to hit.

SLES-12 and Ubuntu 16.04 appear to include this version of Systemd. In
particular, our SLES-12 tester was hitting this during pedant runs,
and all the search tests were imploding in our face.

Thanks to @nsdavidson for the detective work that got to the root cause.

References:
The helpful post that got us on the right track:
https://www.elastic.co/blog/we-are-out-of-memory-systemd-process-limits

The systemd issue created:
systemd/systemd#3211

Documentation on the TasksMax setting:
man systemd.resource-control
markan added a commit to chef-cookbooks/enterprise-chef-common that referenced this issue May 10, 2018
Systemd 228 reduced the max number of tasks allowed to 512, which when
exceeded causes solr to fail with the somewhat opaque message
'java.lang.OutOfMemoryError: unable to create new native
thread'. Since the runit process group encompasses all of the chef
services, that isn't too hard to hit.

SLES-12 and Ubuntu 16.04 appear to include this version of Systemd. In
particular, our SLES-12 tester was hitting this during pedant runs,
and all the search tests were imploding in our face.

Thanks to @nsdavidson for the detective work that got to the root cause.

References:
The helpful post that got us on the right track:
https://www.elastic.co/blog/we-are-out-of-memory-systemd-process-limits

The systemd issue created:
systemd/systemd#3211

Documentation on the TasksMax setting:
man systemd.resource-control

Signed-off-by: Mark Anderson <mark@chef.io>
markan added a commit to chef-cookbooks/enterprise-chef-common that referenced this issue May 11, 2018
Systemd 228 reduced the max number of tasks allowed to 512, which when
exceeded causes solr to fail with the somewhat opaque message
'java.lang.OutOfMemoryError: unable to create new native
thread'. Since the runit process group encompasses all of the chef
services, that isn't too hard to hit.

SLES-12 and Ubuntu 16.04 appear to include this version of Systemd. In
particular, our SLES-12 tester was hitting this during pedant runs,
and all the search tests were imploding in our face.

Thanks to @nsdavidson for the detective work that got to the root cause.

References:
The helpful post that got us on the right track:
https://www.elastic.co/blog/we-are-out-of-memory-systemd-process-limits

The systemd issue created:
systemd/systemd#3211

Documentation on the TasksMax setting:
man systemd.resource-control
markan added a commit to chef-cookbooks/enterprise-chef-common that referenced this issue May 14, 2018
Systemd 228 reduced the max number of tasks allowed to 512, which when
exceeded causes solr to fail with the somewhat opaque message
'java.lang.OutOfMemoryError: unable to create new native
thread'. Since the runit process group encompasses all of the chef
services, that isn't too hard to hit.

SLES-12 and Ubuntu 16.04 appear to include this version of Systemd. In
particular, our SLES-12 tester was hitting this during pedant runs,
and all the search tests were imploding in our face.

Thanks to @nsdavidson for the detective work that got to the root cause.

References:
The helpful post that got us on the right track:
https://www.elastic.co/blog/we-are-out-of-memory-systemd-process-limits

The systemd issue created:
systemd/systemd#3211

Documentation on the TasksMax setting:
man systemd.resource-control

Signed-off-by: Mark Anderson <mark@chef.io>
wmfgerrit pushed a commit to wikimedia/operations-puppet that referenced this issue Nov 2, 2018
For both varnish frontends and backends, ensure that no limit on the
number of active tasks is enforced. Without this, systemd limits the
number of tasks per unit to 15% of kernel.pid_max because of
DefaultTaskMask. See systemd/systemd#3211

Bug: T208574
Change-Id: I266cbce20c02a091a06b69eeb654ff8076eb5bd1
jprvita pushed a commit to endlessm/systemd that referenced this issue Feb 6, 2020
… set it to 512"

This reverts commit 9ded9cd.

Introducing a default limit on number of threads broke a lot of software which
regularly needs more, such as MySQL and RabbitMQ, or services that spawn off an
indefinite number of subtasks that are not in a scope, like LXC or cron.

15% is way too much for most "simple" services, and it's too little for others
such as the ones mentioned above. There is also no particular rationale about
any particular global limit, so even if we'd bump it higher we'd just make the
limit even less useful while still breaking software.

It is both much safer and also much more effective in terms of guarding against
berserk programs/bugs/unintended fork bombs etc. to set limits in units
individually. Once someone looks at one, this is then a great time to also flip
on the other resource and privilege limitations that systemd offers.

Bug: systemd/systemd#3211
Bug-Debian: https://bugs.debian.org/823530
Bug-Ubuntu: https://launchpad.net/bugs/1578080
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants