Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Amend documentation for LimitNPROC= #23242

Merged
merged 1 commit into from
May 5, 2022

Conversation

jakoblell
Copy link
Contributor

PR details

The configuration item LimitNPROC= is often not doing what it is expected to do (see details below). Since systemd already contains a better way (TasksMax= instead of LimitNPROC=, based on cgroups instead of rlimit) of limiting the number of processes launched by a service, no actual code changes are needed. This PR will amend the documentation and recommend using TasksMax= instead of LimitNPROC=.

Problem background/additional information

The LimitNPROC item in the service definition will directly be translated to a corresponding prlimit call to set RLIMIT_NPROC to the configured value. However, the RLIMIT_NPROC will limit the number of processes of a given real user ID (globally on the whole system) and not the number of processes which can be created (forked) by the started service. Please also note that when the service is running as root, the RLIMIT_NPROC setting is not actually enforced by the kernel. This means that the LimitNPROC setting is only actually useful when the service has a dedicated UID and switches to that UID after startup. In all other cases (service is running as root or with an unprivileged default account like nobody shared among multiple services), the LimitNPROC setting will not work as expected.

Additional issues when running in an LXC container

When running under LXC, the RLIMIT_NPROC limit is actually enforced by the kernel even when the service is running as root (and not ignored as it would be on a non-virtualized system). This results in some service configurations to fail when running under LXC, most notably with OpenVPN (see for example #6011, many more bug reports have been opened about this issue with running OpenVPN in LXC).

Summary of cases in which LimitNPROC= will misbehave:

  • Root service without LXC: The LimitNPROC= limit is silently ignored since root is not affected by RLIMIT_NPROC.
  • Root service in LXC container: The LimitNPROC= limit will count all processes running as root in the container and may already be exhausted at the time the service is starting.
  • Service dropping privileges to shared unprivileged account (i.e. nobody for many Linux distributions): All processes using the same shared account will be counted for the LimitNPROC= limit.

How to reproduce the issue with a simple test service:

/root/limitnproc.c:

#include <syslog.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

int main(){
  syslog(LOG_ERR, "limitnproc testservice starting");
  for(int i=0;i<10;i++){
    errno = 0;
    int pid = fork();
    if(pid < 0){
      syslog(LOG_ERR, "limitnproc: fork %d failed with errno=%d: %s", i, errno, strerror(errno));
    } else if(pid == 0){
      syslog(LOG_ERR, "limitnproc: child %d", i);
      sleep(10);
      exit(0);
    } else{
      syslog(LOG_ERR, "limitnproc: parent %d", i);
    }
  }
  syslog(LOG_ERR, "limitnproc: all done, sleeping to keep this program alive");
  sleep(3600);
}

/etc/systemd/system/limitnproc.service:

[Unit]
Description=limitnproc test service

[Service]
Type=simple
LimitNPROC=3
ExecStart=/root/limitnproc

[Install]
WantedBy=multi-user.target

Then compile it (cc -o /root/limitnproc /root/limitnproc.c) and run systemctl daemon-reload && systemctl restart limitnproc to start it. On a non-virtualized system it will launch 10 child processes in parallel (in spite of the LimitNPROC=3 limit), see the following syslog output:

May  1 18:51:52 dummy limitnproc: limitnproc: parent 0
May  1 18:51:52 dummy limitnproc: limitnproc: parent 1
May  1 18:51:52 dummy limitnproc: limitnproc: child 0
May  1 18:51:52 dummy limitnproc: limitnproc: parent 2
May  1 18:51:52 dummy limitnproc: limitnproc: parent 3
May  1 18:51:52 dummy limitnproc: limitnproc: child 2
May  1 18:51:52 dummy limitnproc: limitnproc: child 3
May  1 18:51:52 dummy limitnproc: limitnproc: parent 4
May  1 18:51:52 dummy limitnproc: limitnproc: parent 5
May  1 18:51:52 dummy limitnproc: limitnproc: child 4
May  1 18:51:52 dummy limitnproc: limitnproc: parent 6
May  1 18:51:52 dummy limitnproc: limitnproc: parent 7
May  1 18:51:52 dummy limitnproc: limitnproc: parent 8
May  1 18:51:52 dummy limitnproc: limitnproc: child 5
May  1 18:51:52 dummy limitnproc: limitnproc: child 8
May  1 18:51:52 dummy limitnproc: limitnproc: parent 9
May  1 18:51:52 dummy limitnproc: limitnproc: all done, sleeping to keep this program alive
May  1 18:51:52 dummy limitnproc: limitnproc: child 9
May  1 18:51:52 dummy limitnproc: limitnproc: child 6
May  1 18:51:52 dummy limitnproc: limitnproc: child 1
May  1 18:51:52 dummy limitnproc: limitnproc: child 7

When running the same service in an LXC container it will fail to start any child processes at all since the limit of 3 processes owned by root is already exhausted by other processes on the system (with LXC the limit is actually enforced even for root):

May  1 18:56:46 dummy limitnproc: limitnproc: fork 0 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 1 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 2 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 3 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 4 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 5 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 6 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 7 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 8 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: fork 9 failed with errno=11: Resource temporarily unavailable
May  1 18:56:46 dummy limitnproc: limitnproc: all done, sleeping to keep this program alive

enforced if the service is running as root (and not dropping privileges). To overcome these limitations and
actually limit the number of processes launched by a service, <varname>TasksMax=</varname> (see
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>)
should be used instead of <varname>LimitNPROC=</varname>.</para>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding this makes sense, but I'd tone this down a bit. i.e. not say "should be used", but say something like "is typically a better choice".

Because we never know what people use LIMIT_NPROC for. For example, since it remains set across setuid() people might want to set it so that it still applies after a daemon changes uids later on.

hence, recommending people to use TasksMax= is good, but given we know so little abou thow people's usecase it should be a recommendation, not more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback, I've changed the wording accordingly (and replaced the existing commit as recommended in docs/CONTRIBUTING.md).

Regarding daemons calling setuid: The TasksMax= limit will still apply after that, setuid will not take the process out of the cgroup.

@poettering poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label May 3, 2022
@yuwata yuwata added please-review and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels May 3, 2022
@keszybz keszybz merged commit 14736ab into systemd:main May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants