Isolation: added support for per-application cgroups. #734

ac000 · 2022-07-18T21:42:02Z

Firstly, this is not to be confused with CLONE_NEWCGROUP which unit
already supports and is related to namespaces. To re-cap, namespaces
allow processes to have different views of various parts of the system
such as filesystem mounts, networking, hostname etc.

Whereas cgroup0 is a Linux kernel facility for collecting a bunch of
processes together to perform some task on the group as a whole, for
example to implement resource limits.

There are two parts to cgroup, the core part of organising processes
into a hierarchy and the controllers which are responsible for enforcing
resource limits etc.

There are currently two versions of the cgroup sub-system, the original
cgroup and a version 21 introduced in 3.16 (August 2014) and marked
stable in 4.5 (March 2016).

This commit supports the cgroup V2 API and implements the ability to
place applications into their own cgroup on a per-application basis.
You can put them each into their cgroup or you can group some together.
The ability to set resource limits can easily be added in future.

The initial use case of this would be to aid in observability of unit
applications which becomes much easier if you can just monitor them on a
per cgroup basis.

One thing to note about cgroup, is that unlike namespaces which are
controlled via system calls such as clone(2) and unshare(2), cgroups are
setup and controlled through the cgroupfs pseudo-filesystem.

cgroup is Linux only and this support will only be enabled if configure
finds the cgroup2 filesystem mount, e.g

cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)

To make use of this in unit a new "cgroup" section has been added to the
isolation configuration.

e.g

  "applications": {
      "python": {
          "type": "python",
          "processes": 5,
          "path": "/opt/unit/unit-cgroup-test/",
          "module": "app",

          "isolation": {
              "cgroup": {
                  "name": "unit/python"
              }
          }
      }
  }

Will place the "python" application into its own cgroup under
CGROUP_ROOT/unit/python

If you know of an already existing cgroup where you's like to place it,
you can, e.g

"name": "system.slice/unit/python"

Where system.slice has already been created by systemd and may already
have some overall system limits applied which would also apply to unit.
Limits apply down the hierarchy and lower groups can't exceed the
previous group limits.

So what does this actually look like? Lets take the unit-calculator
application2 and have each of its applications placed into their own
cgroup. If we give each application a new section like

  "isolation": {
      "cgroup": {
          "name": "unit/unit-calculator/add"
      }
  }

changing the name for each one, we can visualise the result with the
systemd-cgls command, e.g

  │   └─session-5.scope (#4561)
  │     ├─  6667 sshd: andrew [priv]
  │     ├─  6684 sshd: andrew@pts/0
  │     ├─  6685 -bash
  │     ├─ 12632 unit: main v1.28.0 [/opt/unit/sbin/unitd --control 127.0.0.1:808>
  │     ├─ 12634 unit: controller
  │     ├─ 12635 unit: router
  │     ├─ 13550 systemd-cgls
  │     └─ 13551 less
  ├─unit (#4759)
  │ └─unit-calculator (#5037)
  │   ├─subtract (#5069)
  │   │ ├─ 12650 unit: "subtract" prototype
  │   │ └─ 12651 unit: "subtract" application
  │   ├─multiply (#5085)
  │   │ ├─ 12653 unit: "multiply" prototype
  │   │ └─ 12654 unit: "multiply" application
  │   ├─divide (#5101)
  │   │ ├─ 12671 unit: "divide" prototype
  │   │ └─ 12672 node divide.js
  │   ├─sqroot (#5117)
  │   │ ├─ 12679 unit: "sqroot" prototype
  │   │ └─ 12680 /home/andrew/src/unit-calculator/sqroot/sqroot
  │   └─add (#5053)
  │     ├─ 12648 unit: "add" prototype
  │     └─ 12649 unit: "add" application

We can see that the main unit processes are in the same cgroup as the
shell from where they were started, by default child process are placed
into the same cgroup as the parent.

Then we can see that each application has been placed into its own
cgroup.

The cgroups are removed on shutdown or as required on reconfiguration.

ac000 · 2022-07-18T21:44:12Z

One open question is how to handle failures to create or add processes to cgroups, I see three options.

a) Ignore failures and carry on
b) Fail hard
c) Carry on but log failures

I'm leaning towards c.

alejandro-colomar · 2022-07-18T22:39:18Z

If a client can trigger the failure, we shouldn't fail, but I don't expect that to be possible. This is a [re]configuration-time failure, right? I'd just fail hard (and that's what we do in most cases, AFAIK) (by fail hard I mean don't allow the reconfiguration and keep the old one).

src/nxt_cgroup.c

auto/cgroup

src/nxt_cgroup.c

src/nxt_cgroup.h

alejandro-colomar · 2022-07-18T23:11:44Z

If a client can trigger the failure, we shouldn't fail, but I don't expect that to be possible. This is a [re]configuration-time failure, right? I'd just fail hard (and that's what we do in most cases, AFAIK).

but do you have a diff to see what would mean fail hard vs log & continue?

ac000 · 2022-07-18T23:47:31Z

On Mon, 18 Jul 2022 15:39:29 -0700 Alejandro Colomar ***@***.***> wrote: If a client can trigger the failure, we shouldn't fail, but I don't expect that to be possible. This is a [re]configuration-time failure, right? I'd just fail hard (and that's what we do in most cases, AFAIK).

Yes, start up and reconfigure is when cgroups could be created/populated. Actually I haven't noticed if there is any process monitoring code, so say if an application process dies, unit would re-create it... then it would become a runtime failure, but as I say I haven't noticed that there is any such code.

ac000 · 2022-07-19T00:03:30Z

On Mon, 18 Jul 2022 15:49:00 -0700 Alejandro Colomar ***@***.***> wrote: @alejandro-colomar commented on this pull request. > +static nxt_int_t +nxt_cgroup_create(const nxt_task_t *task, nxt_process_t *process) +{ + char cgroot[PATH_MAX]; + char cgpath[PATH_MAX]; + + if (task->thread->runtime->type != NXT_PROCESS_MAIN + || nxt_process_type(process) != NXT_PROCESS_PROTOTYPE) { + return NXT_OK; + } + + if (process->isolation.cgroup.name == NULL) { + return NXT_OK; + } + + nxt_make_cgpaths(process, cgpath, sizeof(cgpath), cgroot, sizeof(cgroot)); I have a longstanding concern about this kind of sizeof usage:

You know, I might just get rid of that function, I wasn't particularly happy with it myself and only the cleanup function actually needs both cgpath and cgroot.

ac000 · 2022-07-21T00:14:56Z

This updated commit should address all the above feedback (except the error handling).

Notable changes are

The removal of nxt_make_cgpaths(), it was introduced for a problem that no longer exists.
The auto/cgroup script has been re-worked to make it POSIX compliant.

src/nxt_cgroup.c

src/nxt_process.c

src/nxt_cgroup.c

src/nxt_cgroup.h

src/nxt_conf_validation.c

src/nxt_process.c

ac000 · 2022-08-13T15:46:04Z

Apart from most the above, the biggest change here is to unconditionally include the cgroup_cleanup member in nxt_process_isolation_t, that way we can get rid of the #if/#endif from nxt_main_process_cleanup().

src/nxt_cgroup.c

ac000 · 2022-12-09T14:15:15Z

Rebased with master

ac000 · 2022-12-09T14:41:46Z

Changes :-

PATH_MAX -> NXT_MAX_PATH_LEN
Better termination of the prototype processes in case of error

alejandro-colomar · 2022-12-09T17:48:11Z

What do you think about this simplification? Since we know truncation is not possible, it makes sense to simply not check for it. Let's reconsider; we almost shipped a bug that was in code supposedly preventing bugs... Simple code has less bugs. And given that snprintf(3) is so misdesigned that it's hard to use it correctly, I prefer going for stpcpy(3).

However, if we apply this patch, I'd write a clear note in the commit message that truncation is not possible, and detail that we already check for it in the config validation code (and BTW, it makes sense to me to validate only once).

Cheers,
Alex

diff --git a/src/nxt_cgroup.c b/src/nxt_cgroup.c
index 20497a45..0d1ec4f0 100644
--- a/src/nxt_cgroup.c
+++ b/src/nxt_cgroup.c
@@ -8,9 +8,9 @@
 #include <nxt_cgroup.h>
 
 
-static int nxt_mk_cgpath_relative(nxt_task_t *task, const char *dir,
+static char *nxt_mk_cgpath_relative(nxt_task_t *task, const char *dir,
     char *cgpath);
-static nxt_int_t nxt_mk_cgpath(nxt_task_t *task, const char *dir,
+static char *nxt_mk_cgpath(nxt_task_t *task, const char *dir,
     char *cgpath);
 
 
@@ -18,6 +18,7 @@ nxt_int_t
 nxt_cgroup_proc_add(nxt_task_t *task, nxt_process_t *process)
 {
     int        len;
+    char       *p;
     char       cgprocs[PATH_MAX];
     FILE       *fp;
     nxt_int_t  ret;
@@ -29,8 +30,8 @@ nxt_cgroup_proc_add(nxt_task_t *task, nxt_process_t *process)
         return NXT_OK;
     }
 
-    ret = nxt_mk_cgpath(task, process->isolation.cgroup.path, cgprocs);
-    if (nxt_slow_path(ret == NXT_ERROR)) {
+    p = nxt_mk_cgpath(task, process->isolation.cgroup.path, cgprocs);
+    if (nxt_slow_path(p == NULL)) {
         return NXT_ERROR;
     }
 
@@ -39,13 +40,7 @@ nxt_cgroup_proc_add(nxt_task_t *task, nxt_process_t *process)
         return NXT_ERROR;
     }
 
-    len = strlen(cgprocs);
-
-    len = snprintf(cgprocs + len, PATH_MAX - len, "/cgroup.procs");
-    if (nxt_slow_path(len >= PATH_MAX - len)) {
-        nxt_errno = ENAMETOOLONG;
-        return NXT_ERROR;
-    }
+    stpcpy(p, "/cgroup.procs");
 
     fp = nxt_file_fopen(task, cgprocs, "we");
     if (nxt_slow_path(fp == NULL)) {
@@ -67,17 +62,16 @@ nxt_cgroup_proc_add(nxt_task_t *task, nxt_process_t *process)
 void
 nxt_cgroup_cleanup(nxt_task_t *task, const nxt_process_t *process)
 {
-    char       *ptr;
-    char       cgroot[PATH_MAX], cgpath[PATH_MAX];
-    nxt_int_t  ret;
+    char  *ptr, *ret;
+    char  cgroot[PATH_MAX], cgpath[PATH_MAX];
 
     ret = nxt_mk_cgpath(task, "", cgroot);
-    if (nxt_slow_path(ret == NXT_ERROR)) {
+    if (nxt_slow_path(ret == NULL)) {
         return;
     }
 
     ret = nxt_mk_cgpath(task, process->isolation.cgroup.path, cgpath);
-    if (nxt_slow_path(ret == NXT_ERROR)) {
+    if (nxt_slow_path(ret == NULL)) {
         return;
     }
 
@@ -89,11 +83,11 @@ nxt_cgroup_cleanup(nxt_task_t *task, const nxt_process_t *process)
 }
 
 
-static int
+static char *
 nxt_mk_cgpath_relative(nxt_task_t *task, const char *dir, char *cgpath)
 {
-    int         i, len;
-    char        *buf, *ptr;
+    int         i;
+    char        *buf, *ptr, *p;
     FILE        *fp;
     size_t      size;
     ssize_t     nread;
@@ -101,10 +95,10 @@ nxt_mk_cgpath_relative(nxt_task_t *task, const char *dir, char *cgpath)
 
     fp = nxt_file_fopen(task, "/proc/self/cgroup", "re");
     if (nxt_slow_path(fp == NULL)) {
-        return -1;
+        return NULL;
     }
 
-    len = -1;
+    p = NULL;
     buf = NULL;
     found = 0;
     while ((nread = getline(&buf, &size, fp)) != -1) {
@@ -133,21 +127,22 @@ nxt_mk_cgpath_relative(nxt_task_t *task, const char *dir, char *cgpath)
         ptr++;
     }
 
-    len = snprintf(cgpath, PATH_MAX, NXT_CGROUP_ROOT "%s/%s", ptr, dir);
+    p = stpcpy(cgpath, NXT_CGROUP_ROOT);
+    p = stpcpy(p, ptr);
+    *p++ = '/';
+    p = stpcpy(p, dir);
 
 out_free_buf:
 
     nxt_free(buf);
 
-    return len;
+    return p;
 }
 
 
-static nxt_int_t
+static char *
 nxt_mk_cgpath(nxt_task_t *task, const char *dir, char *cgpath)
 {
-    int  len;
-
     /*
      * If the path from the config is relative, we need to make
      * the cgroup path include the main unit processes cgroup. I.e
@@ -155,19 +150,8 @@ nxt_mk_cgpath(nxt_task_t *task, const char *dir, char *cgpath)
      *   NXT_CGROUP_ROOT/<main process cgroup>/<cgroup path>
      */
     if (dir[0] != '/') {
-        len = nxt_mk_cgpath_relative(task, dir, cgpath);
+        return nxt_mk_cgpath_relative(task, dir, cgpath);
     } else {
-        len = snprintf(cgpath, PATH_MAX, NXT_CGROUP_ROOT "%s", dir);
+        return stpcpy(stpcpy(cgpath, NXT_CGROUP_ROOT), dir);
     }
-
-    if (len == -1) {
-        return NXT_ERROR;
-    }
-
-    if (len >= PATH_MAX) {
-        nxt_errno = ENAMETOOLONG;
-        return NXT_ERROR;
-    }
-
-    return NXT_OK;
 }

$ git diff --stat
 src/nxt_cgroup.c | 62 +++++++++++++++++++++++---------------------------------------
 1 file changed, 23 insertions(+), 39 deletions(-)

src/nxt_conf_validation.c

ac000 · 2022-12-09T21:10:18Z

While we do check at config time (you could argue it's superfluous, but for the absolute path case it's a quick sanity check before things go too far) for the absolute path case, we don't for the relative path case, so truncation checks are still required.

I think it's best to leave it as is at this late stage.

alejandro-colomar · 2022-12-09T21:12:03Z

While we do check at config time (you could argue it's superfluous, but for the absolute path case it's a quick sanity check before things go too far) for the absolute path case, we don't for the relative path case, so truncation checks are still required.

I think it's best to leave it as is at this late stage.

Yes, I have to agree. We can merge the patch as is, which looks good to me (ugly but good), and then improve it afterwards.

Cheers,

Alex

hongzhidao · 2022-12-09T23:48:37Z

Hi,

If we don't test here, that's fine, but we'll need to know that we need to truncate later. So, my previous simplification proposal would be invalid, and we'd need to add stpecpy() at some point. We can do that after the next release.

Yes, I have to agree. We can merge the patch as is, which looks good to me (ugly but good), and then improve it afterwards.

I suggest we finish the feature one time.

The feature is clear now and the same for its logic.
I believe the 1.30 version is more challenging, if a feature is released, we should move on and focus on something new.
It seems a bit strange to do the same feature again, I understand maybe it's an improvement, but it's better to finish it together.

ac000 · 2022-12-10T03:11:54Z

The initial cgroups support is done. There is no 1.30 version, extra features will likely be added but there is no time frame.

As agreed this will be committed imminently. If you waited until things were perfect you'd never ship anything!

alejandro-colomar · 2022-12-10T15:29:32Z

@hongzhidao

My improvement is related to the bug that I reported last week about buffer overruns. I just found another case of a buffer overrun in another part of Unit. Since this case from Andrew was safe (for some particular meaning of safe) so far, I'm fine with merging it, and fixing the bigger issue separately in 1.30. The only remaining issue before releasing 1.29 is the isolation bug that I fixed, so I prefer focusing on that. I'll soon post my tests, and we can maybe develop some pytests for that.

hongzhidao · 2022-12-10T15:35:37Z

My improvement is related to the bug that I reported last week about buffer overruns.

There are no issues as we discussed, maybe we can remove some unnecessary checks.
Do you mean there are some other issues except those reported last week?

alejandro-colomar · 2022-12-10T15:37:40Z

I mean I found another issue in a different part of Unit. I'll share, since I found it's not exploitable remotely. To reproduce, you need to control the app name.

hongzhidao · 2022-12-10T15:39:25Z

Ok, welcome to create individual issues :)

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

auto/cgroup Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

src/nxt_cgroup.h Outdated Show resolved Hide resolved

alejandro-colomar reviewed Jul 18, 2022

View reviewed changes

src/nxt_cgroup.h Outdated Show resolved Hide resolved

ac000 force-pushed the cgroup branch from bd7fc18 to db683f2 Compare July 21, 2022 00:11

ac000 force-pushed the cgroup branch from db683f2 to 2b04558 Compare July 21, 2022 10:38

ac000 mentioned this pull request Aug 3, 2022

cgroupv2: run processes in user-slice #532

Closed

i4ki reviewed Aug 4, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

i4ki reviewed Aug 4, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

i4ki reviewed Aug 4, 2022

View reviewed changes

src/nxt_process.c Outdated Show resolved Hide resolved

ac000 force-pushed the cgroup branch from 2b04558 to f16358b Compare August 11, 2022 22:02

alejandro-colomar reviewed Aug 12, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Aug 12, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Aug 12, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Aug 12, 2022

View reviewed changes

src/nxt_cgroup.h Outdated Show resolved Hide resolved

alejandro-colomar reviewed Aug 12, 2022

View reviewed changes

src/nxt_conf_validation.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Aug 12, 2022

View reviewed changes

src/nxt_conf_validation.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Aug 12, 2022

View reviewed changes

src/nxt_process.c Outdated Show resolved Hide resolved

ac000 force-pushed the cgroup branch from f16358b to 92cf5cc Compare August 13, 2022 15:36

ac000 requested a review from hongzhidao November 8, 2022 12:03

ac000 force-pushed the cgroup branch from 317f1ca to 66d4b59 Compare December 9, 2022 00:20

alejandro-colomar reviewed Dec 9, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Dec 9, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Dec 9, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Dec 9, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Dec 9, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

alejandro-colomar reviewed Dec 9, 2022

View reviewed changes

src/nxt_cgroup.c Outdated Show resolved Hide resolved

ac000 force-pushed the cgroup branch 2 times, most recently from ba83e0c to 226cb99 Compare December 9, 2022 14:14

ac000 force-pushed the cgroup branch from 226cb99 to 96e9c7b Compare December 9, 2022 14:40

alejandro-colomar reviewed Dec 9, 2022

View reviewed changes

src/nxt_conf_validation.c Outdated Show resolved Hide resolved

ac000 closed this Dec 10, 2022

ac000 force-pushed the cgroup branch from 96e9c7b to f67a01b Compare December 10, 2022 14:16

alejandro-colomar mentioned this pull request Dec 10, 2022

snprintf(3) is hard to get right for concatenating strings #804

Closed

ac000 deleted the cgroup branch December 13, 2022 18:33

lcrilly moved this from In Development to Released in NGINX Unit roadmap-tracker Jan 13, 2023

jn0 mentioned this pull request Mar 10, 2023

Cannot run sendmail(8) from unit anymore (1.29.1) #852

Closed

tippexs moved this from 🚀 Released to 🚀 Released1 in NGINX Unit roadmap-tracker Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isolation: added support for per-application cgroups. #734

Isolation: added support for per-application cgroups. #734

ac000 commented Jul 18, 2022 •

edited by alejandro-colomar

Loading

ac000 commented Jul 18, 2022

alejandro-colomar commented Jul 18, 2022 •

edited

Loading

alejandro-colomar commented Jul 18, 2022

ac000 commented Jul 18, 2022 via email

ac000 commented Jul 19, 2022 via email

ac000 commented Jul 21, 2022

ac000 commented Aug 13, 2022

ac000 commented Dec 9, 2022

ac000 commented Dec 9, 2022 •

edited

Loading

alejandro-colomar commented Dec 9, 2022 •

edited

Loading

ac000 commented Dec 9, 2022

alejandro-colomar commented Dec 9, 2022

hongzhidao commented Dec 9, 2022 •

edited

Loading

ac000 commented Dec 10, 2022

alejandro-colomar commented Dec 10, 2022

hongzhidao commented Dec 10, 2022

alejandro-colomar commented Dec 10, 2022

hongzhidao commented Dec 10, 2022

Isolation: added support for per-application cgroups. #734

Isolation: added support for per-application cgroups. #734

Conversation

ac000 commented Jul 18, 2022 • edited by alejandro-colomar Loading

ac000 commented Jul 18, 2022

alejandro-colomar commented Jul 18, 2022 • edited Loading

alejandro-colomar commented Jul 18, 2022

ac000 commented Jul 18, 2022 via email

ac000 commented Jul 19, 2022 via email

ac000 commented Jul 21, 2022

ac000 commented Aug 13, 2022

ac000 commented Dec 9, 2022

ac000 commented Dec 9, 2022 • edited Loading

alejandro-colomar commented Dec 9, 2022 • edited Loading

ac000 commented Dec 9, 2022

alejandro-colomar commented Dec 9, 2022

hongzhidao commented Dec 9, 2022 • edited Loading

ac000 commented Dec 10, 2022

alejandro-colomar commented Dec 10, 2022

hongzhidao commented Dec 10, 2022

alejandro-colomar commented Dec 10, 2022

hongzhidao commented Dec 10, 2022

ac000 commented Jul 18, 2022 •

edited by alejandro-colomar

Loading

alejandro-colomar commented Jul 18, 2022 •

edited

Loading

ac000 commented Dec 9, 2022 •

edited

Loading

alejandro-colomar commented Dec 9, 2022 •

edited

Loading

hongzhidao commented Dec 9, 2022 •

edited

Loading