KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) #3483

team3-oxalide · 2016-06-09T19:38:06Z

Submission type

Bug report
Request for enhancement (RFE)

systemd version the issue has been seen with

230

Used distribution

Archlinux
Gentoo

In case of bug report: Expected behaviour you didn't see

login: root
root $ echo "KillUserProcesses=yes" >>  /etc/systemd/logind.conf
root $ systemctl restart systemd-logind
root $ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager KillUserProcesses
b true

login: max
max $ nohub sleep 3600 &
max $ logout

login: root
root $ pgrep -a sleep
(nothing)

In case of bug report: Unexpected behaviour you saw

login: root
root $ echo "KillUserProcesses=yes" >>  /etc/systemd/logind.conf
root $ systemctl restart systemd-logind
root $ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager KillUserProcesses
b true

login: max
max $ nohub sleep 3600 &
max $ logout

login: root
root $ pgrep -a sleep
<pid> sleep 3600

Remark

Gentoo and Archlinux build systemd with --without-kill-user-processes.

Archlinux:

systemd 230
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Gentoo:

systemd 230
+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL -XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN

I also tested on Archlinux testing with the same result.

I also test with systemd master (40652ca) (without any config option) with the same result.

I use the kernel parameter systemd.unified_cgroup_hierarchy=1.

The problem appeared when I update from 229 to 230 (I also add systemd.unified_cgroup_hierarchy=1 to my kernel parameter at the same time).

EDIT:
I just recheck, the problem disappear when I remove systemd.unified_cgroup_hierarchy=1 from my kernel cmdline.

KillUserProcesses doesn't work in combination with systemd.unified_cgroup_hierarchy=1.

The text was updated successfully, but these errors were encountered:

tchernomax · 2016-06-10T17:46:42Z

maybe related to #3388

(I am the author of this issue… I forget to switch github account when I created it)

evverx · 2016-06-11T01:39:48Z

Seems like a race:
Try:

nohup sleep 3600 &
sleep 10
logout

Do you see sleep?

evverx · 2016-06-11T15:56:26Z

Well, I've reproduced:

sh -c 'trap "" HUP; cat </dev/zero >/dev/null' & sleep 10 && logout

1465659331741086 sh(18014/18014) sh -c trap "" HUP; cat </dev/zero >/dev/null
1465659331743718 sleep(18015/18015) sleep 10
1465659331743831 cat(18016/18016) cat
1465659341752595 systemd-logind(10791/10791) sd_bus_call_method
1465659341754747 systemd-logind(10791/10791) session_stop_scope
1465659341754759 systemd-logind(10791/10791) manager_shall_kill
1465659341754764 systemd-logind(10791/10791) manager_stop_unit
1465659341754768 systemd-logind(10791/10791) sd_bus_call_method
1465659341754891 systemd(1/1) method_stop_unit
1465659341756121 systemd(1/1) cg_kill "/user.slice/user-1001.slice/session-44.scope"
1465659341756161 systemd(1/1) 17395, SIGTERM
1465659341756329 systemd(1/1) 17395, SIGCONT
1465659341756334 systemd(1/1) 17403, SIGTERM
1465659341756520 systemd(1/1) 17403, SIGCONT
1465659341756523 systemd(1/1) 18016, SIGTERM
1465659341756542 systemd(1/1) 18016, SIGCONT
1465659341756545 systemd(1/1) 18014, SIGTERM
1465659341756550 systemd(1/1) 18014, SIGCONT
1465659341756655 systemd(1/1) cg_kill "/user.slice/user-1001.slice/session-44.scope"
1465659341756669 systemd(1/1) 17395, SIGHUP
1465659341756673 systemd(1/1) 17403, SIGHUP
1465659341756675 systemd(1/1) 18016, SIGHUP
1465659341756677 systemd(1/1) 18014, SIGHUP
1465659341758178 systemd(1/1) 17395, SIG_0
1465659341758403 systemd(1/1) 17395, SIG_0
1465659341759051 systemd-logind(10791/10791) sd_bus_call_method
1465659341759571 systemd-logind(10791/10791) sd_bus_call_method
1465659341760598 systemd-logind(10791/10791) sd_bus_call_method
1465659341760905 systemd-logind(10791/10791) sd_bus_call_method

sh -c 'trap "" HUP; cat </dev/zero >/dev/null' & logout

1465659417938648 sh(18074/18074) sh -c trap "" HUP; cat </dev/zero >/dev/null
1465659417943153 cat(18076/18076) cat
1465659417946070 systemd-logind(10791/10791) sd_bus_call_method
1465659417948822 systemd-logind(10791/10791) session_stop_scope
1465659417948835 systemd-logind(10791/10791) manager_shall_kill
1465659417948841 systemd-logind(10791/10791) manager_stop_unit
1465659417948847 systemd-logind(10791/10791) sd_bus_call_method
1465659417949186 systemd(1/1) method_stop_unit
1465659417951403 systemd(1/1) 18024, SIG_0
1465659417952519 systemd-logind(10791/10791) sd_bus_call_method
1465659417953337 systemd-logind(10791/10791) sd_bus_call_method

Notes:
stap-script

$ uname -r
4.7.0-0.rc2.git2.1.fc25.x86_64

$ rpm -q systemd
systemd-230-2.fc25.x86_64

$ grep cgroup /proc/self/mounts
cgroup /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0

evverx · 2016-06-12T14:40:31Z

So,

we receive a SIGCHLD from the scope
the scope invokes:

static void scope_sigchld_event(Unit *u, pid_t pid, int code, int status) {

        /* If we get a SIGCHLD event for one of the processes we were
           interested in, then we look for others to watch, under the
           assumption that we'll sooner or later get a SIGCHLD for
           them, as the original process we watched was probably the
           parent of them, and they are hence now our children. */

        unit_tidy_watch_pids(u, 0, 0);
        unit_watch_all_pids(u);

        /* If the PID set is empty now, then let's finish this off */
        if (set_isempty(u->pids))
                scope_notify_cgroup_empty_event(u);
}

The problem starts here:

int unit_watch_all_pids(Unit *u) {
        assert(u);

        /* Adds all PIDs from our cgroup to the set of PIDs we
         * watch. This is a fallback logic for cases where we do not
         * get reliable cgroup empty notifications: we try to use
         * SIGCHLD as replacement. */

        if (!u->cgroup_path)
                return -ENOENT;

        if (cg_unified() > 0) /* On unified we can use proper notifications */
                return 0;

        return unit_watch_pids_in_path(u, u->cgroup_path);
}

scope_sigchld_event invokes scope_enter_dead(s, SCOPE_SUCCESS). This bypasses scope_enter_signal.

Later we try to remove a cgroup:

1465741918124072 "/sys/fs/cgroup/user.slice/user-1001.slice/session-16.scope" -16 (EBUSY)
 0x7fe362ffed97 : rmdir+0x7/0x30 [/usr/lib64/libc-2.23.90.so]
 0x560e8a95cec9 : cg_trim+0xb9/0x110 [/usr/lib/systemd/systemd]
 0x560e8a9d65f2 : unit_prune_cgroup+0x62/0x150 [/usr/lib/systemd/systemd]
 0x560e8a934ee8 : unit_notify+0x488/0x1190 [/usr/lib/systemd/systemd]
 0x560e8a9c5701 : manager_dispatch_sigchld+0x171/0x2d0 [/usr/lib/systemd/systemd]
 0x560e8a9c8809 : manager_dispatch_signal_fd.lto_priv.1049+0x999/0xa90 [/usr/lib/systemd/systemd]
 0x560e8a9375a3 : source_dispatch.lto_priv.1035+0x1f3/0x350 [/usr/lib/systemd/systemd]
 0x560e8a9c7466 : manager_loop+0x4c6/0xce0 [/usr/lib/systemd/systemd]
 0x560e8a90e67a : main+0x2f7a/0x46e0 [/usr/lib/systemd/systemd]
 0x7fe362f27481 : __libc_start_main+0xf1/0x1d0 [/usr/lib64/libc-2.23.90.so]
 0x560e8a90fe2a : _start+0x2a/0x30 [/usr/lib/systemd/systemd]

This fails. But we ignore it:

Failed to destroy cgroup /user.slice/user-1001.slice/session-16.scope, ignoring: Device or resource busy

This is totally unrelated to #3388

evverx · 2016-06-12T17:24:22Z

I've prepared a fix:

--- a/src/core/scope.c
+++ b/src/core/scope.c
@@ -429,7 +429,7 @@ static void scope_sigchld_event(Unit *u, pid_t pid, int code, int status) {
         unit_watch_all_pids(u);

         /* If the PID set is empty now, then let's finish this off */
-        if (set_isempty(u->pids))
+        if (cg_unified() <= 0 && set_isempty(u->pids))
                 scope_notify_cgroup_empty_event(u);
 }

I'm testing
(maybe, we should fix other code too. For example, service_sigchld_event

        /* We got one SIGCHLD for the service, let's watch all
         * processes that are now running of the service, and watch
         * that. Among the PIDs we then watch will be children
         * reassigned to us, which hopefully allows us to identify
         * when all children are gone */
        unit_tidy_watch_pids(u, s->main_pid, s->control_pid);
        unit_watch_all_pids(u);

        /* If the PID set is empty now, then let's finish this off */
        if (set_isempty(u->pids))
                service_notify_cgroup_empty_event(u);

)
I'm trying to reproduce.

…tifications Fixes: systemd#3483

…tifications (#3531) Fixes: #3483

team3-oxalide changed the title ~~KillUserProcesses doesn't kill user process at logout~~ KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) Jun 9, 2016

aroig mentioned this issue Jun 10, 2016

cgroup controllers are not activated for the user instance in the unified hierarchy #3500

Closed

2 tasks

evverx added bug 🐛 Programming errors, that need preferential fixing pid1 labels Jun 12, 2016

evverx added a commit to evverx/systemd that referenced this issue Jun 13, 2016

core: on unified we don't need to check u->pids: we can use proper no…

cd060ab

…tifications Fixes: systemd#3483

evverx mentioned this issue Jun 13, 2016

core: on unified we don't need to check u->pids: we can use proper notifications #3531

Merged

evverx added the has-pr label Jun 13, 2016

poettering closed this as completed in #3531 Jun 14, 2016

poettering pushed a commit that referenced this issue Jun 14, 2016

core: on unified we don't need to check u->pids: we can use proper no…

bbc85a1

…tifications (#3531) Fixes: #3483

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) #3483

KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) #3483

team3-oxalide commented Jun 9, 2016 •

edited

Loading

tchernomax commented Jun 10, 2016

evverx commented Jun 11, 2016

evverx commented Jun 11, 2016

evverx commented Jun 12, 2016

evverx commented Jun 12, 2016

KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) #3483

KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) #3483

Comments

team3-oxalide commented Jun 9, 2016 • edited Loading

Submission type

systemd version the issue has been seen with

Used distribution

In case of bug report: Expected behaviour you didn't see

In case of bug report: Unexpected behaviour you saw

Remark

tchernomax commented Jun 10, 2016

evverx commented Jun 11, 2016

evverx commented Jun 11, 2016

evverx commented Jun 12, 2016

evverx commented Jun 12, 2016

team3-oxalide commented Jun 9, 2016 •

edited

Loading