Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) #3483

Closed
1 of 2 tasks
team3-oxalide opened this issue Jun 9, 2016 · 5 comments
Closed
1 of 2 tasks
Labels
bug 🐛 Programming errors, that need preferential fixing pid1

Comments

@team3-oxalide
Copy link

team3-oxalide commented Jun 9, 2016

Submission type

  • Bug report
  • Request for enhancement (RFE)

systemd version the issue has been seen with

  • 230

Used distribution

  • Archlinux
  • Gentoo

In case of bug report: Expected behaviour you didn't see

login: root
root $ echo "KillUserProcesses=yes" >>  /etc/systemd/logind.conf
root $ systemctl restart systemd-logind
root $ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager KillUserProcesses
b true

login: max
max $ nohub sleep 3600 &
max $ logout

login: root
root $ pgrep -a sleep
(nothing)

In case of bug report: Unexpected behaviour you saw

login: root
root $ echo "KillUserProcesses=yes" >>  /etc/systemd/logind.conf
root $ systemctl restart systemd-logind
root $ busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager KillUserProcesses
b true

login: max
max $ nohub sleep 3600 &
max $ logout

login: root
root $ pgrep -a sleep
<pid> sleep 3600

Remark

Gentoo and Archlinux build systemd with --without-kill-user-processes.

Archlinux:

systemd 230
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Gentoo:

systemd 230
+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL -XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN

I also tested on Archlinux testing with the same result.

I also test with systemd master (40652ca) (without any config option) with the same result.

I use the kernel parameter systemd.unified_cgroup_hierarchy=1.

The problem appeared when I update from 229 to 230 (I also add systemd.unified_cgroup_hierarchy=1 to my kernel parameter at the same time).

EDIT:
I just recheck, the problem disappear when I remove systemd.unified_cgroup_hierarchy=1 from my kernel cmdline.

KillUserProcesses doesn't work in combination with systemd.unified_cgroup_hierarchy=1.

@team3-oxalide team3-oxalide changed the title KillUserProcesses doesn't kill user process at logout KillUserProcesses doesn't kill user process at logout (systemd.unified_cgroup_hierarchy=1) Jun 9, 2016
@tchernomax
Copy link
Contributor

maybe related to #3388

(I am the author of this issue… I forget to switch github account when I created it)

@evverx
Copy link
Member

evverx commented Jun 11, 2016

Seems like a race:
Try:

nohup sleep 3600 &
sleep 10
logout

Do you see sleep?

@evverx
Copy link
Member

evverx commented Jun 11, 2016

Well, I've reproduced:

sh -c 'trap "" HUP; cat </dev/zero >/dev/null' & sleep 10 && logout

1465659331741086 sh(18014/18014) sh -c trap "" HUP; cat </dev/zero >/dev/null
1465659331743718 sleep(18015/18015) sleep 10
1465659331743831 cat(18016/18016) cat
1465659341752595 systemd-logind(10791/10791) sd_bus_call_method
1465659341754747 systemd-logind(10791/10791) session_stop_scope
1465659341754759 systemd-logind(10791/10791) manager_shall_kill
1465659341754764 systemd-logind(10791/10791) manager_stop_unit
1465659341754768 systemd-logind(10791/10791) sd_bus_call_method
1465659341754891 systemd(1/1) method_stop_unit
1465659341756121 systemd(1/1) cg_kill "/user.slice/user-1001.slice/session-44.scope"
1465659341756161 systemd(1/1) 17395, SIGTERM
1465659341756329 systemd(1/1) 17395, SIGCONT
1465659341756334 systemd(1/1) 17403, SIGTERM
1465659341756520 systemd(1/1) 17403, SIGCONT
1465659341756523 systemd(1/1) 18016, SIGTERM
1465659341756542 systemd(1/1) 18016, SIGCONT
1465659341756545 systemd(1/1) 18014, SIGTERM
1465659341756550 systemd(1/1) 18014, SIGCONT
1465659341756655 systemd(1/1) cg_kill "/user.slice/user-1001.slice/session-44.scope"
1465659341756669 systemd(1/1) 17395, SIGHUP
1465659341756673 systemd(1/1) 17403, SIGHUP
1465659341756675 systemd(1/1) 18016, SIGHUP
1465659341756677 systemd(1/1) 18014, SIGHUP
1465659341758178 systemd(1/1) 17395, SIG_0
1465659341758403 systemd(1/1) 17395, SIG_0
1465659341759051 systemd-logind(10791/10791) sd_bus_call_method
1465659341759571 systemd-logind(10791/10791) sd_bus_call_method
1465659341760598 systemd-logind(10791/10791) sd_bus_call_method
1465659341760905 systemd-logind(10791/10791) sd_bus_call_method

sh -c 'trap "" HUP; cat </dev/zero >/dev/null' & logout

1465659417938648 sh(18074/18074) sh -c trap "" HUP; cat </dev/zero >/dev/null
1465659417943153 cat(18076/18076) cat
1465659417946070 systemd-logind(10791/10791) sd_bus_call_method
1465659417948822 systemd-logind(10791/10791) session_stop_scope
1465659417948835 systemd-logind(10791/10791) manager_shall_kill
1465659417948841 systemd-logind(10791/10791) manager_stop_unit
1465659417948847 systemd-logind(10791/10791) sd_bus_call_method
1465659417949186 systemd(1/1) method_stop_unit
1465659417951403 systemd(1/1) 18024, SIG_0
1465659417952519 systemd-logind(10791/10791) sd_bus_call_method
1465659417953337 systemd-logind(10791/10791) sd_bus_call_method

Notes:
stap-script

$ uname -r
4.7.0-0.rc2.git2.1.fc25.x86_64

$ rpm -q systemd
systemd-230-2.fc25.x86_64

$ grep cgroup /proc/self/mounts
cgroup /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0

@evverx evverx added bug 🐛 Programming errors, that need preferential fixing pid1 labels Jun 12, 2016
@evverx
Copy link
Member

evverx commented Jun 12, 2016

So,

  • we receive a SIGCHLD from the scope
  • the scope invokes:
static void scope_sigchld_event(Unit *u, pid_t pid, int code, int status) {

        /* If we get a SIGCHLD event for one of the processes we were
           interested in, then we look for others to watch, under the
           assumption that we'll sooner or later get a SIGCHLD for
           them, as the original process we watched was probably the
           parent of them, and they are hence now our children. */

        unit_tidy_watch_pids(u, 0, 0);
        unit_watch_all_pids(u);

        /* If the PID set is empty now, then let's finish this off */
        if (set_isempty(u->pids))
                scope_notify_cgroup_empty_event(u);
}

The problem starts here:

int unit_watch_all_pids(Unit *u) {
        assert(u);

        /* Adds all PIDs from our cgroup to the set of PIDs we
         * watch. This is a fallback logic for cases where we do not
         * get reliable cgroup empty notifications: we try to use
         * SIGCHLD as replacement. */

        if (!u->cgroup_path)
                return -ENOENT;

        if (cg_unified() > 0) /* On unified we can use proper notifications */
                return 0;

        return unit_watch_pids_in_path(u, u->cgroup_path);
}

scope_sigchld_event invokes scope_enter_dead(s, SCOPE_SUCCESS). This bypasses scope_enter_signal.

Later we try to remove a cgroup:

1465741918124072 "/sys/fs/cgroup/user.slice/user-1001.slice/session-16.scope" -16 (EBUSY)
 0x7fe362ffed97 : rmdir+0x7/0x30 [/usr/lib64/libc-2.23.90.so]
 0x560e8a95cec9 : cg_trim+0xb9/0x110 [/usr/lib/systemd/systemd]
 0x560e8a9d65f2 : unit_prune_cgroup+0x62/0x150 [/usr/lib/systemd/systemd]
 0x560e8a934ee8 : unit_notify+0x488/0x1190 [/usr/lib/systemd/systemd]
 0x560e8a9c5701 : manager_dispatch_sigchld+0x171/0x2d0 [/usr/lib/systemd/systemd]
 0x560e8a9c8809 : manager_dispatch_signal_fd.lto_priv.1049+0x999/0xa90 [/usr/lib/systemd/systemd]
 0x560e8a9375a3 : source_dispatch.lto_priv.1035+0x1f3/0x350 [/usr/lib/systemd/systemd]
 0x560e8a9c7466 : manager_loop+0x4c6/0xce0 [/usr/lib/systemd/systemd]
 0x560e8a90e67a : main+0x2f7a/0x46e0 [/usr/lib/systemd/systemd]
 0x7fe362f27481 : __libc_start_main+0xf1/0x1d0 [/usr/lib64/libc-2.23.90.so]
 0x560e8a90fe2a : _start+0x2a/0x30 [/usr/lib/systemd/systemd]

This fails. But we ignore it:

Failed to destroy cgroup /user.slice/user-1001.slice/session-16.scope, ignoring: Device or resource busy

This is totally unrelated to #3388

@evverx
Copy link
Member

evverx commented Jun 12, 2016

I've prepared a fix:

--- a/src/core/scope.c
+++ b/src/core/scope.c
@@ -429,7 +429,7 @@ static void scope_sigchld_event(Unit *u, pid_t pid, int code, int status) {
         unit_watch_all_pids(u);

         /* If the PID set is empty now, then let's finish this off */
-        if (set_isempty(u->pids))
+        if (cg_unified() <= 0 && set_isempty(u->pids))
                 scope_notify_cgroup_empty_event(u);
 }

I'm testing
(maybe, we should fix other code too. For example, service_sigchld_event

        /* We got one SIGCHLD for the service, let's watch all
         * processes that are now running of the service, and watch
         * that. Among the PIDs we then watch will be children
         * reassigned to us, which hopefully allows us to identify
         * when all children are gone */
        unit_tidy_watch_pids(u, s->main_pid, s->control_pid);
        unit_watch_all_pids(u);

        /* If the PID set is empty now, then let's finish this off */
        if (set_isempty(u->pids))
                service_notify_cgroup_empty_event(u);

)
I'm trying to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing pid1
Development

No branches or pull requests

3 participants