Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"A stop job is running for User Manager for UID 1000" #12262

Closed
ysooqe opened this issue Apr 9, 2019 · 99 comments · Fixed by #12621
Closed

"A stop job is running for User Manager for UID 1000" #12262

ysooqe opened this issue Apr 9, 2019 · 99 comments · Fixed by #12621

Comments

@ysooqe
Copy link

ysooqe commented Apr 9, 2019

I hope this is the right place to report/post/ask this.
When I want to shutdown or reboot my system (using reboot or shutdown now), the system is not shutting down immediately but rather waits 1m30s and displays the following:
A stop job is running for User Manager for UID 1000 (1min 10s / 2min)
I tried to find out what is causing this, creating a log according to Debugging.
Unfortunately, I can not find out myself what is causing the issue. What can I do to solve this issue?
I attached the log to the end of this report.

systemd version the issue has
been seen with

241.67

Used distribution

Arch Linux

Expected behaviour you didn't see

System shuts down properly without a delay

Unexpected behaviour you saw

Delay when shutting down

Steps to reproduce the problem
Shutdown the system

shutdown-log.txt

@poettering
Copy link
Member

this is most likely some user service (i.e. a service that runs inside the user systemd instance) that blocks shutdown. Interestingly there's zero logging from that per-user systemd instance in your logs. Not sure why.

Please enable the debug shell on Alt-F9 (by doing systemctl enable --now debug-shell.service — but don't forget to disable it eventually, to not make your system vulnerable), then reproduce the issue, and when it hangs check the process tree and extract a backtrace from the systemd --user process (by doing pstack 1 with gdb and debug symbols installed).

@poettering poettering added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Apr 10, 2019
@boucman
Copy link
Contributor

boucman commented Apr 10, 2019

I tend to log as the faulty user on Alt-F9 and simply do systemctl --user list-jobs to find the faulty process...
if that's usefull in any way...

@ysooqe
Copy link
Author

ysooqe commented Apr 11, 2019

@poettering

Please enable the debug shell on Alt-F9 (by doing systemctl enable --now debug-shell.service — but don't forget to disable it eventually, to not make your system vulnerable), then reproduce the issue, and when it hangs check the process tree and extract a backtrace from the systemd --user process (by doing pstack 1 with gdb and debug symbols installed).

I am sorry, but I do not understand what I am supposed to do here.
I installed gdb, I enabled the debug shell, but when I shutdown the system, pressing CTRL+ALT+F9 I get to a shell, but I cant really enter commands. After I enter a letter, it disappears quickly. Entering pstack 1 just prints the message again, that a stop job is running.
When I just do CTRL-ALT-F9 while the system is running, I can enter commands as usual.

@boucman

I tend to log as the faulty user on Alt-F9 and simply do systemctl --user list-jobs to find the faulty process...
if that's usefull in any way...

How can I log in as a user on CTRL-ALT-F9? I tried several ways, including the login command, but I can not log in as user. Therefore, when I do systemctl --user list-jobs I get Failed to connect to bus: No such file or directory

@keszybz
Copy link
Member

keszybz commented Apr 11, 2019

@ysooqe you need enable debug-shell.service. It requires no login, and is just a root shell on tty9, not related to gdb either. See https://www.freedesktop.org/wiki/Software/systemd/Debugging/#earlydebugshell.

@poettering
Copy link
Member

I am sorry, but I do not understand what I am supposed to do here.
I installed gdb, I enabled the debug shell, but when I shutdown the system, pressing CTRL+ALT+F9 I get to a shell, but I cant really enter commands. After I enter a letter, it disappears quickly. Entering pstack 1 just prints the message again, that a stop job is running.
When I just do CTRL-ALT-F9 while the system is running, I can enter commands as usual.

most likely the shutdown status output just gets mixed up with the shell's output. Just type blindly, and when you hit enter you should still see the stacktrace.

what do you see in "ps xawuf" when you are in that state?

@ysooqe
Copy link
Author

ysooqe commented Apr 13, 2019

I am sorry, but I do not understand what I am supposed to do here.
I installed gdb, I enabled the debug shell, but when I shutdown the system, pressing CTRL+ALT+F9 I get to a shell, but I cant really enter commands. After I enter a letter, it disappears quickly. Entering pstack 1 just prints the message again, that a stop job is running.
When I just do CTRL-ALT-F9 while the system is running, I can enter commands as usual.

most likely the shutdown status output just gets mixed up with the shell's output. Just type blindly, and when you hit enter you should still see the stacktrace.

what do you see in "ps xawuf" when you are in that state?

Unfortunately, I still can not get a useful output. It says somewhere, that there are no debug symbols installed. Since I am not a developer and have no idea what that means, I tried to search for it. Now, if I understood it correctly, I would have to compile systemd for myself with debug symbols enabled in order to get the desired output when I do pstack 1?

And the output of ps xawuf regarding the user is

ysooqe  1012  0.0  0.0  34392  9500 ?        Ss   09:54   0:00 /usr/lib/systemd/systemd --user
ysooqe  1013  0.0  0.0  84320  2756 ?        S    09:54   0:00  \_ (sd-pam)
ysooqe  1051  0.0  0.0 306784  6972 ?        Ssl  09:54   0:00  \_ /usr/lib/at-spi-bus-launcher
ysooqe  1391  0.0  0.0  12852  4048 ?        Ss   09:54   0:00  \_ tmux
ysooqe  1392  0.0  0.0  11256  6544 pts/1    Ss+  09:54   0:00      \_ -zsh
ysooqe  2784  0.2  0.0  12900  7980 pts/2    Ss+  10:05   0:00      \_ -zsh

@topimiettinen
Copy link
Contributor

I also get this but very rarely. I managed to capture something with "script": https://gist.github.com/topimiettinen/a9984027b756405860c5e288132a247d

@arvidjaar
Copy link
Contributor

@topimiettinen

This really looks like systemd lost child process events

topi         849  0.1  0.0  21732 10068 ?        Ss   11:36   0:00 /lib/systemd/systemd --user
topi         891  0.0  0.0      0     0 ?        Zs   11:36   0:00  \_ [dbus-daemon] <defunct>
topi         902  0.0  0.0      0     0 ?        Zs   11:36   0:00  \_ [redshift] <defunct>
topi        1154  0.3  0.0      0     0 ?        Zs   11:38   0:00  \_ [pulseaudio] <defunct>

What systemd version is it?

@topimiettinen
Copy link
Contributor

@arvidjaar Debian Buster 241-3.

@ysooqe
Copy link
Author

ysooqe commented Apr 16, 2019

@poettering
Since I still wasnt able to figure out what I am supposed to do, I did something else:
systemctl --user list-sockets and tried to manually stop the coresponding service and then do a shutdown just to see if the behaviour changes.
And after I stopped dbus.service, it was actually possible to shutdown without the delay. I have no idea if that even makes sense, but from this, I believe dbus.service might be the culprit. How do I go from here?

@topimiettinen
Copy link
Contributor

Also in my call trace it seems that systemd doesn't realize that dbus has stopped and the bus is not available anymore. sd_bus_ensure_running() will not call sd_bus_wait() if bus->state is BUS_CLOSED or BUS_CLOSING, but this is where systemd is waiting in my case.

@topimiettinen
Copy link
Contributor

I recompiled the Debian package with symbols to get better backtraces. I managed to capture another backtrace. Here bus->state = BUS_AUTHENTICATING, which is weird. The context seems to be that session dbus died and systemd got SIGCHLD as can be seen from service_sigchld_event(). Maybe calling sd_bus_flush() from destroy_bus() is a bad idea also for non-system case?

@topimiettinen
Copy link
Contributor

@ysooqe could you check if #12349 fixes the issue for you?

@ysooqe
Copy link
Author

ysooqe commented Apr 19, 2019

@topimiettinen thank you for taking the time to look into this!
i wont be able to test this over easter, but i will test this as soon as possible and report back then

@topimiettinen
Copy link
Contributor

After some successful reboots I just got the problem again (backtrace), so #12349 does not fix this (at least completely). This time the call chain does not originate from destroy_bus().

@ysooqe
Copy link
Author

ysooqe commented Apr 23, 2019

Alright, this is strange:
The issue is gone after I switched vpn-clients. Specifically, there seemed to be an issue with openresolv. No idea what was the problem. If there are things that I should and can do to investigate this, please let me know. Otherwise I would close this for now as the issue is no longer present.

@ysooqe ysooqe closed this as completed Apr 23, 2019
@topimiettinen topimiettinen reopened this May 18, 2019
@topimiettinen
Copy link
Contributor

After some changes in my setup, this now happens on every shutdown for me. The symptoms are different from what I had earlier (user systemd was alive but did not reap zombies), now user systemd has exited but it has left one user process as a zombie. If I kill it, shutdown continues.

@topimiettinen
Copy link
Contributor

Here's another example.

@poettering It seems that pstack only works for 32 bit apps. Would you have any ideas how to debug this?

This story about BPF in LWN hints that there could be a bug in /proc where "occasionally the getdents() system call will return a partial result on /proc, causing the entry for the target process to be left out". Maybe a kernel bug could be the root cause.

@topimiettinen
Copy link
Contributor

topimiettinen commented May 19, 2019

Looking at systemd --user logs, the kdeconnect process which becomes zombie is launched by D-Bus (not systemd) at the same time as shutdown is initiated:

May 19 00:44:56 loora org_kde_powerdevil[937]: powerdevil: Scheduling inhibition from ":1.19" "ksmserver" with cookie 1 and reason "Shutting down system"
May 19 00:44:56 loora dbus-daemon[847]: [session uid=1000 pid=847] Activating service name='org.kde.kdeconnect' requested by ':1.25' (uid=1000 pid=976 comm="/usr/bin/plasmashell ")
May 19 00:44:56 loora kgpg[931]: org.kde.knotifications: env says KDE is running but SNI unavailable -- check KDE_FULL_SESSION and XDG_CURRENT_DESKTOP
May 19 00:44:56 loora usbguard-applet-qt[895]: org.kde.knotifications: env says KDE is running but SNI unavailable -- check KDE_FULL_SESSION and XDG_CURRENT_DESKTOP
May 19 00:44:56 loora org.kde.kdeconnect[847]: kdeconnect.core: KdeConnect daemon starting
May 19 00:44:56 loora org.kde.kdeconnect[847]: Could not create AF_NETLINK socket (Address family not supported by protocol)
May 19 00:44:56 loora systemd[804]: Received SIGTERM from PID 1 (systemd).
May 19 00:44:56 loora systemd[804]: Activating special unit exit.target

D-Bus is stopped at 00:45:00:

May 19 00:45:00 loora systemd[804]: dbus.service: Changed running -> stop-sigterm
May 19 00:45:00 loora systemd[804]: Bus bus-api-user: changing state RUNNING → CLOSED
May 19 00:45:00 loora systemd[804]: Received SIGCHLD from PID 847 (dbus-daemon).
May 19 00:45:00 loora systemd[804]: Child 847 (dbus-daemon) died (code=exited, status=0/SUCCESS)
May 19 00:45:00 loora systemd[804]: dbus.service: Child 847 belongs to dbus.service.
May 19 00:45:00 loora systemd[804]: dbus.service: Main process exited, code=exited, status=0/SUCCESS
May 19 00:45:00 loora systemd[804]: dbus.service: Succeeded.
May 19 00:45:00 loora systemd[804]: dbus.service: Changed stop-sigterm -> dead
May 19 00:45:00 loora systemd[804]: dbus.service: Failed to destroy cgroup /user.slice/user-1000.slice/user@1000.service/dbus.service, ignoring: Device or resource busy
May 19 00:45:00 loora systemd[804]: dbus.service: Job 154 dbus.service/stop finished, result=done
May 19 00:45:00 loora systemd[804]: dbus.socket: Changed running -> listening

Kdeconnect is still running after systemd --user exits:

May 19 00:45:00 loora systemd[804]: Exit.
May 19 00:45:00 loora systemd[804]: Bus bus-system: changing state RUNNING → CLOSED
May 19 00:45:00 loora systemd[1]: Received SIGCHLD from PID 804 (systemd).
May 19 00:45:00 loora systemd[1]: Child 804 (systemd) died (code=exited, status=0/SUCCESS)
May 19 00:45:06 loora org.kde.kdeconnect[847]: Could not create AF_NETLINK socket (Address family not supported by protocol)
[repeated every 10 seconds until reboot]
May 19 00:47:36 loora org.kde.kdeconnect[847]: Could not create AF_NETLINK socket (Address family not supported by protocol)
-- Reboot --

But "ps" output had different PID of 1946 for kdeconnectd and there was no PID 847! This was from around 00:45:20:
topi 1946 0.3 0.0 0 0 ? Zl 00:44 0:00 [kdeconnectd] <defunct>
Earlier we see that session D-Bus daemon was indeed PID 847:
May 19 00:34:40 loora dbus-daemon[847]: dbus.service: Executing: /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
After I kill kdeconnectd, PID1 sees this and reboot sequence continues:

May 19 00:47:41 loora systemd[1]: Received SIGCHLD from PID 1946 (kdeconnectd).
May 19 00:47:41 loora systemd[1]: Child 1946 (kdeconnectd) died (code=killed, status=31/SYS)
May 19 00:47:41 loora systemd[1]: user@1000.service: Child 1946 belongs to user@1000.service.
May 19 00:47:41 loora systemd[1]: user@1000.service: cgroup is empty
May 19 00:47:41 loora systemd[1]: user@1000.service: Succeeded.
May 19 00:47:41 loora systemd[1]: user@1000.service: Changed final-sigkill -> dead
May 19 00:47:41 loora systemd[1]: user@1000.service: Job 1274 user@1000.service/stop finished, result=done
May 19 00:47:41 loora systemd[1]: Stopped User Manager for UID 1000.

Very very strange. Some thoughts:

  • systemd --user (or PID1) could be confused between 1946 and 847
  • kernel could be confused between 1946 and 847
  • the zombie process is not zombie as it can be killed, but why it is marked as Z and <defunct>?

@topimiettinen
Copy link
Contributor

I considered bisecting, but once I replace my version of systemd (debian/241-3-1-g7197cd7+, Debian 241-3 modified by PR #12523) with stock Debian 241-3, the problem disappears (or turns to rare again). I think the modification should not affect this issue as I did not use systemd-mount during that session.

@topimiettinen
Copy link
Contributor

I tried also kernel 4.9.0-9 from Stretch instead of 4.19.0-5. Either the heisenbug disappeared or that's a good kernel version.

But I noticed that the kdeconnectd zombie still has two live threads:

cat /proc/1164/task/*/status (edited)
Name:   kdeconnectd
State:  Z (zombie)
Pid:    1164
PPid:   1
Threads:        3

Name:   QDBusConnection
State:  S (sleeping)
Pid:    1189
PPid:   1

Name:   Qt bearer threa
State:  S (sleeping)
Pid:    1191
PPid:   1

Killing one of them makes reboot continue.

@topimiettinen
Copy link
Contributor

This program creates a similar zombie thread but that does not work the same way. Killing it does not trigger reboot. Killing kdeconnectd still triggers, despite of the presence of this additional zombie.

#include <ctype.h>
#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

static void hang(void) {
        struct timespec ts = {100, 0};

        for(;;)
                nanosleep(&ts, NULL);
}

static void *thread_start(void *arg) {
        hang();
        return NULL;
}

int main(void) {
        pid_t pid;
        int r;
        pthread_t thread_id;

        pid = fork();
        if (pid > 0) {
                return 0;
        }
        (void)setsid();
        pid = fork();
        if (pid > 0) {
                hang();
                return 0;
        } else if (pid == 0) {
                r = pthread_create(&thread_id, NULL, thread_start, NULL);
                if (r != 0)
                        perror("pthread_create");
                pthread_exit(NULL);
        } else {
                perror("fork");
                return -1;
        }
        return 0;
}

topimiettinen added a commit to topimiettinen/systemd that referenced this issue May 20, 2019
It's possible for a zombie process to have live threads. These are not listed
in /sys in "cgroup.procs" but they show up in "tasks" nodes. When killing a
cgroup, let's kill threads instead of processes, so the live threads of a
zombie get killed too.

Closes systemd#12262.
@topimiettinen
Copy link
Contributor

I think I found a fix. I did not see any shutdown problems in a few reboots.

@mbiebl
Copy link
Contributor

mbiebl commented Dec 16, 2020

This bug report is a mess and not actionable, so it doesn't make sense to re-open.
We have wildly different issues here, the only thing in common is the rather generic error message.

@andreyk0
Copy link

This bug report is a mess and not actionable, so it doesn't make sense to re-open.
We have wildly different issues here, the only thing in common is the rather generic error message.

Would it make sense to improve an error message? Perhaps mention what exactly it is waiting for, its process ID, etc?

@mbiebl
Copy link
Contributor

mbiebl commented Dec 16, 2020

I would have to double check, but I think it already does that with a recent version of systemd.

@boucman
Copy link
Contributor

boucman commented Dec 16, 2020

Thing is, there are two systemd involved in this message,

  • the system instance is handling your whole machine
  • the session instance handle your login session and your user-processes

What you see here is a message from the system machine and it tells you what it is waiting for: it's waiting for the user session to finish.

why doesn't the user session finish ? Go figure... that's where you have all the report each for a different case of "there is a user dameon that doesn't finish"

So yes, appart from the not clear user message, there isn't much to do. systemd waits for the session to finish, and that's the end of what systemd can and should do. everything else is probably a bug somewhere in the higher layers...

@andreyk0
Copy link

Thanks for clarification!
Would it be fare to assume then that most of the time users who come across this will need to debug their user session?
If so - the processes we run there are usually "not databases", could systemd's user session process have a "soft" and a "hard" time limits? E.g. normally things should drop in under a few sec, so it could wait that much and then start printing "waiting for your X user process, still running".
If I saw something like that - would definitely help me debug my session.

@boucman
Copy link
Contributor

boucman commented Dec 16, 2020

that would need to have systemd (session) be a "special case" for systemd (system) which is a questionnable design decision. I'm not a dev so I can't really voice how good that would be.

as far as this thread is concerned, I think a dev should provide one last, clear explanation of what this message means and the next step to debug the problem then lock the thread.

It's not because the thread is badly behaved, but at this point google is directing all sort of people with all sorts of problems with the same symptom here and it's confusing for them and useless for the project. Locking this would be better for everybody.

@poettering
Copy link
Member

Not sure why people think it's useful to keep posting on al old bug. Summary is this: some user service doesn't want to shut down on SIGTERM when systemd --user tries to end the session. Because of that things have to time-out, and because of that systemd --user itself has to wait before it can shutdown. The system instance of systemd (i.e. systemd --system) then reports that on screen as systemd --user not shutting down.

Figure out which user service is responsible. It's almost certain systemd is just the messenger here, and not at fault itself, but some user service that just doesn't want to die and thus causes everything else to hang too and systemd decides to take action after the timeout.

Check the system logs to see what service this is, this is a logged event.

last time this has been discussed among the developers we figured a nice approach to hande this better (i.e. make it easier to track down the faulty user service), is to make systemd --user use sd_notify() status text notifications aggressively to tell systemd --system what it is doing and waiting for. and then make systemd --system to include the most recent status text of the service in its console status output. That way you'd see on screen which user service is causing all this.

It's just a matter of actually implementing this.

And no, just shortening the timeout to someting small is not a "fix". It's an invitation to data loss really. We cannot just go and kill user stuff too agressively, it's very much possible that the system is just slow, thus any default timeout must be chosen beyond the time where "system is just slow" territory is left, and the territory of "things are clearly hung" is entered. And given that that line is blurry we better go for timeouts that are too long than too short.

@speakk
Copy link

speakk commented Dec 17, 2020

If you read through the comments though, user experience is in a big part in this. And that can't be fixed by "just fixing the various root causes", as there are various daemons and things causing this issue. This is issue will pop up time and time again.

Give users a way to kill the process forcefully if something is taking too long. It's up to them to judge the situation based on what is hanging, and with the changes you were talking about for showing which user service it is, users will have the information they need to know if they want to kill a process or not.

@poettering
Copy link
Member

If you hit C-A-D more than 7 times within 2s the system will instantly reboot, and not wait for offending services. It will still try to sync file systems and so on, but hanging services won't cause further delays. It's the escape hatch if shutdown is hung and you don' want t wait anymore. It's not a hard power off, and not a clean shutdown, but something reasonably in the middle.

@speakk
Copy link

speakk commented Dec 17, 2020

C-A-D 7 times to kill everything is not reasonable compared to just killing that one specific service that is hanging. It's an obscure way of killing (compared to say ctrl+c), and on top it does too much in comparison. Just let users force kill the one hanging service.

@poettering
Copy link
Member

I am sorry, but during system shutdown there's no UI running. We can only use the most basic if input events the kernel provides us with, C-A-D handling.

@andreyk0
Copy link

Would you at least consider giving users more hints in the message that we all googled for and ended up here?
E.g. when my system all of a sudden hung for 2min I didn't know that I could hit C-A-D 7 times and obviously there's no way to open a browser at that point and search.

Similarly, I've tried to search logs for this event but there's a ton of output and I don't know what exact string I'm searching for, I didn't notice it. So, assume I'm dumb and your system usually works well enough that I didn't even bother learning all of the commands, just give me a hint perhaps? Like "hey, on your next reboot try journalctl -x -y | grep foo to determine the root cause. Oh and if you want to force reboot now press C-A-D 7 times"?
If all of this is too long for a message - perhaps mention C-A-D escape hatch on the screen and print a link to get more info?

@boucman
Copy link
Contributor

boucman commented Dec 17, 2020

I don't think advertising that is a good idea...
7xCaD is basically equivalent to a long press on your power button. It's pretty much a last resort, not something we want people to do.

systemd should report what is blocking, but killing whatever is blocking is dangerous and really not the answer. Something is wrong on your system and you should investigate that.

As for how systemd reports what is blocking, @poettering suggested a way to do that, now we need someone to code it...

@meltdown03
Copy link

According to this, GNOME has merged a fix for this which adds the Slice=-.slice line to the gnome-session-restart-dbus.service file.

@ghost
Copy link

ghost commented Dec 18, 2020

For non-GNOME users like me who still have it happen, this seems to be caused by it trying to generate a crashdump but failing over and over again. For example, if I launch Discord, and while it downloads or installs an update, reboot/shutdown, then this happens.

Simple solution is to close out of everything, log out, and shutdown/reboot from SDDM. Unless SDDM tries to generate a crashdump. In which there's probably something horribly wrong with your esystem.

@IanPul
Copy link

IanPul commented Jan 22, 2021

Hi. I've been seeing the same symptom as described on my NAS after upgrading it to Fedora 33. I used debug-shell.service and ps xawuf as suggested above to diagnose it (thanks for that). In my case the culprit turned out to be Resilio-Sync (I have its Linux NAS-tools running as the phone-sync server on the NAS) :

Ian         1591  0.0  0.3  26008 15692 ?        Ss   17:15   0:00 /usr/lib/systemd/systemd --user
Ian         1593  0.0  0.1 178820  6636 ?        S    17:15   0:00  \_ (sd-pam)
Ian         1722  0.0  0.0 269388  3892 ?        Ss   17:15   0:00  \_ /usr/bin/dbus-broker-launch --scope user
Ian         1723  0.0  0.0   5132  2908 ?        S    17:15   0:00  |   \_ dbus-broker --log 4 --controller 10 --machine-id 6800f8c2a21b4bb39d7aff0cba6a3c16 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000
Ian         2581  0.0  2.8 1961260 110120 ?      Ssl  17:15   0:00  \_ /usr/bin/rslsync --config /home/Ian/.config/resilio-sync/config.json

I have the current latest version of resilio-sync installed (2.7.2.1375) so for anyone who might hit the same, here's the workaround:

"ps -e | grep rslsync" will show two running processes (mine did anyway). "sudo systemctl stop resilio-sync" will kill one of them, and you then need to do "sudo kill nnnn" to manually kill the other one. Once they have both gone away, a reboot then happens with no delay.

Ian

Edit: After disabling Resilio-Sync, then re-enabling it and re-configuring it from scratch, I now see only one rslsync process running, and the shutdown delay no longer happens. So this may have been caused by some configuration issue.

@Lillecarl
Copy link

I came here from Google too because the UX wasn't satisfactory.

The command I ended up using on next boot was
journalctl --no-pager -b -1 > journal.log
This will create a log file that you can browser through (start on last line and move up) and look for long pauses (seen by the time counter to the left) and then you'll eventually see what was killed.

I agree that the UX could be improved to assist people who aren't familiar with everything in their system. For me it seems like a KWin bug that I solved by creating a systemd service to kill it on shutdown.

@fnune
Copy link

fnune commented Apr 23, 2022

For users coming from search engines:

The Slice=-.slice fix didn't work for me. Using journalctl I noticed a process for fluidsynth was the one that was getting killed.

It's an off-hand workaround, but I resorted to just removing that package and the apps that depended on it (which I don't use: they're Totem, Cheese, Gnome Music and Gnome Photos).

pacman -Rs cheese
pacman -Rs totem
pacman -Rs gnome-music gnome-photos
pacman -Rs gst-plugins-bad
pacman -Rs grilo-plugins
pacman -Rs gnome-video-effects
pacman -Rs fluidsynth

There's probably a cleaner solution to be found by figuring out why fluidsynth is hanging at shutdown.

@shmerl
Copy link

shmerl commented Aug 21, 2022

I started getting it in recent versions of KDE Plasma (Wayland session) on Debian testing. systemd KDE session management is enabled.

What is the way to check which service hangs in the user session?

@Lillecarl
Copy link

@shmerl #12262 (comment)

You have to wait for it to finish on one shutdown though.

@shmerl
Copy link

shmerl commented Aug 21, 2022

But how to check what was wrong? I need to enable persistent logging and then to look for something?

@jimmyjules
Copy link

jimmyjules commented Oct 16, 2022

I first noticed this after updating to Debian Testing.

journalctl --no-pager -b -1 > journal.log
This will create a log file that you can browser through (start on last line and move up) and look for long pauses (seen by the time counter to the left) and then you'll eventually see what was killed.

Using these instructions by @Lillecarl I found the following entry in journal.log to be interesting.

dbus-daemon[3548]: [session uid=1000 pid=3548] Activating via systemd: service name='org.gnome.Terminal' unit='gnome-terminal-server.service' requested by ':1.81' (uid=1000 pid=5360 comm="gnome-terminal --wait")

As the message I get during shutdown and mentioned in the thread's topic is referencing to UID 1000. I use XFCE but apparently still some Gnome processes are running including gnome-terminal-server.

Now the issue disappeared.

@ken-okabe
Copy link

For users coming from search engines:

The Slice=-.slice fix didn't work for me. Using journalctl I noticed a process for fluidsynth was the one that was getting killed.

It's an off-hand workaround, but I resorted to just removing that package and the apps that depended on it (which I don't use: they're Totem, Cheese, Gnome Music and Gnome Photos).

pacman -Rs cheese
pacman -Rs totem
pacman -Rs gnome-music gnome-photos
pacman -Rs gst-plugins-bad
pacman -Rs grilo-plugins
pacman -Rs gnome-video-effects
pacman -Rs fluidsynth

There's probably a cleaner solution to be found by figuring out why fluidsynth is hanging at shutdown.

As an Arch Linux user, I encountered an issue even after a clean install. However, the suggested solution to remove certain packages has just worked. I appreciate your help.

@xDShot
Copy link

xDShot commented Feb 25, 2023

qbittorrent is suspected too

@aki-k
Copy link

aki-k commented Apr 19, 2023

This bug also occurs on a Rocky Linux 8.7 LXC container running on Fedora 36, kernel 6.2.10-100.fc36.x86_64, systemd-239-68.el8_7.4.x86_64.

If I've only logged into the container as root, the message shows "for UID 0".

Edit: here's journalctl -b -1 part that shows the 2 minute timeout:

Apr 19 18:50:22 systemd[1]: Stopped target Network is Online.
Apr 19 18:52:22 systemd[1]: user@0.service: Processes still around after final SIGKILL. Entering failed mode.
Apr 19 18:52:22 systemd[1]: user@0.service: Failed with result 'timeout'.
Apr 19 18:52:22 systemd[1]: Stopped User Manager for UID 0.
Apr 19 18:52:22 systemd[1]: Stopping User runtime directory /run/user/0...

@aquamarine-axo
Copy link

This is really annoying. (Almost) everyone here has had this problem, yet everyone has different solutions for it and none of them work for me.

I am still having this issue. Arch Linux, everything latest from Pacman.

@ghost
Copy link

ghost commented Sep 2, 2023

This is really annoying. (Almost) everyone here has had this problem, yet everyone has different solutions for it and none of them work for me.

I am still having this issue. Arch Linux, everything latest from Pacman.

Use OpenRC :)

@arti004
Copy link

arti004 commented Jan 11, 2024

Inside the file
/usr/lib/systemd/system/user@.service
and set TimeouStopSec to 4s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment