New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"A stop job is running for User Manager for UID 1000" #12262
Comments
this is most likely some user service (i.e. a service that runs inside the user systemd instance) that blocks shutdown. Interestingly there's zero logging from that per-user systemd instance in your logs. Not sure why. Please enable the debug shell on Alt-F9 (by doing systemctl |
I tend to log as the faulty user on Alt-F9 and simply do systemctl --user list-jobs to find the faulty process... |
I am sorry, but I do not understand what I am supposed to do here.
How can I log in as a user on CTRL-ALT-F9? I tried several ways, including the |
@ysooqe you need enable |
most likely the shutdown status output just gets mixed up with the shell's output. Just type blindly, and when you hit enter you should still see the stacktrace. what do you see in "ps xawuf" when you are in that state? |
Unfortunately, I still can not get a useful output. It says somewhere, that there are no debug symbols installed. Since I am not a developer and have no idea what that means, I tried to search for it. Now, if I understood it correctly, I would have to compile systemd for myself with debug symbols enabled in order to get the desired output when I do And the output of
|
I also get this but very rarely. I managed to capture something with "script": https://gist.github.com/topimiettinen/a9984027b756405860c5e288132a247d |
This really looks like systemd lost child process events
What systemd version is it? |
@arvidjaar Debian Buster 241-3. |
@poettering |
Also in my call trace it seems that systemd doesn't realize that dbus has stopped and the bus is not available anymore. |
I recompiled the Debian package with symbols to get better backtraces. I managed to capture another backtrace. Here |
@topimiettinen thank you for taking the time to look into this! |
Alright, this is strange: |
After some changes in my setup, this now happens on every shutdown for me. The symptoms are different from what I had earlier (user systemd was alive but did not reap zombies), now user systemd has exited but it has left one user process as a zombie. If I kill it, shutdown continues. |
Here's another example. @poettering It seems that This story about BPF in LWN hints that there could be a bug in /proc where "occasionally the getdents() system call will return a partial result on /proc, causing the entry for the target process to be left out". Maybe a kernel bug could be the root cause. |
Looking at systemd --user logs, the kdeconnect process which becomes zombie is launched by D-Bus (not systemd) at the same time as shutdown is initiated:
D-Bus is stopped at 00:45:00:
Kdeconnect is still running after systemd --user exits:
But "ps" output had different PID of 1946 for kdeconnectd and there was no PID 847! This was from around 00:45:20:
Very very strange. Some thoughts:
|
I considered bisecting, but once I replace my version of systemd (debian/241-3-1-g7197cd7+, Debian 241-3 modified by PR #12523) with stock Debian 241-3, the problem disappears (or turns to rare again). I think the modification should not affect this issue as I did not use systemd-mount during that session. |
I tried also kernel 4.9.0-9 from Stretch instead of 4.19.0-5. Either the heisenbug disappeared or that's a good kernel version. But I noticed that the kdeconnectd zombie still has two live threads:
Killing one of them makes reboot continue. |
This program creates a similar zombie thread but that does not work the same way. Killing it does not trigger reboot. Killing kdeconnectd still triggers, despite of the presence of this additional zombie.
|
It's possible for a zombie process to have live threads. These are not listed in /sys in "cgroup.procs" but they show up in "tasks" nodes. When killing a cgroup, let's kill threads instead of processes, so the live threads of a zombie get killed too. Closes systemd#12262.
I think I found a fix. I did not see any shutdown problems in a few reboots. |
Because: 2020 https://psycnet.apa.org/doiLanding?doi=10.1037%2Fpne0000225 2019 https://edhub.ama-assn.org/jn-learning/video-player/17844517 2019 https://jamanetwork.com/journals/jamapsychiatry/fullarticle/2720691 2019 https://www.youtube.com/watch?v=JSCMVGiNGjQ 1963 https://www.toxicdocs.org/d/npR4MzrGZw39NKML5ov6L1y0G?lightbox=1 |
Incidentally:
That's not a systemd update. The |
Can someone reopen this until the fix is found? |
This does not work on kde |
This bug report is a mess and not actionable, so it doesn't make sense to re-open. |
Would it make sense to improve an error message? Perhaps mention what exactly it is waiting for, its process ID, etc? |
I would have to double check, but I think it already does that with a recent version of systemd. |
Thing is, there are two systemd involved in this message,
What you see here is a message from the system machine and it tells you what it is waiting for: it's waiting for the user session to finish. why doesn't the user session finish ? Go figure... that's where you have all the report each for a different case of "there is a user dameon that doesn't finish" So yes, appart from the not clear user message, there isn't much to do. systemd waits for the session to finish, and that's the end of what systemd can and should do. everything else is probably a bug somewhere in the higher layers... |
Thanks for clarification! |
that would need to have systemd (session) be a "special case" for systemd (system) which is a questionnable design decision. I'm not a dev so I can't really voice how good that would be. as far as this thread is concerned, I think a dev should provide one last, clear explanation of what this message means and the next step to debug the problem then lock the thread. It's not because the thread is badly behaved, but at this point google is directing all sort of people with all sorts of problems with the same symptom here and it's confusing for them and useless for the project. Locking this would be better for everybody. |
Not sure why people think it's useful to keep posting on al old bug. Summary is this: some user service doesn't want to shut down on SIGTERM when systemd --user tries to end the session. Because of that things have to time-out, and because of that systemd --user itself has to wait before it can shutdown. The system instance of systemd (i.e. systemd --system) then reports that on screen as systemd --user not shutting down. Figure out which user service is responsible. It's almost certain systemd is just the messenger here, and not at fault itself, but some user service that just doesn't want to die and thus causes everything else to hang too and systemd decides to take action after the timeout. Check the system logs to see what service this is, this is a logged event. last time this has been discussed among the developers we figured a nice approach to hande this better (i.e. make it easier to track down the faulty user service), is to make systemd --user use sd_notify() status text notifications aggressively to tell systemd --system what it is doing and waiting for. and then make systemd --system to include the most recent status text of the service in its console status output. That way you'd see on screen which user service is causing all this. It's just a matter of actually implementing this. And no, just shortening the timeout to someting small is not a "fix". It's an invitation to data loss really. We cannot just go and kill user stuff too agressively, it's very much possible that the system is just slow, thus any default timeout must be chosen beyond the time where "system is just slow" territory is left, and the territory of "things are clearly hung" is entered. And given that that line is blurry we better go for timeouts that are too long than too short. |
If you read through the comments though, user experience is in a big part in this. And that can't be fixed by "just fixing the various root causes", as there are various daemons and things causing this issue. This is issue will pop up time and time again. Give users a way to kill the process forcefully if something is taking too long. It's up to them to judge the situation based on what is hanging, and with the changes you were talking about for showing which user service it is, users will have the information they need to know if they want to kill a process or not. |
If you hit C-A-D more than 7 times within 2s the system will instantly reboot, and not wait for offending services. It will still try to sync file systems and so on, but hanging services won't cause further delays. It's the escape hatch if shutdown is hung and you don' want t wait anymore. It's not a hard power off, and not a clean shutdown, but something reasonably in the middle. |
C-A-D 7 times to kill everything is not reasonable compared to just killing that one specific service that is hanging. It's an obscure way of killing (compared to say ctrl+c), and on top it does too much in comparison. Just let users force kill the one hanging service. |
I am sorry, but during system shutdown there's no UI running. We can only use the most basic if input events the kernel provides us with, C-A-D handling. |
Would you at least consider giving users more hints in the message that we all googled for and ended up here? Similarly, I've tried to search logs for this event but there's a ton of output and I don't know what exact string I'm searching for, I didn't notice it. So, assume I'm dumb and your system usually works well enough that I didn't even bother learning all of the commands, just give me a hint perhaps? Like "hey, on your next reboot try journalctl -x -y | grep foo to determine the root cause. Oh and if you want to force reboot now press C-A-D 7 times"? |
I don't think advertising that is a good idea... systemd should report what is blocking, but killing whatever is blocking is dangerous and really not the answer. Something is wrong on your system and you should investigate that. As for how systemd reports what is blocking, @poettering suggested a way to do that, now we need someone to code it... |
According to this, GNOME has merged a fix for this which adds the |
For non-GNOME users like me who still have it happen, this seems to be caused by it trying to generate a crashdump but failing over and over again. For example, if I launch Discord, and while it downloads or installs an update, reboot/shutdown, then this happens. Simple solution is to close out of everything, log out, and shutdown/reboot from SDDM. Unless SDDM tries to generate a crashdump. In which there's probably something horribly wrong with your esystem. |
Hi. I've been seeing the same symptom as described on my NAS after upgrading it to Fedora 33. I used debug-shell.service and ps xawuf as suggested above to diagnose it (thanks for that). In my case the culprit turned out to be Resilio-Sync (I have its Linux NAS-tools running as the phone-sync server on the NAS) :
I have the current latest version of resilio-sync installed (2.7.2.1375) so for anyone who might hit the same, here's the workaround: "ps -e | grep rslsync" will show two running processes (mine did anyway). "sudo systemctl stop resilio-sync" will kill one of them, and you then need to do "sudo kill nnnn" to manually kill the other one. Once they have both gone away, a reboot then happens with no delay. Ian Edit: After disabling Resilio-Sync, then re-enabling it and re-configuring it from scratch, I now see only one rslsync process running, and the shutdown delay no longer happens. So this may have been caused by some configuration issue. |
I came here from Google too because the UX wasn't satisfactory. The command I ended up using on next boot was I agree that the UX could be improved to assist people who aren't familiar with everything in their system. For me it seems like a KWin bug that I solved by creating a systemd service to kill it on shutdown. |
For users coming from search engines: The It's an off-hand workaround, but I resorted to just removing that package and the apps that depended on it (which I don't use: they're Totem, Cheese, Gnome Music and Gnome Photos).
There's probably a cleaner solution to be found by figuring out why |
I started getting it in recent versions of KDE Plasma (Wayland session) on Debian testing. systemd KDE session management is enabled. What is the way to check which service hangs in the user session? |
You have to wait for it to finish on one shutdown though. |
But how to check what was wrong? I need to enable persistent logging and then to look for something? |
I first noticed this after updating to Debian Testing.
Using these instructions by @Lillecarl I found the following entry in journal.log to be interesting.
As the message I get during shutdown and mentioned in the thread's topic is referencing to UID 1000. I use XFCE but apparently still some Gnome processes are running including gnome-terminal-server. Now the issue disappeared. |
As an Arch Linux user, I encountered an issue even after a clean install. However, the suggested solution to remove certain packages has just worked. I appreciate your help. |
qbittorrent is suspected too |
I hope this is the right place to report/post/ask this.
When I want to shutdown or reboot my system (using
reboot
orshutdown now
), the system is not shutting down immediately but rather waits 1m30s and displays the following:A stop job is running for User Manager for UID 1000 (1min 10s / 2min)
I tried to find out what is causing this, creating a log according to Debugging.
Unfortunately, I can not find out myself what is causing the issue. What can I do to solve this issue?
I attached the log to the end of this report.
systemd version the issue has
been seen with
Used distribution
Expected behaviour you didn't see
Unexpected behaviour you saw
Steps to reproduce the problem
Shutdown the system
shutdown-log.txt
The text was updated successfully, but these errors were encountered: