/var stays busy at shutdown due to journald #867

Open
dunca opened this Issue Aug 5, 2015 · 44 comments

Comments

Projects
None yet

dunca commented Aug 5, 2015

Hello.

Failed unmounting /run/user/1000
Failed unmounting /home

It's been happening when shutting down or rebooting. It doesn't seem to happen with systemd 222.

More info: https://bbs.archlinux.org/viewtopic.php?id=200469

Owner

poettering commented Aug 5, 2015

Judging from the linked logs it fails due to EBUSY... There appears to be something still accessing the dir when we are about to go down...

If you say this has been broken with systemd > 222, but not with 222, any chance you can git bisect to check which commit actually broke this?

ledti commented Aug 10, 2015

I believe the bug was introduced in 7189be2. Before that commit, I can't reproduce the error message during reboot.

Contributor

eworm-de commented Aug 10, 2015

I do see a similar problem with /var/log:

Aug 10 19:33:10 thebe systemd[1]: Stopped Flush Journal to Persistent Storage.
Aug 10 19:33:10 thebe systemd[1]: Unmounting /var/log...
Aug 10 19:33:10 thebe umount[11998]: umount: /var/log: target is busy
Aug 10 19:33:10 thebe umount[11998]: (In some cases useful info about processes that
Aug 10 19:33:10 thebe umount[11998]: use the device is found by lsof(8) or fuser(1).)
Aug 10 19:33:10 thebe systemd[1]: var-log.mount: Mount process exited, code=exited status=32
Aug 10 19:33:10 thebe systemd[1]: Failed unmounting /var/log.

I think it startet between v222 and v223, though I am not perfectly sure.

Harv234 commented Aug 18, 2015

My /var and /home are on separate partitions encrypted with dm-crypt and are also affected. Downgrading to systemd 222 fixed this for me.

ledti commented Sep 2, 2015

I can definitely confirm that it's caused by 7189be2. Reverting that sole commit in either v224 or v225 fixes the issue for me.

I used git revert -n -m 1 7189be2 when building the release. I'm not sure how sane that is, but it works.

Why is this bug still tagged needs-reporter-feedback ? I am as well effected by this bug and can reproduce 100% reliably.

Harv234 commented Sep 9, 2015

Still valid in systemd 226. This is quite annoying as it affects a lot of people... See
https://bbs.archlinux.org/viewtopic.php?id=200469 and
https://bbs.archlinux.org/viewtopic.php?pid=1554049

Owner

poettering commented Sep 18, 2015

@ledti are you using automount units for anything? If you think 7189be2 is your issue, then this could only be triggered if you actually use automounts?

gnufied commented Sep 18, 2015

According to archwiki(https://wiki.archlinux.org/index.php/Fstab) each entry in /etc/fstab is converted to a systemd mount unit. @poettering is Arch doing this wrong?

Owner

poettering commented Sep 18, 2015

@gnufied hmm? the conversion from fstab to .mount units is part of systemd, and is what should happen.

gnufied commented Sep 18, 2015

Sorry, I was confused between systemd.mount and systemd.automount. I for one, is not using automount options. Mine is pretty standard configuration (still seeing this bug).

I however see now that, users who are using encryption might be using autmounts.

ledti commented Sep 18, 2015

@poettering

I'm not sure. If automount units are something unrelated to fstab, then no. I only use /etc/fstab.

tim@avedust ~ % locate automount
/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.automount
/usr/lib/systemd/system/sysinit.target.wants/proc-sys-fs-binfmt_misc.automount
/usr/share/man/man5/systemd.automount.5.gz

Here's my current fstab. Besides root, /boot, and /home, I only have one other partition (sdb1) that is auto-mounted at boot. It's on a separate internal hard drive.

No automounts here either, but I'm also seeing these failures. They don't happen on every shutdown, but unmounting /run/user/1000 often fails. Unlike the original reporter, I don't have /home on a separate partition so there aren't errors about that. I also once saw a similar failure with /tmp. This has been happening with both v225 and v226. I reverted 7189be2 and the issue hasn't been happening since.

It seems systemd starts unmounting remaining mounts too early on shutdown. This can particularly affect user services like profile-sync-daemon which make use of $XDG_RUNTIME_DIR (mine being /run/user/1000). This log is from a shutdown before reverting the mentioned commit. Notice that unmounting /run/user/1000 starts already on line 50 where user services (particularly profile-sync-daemon) haven't even exited yet. On a shutdown after reverting the commit, the log shows clearly that unmounting /run/user/1000 takes place in the very end of the shutdown process (line 124).

profile-sync-daemon also uses and manages overlayfs mounts on $XDG_RUNTIME_DIR for browser profiles. I believe that due to this same bug systemd might unmount those overlays before profile-sync-daemon has the opportunity to sync the profiles back to disk, causing data loss. For example, the Firefox overlay is unmounted already on line 38 (by systemd?) but psd starts unsyncing on line 61. In the healthy log, the psd unsync begins on line 55 and the Firefox overlay is unmounted by psd after that, on line 57.

I too have a problem with /var/log using v226 on Arch.

Sep 22 19:42:21 synapse umount[6381]: umount: /var/log: target is busy
Sep 22 19:42:21 synapse umount[6381]: (In some cases useful info about processes that
Sep 22 19:42:21 synapse umount[6381]: use the device is found by lsof(8) or fuser(1).)
Sep 22 19:42:21 synapse systemd[1]: var-log.mount: Mount process exited, code=exited status=32
Sep 22 19:42:21 synapse systemd[1]: Failed unmounting /var/log.

I will revert 7189be2 and report back here.

Building systemd from ABS inserting the patch in the prepare function:
patch -R -Np1 -i $srcdir/revert.patch

Where revert.patch = https://github.com/systemd/systemd/commit/7189be2728ae2926500371cda6f13f9cdc06c21b.patch

ggg377 commented Sep 23, 2015

I have a similar, but so far unique error on shutdown since the latest systemd upgrade. This can be reproduced on a fresh Arch Linux install with full dm-crypt LUKS encryption:

Sep 23 01:58:04 arch login[380]: pam_systemd(login:session): Failed to release session: Connection reset by peer Sep 23 01:58:04 arch systemd-logind[374]: Failed to abandon session scope: Transport endpoint is not connected

I can also confirm the /run/user/1000 unmounting errors on shutdown and if I add my cache folder into tmpfs via fstab, then that also fails to unmount occasionally. The latest systemd release also causes data loss when used with profile-sync-daemon with overlayfs on. None of this happened before 226.

You can logout all users, then use powerbutton to shutdown. bypassing this issue :)

@atmouse- That's not completely true, as some people are seeing this issue with other that user mounts, like /var/log and /tmp (which I've seen at least once).

Reverting this commit has not solved my problem.

ledti commented Oct 8, 2015

Since upgrading to 227, I haven't been able to reproduce this bug.

@ledti I can confirm that. Been running systemd 227 since yesterday. There have been several reboots with normal usage between them, and the error hasn't appeared a single time. I've checked the journals from all the shutdowns, and the unmounting race conditions I described above don't seem to appear any more.

Harv234 commented Oct 8, 2015

The error 'Failed unmounting /var/log' still persists with systemd 227. Seems it does not like /var on a separate partition encrypted via luks decrypted via /etc/crypttab (UUID) and mounted via /etc/fstab (/dev/mapper/var).

@ledti I can confirm that too. No errors since systemd 227.

Contributor

eworm-de commented Oct 15, 2015

I still see Failed unmounting /var/log with systemd 227. Looks like we have a race with stopping systemd-journald here...
Stopping systemd-journald.socket, systemd-journald-dev-log.socket and systemd-journald.service before shutdown makes the error go away.

Harv234 commented Oct 15, 2015

This is interesting. We saw this error before, IIRC... The workaround was to specify Storage=volatile in /etc/systemd/journald.conf. But this time this does not help.

Contributor

eworm-de commented Oct 15, 2015

Storage=volatile makes the logs being stored in memory (/run/log/journal/) only, not on disk. This is a real productive system, so I want my logs to go to disk.

Harv234 commented Oct 15, 2015

Yes, I know that. As I said, it was only a workaround. And it does not work anymore, although /var/log is not used in this case. BTW, my error says: "error unmounting /var", not "/var/log". Hmmm. Relevant?

Owner

poettering commented Oct 15, 2015

There are two issues being intermingled here...

  1. the unmount errors coming from the systemd user instance (recognizable in the logs by having a pid != 1). This is fixed in git, and backported to some distros, #1516.
  2. the unmount errors for /var/log/journal (or any partitions that are prefix of this). This is actually expected: we keep journald running till the very end of the system, it's not killed during the normal shutdown phase, but only in the final killing spree. At that time it will continue to write to /var/log/journal. The primary reason why this is done that way is that we currently don't have a nice IPC interface to migrate logging back from /var to /run (the inverse of how we migrate from /run to /var during early boot, as soon as /var becomes available). Ideally, we'd just provide a bus method for this operation, that we could call during shutdown, but we cannot do that, as dbus-daemon is a client to journald, and that would hence result in a cyclic dep loop, if we require dbus to be up to shut down journald, but dbus-daemon requiring dbus-daemon during its operation to log. The plan was always to fix this properly, as soon as kdbus is available (since IPC is then a kernel subsystem and always available), but that's taking much longer than hoped of course. This specific issue is pretty cosmetic actually, as all you see is the EBUSY error messages during shutdown, but the file system will be unmounted during the final killing spree, so all should be good.

Anyway, i will now rename this one, and make this strictly about issue 2 above, as issue 1 should be fixed.

@poettering poettering changed the title from " Failed unmounting /run/user/1000; Failed unmounting /home " on systemd > 222 to /var stays busy at shutdown due to journald Oct 15, 2015

See also this downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=1026119
This issue is present since at least November 2013 in Fedora 20.

Contributor

johannbg commented Jan 26, 2016

This bug probably only affects separated partitions which are encrypted.

This bug probably only affects separated partitions which are encrypted.

No, I have a simple bind mount in my fstab and I experience it:

LABEL=cache  /var/cache   ext4 defaults,relatime 0 2
/var/cache/log /var/log   none rw,bind 0 0

Is this being looked into? Just set up separate /var partition and now receive the disconcerting failed unmount message on system shutdown. This is using Archlinux.

Same issue occurs using a separate encrypted partition that is auto-mounted at boot time via the fstab. In my case it's the /home folder.

Going try and workaround the problem by using autofs since I was meaning to get that entry out of the fstab, but not sure if that will really matter.

The host OS is running Gentoo w/ systemd-229.

pklapperich commented Jun 17, 2016

On a live system I can systemctl stop systemd-journald.service systemd-journald.socket and systemd-journald-dev-log.socket and then I'm able to remount /var/log read only.

Presumably systemd shuts down journald and associated sockets before the system is actually rebooted. Couldn't umounting /var or /var/log be delayed until then?

Same issue on Fedora 24.

Contributor

mbiebl commented Sep 28, 2016

If you add

[Unit]
...
RequiresMountsFor=/var/log/journal

to systemd-journald.service, does the error message go away?

genodeftest commented Sep 28, 2016

Yes, adding RequiresMountsFor=/var/log/journal into the [Unit] section fixes this bug for me (on Fedora 25 Alpha). Didn't expect the solution to be that easy ;)

Contributor

mbiebl commented Sep 28, 2016

Replying to myself: Adding the RequiresMountsFor=/var/log/journal to systemd-journald.service has a couple of unpleasant side-effects

  • If no persistent journal is used but /var is on a separate partition, we start journald later then necessary during boot
  • On shutdown, since the systemd-journald.socket is still running, due to socket activation the start of systemd-journald.service will be triggered after it has been stopped, resulting in error messages like:
[   46.992034] systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
[   47.015034] systemd[1]: systemd-journald.socket: Failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction is destructive.
[   47.156341] systemd[1]: systemd-journald.socket: Unit entered failed state.
  • systemd-journald is stopped a bit earlier during shutdown, thus collecting fewer log messages.

So I'm a bit undecided whether adding RequiresMountsFor would actually be a good idea

@poettering Could we simply exclude any mounts below /var from being unmounted via Conflicts=umount.target?

Contributor

mbiebl commented Sep 28, 2016

Excluding /var from being umounted via umount.target should be as simple as adding it to

static bool should_umount(Mount *m) {
        MountParameters *p;

        if (PATH_IN_SET(m->where, "/", "/usr") ||
Owner

poettering commented Oct 6, 2016

@mbiebl see my earlier comments. THis is a known issue, and we are unlikely to fix this anytime soon, as journald keeps /var/log/journal busy, but needs to stay around until the last moment of shutdown so that it collects logs.

The only way to fix this properly would be to add a way to tell the journal to move its logs from /var back to /run at some time before shutting down, and only then trying to unmount /var. This would be the inverse of systemd-journal-flush.service which triggers the move /run → /var at boot. Implementing this isn't trivial however, as the move operation at shutdown needs to be synchronous to be safe, and thus a plain unix signal is not enough. Ideally we'd just add a bus call to journald for this, but we can't due to the weird relationship between the journal and dbus-daemon: as dbus-daemon logs to journald the latter can't also be client to the former, to avoid deadlocks.

I see no easy fix for the moment.

That said, the issue is purely cosmetic IRL. The final killing spree (where journald is finally gone) will be able to unmoiunt /var anyway, hence the earlier umount failing might be annoying but isn't much of an issue.

Contributor

mbiebl commented Oct 6, 2016

@poettering Why this complex logic on shutdown? If we move journald to ephemeral storage, the log messages will be lost, so we could just as well stop journald

Owner

poettering commented Oct 6, 2016

@mbiebl I am pretty sure we should collect logs all the way through the end, even if we have to store them on tmpfs during late shutdown and thus lose them on reboot. That#s because we won't lose them until the actual physical reboot happens, hence if you are trying to debug stuff, then you can still do so in some final debugging shell, for example in the initrd shutdown hooks, or just from the /etc/halt script thingy.

rathann commented Jan 18, 2017

I'm also affected by this, having separate /var LV on a LUKS-encrypted VG. @mbiebl thanks for the work-around. It works exactly as you described.

genodeftest commented Jan 18, 2017

In reference to #867 (comment) by @mbiebl of adding RequiresMountsFor=/var/log/journal to systemd-journald.service:
I don't get the error messages you wrote in #867 (comment). Are you sure there is no other issue involved?
Running systemd-231-11.fc25.x86_64 on Fedora 25.

hlaube1 commented Nov 1, 2017

Adding RequiresMountsFor=/var/log/journal to systemd-journald.service helped here as well. Shutdown no runs the laptop completly down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment