Replugging should be enough to clear failed state #616

Closed
jidanni opened this Issue Jul 18, 2015 · 5 comments

Comments

Projects
None yet
2 participants

jidanni commented Jul 18, 2015

At first I thought this was a mount bug,

# mount /var/lib/apt/lists; echo $?
0
# mount|grep /var/lib/apt/lists
#

If the thing didn't get mounted, should something be communicated to the user?

Looking further

# journalctl
kernel: EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
systemd[1]: var-lib-apt-lists.mount: Unit is bound to inactive unit dev-sdd1.device. Stopping, too.
systemd[1]: Unmounting /var/lib/apt/lists...
systemd[1]: Unmounted /var/lib/apt/lists.
systemd[1]: var-lib-apt-lists.mount: Unit entered failed state.

The problem is: if there has ever been a failure on sdd1, then that item
UUID=... is marked as failed for the rest of the session, until the
user reboots.

Even if he umounts, pull out the device, puts it back in again so it
is now on sdb1 as you see above, the fact that it one was marked bad
on sdd1 dogs him for the rest of the session, leaving the user no
other choice but to reboot.

Note the he can mount it just fine on anywhere other than
/var/lib/apt/lists .

So the problem is the var-lib-apt-lists.mount thing not getting
cleared, no matter what he does. The problem isn't with sdd1 or sdb1
above.

$ grep lists /etc/fstab
UUID=26a... /var/lib/apt/lists auto noauto,errors=remount-ro 0 0

Reading
http://linux-audit.com/auditing-systemd-solving-failed-units-with-systemctl/
we see the user is forced to use some systemd commands to fix it
whereas unplugging the device and plugging it back in should be enough
to fix it.

By the way, do you also feel I should report a bug against Debain
mount 2.26.2-8 for not making a message, or is something else not
informing mount that there was a failure?

Systemd 222-2 on Debian.

P.S., if you think about it, what could go wrong? He is inserting the device for the second time, why can't you just forget about the past?

Owner

poettering commented Jul 22, 2015

I am not sure I parse correctly what you are saying, but the "failed" state is automatically reset when a unit is started again. Either a unit is starting/started/stopping/stopped, or is it "failed", but it can never be both (because that's implemented as enum...). Also, if a unit is failed, then this will never have any effect on subsequent starts.

If a unit is failed it will not be GC'ed, until it is either restarted, or explicitly flushed out by the admin by invoking "systemctl reset-failed", so that the user can have a look at the unit and why it failed.

Now, the question is why the .mount unit entered failed state for you. What is the precise "systemctl status" and "systemctl show" output of the failed mount units after they failed?

jidanni commented Jul 29, 2015

I don't have that equipment with me as it is back in town, but in general, lets say a USB hub is the problem. Well a card reader plugged into it gets /dev/sdd1 marked bad, and no matter if the user unplugs is and switches to a different hub, the bad state is not cleared without him needing special commands as you have listed.

Owner

poettering commented Jul 31, 2015

Also, systemd knows no "failed" state for device units, hence I don't grok what you are saying at all. Again, please provide the systemctl show and status output I asked for!

jidanni commented Aug 1, 2015

jidanni commented Aug 15, 2015

OK now I am in town with the device. However as I now boot before plugging in any USB stuff (or else it (BIOS) won't boot) and everything is working, I would rather not try to induce a fault with my precious disk. Therefore closing this bug.

@jidanni jidanni closed this Aug 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment