Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't freeze system when /etc/mtab is not a symlink #1495

Closed
martinpitt opened this issue Oct 8, 2015 · 45 comments · Fixed by #1754
Closed

Don't freeze system when /etc/mtab is not a symlink #1495

martinpitt opened this issue Oct 8, 2015 · 45 comments · Fixed by #1754
Assignees
Labels

Comments

@martinpitt
Copy link
Contributor

This was already discussed in PR #986, but didn't get finished. systemd 227 now freezes the system at boot if /etc/mtab is not a symlink, but a file. This is really obnoxious, as that gives the user zero chance to actually unbreak the system after this happened. Remember, this isn't anything a vendor/distro can influence -- have some weird backup/restore program/strategy, some weird script to run sed -i on the thing, or whatnot -- we've seen actual bug reports where this happened.

At the very least there should be a rescue shell, so that you have the chance to clean up. But then again, there's only one solution, and e. g. in Debian we have a unit that turns /etc/mtab back into a symlink on boot, but with the early freeze this now won't help any more. The visible effect is now that we moved from "your system fixes itself" to "your system became completely useless and hard to fix" (remote servers!), which is not at all an improvement.

Nobody likes /etc/mtab, it should have burned and died many years ago, and we all agree that it is an utter misconception. So I rather ask: Why are we still paying attention to it?

systemd itself only uses it in shell-completion/zsh/_udevadm (which is trivially fixable to use /proc/mounts instead), and tmpfiles.d/etc.conf.m4, which could just stay as it is. All other references are just for asserting that /etc/mtab is a symlink.

This was merged because allegedly util-linux' mount monitor breaks if /etc/mtab is a file -- so it seems to me that this is the place to fix: Pretty please let's just make the mount monitor stop paying attention to a potentially broken /etc/mtab and always use /proc. Isn't that what --enable-libmount-force-mountinfo is supposed to do anyway? Can't that just become the default, and we declare /etc/mtab dead once and for all?

@karelzak, what exactly in util-linux still pays attention to /etc/mtab, and can we make that stop?

I'm happy to send a PR with that cleanup, which will be straightforward but let's agree to what we'll do first.

Thanks for considering!

@johannbg
Copy link
Contributor

johannbg commented Oct 8, 2015

Hmm I thought @karelzak had already finished this with...
"We want error on regular mtab, if you have regular mtab, then your system is no compatible with systemd. That's all the sorry."

@martinpitt
Copy link
Contributor Author

I saw that, but freezing your entire system is the worst possible reaction to that IMHO -- why can't we just ignore mtab or fix it during boot again?

@teg
Copy link
Contributor

teg commented Oct 9, 2015

I'm with @martinpitt on this. util-linux should just be taught to not care at all about /etc/mtab, and systemd should certainly not care about it as it is internal util-linux api. That said, there is a transition problem to sort out. I'd be open to some generic facility for "notice that something is fucked, but try to boot into emergency.target rather than freezing", which could be used by the /etc/mtab check until util-linux has been fixed.

@martinpitt
Copy link
Contributor Author

FTR, we've shipped http://anonscm.debian.org/cgit/pkg-systemd/systemd.git/tree/debian/extra/units/debian-fixup.service for many years now (it's even in wheezy), to cleanup a broken /etc/mtab symlink during boot. If systemd wants to care about this, perhaps this would be something to adopt upstream, instead of the freezing?

@karelzak
Copy link
Contributor

karelzak commented Oct 9, 2015

The current (in-git tree) libmount monitor does not check for the mtab file anymore, but it's not released yet. It will be in v2.27.1 (maybe next week?).

util-linux/util-linux@0250174

Anyway, I think it's good idea to check system during systemd startup. It's crazy idea to ignore that there is a regular /etc/mtab file, because the file maybe used by many another utils (including 3rd party programs, mount.type helpers or local shell scripts, etc.).

I agree that freeze boot is overkill (and I have never suggested this), maybe start rescue shell will be better solution, or (I vote for): print warning and unlink() +
symlink() to fix the problem automatically during systemd startup. That's simple solution to make everyone happy ;-)

@martinpitt
Copy link
Contributor Author

(I vote for): print warning and unlink() + symlink() to fix the problem automatically during systemd startup

That's basically what we do with the external unit. Happy to do that in code, if @poettering agrees. Thanks!

@poettering
Copy link
Member

Well, so, the thing is that the file is part of glibc API via setmntent() and _PATH_MOUNTED.

I am not sure I understand why Ubuntu still runs into issues with this all... /etc/mtab is dead since 4+ years, and there are still up-to-date Ubuntu systems around with it as a regular file? Also, we have been shipping a tmpfiles snippet the fixes the symlink since more than a year too in systemd, is Ubuntu not shipping that? I really don't get why this is popping up now on Ubuntu.

Generally, there are tons of ways how you can fuck up your system. For example, remove /usr/lib/libc6.so or so... I am not sure why having a broken /etc/mtab file should be something we should accept...

We need no unlink() + symlink() in code, since we already have it in tmpfiles -- and that as mentioned since quite some time. And that's really the right place, since you can only make such a change after / got remounted read-write, hence moving that into earlier boot will not work...

Also, initrds have rescue shells built-in, it's not that this was unfixable...

Anyway, I really don#t understand how this is popping up now. This can only happen if somebody mixes a really old /etc with a very new /usr -- or if ubuntu's packaging is borked and doesn't change the file to become a symlink, and also didn't bother with the tmpfiles snippet... But all of that are packaging issues, not sure why this should be an issue upstream?

Where did this issue pop up exactly btw, can you point me to some real-life bug report?

I am fine with making systemd ignore the file all-together, but that really means that util-linux should ignore it too, and glibc as well... Maybe file a bug against gibc to change the _PATH_MOUNTED define?

@poettering poettering added the pid1 label Oct 9, 2015
@martinpitt
Copy link
Contributor Author

and there are still up-to-date Ubuntu systems around with it as a regular file

This isn't distro specific. Nothing in the distro creates this (except for stupid bugs like http://launchpadlibrarian.net/197406177/lxcfs_0.5-0ubuntu1_0.5-0ubuntu2.diff.gz which just get fixed). As I wrote, the primary concern is about keeping /etc/ in revision control, under backup, trying to edit it, etc. -- there's any number of ways how that symlink can become a file again.

My point isn't that this will happen often, but that it's something which a distro only has limited control over -- and if it does happen it hits you really hard now: instead of cleaning this up automatically it just shreds your system now, for no good reason.

Maybe file a bug against gibc to change the _PATH_MOUNTED define?

Done: https://sourceware.org/bugzilla/show_bug.cgi?id=19108

@poettering
Copy link
Member

Is the lxcfs issue the only case where this popped up? Is this issue otherwise theoretical?

Honestly, for that lxcfs issue I think I don't feel too bad to have broken that... ;-)

@poettering
Copy link
Member

Thanks for filing the glibc bug...

@martinpitt
Copy link
Contributor Author

Is the lxcfs issue the only case where this popped up?

That's the one I remember the link to. We also got some cases when people switch between e. g. upstart and systemd, or others, but I don't have links handy. But e. g. restoring an /etc/ backup isn't a totally far-fetched scenario?

Honestly, for that lxcfs issue I think I don't feel too bad to have broken that... ;-)

Certainly not for that -- in the distro we must fix such silly bugs. :-)

@johannbg
Copy link
Contributor

johannbg commented Oct 9, 2015

@martinpitt the hammer approach ( as un-likeable as it is), is the only means to have people adapt their workflow to current times as well as being the only means to identify why people in this particular case are still using mtab. ( And people will always "voice their concerns" more or less only when you break backwards compatibility and or remove legacy cruff )

Distribution that support multiple init system will have to carry the additional load in doing so downstream with them do they not?

@poettering
Copy link
Member

Anyway, happy to drop the whole mtab crap, but we should sync that up with util-linux, and add a versioned dep on that release. glibc would be good too, but I figure there's no need to wait for that...

@poettering
Copy link
Member

I dropped some parts of the mtab crap (not the boot-time halt though) with #1516... Please review, and press the green button!

@martinpitt
Copy link
Contributor Author

Anyway, happy to drop the whole mtab crap, but we should sync that up with util-linux, and add a versioned dep on that release.

Sounds good, thanks! I'll post a PR once there's an util-linux 2.27.1 (and update my PPA for semaphore).

@martinpitt martinpitt self-assigned this Oct 13, 2015
@stinky1
Copy link

stinky1 commented Oct 16, 2015

FYI: This morning I did a fresh install of Debian on a lenovo T410 laptop and ran into this exact issue. The machine is unusable at this point since I have no way of correcting the problem. Booting to recovery gives me the same error and hard stop.

The debian iso used http://cdimage.debian.org/cdimage/weekly-builds/amd64/iso-cd/debian-testing-amd64-CD-1.iso

@stinky1
Copy link

stinky1 commented Oct 16, 2015

Attached is a pic from my phone as proof. I'd be happy to answer questions if anyone has any.

p_20151016_093344

@martinpitt
Copy link
Contributor Author

@stinky1: Do you know what created /etc/mtab as a file? As said, for the past two years or so we've shipped debian-fixup.service that turns it into a symlink on every boot -- i. e. there must be something on your system which turned it back to a file very recently.

@IlyaSukhanov
Copy link

Encountered same problem as @stinky1 clean install of debian stretch. Was able to work around the problem by downgrading to libsystemd0 and systemd packages to 226-4. Originally was 227-2 and 215-17+deb8u2 respectively.

@stinky1
Copy link

stinky1 commented Oct 16, 2015

I am sorry but I am not a linux power user by any means so I can't answer your question on what created /etc/mtab. I just popped in my flash drive, booted from it and clicked next a bunch of times to complete the install. The screen shot you see is the very 1st boot.

The only NON standard thing I did was to change the root volume from 10g to 20gigs. There is no uefi enabled on this laptop, mate was selected as the desktop.

@stinky1
Copy link

stinky1 commented Oct 16, 2015

If there are any logs you may want to look at from this install I can get them for you just be very specific. Full path to file please.

Thanks

@martinpitt
Copy link
Contributor Author

Please take this particular issue to downstream debian bugs (https://www.debian.org/Bugs/Reporting) and describe what you installed. Presumably there's some scenario where the installer writes an /etc/mtab file.

@stinky1
Copy link

stinky1 commented Oct 16, 2015

Please do not take this the wrong way but I have never filed a bug, the process to file an email bug (system wont boot) seems very strict in its format & categorization of the problem and I have read too many times about the potential for hostility on these mailing lists especially when systemd is involved.

I am not certain I am willing to undertake that kind of responsibility. I would like to help but I am not into opening myself for abuse.

@joshtriplett
Copy link
Contributor

Is there some fundamental reason that systemd needs to stop and refuse to continue, rather than, for instance, moving mtab aside and creating the proper symlink, or bind-mounting over it, or otherwise doing something that allows it to continue?

@IlyaSukhanov
Copy link

Filed a downstream bug report with debian.

@djask
Copy link

djask commented Oct 17, 2015

I have the EXACT same problem as stinky1. I'm only a basic linux user so I would appreciate if somebody instructed me on how to "downgrade" packages, etc.

@meden
Copy link

meden commented Oct 17, 2015

Hello.
I also hit this bug installing Debian testing (Stretch) from a netboot image (via bootable USB stick).

FWIW, my two cents: render a system unusable due to an automatically fixable thing is just crazy (sounds like: "if the world is different than the one I desire, then the whole world is wrong, I'm not responsible at all and I will do nothing to fix things: you should have known it and now are on your own, moron". Not much user-friendly...).

Rants apart, an hint to recover broken systems for those using Grub and stumble upon this bug report: at the boot menu, modify the boot entry on linux command line adding init=/bin/bash and boot. The system will go in a recovery shell where one can remount / rw and manually create the symlink /etc/mtab -> /proc/self/mounts (or it should be simply /proc/mounts? On my machine the latter is in its turn a symlink...).

Have a nice day and thanks for you work.

@poettering
Copy link
Member

@joshtriplett /etc is read-only during early boot. And a mentioned earlier, we already have a tmpfiles rule that makes /etc/mtab a symlink, and it runs in later boot, as soon as /etc is writable, if it ever is. We had that for a long time. If distros still don't get this right then there's a good chunk of blame to put on them really... Especially as this is documented in NEWS, and not particularly hard to fix from a package postinst script.

@Snigelson
Copy link

Maybe I am missing a step in the logic here, but what good is a script that runs later in the boot process to prevent the system from halting at the beginning of the boot process? Why not just ignore the file, and let it be fixed later on?

@joshtriplett
Copy link
Contributor

@poettering Even if /etc is read-only, can't systemd fix this with a bind mount, as long as /etc/mtab exists at all?

@MagicFab
Copy link

@meden thank you for the hint, tried the workaround and system now booted!

  • When booting, reach the GRUB menu, press "e" to edit the first entry
  • Add init=/bin/bash to the linux line, press Ctrl-x to boot
  • At the prompt (should be "#"), type:

mount -o remount /
rm /etc/mtab
ln -s /proc/self/mounts /etc/mtab
reboot

You should now have a bootable system.

@MiheeJo
Copy link

MiheeJo commented Oct 17, 2015

@MagicFab Thank you so much for the workaround. Finally after spending a day for installation, I found this and it works perfectly! Thank you again!

@djask
Copy link

djask commented Oct 17, 2015

Hi, I'm trying MagicFab's method but I get
"cannot set terminal process group (-1): Inappropriate ioctl for device.
no job control in this shell"
when I try to use init=/bin/bash command.
My keyboard does not work at all so I can't use any commands after that.

EDIT: I've fixed the problem by creating the symlink with boot-repair-disk.
Thanks to everyone here for the instructions and resources!

@eikesauer
Copy link

Thanks MagicFab & meden, you saved my installation!
(I'm sure it is totally undesirable and probably even impossible to boot up a system if /etc/mtab exists.)

@martinpitt
Copy link
Contributor Author

Thanks @IlyaSukhanov, the fix got committed to the Debian installer. So this was good for something :-)

We won't ship the next Debian release with this strict check, but having it for a couple of weeks seems useful indeed to discover such old bugs.

@mtglsk
Copy link

mtglsk commented Oct 23, 2015

Happens to me on a fresh stretch alpha 3 netinst.

@mbiebl
Copy link
Contributor

mbiebl commented Oct 23, 2015

This will be fixed in stretch alpha4. A fix has been applied to debian-installer:
http://anonscm.debian.org/cgit/d-i/finish-install.git/commit/?id=b49e02c5c73fd2ef1b96fdc4878f19da530d6619

@victorssilva
Copy link

Thank you @MagicFab and @meden, your directions worked for me.

@keszybz
Copy link
Member

keszybz commented Oct 24, 2015

@martinpitt, do you intend to submit a match to make the check non-fatal?

@martinpitt
Copy link
Contributor Author

@keszybz: Yes, I will, as soon as util-linux/util-linux@0250174 gets released as util-linux 2.27.1 so that we can bump systemd's dependency to it.

@karelzak
Copy link
Contributor

karelzak commented Nov 2, 2015

util-linux v2.27.1 released

@zonque
Copy link
Member

zonque commented Nov 2, 2015

Cool, thanks a lot. As soon as @martinpitt updated his PPA, we can depend on that without making Semaphore builds fail.

@andhe
Copy link
Contributor

andhe commented Nov 2, 2015

@martinpitt : 2.27.1-1 now uploaded to Debian and pushed to pkg-util-linux git on alioth.

@dsegan
Copy link

dsegan commented Nov 2, 2015

FWIW, I've seen this upgrading an install of Ubuntu 14.04 server with an encrypted home to 16.04, with the /etc/mtab being
/home/danilo/.Private /home/danilo ecryptfs ecryptfs_check_dev_ruid,ecryptfs_cipher=aes,...

This was originally a fresh install of 14.04.

@martinpitt
Copy link
Contributor Author

The semaphore PPA now has util-linux 2.27.1. I'll work on a PR to remove this strict check then.

That said, having this strict check for a while was actually useful. We now fixed our installer to set up a proper /etc/mtab.

martinpitt added a commit to martinpitt/systemd that referenced this issue Nov 2, 2015
util-linux 2.27.1 now entirely stops looking at /etc/mtab, so we don't need to
verify /etc/mtab during early boot any more. Later on, tmpfiles.d/etc.conf will
fix /etc/mtab anyway, so there's not even a point in warning about it.

Drop test_mtab() and bump the util-linux dependency to >= 2.17.1.

Fixes systemd#1495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.