New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting up crypted swap partition sometimes fails (racey) #10179
Comments
Wow! This is entirely bogus. There is no reason to prohibit use of device without recognizable format. Actually #10154 is related. If systemd had ability to run commands to bring up device, this workaround would not be needed. It would just wait for
Add |
Indeed, I had already added the following override file # Run udevadm trigger after the mkswap call in the original generated
# service
[Service]
ExecStartPost=/sbin/udevadm trigger /dev/mapper/%i You seem to imply that the udev rules should not set |
Removing it now will break swap and tmp processing by starting followup units too early. This requires some non-trivial redesign of the whole mess. |
That seems to do the job. This is an issue experienced heavily on Pop!_OS, because we encrypt swap partitions by default. Most hardware is unaffected, but there are a few models that hit this error regularly. After applying this, the issue stopped entirely on the Galago Pro 2. Now I need to find a way to apply this automatically for all installations. |
On some systems, encrypted swap partitions occasionally fail to mount at boot, because systemd does not receive the triggers that it is waiting to hear. By running `udevadm trigger /dev/mapper/$cryptswap` immediately after creation, this issue is resolved. Frequency of occurrence varies from system to system, so some systems will never experience the issue, others will occasionally experience it, and then there's the worst case scenario that experiences it almost every boot. - Pop!_OS issue report: pop-os/pop#316 - Solution based on systemd#10179 (comment) - Closes systemd#10179
I strongly suspect that the real bug is in |
Is there any plan to fix this ? I'm experiencing this on Archlinux too. Most of the time, systemd will hang waiting for the swap partition while it is effectively created. |
I'm experiencing this issue too. Fresh Archlinux installs, boot hangs while waiting for crypt-swap device. I experience this issue 100% boots on 1CPU/1Gb RAM Linode. On fresh install with Starting auto-generated swap unit manually works 100% of times.
Hope this helps! Applying the workaround with |
…t processing While external programs that take an exclusive flock on block devices while modifying them (e.g. partitioning or mkfs) will safely work with udevd so that the block device modifications don't race with udevd processing of the device (e.g. creating symlinks for the newly-created partitions), any external program that doesn't take an exclusive flock will race with udevd, and the changes made to the block device may be missed by udevd, leading to failures, e.g. udevd might not create the symlinks for new partitions, or might not create the /dev/disk/by-* for new filesystems. This updates the flock function to also take a short-duration inotify watch, so that after processing the device, udevd can synthesize a new uevent if it detected any IN_CLOSE_WRITE while the device was being processed, before the real watch was added. One example is the mkswap that we ourselves actually run, from the service created by cryptsetup-generator; we have it running mkswap: if (swap) fprintf(f, "ExecStartPost=/sbin/mkswap '/dev/mapper/%s'\n", name_escaped); However, this is racy, because it doesn't take an exclusive flock. This (and probably other places in our own code) should have done "flock ..." instead. If it's hard for us to get this right, it seems too much to expect all other non-systemd programs to also be aware they need to flock the block device. Fixes: systemd#10179
…t processing While external programs that take an exclusive flock on block devices while modifying them (e.g. partitioning or mkfs) will safely work with udevd so that the block device modifications don't race with udevd processing of the device (e.g. creating symlinks for the newly-created partitions), any external program that doesn't take an exclusive flock will race with udevd, and the changes made to the block device may be missed by udevd, leading to failures, e.g. udevd might not create the symlinks for new partitions, or might not create the /dev/disk/by-* for new filesystems. This updates the flock function to also take a short-duration inotify watch, so that after processing the device, udevd can synthesize a new uevent if it detected any IN_CLOSE_WRITE while the device was being processed, before the real watch was added. One example is the mkswap that we ourselves actually run, from the service created by cryptsetup-generator; we have it running mkswap: if (swap) fprintf(f, "ExecStartPost=/sbin/mkswap '/dev/mapper/%s'\n", name_escaped); However, this is racy, because it doesn't take an exclusive flock. This (and probably other places in our own code) should have done "flock ..." instead. If it's hard for us to get this right, it seems too much to expect all other non-systemd programs to also be aware they need to flock the block device. Fixes: systemd#10179
…t processing While external programs that take an exclusive flock on block devices while modifying them (e.g. partitioning or mkfs) will safely work with udevd so that the block device modifications don't race with udevd processing of the device (e.g. creating symlinks for the newly-created partitions), any external program that doesn't take an exclusive flock will race with udevd, and the changes made to the block device may be missed by udevd, leading to failures, e.g. udevd might not create the symlinks for new partitions, or might not create the /dev/disk/by-* for new filesystems. This updates the flock function to also take a short-duration inotify watch, so that after processing the device, udevd can synthesize a new uevent if it detected any IN_CLOSE_WRITE while the device was being processed, before the real watch was added. One example is the mkswap that we ourselves actually run, from the service created by cryptsetup-generator; we have it running mkswap: if (swap) fprintf(f, "ExecStartPost=/sbin/mkswap '/dev/mapper/%s'\n", name_escaped); However, this is racy, because it doesn't take an exclusive flock. This (and probably other places in our own code) should have done "flock ..." instead. If it's hard for us to get this right, it seems too much to expect all other non-systemd programs to also be aware they need to flock the block device. Fixes: systemd#10179
…t processing While external programs that take an exclusive flock on block devices while modifying them (e.g. partitioning or mkfs) will safely work with udevd so that the block device modifications don't race with udevd processing of the device (e.g. creating symlinks for the newly-created partitions), any external program that doesn't take an exclusive flock will race with udevd, and the changes made to the block device may be missed by udevd, leading to failures, e.g. udevd might not create the symlinks for new partitions, or might not create the /dev/disk/by-* for new filesystems. This updates the flock function to also take a short-duration inotify watch, so that after processing the device, udevd can synthesize a new uevent if it detected any IN_CLOSE_WRITE while the device was being processed, before the real watch was added. One example is the mkswap that we ourselves actually run, from the service created by cryptsetup-generator; we have it running mkswap: if (swap) fprintf(f, "ExecStartPost=/sbin/mkswap '/dev/mapper/%s'\n", name_escaped); However, this is racy, because it doesn't take an exclusive flock. This (and probably other places in our own code) should have done "flock ..." instead. If it's hard for us to get this right, it seems too much to expect all other non-systemd programs to also be aware they need to flock the block device. Fixes: systemd#10179
Hi, I think I am facing this issue but I’m using ubuntu server 18.04.04 and I’m using whole disc encryption and I also mount additional encrypted partitions. I hope you could help me to adjust your quick fix to my situation. My system is a small homeserver sever, which I use remotely with a ssd for the os and two hdds for data. Therefore, I am using dropbear/ssh for decrypting the ssd remotely, which contains a key file for mounting the two hdds which contain a luks encrypted btrfs raid. However, the boot process hangs indefinitely in 1 out of 20 boot attempts. This is unfortunately quite inconvenient from a remote location. Since i have 3 encrypted devices I also have 3 systemd services - so is adding
to all services ok? Here is my crypttab file:
And here my fstab:
And here the 3 systemd cryptsetup services
Thanks! |
@Thorsten42 I would think so! But these services are generated so you'll need to use overrides. This issue is mainly caused of a race between mkswap and udev setting up the swatch. In your case the FS is already there so it is a bit unrelated but it could help? |
@paulvt thanks I will test it :) |
… and "tmp" options This way we can take benefit of the correct block device locking we just added. I was thinking whether to instead pull in a regular systemd-makefs@.service instance, but I couldn't come up with a reason to, and thus opted for just doing the minimal patch and just replacing the simply mkfs calls. Fixes: systemd#10179 Replaces: systemd#13162
My proposed fix is in #15836. would be great if anyone could test if it makes things work for them! |
… and "tmp" options This way we can take benefit of the correct block device locking we just added. I was thinking whether to instead pull in a regular systemd-makefs@.service instance, but I couldn't come up with a reason to, and thus opted for just doing the minimal patch and just replacing the simply mkfs calls. Fixes: systemd#10179 Replaces: systemd#13162
Thanks for this Lennart. I didn't have any luck reproducing the bug with a few reboots after removing my If any other Arch linux users here are also interested in testing, you can clone the repository linked above and use I've also uploaded the package I built into the dist directory. |
Please test the linked PR, or systemd-246-rc1 when it comes out. |
testing update: The issue just appeared again for me, so I've now installed the version with the fix that I linked earlier and will report back again in a week or so. |
Hi again all, unfortunately I've just experienced another hang on boot using the version of systemd I built with Lennart's fix linked here, logs attached: journalctl-gh10179.log. 15:30:10 is when I give up waiting and hit alt-ctrl-delete. |
@oes placing the following file in [Unit]
Before=dev-mapper-%i.swap
Requires=systemd-random-seed.service
After=systemd-random-seed.service
[Service]
ExecStartPost=/sbin/udevadm trigger /dev/mapper/%i |
I believe there is still a race condition even after the fix for this issue:
The following drop-in file, however, does solve the problem: [Unit]
Before=dev-mapper-%i.swap
Requires=systemd-random-seed.service
After=systemd-random-seed.service
[Service]
ExecStartPost=/sbin/udevadm trigger /dev/mapper/%i By forcing |
@DemiMarie udevd opens the device, then locks it actually, and then installs inotify on it, and then unlocks it. if systemd-makefs then opens the device too, locks it, makes its changes and unlocks it, closes it, then udev will see the inotify close event, and retry. Key really here is that both udev and makefs take the lock |
@3point2 have you built updated versions of systemd-makefs? |
Yes @DemiMarie the makepkg command used to build the patched Arch Linux package I'm testing with does a complete build of https://github.com/systemd/systemd-stable/tree/v245.5 with a few backports and three patches, the third being Lennart's fix for this issue (to view the patches see the 00*.patch files here). The PKGBUILD file in the same directory also includes the exact meson and ninja commands used to run the build. If you run a @poettering If you'd like to add some additional logging to your makefs-lock branch I'd be happy to re-build and attach the output - This would give additional assurance that I'm building everything correctly and hopefully shed some light as to why I'm still getting hangs without the udevadm trigger workaround. |
@poettering does udev rescan all block devices if an inotify event is lost? Inotify is not a reliable protocol, so this can happen. My |
@poettering this not fixed. |
I'm on a custom embedded linux build using systemd 241 and have a similar problem with an encrypted device that isn't swap. My crypttab will create a /dev/mapper/crypt device but it then doesn't mount with my /etc/fstab entry. If I change my fstab entry to be /dev/dm-0 rather than /dev/mapper/crypt it all works great, but I can't do that since /dev/dm-0 isn't a stable path. |
@poettering testing shows that this is fixed in systemd 246. @dsigurds systemd 241 is too old for this issue tracker; please discuss it with your distro maintainer. |
Okay, thank you for the information about version 246 |
Excellent, thanks for reporting back. |
I am still experiencing this problem with systemd 246.6 (gentoo linux). My boot process got stuck in four of four attempts at
Commenting out the entry in Adding the |
systemd version the issue has been seen with
Note: While we are aware that this version is over 2 versions old, we have noticed that no relevant changes have been made to the cryptsetup generator, a key part of this bug report, since this release (except for keydev support).
Used distribution
Expected behaviour you didn't see
Unexpected behaviour you saw
In case the swap partition is marked as that it should not fail, it blocks the boot process.
Steps to reproduce the problem
/etc/fstab
:/etc/crypttab
:For us, the issue seems to cause problems where the system is either relatively slow CPU-wise or IO-wise.
Analysis
The cyptsetup-generate of systemd generates the following service file
systemd-cryptsetup@crypt_swap.service
:What we think that the following is going on:
systemd-cryptsetup attach
call leads tosystemd-udev
creating the device/dev/mapper/crypt_setup
withSYSTEMD_READY=0
as per/lib/udev/rules.d/99-systemd-rules
and also starts setting up an inotify watch as per/lib/udev/rules.d/60-persistend-storage-dm.rules
.mkswap
call that is done next.If
mkswap
is called before the watch is set up by udev, it misses the fact that the swap partition is now ready, the device keeps being marked asSYSTEMD_READY=0
.dev-mapper-crypt_swap.device
never becomes active and it blocks the boot because the job timeout is set to 0 for the service by a systemd override (in case the swap partition is considered "fail").Below are the log files of the interaction of
mkswap
andsystemd-udev
for the cases where it goes right and wrong.Correct
Wrong
Considerations
While systemd often relies on proper notifications for other subsystems, such as LVM etc., in this case the setup is relatively bare and systemd itself calls the bare tool
mkswap
and relies on udev to mark the device as ready. We currently don't know how to solve this or get around the issue.The text was updated successfully, but these errors were encountered: