Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically GC queued device unit jobs that nothing is waiting for #1921

Closed
lynix opened this issue Nov 16, 2015 · 37 comments
Closed

Automatically GC queued device unit jobs that nothing is waiting for #1921

lynix opened this issue Nov 16, 2015 · 37 comments
Labels
pid1 RFE 🎁 Request for Enhancement, i.e. a feature request

Comments

@lynix
Copy link

lynix commented Nov 16, 2015

I've got two LUKS volumes defined via /etc/crypttab as follows:

# <name>       <device>                                   <password>         <options>
storage1plain  UUID=4963abc0-eac7-4fb9-8133-7e47974f5812  /etc/storage1.key  luks
storage2plain  UUID=f98c6a10-c8dc-49a2-8702-ed06eb14ddcf  /etc/storage2.key  luks

These two volumes are mounted as a BTRFS raid1 via fstab:

UUID=19db72c8-4058-402c-b1b6-49af3feff486  /mnt/storage  btrfs  noatime  0 0

On bootup the LUKS partitions are unlocked successfully, and the BTRFS volume is mounted in place.

However, systemd somehow fails to catch the second volume (storage2plain) appearing, as the corresponding .device unit is queued forever:

[lynix@thor ~]$ systemctl list-jobs
JOB UNIT                            TYPE  STATE  
 21 dev-mapper-storage2plain.device start running

[lynix@thor ~]$ systemctl status
● thor
    State: starting
     Jobs: 1 queued
   Failed: 0 units
    Since: Mo 2015-11-16 17:57:54 CET; 1h 48min ago

Being able to use the volume I normally wouldn't bother but, as for systemd, the system is never finished booting up I'm not able to use systemctl analyze and the-like.

I'm using systemd-227 on Kernel 4.2.5 (Arch Linux).

@ohsix
Copy link

ohsix commented Nov 17, 2015

this seems to come up a lot with people using Arch, they use their own initramfs architecture; have you checked their bug tracker or considered filing there?

(I don't use arch, and the last few times I tried to walk people through the problem and what is involved, they just asserted it was a systemd problem and wouldn't cooperate, even if indeed, after it has been tracked down, that the problem was in systemd)

@gdamjan
Copy link
Contributor

gdamjan commented Nov 17, 2015

@lynix can you supply more information about your setup to try reproduce it?
what's your partition setup, and how did you create the btrfs filesystem.

Are you unlocking the luks volumes on boot (in the initramfs), or after the root is mounted and systemd is running? If the first, how did you do that (since the Arch encrypt initcpio hook only supports a single cryptdevice).

@lynix
Copy link
Author

lynix commented Nov 17, 2015

@ohsix I have checked the bugtracker but found no match, and thinking about creating an issue there I thought they were likely to tell me to report upstream, as it is unlikely to be a packaging issue. I'm willing to cooperate in any way I can in order to tackle this down, though ;)

@gdamjan Sure, glad to provide the details:

I'm using two WD Caviar Green 1 TB disks (WD10EAVS), each with only one type 0x83 partition (DOS table). I have created the LUKS volumes using

$ cryptsetup -c aes-xts-plain64 -s 512 -d /etc/keyfile luksFormat /dev/...

and then issued

$ mkfs.btrfs -d raid1 -m raid1 /dev/mapper/storage1plain /dev/mapper/storage2plain

to create the btrfs filesystem.

[lynix@thor ~]$ lsblk /dev/sde /dev/sda
NAME              MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sde                 8:64   0 931,5G  0 disk  
└─sde1              8:65   0 931,5G  0 part  
  └─storage1plain 254:5    0 931,5G  0 crypt /mnt/storage
sda                 8:0    0 931,5G  0 disk  
└─sda1              8:1    0 931,5G  0 part  
  └─storage2plain 254:4    0 931,5G  0 crypt

Unlocking of these volumes, as you said, is not done in initramfs but later after root got mounted.

But I do unlock a LUKS-encrypted LVM volume group in initramfs, on which my root partition resides. So as far as I see there is an early systemd instance running in the initramfs.

This is the disk layout my system is installed on (Crucial M4 SSD, 128GB):

[lynix@thor ~]$ lsblk /dev/sdc
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdc                8:32   0 119,2G  0 disk  
├─sdc1             8:33   0   512M  0 part  /boot
└─sdc2             8:34   0 118,8G  0 part  
  └─m4vg0        254:0    0 118,8G  0 crypt 
    ├─m4vg0-swap 254:1    0   512M  0 lvm   [SWAP]
    ├─m4vg0-root 254:2    0    30G  0 lvm   /
    └─m4vg0-home 254:3    0    80G  0 lvm   /home

(m4vg0 is the LUKS-encrypted LVM VolGroup that is unlocked via initramfs-hook)

If there are details missing that might be of use, please ask!

@gdamjan
Copy link
Contributor

gdamjan commented Nov 17, 2015

So as far as I see there is an early systemd instance running in the initramfs.

not by default on Arch

@ohsix
Copy link

ohsix commented Nov 18, 2015

i totally misread original report, sorry; this is a btrfs thing and i know jack about it

but i did see this: TODO:* btrfs raid assembly: some .device jobs stay stuck in the queue

@ohsix
Copy link

ohsix commented Nov 18, 2015

played around, and here's some conjecture: the reason there is some problems with .device files is because you can mount a btrfs volume by specifying any disk in the set it was created with, so the first mount 'succeeds' to /mnt/storage and stuff just hangs around

systemd could probably learn the devices= fs option to relate them literally, or do something smarter

the TODO entry is from 2013, maybe @keszybz can offer background?

@lynix
Copy link
Author

lynix commented Nov 18, 2015

So as far as I see there is an early systemd instance running in the initramfs.

not by default on Arch

Booting the initramfs I see a line saying "starting version 227" above the prompt for the LUKS passphrase for the root volume (group). This seemed to me like a reference to the systemd version. But you are right as I have extracted the initramfs image now and have not found a systemd binary (only usr/lib/systemd/systemd-udevd in there).

@ohsix Thanks for digging this out, so at least my issue seems to be a known problem.

@lynix
Copy link
Author

lynix commented Nov 20, 2015

I've tried removing the entry for my multi-volume btrfs filesystem from fstab, so that it is not mounted on boot. Strange thing is that the .device unit for the second LUKS volume still remains hanging, and system startup never finishes.

To my understanding, the issue cannot be related to btrfs then, can it?

@arvidjaar
Copy link
Contributor

@lynix Could you make available "udevadm info -q all" for both your LUKS devices when one of them is "stuck"? I think I know what happens, but would like to verify.

@lynix
Copy link
Author

lynix commented Nov 21, 2015

@arvidjaar Sure, here they are:

storage1 (encrypted block device): https://gist.github.com/lynix/64ed188794d4454484b8
storage1plain (LUKS opened): https://gist.github.com/lynix/f555492000a92d05da37

storage2 (encrypted block device): https://gist.github.com/lynix/dd6d4396a8f0530ae141
storage2plain (LUKS opened): https://gist.github.com/lynix/64ed188794d4454484b8

The one that's stuck is storage2plain. I'm curious, what do you suspect?

@arvidjaar
Copy link
Contributor

The one that's stuck is storage2plain.

Are you sure? I expect it is storage1plain because it has SYSTEMD_READY=0.

@lynix
Copy link
Author

lynix commented Nov 21, 2015

Are you sure? I expect it is storage1plain because it has SYSTEMD_READY=0.

I'm sorry, you are right, it is indeed storage1plain. This is the first time it is this one, used to be the other one every time I checked before :)

@arvidjaar
Copy link
Contributor

you are right, it is indeed storage1plain

This matches my hypothesis. It is related to the trick systemd plays with multi-device btrfs.

Systemd has no way to express dependency of filesystem on multiple devices. So what it does is to mark each device that is part of btrfs as "not ready" in udev rule until "enough" devices are found. At this point last device appears as active, which in turn causes /dev/disk/by-uuid alias appear as active so systemd proceeds with mounting btrfs. But all devices found previously are left with SYSTEMD_READY=0, so they are "dead" from systemd point of view.

@lynix
Copy link
Author

lynix commented Nov 24, 2015

so systemd proceeds with mounting btrfs. But all devices found previously are left with SYSTEMD_READY=0

Okay, but why does this happen even without an fstab entry being present, so none of the devices getting mounted?

My current workaround is to have two .service units setup so that they depend on the two block devices, unlocking LUKS volumes manually, and having a third .service unit that manually mounts the BTRFS volume after the two .service units are done.

This somewhat works, but it's crappy. I'd prefer a clean solution to this :)

So how can I help, shall I take a dive into systemd code, trying to implement the devices= option in fstab as @ohsix mentioned? I'm okay with C, but haven't had a look at systemd sources yet, so in this case a starting vector would be nice.

@spitzauer
Copy link

I have the same problem with Arch Linux and RAID1 BTRFS. But in my case, sometimes I can mount the FS on boot.
Here is the journalctl output if the Error uccor:

kernel: Btrfs loaded
kernel: BTRFS: device label pool01 devid 2 transid 43399 /dev/sdb
systemd[1]: Found device QEMU_HARDDISK 1.
kernel: BTRFS: device fsid d3c7c121-136c-4c19-b99b-e03e381938eb devid 1 transid 53 /dev/sdc1
kernel: BTRFS: device label pool01 devid 1 transid 43399 /dev/sda
kernel: BTRFS: device fsid 785fff78-98b6-43b5-aec2-60ec0c60edc6 devid 1 transid 97 /dev/sda1
kernel: BTRFS: device fsid 785fff78-98b6-43b5-aec2-60ec0c60edc6 devid 2 transid 97 /dev/sdb1
systemd[1]: dev-sda1.device: Job dev-sda1.device/start timed out.
systemd[1]: Timed out waiting for device dev-sda1.device.
systemd[1]: Dependency failed for /mnt/nas.
systemd[1]: Dependency failed for Local File Systems.
systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
systemd[1]: mnt-nas.mount: Job mnt-nas.mount/start failed with result 'dependency'.
systemd[1]: dev-sda1.device: Job dev-sda1.device/start failed with result 'timeout'.

Here is the journalctl output if it is working:

kernel: Btrfs loaded
kernel: BTRFS: device label pool01 devid 2 transid 43399 /dev/sdb
systemd[1]: Found device QEMU_HARDDISK 1.
kernel: BTRFS: device fsid d3c7c121-136c-4c19-b99b-e03e381938eb devid 1 transid 53 /dev/sdc1
kernel: BTRFS: device label pool01 devid 1 transid 43399 /dev/sda
systemd[1]: Found device QEMU_HARDDISK pool01.
kernel: BTRFS: device fsid 785fff78-98b6-43b5-aec2-60ec0c60edc6 devid 2 transid 97 /dev/sdb1
kernel: BTRFS: device fsid 785fff78-98b6-43b5-aec2-60ec0c60edc6 devid 1 transid 97 /dev/sda1
systemd[1]: Mounting /mnt/nas...
kernel: BTRFS info (device sda1): enabling auto defrag
kernel: BTRFS info (device sda1): disk space caching is enabled
kernel: BTRFS: has skinny extents
systemd[1]: Mounted /mnt/nas.

@poettering
Copy link
Member

I figure we should add some code that GCs jobs that aren't needed anymore (i.e. no other jobs are around that are ordered against it), and that have no effect on their own (jobs of device units are of this type). In fact there has been a TODO list item for a while to add this, maybe we should actually do it.

@poettering poettering added RFE 🎁 Request for Enhancement, i.e. a feature request pid1 labels Jan 12, 2016
@poettering poettering changed the title systemd fails to detect unlocked LUKS blockdevice Automatically GC queued device unit jobs that nothing is waiting for Jan 12, 2016
@alexforencich
Copy link

Apparently the problem is more than just GC...systemd is totally incapable of mounting my btrfs raid1 array, though manual mounting works fine: https://bbs.archlinux.org/viewtopic.php?id=216202 . This appears like it might be a regression of some sort as systemd was able to successfully mount the array properly a few weeks ago. The added udev rule is an effective workaround.

@arvidjaar
Copy link
Contributor

@spitzauer

sometimes I can mount the FS on boot

I guess you have plain /dev/sda1 in your fstab in which case it is non-deterministic which device will be left with SYSTEMD_READY=0. Sometimes it is sda1, sometimes it is sdb1.

@poettering

I figure we should add some code that GCs jobs that aren't needed anymore

Well, we need to come up with clean solution for multidevice filesystems, not add yet more crutches to workaround it. A program that listens to udev events and waits until btrfs is ready to be mounted.

@alexforencich

ystemd is totally incapable of mounting my btrfs raid1 array

There is not enough information in thread you mention. If you could boot with systemd.log_level=debug and make journalctl -b with this problem available ... BTW rule you mention there is wrong too - it will cause systemd to attempt to mount filesystem as soon as the very first device appears. So it is race; any change in system startup may result in losing it.

@alexforencich
Copy link

I'll give that a shot, then. I'm going to leave it with 'nofail' in /etc/fstab so that it won't halt the boot process if mounting fails. I actually didn't check the logs after adding the udev rule, but the array was mounted when I logged in.

@alexforencich
Copy link

OK, adding systemd.log_level=debug prevented the system from booting. It printed out a LOT of stuff, but then stopped at a rootfs# prompt. And nothing was written to the journal.

Also, even after commenting out the udev rule and running mkinitcpio, systemd is now mounting the array properly.

@seadra
Copy link

seadra commented Aug 25, 2016

@alexforencich, can you please clarify how exactly you got it working without the udev rule? Just by adding nofail in the fstab?

I have a similar setup (LUKS + raid0 multi-device fs), and I'm having similar problems.

@alexforencich
Copy link

alexforencich commented Aug 25, 2016

Absolutely nothing. It didn't boot, so I added nofail to get it to boot even if it couldn't mount the partition. With nofail, it still didn't mount, but it did boot so I could then log in manually and mount it. I posted on the Arch forum about the problem as it was 100% repeatable and I had not found a solution. The udev rule was posted, so I added it, ran mkinitcpio and rebooted. The partition was mounted by systemd on boot. Then I posted here and was told the udev rule was wrong. So I commented it out, ran mkinitcpio, and rebooted. And the parition was mounted by systemd automatically at boot. I have no idea what changed - it wasn't anything that I did explicitly. The only thing I can think of is it might have been a side effect of running mkinitcpio. Or perhaps there was some sort of a chicken and egg issue - failed systemd mount somehow prevents a subsequent systemd mount, forcing it to mount even in a hackish way cleared that. Or perhaps this was a 'did you turn it off and back on again?' issue as after booting with systemd.log_level=debug, I got stuck at a rootfs# prompt and had to do a hard reset via IPMI to get out of that, so this may have cleared some transient problem. I can't recreate the issue to experiment further - I attempted to recreate it by undoing the only change that I had made, and that failed. So I'm at a loss for what to do now. I have had intermittent issues in the past with the array mount timing out, so I would definitely like to get to the bottom of this, but without a way to trigger the issue, there isn't much I can do.

@whompy
Copy link

whompy commented Aug 25, 2016

I have a similar setup that fails intermittently. For me, there is always a
mount service that never finishes as well. Quite annoying as the boot
technically never finishes, so that has its own caveats.

On Aug 24, 2016 9:06 PM, "Alex Forencich" notifications@github.com wrote:

Absolutely nothing. It didn't boot, so I added nofail to get it to boot
even if i couldn't mount the partition. With nofail, it still didn't mount,
but it did boot so I could then log in manually and mount it. I posted on
the Arch forum about the problem as it was 100% repeatable and I had not
found a solution. The udev rule was posted, so I added it, ran mkinitcpio
and rebooted. The partition was mounted by systemd on boot. Then I posted
here and was told the udev rule was wrong. So I commented it out, ran
mkinitcpio, and rebooted. And the parition was mounted by systemd
automatically at boot. I have no idea what changed - it wasn't anything
that I did explicitly. The only thing I can think of is it might have been
a side effect of running mkinitcpio. Or perhaps there was some sort of a
chicken and egg issue - failed systemd mount somehow prevents a subsequent
systemd mount, forcing it to mount even in a hackish way cleared that. I
can't recreate the issue to experiment further.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1921 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB4EaxnWGM4m9xrwTH7qvlBzqbHwL3zFks5qjOqugaJpZM4GjOxG
.

@kylemanna
Copy link

I continue to [experience the same hanging btrfs RAID jobs] that fail for some (mine don't fail, just have lingering jobs). It'd be nice if systemd understood multi-device file systems and handled it correctly rather then GC'ing the old jobs.

Currently I cancel the jobs on every boot.

@poettering
Copy link
Member

There's a fix for this waiting in #4678

@kylemanna
Copy link

@poettering thanks for the work! I'll keep an eye on it as it makes its way into Arch testing!

@martinpitt
Copy link
Contributor

This very likely introduced random udev failures which cause issue #4725.

@jcaesar
Copy link

jcaesar commented Mar 31, 2017

Hm, so I still have this issue. I'm using the workaround with setting SYSTEMD_READY=1 via udev and manually adding the dependency on the second device via x-systemd.requires in /etc/fstab. That works just fine for me.

@OlliC
Copy link

OlliC commented May 31, 2017

Also still having this issue with a three disk btrfs raid1 array. There is a 60/40 chance it boots fine (more often than not). This is with systemd 232 on Arch Linux.

This is my /etc/fstab for the array:
UUID=89742fc9-9376-45fd-96fa-a80a0550da2a /mnt/bmain btrfs defaults,noatime 0 0

The UUID is the same for all three disks, but they have different UUID_SUB.

This is whats in the journal:

Mai 31 11:54:48 viki systemd[1]: dev-disk-by\x2duuid-89742fc9\x2d9376\x2d45fd\x2d96fa\x2da80a0550da2a.device: Job dev-disk-by\x2duuid-89742fc9\x2d9376\x2d45fd\x2d96fa\x2da80a0550da2a.device/start timed out
Mai 31 11:54:48 viki systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-89742fc9\x2d9376\x2d45fd\x2d96fa\x2da80a0550da2a.device.
-- Subject: Unit dev-disk-by\x2duuid-89742fc9\x2d9376\x2d45fd\x2d96fa\x2da80a0550da2a.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit dev-disk-by\x2duuid-89742fc9\x2d9376\x2d45fd\x2d96fa\x2da80a0550da2a.device has failed.
-- 
-- The result is timeout.

@arvidjaar
Copy link
Contributor

Also still having this issue with a three disk btrfs raid1 array.

Could be #5781.

@lynix
Copy link
Author

lynix commented Jun 11, 2017

Well, I've just noticed that I'm also still having the issue that is subject to this bug report. c5a97ed seems to not have solved this. I'll double-check my systemd version has it.

Edit: Never mind: I'm on 232 (current Arch package), the commit in question is included as of 233. Sorry for the noise.

@OlliC
Copy link

OlliC commented Jun 11, 2017

Could be #5781.

Yes probably. I have added a udev rule now which i found here:
https://bbs.archlinux.org/viewtopic.php?pid=1649105#p1649105
Now it works fine for me.

Edit: Never mind: I'm on 232 (current Arch package), the commit in question is included as of 233. Sorry for the noise.

Good to know. When 233 gets into Arch i will try again without the udev rule.

@arvidjaar
Copy link
Contributor

https://bbs.archlinux.org/viewtopic.php?pid=1649105#p1649105

This could result in systemd attempting to access filesystem too early, before all devices have been scanned by btrfs.

@OlliC
Copy link

OlliC commented Jun 11, 2017

This could result in systemd attempting to access filesystem too early, before all devices have been scanned by btrfs.

Is this a problem? What could happen? If its bad than i would just mount it manually until version 233 hits arch.

@arvidjaar
Copy link
Contributor

Is this a problem? What could happen?

If systemd attempts to mount filesystem before all devices are scanned, mount fails.

@petr-nehez
Copy link

Is here anybody who could help me on https://ubuntuforums.org/showthread.php?t=2364736 ?

@lynix
Copy link
Author

lynix commented Jul 12, 2017

Good to know. When 233 gets into Arch i will try again without the udev rule.

With 223 I get the following:

systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
systemd[1]: systemd-cryptsetup@storage2plain.service: Failed to enqueue stop job, ignoring: Transaction is destructive.

No infinitely queued units so far, GC seems to be working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pid1 RFE 🎁 Request for Enhancement, i.e. a feature request
Development

No branches or pull requests