Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs multiple devices confusion: automatically unmounted /home, clobbered ssh session #14674

Open
cmurf opened this issue Jan 27, 2020 · 12 comments
Open

Comments

@cmurf
Copy link
Contributor

@cmurf cmurf commented Jan 27, 2020

systemd version the issue has been seen with

systemd-244.1-2.fc32.x86_64

Used distribution

Fedora Rawhide

Expected behaviour you didn't see

/home should remain mounted, user login stays logged in

Unexpected behaviour you saw

/home is automatically unmount, user is logged out

Steps to reproduce the problem

  1. Minimal installation, layout looks like this:
# lsblk -f
NAME   FSTYPE FSVER LABEL  UUID                                 FSAVAIL FSUSE% MOUNTPOINT                                                                     
vda                                                                            
├─vda1 vfat   FAT32        ACE0-7EB6                             590.3M     1% /boot/efi
├─vda2 ext4   1.0          cdb6c92a-a461-43e4-b6f9-57e865b32f0c    812M    10% /boot
├─vda3 swap   1            0d13000a-c258-41f9-a964-8960321f59fa                [SWAP]
└─vda4 btrfs        fedora b2e7ba8f-70cb-4286-89d0-21b8a1f9af0a   26.2G     3% /home
# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda4        28G  804M   27G   3% /
/dev/vda4        28G  804M   27G   3% /home
/dev/vda2       976M   97M  813M  11% /boot
/dev/vda1       599M  8.5M  591M   2% /boot/efi
  1. live migration to a new configuration could be a replacement drive, or in this case to a ramdisk so that the original device can be repartitioned, without rebooting:
# umount /boot/efi
# umount /boot
# swapoff /dev/vda3
# modprobe zram
# zramctl -f -a lz4 -s 1536M
# btrfs dev add /dev/zram0 /
# btrfs dev rem /dev/vda4 /

At this point everything is working OK, no complaints.

  1. But the instant I run fdisk systemd kicks me out of the ssh user session and unmounts /home, preventing login. This is what's recorded in the journal at the moment I run fdisk, attached journal shows what happens after that.
[  830.014371] localhost.localdomain kernel:  vda: vda1 vda2 vda3 vda4

I'm not sure what the logic/trigger is for the user session dying and /home being unmounted. It's almost like reboot/shutdown behavior. I'm not sure it's a bug, but I'm sure it's not expected. Once I'm kicked out, I can't log back in because /home is unmounted. So now I need direct access to the VM or server.

bug14674_journalhomekill.txt

@cmurf
Copy link
Contributor Author

@cmurf cmurf commented Jan 27, 2020

Looks nearly the same as #14454. I'll answer the relevant questions from that issue with info for this setup.

  • No LVM, no dm-crypt, only plain partitions.
  • Underlying device /dev/vda never disappeared; and none of its partitions were yet changed (but that was where I was headed.) And in this case, fs UUID doesn't change either.
  • /etc/fstab references /home and / by fs UUID and subvolume name, mount options are:
    noatime,compress=zstd:1,subvol=home
    noatime,compress=zstd:1,subvol=root

Attaching /run/systemd/generator/home.mount
bug14674_home.mount.txt
Attaching a journal with systemd.log_level=debug enabled and reproducing problem. fdisk is run at [ 233.547861]
bug14674_journalhomekilldebug.txt

@yuanlineedfreeandhelp
Copy link

@yuanlineedfreeandhelp yuanlineedfreeandhelp commented Jan 27, 2020

@arvidjaar
Copy link
Contributor

@arvidjaar arvidjaar commented Jan 27, 2020

I can trivially reproduce it on openSUSE Tumbleweed with kernel 5.4.10 and systemd 244. When you call fdisk, kernel sends remove/add events for partitions:

KERNEL[271.860289] remove   /devices/pci0000:00/0000:00:0d.0/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1 (block)
KERNEL[271.861345] remove   /devices/pci0000:00/0000:00:0d.0/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2 (block)
KERNEL[271.862369] change   /devices/pci0000:00/0000:00:0d.0/ata1/host0/target0:0:0/0:0:0:0/block/sda (block)
KERNEL[271.862558] add      /devices/pci0000:00/0000:00:0d.0/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1 (block)
KERNEL[271.862729] add      /devices/pci0000:00/0000:00:0d.0/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2 (block)

As filesystems from mount unit are bound to respective devices, they got unmounted.

This demonstrates fundamental design problem in systemd - filesystem instance is associated with single device, this association is static and generated once on boot and never changes. This assumption does not work for multi-device filesystems like btrfs or zfs. Here underlying devices can be changed online at any time, completely replaced and original devices repurposed. systemd is not prepared to deal with it.

@arvidjaar
Copy link
Contributor

@arvidjaar arvidjaar commented Jan 27, 2020

When you call fdisk, kernel sends remove/add events for partitions

which most likely is result of ìoctl(...,BLKRRPART,...)... yep, it callsbdev_reread_part()which doesblk_drop_partitions()` as the first thing.

@poettering
Copy link
Member

@poettering poettering commented Jan 28, 2020

We don#t generate BindsTo= from .mount to .device anymore these days. Except that for you it was apparently created? Can you check the comments on #14454 regarding that, i.e. we need to figure out where BindsTo= comes from?

@cmurf
Copy link
Contributor Author

@cmurf cmurf commented Jan 28, 2020

systemctl show home.mount contains:

BindsTo=dev-disk-by\x2duuid-b2e7ba8f\x2d70cb\x2d4286\x2d89d0\x2d21b8a1f9af0a.device dev-vda4.device

From the systemd-analyze dump -> Unit home.mount: section

	BindsTo: dev-disk-by\x2duuid-b2e7ba8f\x2d70cb\x2d4286\x2d89d0\x2d21b8a1f9af0a.device (origin-file)
	BindsTo: dev-vda4.device (origin-mountinfo-implicit)

issue14674_systemdanalyzedump.txt

@cmurf
Copy link
Contributor Author

@cmurf cmurf commented Jan 28, 2020

Possible goose chase...

Booting with rd.udev.log_priority=debug, I see:

[    3.242424] localhost.localdomain kernel: BTRFS: device label fedora devid 1 transid 1074 /dev/vda4 scanned by systemd-udevd (339)
[    3.239701] localhost.localdomain systemd-udevd[339]: vda4: /usr/lib/udev/rules.d/64-btrfs.rules:15 RUN '/usr/bin/udevadm trigger -s block -p ID_BTRFS_READY=0'
[    3.239766] localhost.localdomain systemd-udevd[339]: vda4: Handling device node '/dev/vda4', devnum=b252:4
[    3.239806] localhost.localdomain systemd-udevd[339]: vda4: Setting permissions /dev/vda4, uid=0, gid=6, mode=0660
[    3.239851] localhost.localdomain systemd-udevd[339]: vda4: Creating symlink '/dev/block/252:4' to '../vda4'
[    3.239921] localhost.localdomain systemd-udevd[339]: vda4: Creating symlink '/dev/disk/by-path/pci-0000:04:00.0-part4' to '../../vda4'
[    3.240117] localhost.localdomain systemd-udevd[339]: vda4: Creating symlink '/dev/disk/by-label/fedora' to '../../vda4'
[    3.240281] localhost.localdomain systemd-udevd[339]: vda4: Creating symlink '/dev/disk/by-uuid/b2e7ba8f-70cb-4286-89d0-21b8a1f9af0a' to '../../vda4'
[    3.240489] localhost.localdomain systemd-udevd[339]: vda4: Creating symlink '/dev/disk/by-partuuid/42f28b9c-118d-411d-b50c-e3a5c943afc0' to '../../vda4'
[    3.240697] localhost.localdomain systemd-udevd[339]: vda4: Creating symlink '/dev/disk/by-path/virtio-pci-0000:04:00.0-part4' to '../../vda4'
[    3.240967] localhost.localdomain systemd-udevd[339]: vda4: sd-device: Created db file '/run/udev/data/b252:4' for '/devices/pci0000:00/0000:00:02.3/0000:04:00.0/virtio2/block/vda/vda4'
[    3.241012] localhost.localdomain systemd-udevd[339]: vda4: Running command "/usr/bin/udevadm trigger -s block -p ID_BTRFS_READY=0"
[    3.241049] localhost.localdomain systemd-udevd[339]: vda4: Starting '/usr/bin/udevadm trigger -s block -p ID_BTRFS_READY=0'
[    3.242929] localhost.localdomain systemd-udevd[339]: Successfully forked off '(spawn)' as PID 437.
[    3.258714] localhost.localdomain systemd-udevd[339]: vda4: Process '/usr/bin/udevadm trigger -s block -p ID_BTRFS_READY=0' succeeded.
[    3.258963] localhost.localdomain systemd-udevd[339]: vda4: Adding watch on '/dev/vda4'
[    3.259170] localhost.localdomain systemd-udevd[339]: vda4: sd-device: Created db file '/run/udev/data/b252:4' for '/devices/pci0000:00/0000:00:02.3/0000:04:00.0/virtio2/block/vda/vda4'
[    3.259214] localhost.localdomain systemd-udevd[339]: vda4: Device (SEQNUM=1831, ACTION=add) processed
[    3.259268] localhost.localdomain systemd-udevd[339]: vda4: sd-device-monitor: Passed 1178 byte to netlink monitor
[    3.259445] localhost.localdomain systemd-udevd[316]: virtio2: sd-device-monitor: Passed 221 byte to netlink monitor
[    3.260184] localhost.localdomain systemd-udevd[326]: virtio2: Processing device (SEQNUM=1832, ACTION=bind)
[    3.260240] localhost.localdomain systemd-udevd[326]: virtio2: /usr/lib/udev/rules.d/50-udev-default.rules:14 Importing properties from results of builtin command 'hwdb --subsystem=virtio'
[    3.260278] localhost.localdomain systemd-udevd[326]: virtio2: /usr/lib/udev/rules.d/50-udev-default.rules:14 Failed to run builtin 'hwdb --subsystem=virtio': Invalid argument
[    3.260326] localhost.localdomain systemd-udevd[326]: virtio2: Device (SEQNUM=1832, ACTION=bind) processed
[    3.260364] localhost.localdomain systemd-udevd[326]: virtio2: sd-device-monitor: Passed 221 byte to netlink monitor
@arvidjaar
Copy link
Contributor

@arvidjaar arvidjaar commented Jan 28, 2020

We don#t generate BindsTo= from .mount to .device anymore these days.

Sorry?

dep = mount_is_bound_to_device(m) ? UNIT_BINDS_TO : UNIT_REQUIRES;

@poettering
Copy link
Member

@poettering poettering commented Jan 28, 2020

hmm, true we actually do, for those configured in /etc/fstab, but not for the others...

So fdisk these days is actually capable of not removing all partitions in the kernel, but operate incrementally, so that the devices never disappear.

@cmurf
Copy link
Contributor Author

@cmurf cmurf commented Jan 28, 2020

The next steps in the use case, whether fdisk, gdisk (or variants), or parted, is to resize its partitions, and write out the new GPT (or MBR). Presumably the kernel refreshes this devices partition map, since nothing is actively pinning it. But if it doesn't refresh, the user will either partprobe or hdparm -z to force the refresh. Next, migrate sysroot back to the new layout.

The analog to this on LVM is to use pvmove.

@arvidjaar
Copy link
Contributor

@arvidjaar arvidjaar commented Jan 29, 2020

So fdisk these days is actually capable of not removing all partitions in the kernel, but operate incrementally, so that the devices never disappear.

Huh? And if user removes partitions intentionally?

@arvidjaar
Copy link
Contributor

@arvidjaar arvidjaar commented Jan 31, 2020

hmm, true we actually do, for those configured in /etc/fstab, but not for the others

Wrong, you do add BindsTo to all mount units generated from persistent unit file (fragment). It is irrelevant whether this unit file was generated from /etc/fstab or not. You do not add it to ephemeral units created from /proc/mountinfo. You may be do not add to mount units generated by systemd-mount.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.