Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd forgets mounts during transition from initrd #28452

Closed
mrc0mmand opened this issue Jul 19, 2023 · 8 comments · Fixed by #28497
Closed

systemd forgets mounts during transition from initrd #28452

mrc0mmand opened this issue Jul 19, 2023 · 8 comments · Fixed by #28497
Assignees
Labels
pid1 regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Milestone

Comments

@mrc0mmand
Copy link
Member

mrc0mmand commented Jul 19, 2023

systemd version the issue has been seen with

254-rc2

Used distribution

Fedora Rawhide

Linux kernel version used

No response

CPU architectures issue was seen on

None

Component

No response

Expected behaviour you didn't see

We've got a couple of reports [0][1] regarding missing mounts and after some playing around with Fedora-Server-dvd-x86_64-Rawhide-20230718.n.0.iso it looks like systemd forgets mounts mounted in initrd when transitioning to the "real" system:

Reproducer:

$ qemu-kvm -boot d -cdrom ~/Downloads/Fedora-Server-dvd-x86_64-Rawhide-20230718.n.0.iso -m 2G -nographic

## Tweak the kernel command line in grub and add: rd.break console=ttyS0
sh-5.2# mkdir /run/{foo,bar}
sh-5.2# mount --bind -v /run/foo /run/bar
mount: /run/foo bound on /run/bar.
sh-5.2# mount -l | grep /run
tmpfs on /run type tmpfs (rw,nosuid,nodev,size=400116k,nr_inodes=819200,mode=755,inode64)
/dev/sr0 on /run/install/repo type iso9660 (ro,relatime,nojoliet,check=s,map=n,blocksize=2048,iocharset=utf8) [Fedora-S-dvd-x86_64-rawh]
/dev/loop0 on /run/rootfsbase type squashfs (ro,relatime,errors=continue)
LiveOS_rootfs on /sysroot type overlay (rw,relatime,lowerdir=/run/rootfsbase,upperdir=/run/overlayfs,workdir=/run/ovlwork)
tmpfs on /run/bar type tmpfs (rw,nosuid,nodev,size=400116k,nr_inodes=819200,mode=755,inode64)
sh-5.2# exit

## Switch to the second tmux pane with ^B-2
# mount -l | grep /run
LiveOS_rootfs on / type overlay (rw,relatime,seclabel,lowerdir=/run/rootfsbase,upperdir=/run/overlayfs,workdir=/run/ovlwork)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,size=400112k,nr_inodes=819200,mode=755,inode64)

The same thing with Fedora-Server-dvd-x86_64-38-1.6.iso works as expected (i.e. the mounts are present after the initrd transition).

Unexpected behaviour you saw

No response

Steps to reproduce the problem

No response

Additional program output to the terminal or log subsystem illustrating the issue

No response

@mrc0mmand mrc0mmand added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Jul 19, 2023
@mrc0mmand mrc0mmand added this to the v254 milestone Jul 19, 2023
@mrc0mmand
Copy link
Member Author

After looking at differences between journals from both versions, the main difference is in the /run shenanigans done by v253-rc2:

[   82.629738] systemd[1]: Bind-mounting /run on /sysroot/run (MS_BIND "")...
[   82.642270] systemd[1]: Path '/run/credentials/@system' to move to target ro>
[   82.642910] systemd[1]: Path '/run/credentials/@encrypted' to move to target>
[   82.649964] systemd[1]: Path '/run/host' to move to target root directory, n>
[   82.746188] systemd[1]: Not unmounting /sysroot/run, referenced by keep list.

@mrc0mmand
Copy link
Member Author

Maybe 7c764d4 could be the culprit, since in the commit message it says:

Let's just use MS_BIND always. Let's tweak it though: let's use
MS_BIND|MS_REC for the kernel API VFS, and MS_BIND without MS_REC for
/run/. The latter reflects the fact that the submounts /run/ has usually
are not so much about just accessing kernel APIs but about auxiliary
user resources. Hence let's only move the main mount over for that.

@bluca
Copy link
Member

bluca commented Jul 19, 2023

Do you have a reproducer that doesn't require DVDs and grub and whatnot?

@mrc0mmand
Copy link
Member Author

Do you have a reproducer that doesn't require DVDs and grub and whatnot?

I was wondering how to incorporate that into our test suite (since we definitely want this covered), and thanks to @poettering's creds infra I came up with:

diff --git a/test/TEST-01-BASIC/test.sh b/test/TEST-01-BASIC/test.sh
index d0e714ac30..474a86be5c 100755
--- a/test/TEST-01-BASIC/test.sh
+++ b/test/TEST-01-BASIC/test.sh
@@ -23,4 +23,26 @@ test_append_files() {
     cp -v "$TEST_UNITS_DIR"/{testsuite-01,end}.service "$TEST_UNITS_DIR/testsuite.target" "$dst"
 }
 
+run_qemu_hook() {
+    local extra="$WORKDIR/initrd.extra"
+
+    mkdir -m 755 "$extra"
+    mkdir -m 755 "$extra/etc" "$extra/etc/systemd" "$extra/etc/systemd/system" "$extra/etc/systemd/system/initrd.target.wants"
+
+    cat >"$extra/etc/systemd/system/initrd-run-mount.service" <<EOF
+[Unit]
+Description=Create a mount in /run that should survive the transition from initrd
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+ExecStart=sh -xec "mkdir /run/initrd-mount-source /run/initrd-mount-target; mount -v --bind /run/initrd-mount-source /run/initrd-mount-target"
+EOF
+    ln -svrf "$extra/etc/systemd/system/initrd-run-mount.service" "$extra/etc/systemd/system/initrd.target.wants/initrd-run-mount.service"
+
+    (cd "$extra" && find . | cpio -o -H newc -R root:root > "$extra.cpio")
+
+    INITRD_EXTRA="$extra.cpio"
+}
+
 do_test "$@"
diff --git a/test/units/testsuite-01.sh b/test/units/testsuite-01.sh
index 780f37ee12..a1193ce6fb 100755
--- a/test/units/testsuite-01.sh
+++ b/test/units/testsuite-01.sh
@@ -19,6 +19,11 @@ if systemd-detect-virt -q --container; then
     test ! -e /run/systemd/container
     cp -afv /tmp/container /run/systemd/container
 else
+    # We should've created a mount under /run in initrd (see the other half of the test)
+    # that should've survived the transition from initrd to the real system
+    test -d /run/initrd-mount-target
+    mountpoint /run/initrd-mount-target
+
     # We bring the loopback netdev up only during a full setup, so it should
     # not get brought back up during reexec if we disable it beforehand
     [[ "$(ip -o link show lo)" =~ LOOPBACK,UP ]]

The only downside is that you need the initrd to already contain the latest systemd version (which is not an issue in CIs). With that in mind the TEST-01 now fails with:

[    2.895367] testsuite-01.sh[608]: + systemd-detect-virt -q --container
[    2.896305] testsuite-01.sh[608]: + test -d /run/initrd-mount-target
[    2.897167] testsuite-01.sh[608]: + mountpoint /run/initrd-mount-target
[    2.898166] testsuite-01.sh[614]: /run/initrd-mount-target is not a mountpoint

whereas with F38's systemd it passes.

@keszybz
Copy link
Member

keszybz commented Jul 19, 2023

Please submit that as PR.

@mrc0mmand
Copy link
Member Author

Please submit that as PR.

I wanted the test case to go together with a potential fix, so it should, hopefully, be part of #28454.

@keszybz
Copy link
Member

keszybz commented Jul 19, 2023

Please do anyway. If the PR fails, then we know the test works, and then we can stuff the fix into the PR.

mrc0mmand added a commit to mrc0mmand/systemd that referenced this issue Jul 19, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
@mrc0mmand
Copy link
Member Author

-> #28456

mrc0mmand added a commit to mrc0mmand/systemd that referenced this issue Jul 19, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
mrc0mmand added a commit to mrc0mmand/systemd that referenced this issue Jul 19, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
keszybz pushed a commit to keszybz/systemd that referenced this issue Jul 19, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
@yuwata yuwata added the pid1 label Jul 19, 2023
bluca pushed a commit to bluca/systemd that referenced this issue Jul 22, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
bluca added a commit to bluca/systemd that referenced this issue Jul 22, 2023
There are applications that rely on mounts under /run surviving the
switch from initrd to rootfs, so use MS_REC unless we are soft
rebooting.

Follow-up for 7c764d4

Fixes systemd#28452
bluca pushed a commit to bluca/systemd that referenced this issue Jul 22, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
bluca added a commit to bluca/systemd that referenced this issue Jul 22, 2023
There are applications that rely on mounts under /run surviving the
switch from initrd to rootfs, so use MS_REC unless we are soft
rebooting.

Follow-up for 7c764d4

Fixes systemd#28452
bluca pushed a commit to bluca/systemd that referenced this issue Jul 22, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
bluca added a commit to bluca/systemd that referenced this issue Jul 23, 2023
There are applications that rely on mounts under /run surviving the
switch from initrd to rootfs, so use MS_REC unless we are soft
rebooting.

Follow-up for 7c764d4

Fixes systemd#28452
bluca pushed a commit to bluca/systemd that referenced this issue Jul 23, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
bluca added a commit to bluca/systemd that referenced this issue Jul 23, 2023
There are applications that rely on mounts under /run surviving the
switch from initrd to rootfs, so use MS_REC unless we are soft
rebooting.

Follow-up for 7c764d4

Fixes systemd#28452
bluca pushed a commit to bluca/systemd that referenced this issue Jul 23, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
bluca added a commit to bluca/systemd that referenced this issue Jul 24, 2023
There are applications that rely on mounts under /run surviving the
switch from initrd to rootfs, so use MS_REC unless we are soft
rebooting.

Follow-up for 7c764d4

Fixes systemd#28452
bluca pushed a commit to bluca/systemd that referenced this issue Jul 24, 2023
Since 7c764d4 we bind mount certain directories during switch root
instead of moving the mount directly, and for /run we do this without
MS_REC. This, unfortunately, leaves all mounts under /run behind
in the old root, which breaks certain use cases.

See: systemd#28452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pid1 regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
5 participants