Current Behavior
After atomic-rollback is uninstalled from a system that has had migrate
applied, any subsequent kernel install via dnf or kernel-install produces a
BLS entry with paths that GRUB cannot resolve. The kernel package installs
successfully, but on next reboot GRUB fails to load it and drops to the GRUB
menu with the new kernel marked unbootable.
The standard Fedora kernel-install pipeline (90-loaderentry.install
invoking grub2-mkrelpath) generates linux and initrd BLS fields of the
form /root/boot/vmlinuz-X and /root/boot/initramfs-X.img for new kernels
on a migrated layout. The 90-atomic-rollback.install kernel-install hook
normally rewrites these paths via fix_bls_paths in src/kernel_hook.rs,
but the hook is removed when the package is removed, so the rewrite never
happens for kernels installed after that point.
GRUB error on reboot:
Booting `Fedora Linux (X.Y.Z-200.fc43.x86_64) 43 (Cloud Edition)'
error: ../../grub-core/fs/btrfs.c:2153:file
`/root/boot/vmlinuz-X.Y.Z-200.fc43.x86_64'
not found.
error: ../../grub-core/loader/i386/efi/linux.c:260:you need to load the
kernel first.
Press any key to continue...
Failed to boot both default and fallback entries.
Expected Behavior
A system that has had migrate applied should produce bootable BLS entries
for future kernel installs. The migration permanently changes the on-disk
layout (consumes /boot, creates symlinks at the root subvolume, updates ESP
grub.cfg, rewrites existing BLS entries). After such a permanent change,
subsequent kernel installs should not produce broken boot configurations,
regardless of whether the package that performed the migration is still
installed.
Context
Old kernels installed before package removal continue to boot correctly
because their BLS entries were rewritten at install time when the hook was
still active. The first failure surfaces only when a new kernel is installed
after removing the package. The failure is not visible until reboot, and at
that point the new kernel is the default, GRUB fails to load it, and only
manual selection of the previous kernel from the GRUB menu allows recovery.
The system is recoverable from the GRUB menu by selecting an older kernel.
This is not a brick. But every subsequent kernel install fails the same way
until either atomic-rollback is reinstalled and a kernel install is
retriggered, or the layout is manually un-migrated.
Reproduced on a fresh Fedora 43 cloud image with atomic-rollback v0.3.7-1.fc43
installed from the COPR.
Technical Details
Reproduction
On a fresh Fedora 43 system with the default layout (separate ext4 /boot,
btrfs root with subvolumes):
-
sudo dnf copr enable -y rocketman-code/atomic-rollback
-
sudo dnf install -y atomic-rollback
-
sudo atomic-rollback setup
-
sudo atomic-rollback migrate
-
Confirm migration: sudo atomic-rollback check (should pass), and
confirm /etc/fstab shows #MIGRATED: for /boot.
-
sudo dnf remove -y atomic-rollback
-
Confirm the kernel-install hook is gone:
ls /usr/lib/kernel/install.d/ | grep atomic (should be empty).
-
Install any kernel that is not already installed, for example from koji:
sudo dnf install -y https://kojipkgs.fedoraproject.org/packages/kernel/<v>/200.fc43/x86_64/kernel-{,core-,modules-,modules-core-}<v>-200.fc43.x86_64.rpm
-
Inspect the new BLS entry:
sudo cat /boot/loader/entries/*<v>*.conf | grep -E '^linux|^initrd'
The output will show paths beginning with /root/boot/:
linux /root/boot/vmlinuz-<v>-200.fc43.x86_64
initrd /root/boot/initramfs-<v>-200.fc43.x86_64.img
-
Reboot. GRUB will report file '/root/boot/vmlinuz-<v>...' not found
and refuse to load the new kernel. Older kernels remain selectable from
the GRUB menu as a fallback.
Relevant Code
src/kernel_hook.rs, the has_bad_path closure inside fix_bls_paths
explicitly identifies the exact pattern that the standard Fedora
kernel-install hook generates on a migrated layout:
let has_bad_path = |v: &str| -> bool {
v.contains("/root/boot/") || v.contains("/boot/vmlinuz-") || v.contains("/boot/initramfs-")
};
The rewrite that this closure gates is the only mechanism preventing the
standard kernel-install pipeline from producing unbootable BLS entries on a
migrated system. It runs only when kernel-install add invokes
90-atomic-rollback.install, which is installed by the atomic-rollback RPM
and removed when that RPM is uninstalled.
Root Cause
The migration writes a permanent change to the on-disk layout. The
maintenance code that the migration depends on for ongoing correctness is
owned by a removable package. That asymmetry is the root of the problem:
removing the package removes the maintenance code, but the layout it was
maintaining remains in place. New kernel installs then run through the
default Fedora kernel-install pipeline, which generates paths appropriate
for the unmigrated layout, against a filesystem that is no longer in that
layout.
Current Behavior
After atomic-rollback is uninstalled from a system that has had
migrateapplied, any subsequent kernel install via dnf or kernel-install produces a
BLS entry with paths that GRUB cannot resolve. The kernel package installs
successfully, but on next reboot GRUB fails to load it and drops to the GRUB
menu with the new kernel marked unbootable.
The standard Fedora kernel-install pipeline (
90-loaderentry.installinvoking
grub2-mkrelpath) generateslinuxandinitrdBLS fields of theform
/root/boot/vmlinuz-Xand/root/boot/initramfs-X.imgfor new kernelson a migrated layout. The
90-atomic-rollback.installkernel-install hooknormally rewrites these paths via
fix_bls_pathsinsrc/kernel_hook.rs,but the hook is removed when the package is removed, so the rewrite never
happens for kernels installed after that point.
GRUB error on reboot:
Expected Behavior
A system that has had
migrateapplied should produce bootable BLS entriesfor future kernel installs. The migration permanently changes the on-disk
layout (consumes /boot, creates symlinks at the root subvolume, updates ESP
grub.cfg, rewrites existing BLS entries). After such a permanent change,
subsequent kernel installs should not produce broken boot configurations,
regardless of whether the package that performed the migration is still
installed.
Context
Old kernels installed before package removal continue to boot correctly
because their BLS entries were rewritten at install time when the hook was
still active. The first failure surfaces only when a new kernel is installed
after removing the package. The failure is not visible until reboot, and at
that point the new kernel is the default, GRUB fails to load it, and only
manual selection of the previous kernel from the GRUB menu allows recovery.
The system is recoverable from the GRUB menu by selecting an older kernel.
This is not a brick. But every subsequent kernel install fails the same way
until either atomic-rollback is reinstalled and a kernel install is
retriggered, or the layout is manually un-migrated.
Reproduced on a fresh Fedora 43 cloud image with atomic-rollback v0.3.7-1.fc43
installed from the COPR.
Technical Details
Reproduction
On a fresh Fedora 43 system with the default layout (separate ext4 /boot,
btrfs root with subvolumes):
sudo dnf copr enable -y rocketman-code/atomic-rollbacksudo dnf install -y atomic-rollbacksudo atomic-rollback setupsudo atomic-rollback migrateConfirm migration:
sudo atomic-rollback check(should pass), andconfirm
/etc/fstabshows#MIGRATED:for /boot.sudo dnf remove -y atomic-rollbackConfirm the kernel-install hook is gone:
ls /usr/lib/kernel/install.d/ | grep atomic(should be empty).Install any kernel that is not already installed, for example from koji:
sudo dnf install -y https://kojipkgs.fedoraproject.org/packages/kernel/<v>/200.fc43/x86_64/kernel-{,core-,modules-,modules-core-}<v>-200.fc43.x86_64.rpmInspect the new BLS entry:
sudo cat /boot/loader/entries/*<v>*.conf | grep -E '^linux|^initrd'The output will show paths beginning with
/root/boot/:Reboot. GRUB will report
file '/root/boot/vmlinuz-<v>...' not foundand refuse to load the new kernel. Older kernels remain selectable from
the GRUB menu as a fallback.
Relevant Code
src/kernel_hook.rs, thehas_bad_pathclosure insidefix_bls_pathsexplicitly identifies the exact pattern that the standard Fedora
kernel-install hook generates on a migrated layout:
The rewrite that this closure gates is the only mechanism preventing the
standard kernel-install pipeline from producing unbootable BLS entries on a
migrated system. It runs only when
kernel-install addinvokes90-atomic-rollback.install, which is installed by the atomic-rollback RPMand removed when that RPM is uninstalled.
Root Cause
The migration writes a permanent change to the on-disk layout. The
maintenance code that the migration depends on for ongoing correctness is
owned by a removable package. That asymmetry is the root of the problem:
removing the package removes the maintenance code, but the layout it was
maintaining remains in place. New kernel installs then run through the
default Fedora kernel-install pipeline, which generates paths appropriate
for the unmigrated layout, against a filesystem that is no longer in that
layout.