Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grub-probe ("suddenly") fails with "algorithm inherit not supported" #15261

Closed
zviratko opened this issue Sep 11, 2023 · 24 comments
Closed

grub-probe ("suddenly") fails with "algorithm inherit not supported" #15261

zviratko opened this issue Sep 11, 2023 · 24 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@zviratko
Copy link

zviratko commented Sep 11, 2023

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version ~amd64
Kernel Version 6.4.15 (but also on 6.3.x)
Architecture amd64
OpenZFS Version 2.2.0-rc3 (but also on stable 2.1.12)

Describe the problem you're observing

This occured after I changed the motherboard in my home server.
I had to recompile the kernel/modules to boot the system so I used Ubuntu livecd (23.04) with chroot, recompiled the kernel and ran grub-mkconfig. It failed (I don't remember how exactly), but I expected that and corrected the next boot by hand. The changes made were related to graphics/framebuffer, nothing like architecture setting was touched

After booting into the real system I recompiled the kernel again and ran grub-mkconfig, only to discover it still doesn't boot properly because ${rpool} used by mkconfig was empty

This is because:

# grub-probe --device /dev/nvme0n1p4 /dev/nvme1n1p4 --target=fs_label
grub-probe: error: compression algorithm inherit not supported

My rpool is a mirror of two nvme device partitions:

        rpool                                                ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            nvme-eui.0000000001000000e4d25c6114d64f01-part4  ONLINE       0     0     0
            nvme-eui.0000000001000000e4d25c324dd64f01-part4  ONLINE       0     0     0

It doesn't fail for my boot pool which looks the same but only has a single (root pool) filesystem mounted at /boot

I have not touched rpool configuration at all, I have not enabled any new features (unless Ubuntu or it's systemd-operating-system decided to do that for me somehow), but I don't think so and zpool history concurs.

Not enabled features:

rpool
      draid
      zilsaxattr
      head_errlog
      blake3
      block_cloning
      vdev_zaps_v2

I tried setting compression explicitely (compression=on, compression=lz4), deleting snapshots in case it's related to an old bug I found (but nothing was changed according to zpool history), recompiling grub, upgrading ZFS and recompiling grub.

The system still boots if I change
LINUX_ROOT_DEVICE="ZFS=${rpool}${bootfs%/}"
to
LINUX_ROOT_DEVICE="ZFS=rpool/${bootfs%/}"
It's just grub-probe that fails.

Describe how to reproduce the problem

No idea what changed for this to happen.


Any ideas what this might be or how to provide a useful debug? Should I bug GNU/Grub guys with this? I can try grub-2.12-rcX but I didn't find anything in the changelog and I'd rather find the problem first.

Thanks!

@zviratko zviratko added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 11, 2023
@GregorKopka
Copy link
Contributor

Suggestion: Switch to ZFSBootMenu and forget about GRUB (and its inability to fully deal with ZFS features).

@zviratko
Copy link
Author

Good suggestion. I only discovered ZFSBootMenu right after creating this issue, which is a shame as it looks awesome (didn't yet switch to it myself, though).

@dmdx86
Copy link

dmdx86 commented Sep 30, 2023

duplicate of this issue: #13873 but ultimately a grub issue: https://savannah.gnu.org/bugs/index.php?64297

tl;dr snapshoppting the top-level dataset of your boot pool will cause this issue. I ran into this a while back and I believe once you do the snapshot, it becomes a permanent irreversible condition and you have to destroy and re-create the pool. My experience was that removing the snapshot did not fix the issue.

@zviratko
Copy link
Author

@dmdx86 I don't think it's the same issue but it will be of the same kind as I always snapshotted all filesystems in that pool without an issue, nothing has changed (according to my memory and confirmed by history), all I did was import the pool in Ubuntu LiveCD and reboot. That triggered something but I have no idea what.
Feel free to close this until somebody else hits this issue, I'll work on migration to ZFSBootMenu in the meantime ;-)

@meilon
Copy link

meilon commented Oct 23, 2023

I have the same issue, also always did do snapshots. The last lines of a debug output grub-probe (v 2.06) are:

grub-core/fs/zfs/zfs.c:3395:zfs: endian = 1
grub-core/fs/zfs/zfs.c:3170:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 0/512
grub-core/kern/fs.c:79:fs: error: compression algorithm inherit not supported

The full output is here: https://pastebin.com/dJWrj482

I updated my kernel and corresponding zfs packages, did a reboot, and voila, GRUB doesn't want to boot anymore.

I can't get ZFSBootMenu to work with my setup (basically an older version of https://openzfs.github.io/openzfs-docs/Getting%20Started/Arch%20Linux/Root%20on%20ZFS.html, it doesn't detect any boot environments). If someone got a hint, that would be great!

@dmdx86
Copy link

dmdx86 commented Oct 23, 2023

If you have a separate pool just for booting (which is what current ZFS docs recommend now days) then IMHO the easiest thing to do is to back up all the data and then destroy / re-create the bpool, and take special precaution to never snapshot the top-level of the bpool. There should be not much in your bpool other than kernels, initrds, and similar files so it shouldn’t be as painful as blowing away your entire dpool.

@zviratko
Copy link
Author

@dmdx86 bpool is not the problem, the problem is rpool, and snapshots are one of the reasons we run ZFS there... :) I'm wondering if snapshotting my boot pool would break it as well, even though I have set compatibility=legacy for it...

@dmdx86
Copy link

dmdx86 commented Oct 23, 2023

If you set up the pools and grub correctly you should not have any data on rpool that grub is referencing. Grub does not need to read anything other than bpool data. Once grub loads the kernel, the kernel takes over and imports your other pools.

@zviratko
Copy link
Author

I see that there's a misunderstanding of the issue I'm reporting.

I do not have a problem with GRUB not loading the kernel or initramfs from the bpool and booting the kernel.
I have a problem with the scripts constructing the root=ZFS=... portion of kernel cmdline, where it calls grub-probe.
That's this part of code in /etc/grub.d/10_linux:

case x"$GRUB_FS" in
    xbtrfs)
        rootsubvol="`make_system_path_relative_to_its_root /`"
        rootsubvol="${rootsubvol#/}"
        if [ "x${rootsubvol}" != x ]; then
            GRUB_CMDLINE_LINUX="rootflags=subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX}"
        fi;;
    xzfs)
        rpool=`${grub_probe} --device ${GRUB_DEVICE} --target=fs_label 2>/dev/null || true`
        bootfs="`make_system_path_relative_to_its_root / | sed -e "s,@$,,"`"
        LINUX_ROOT_DEVICE="ZFS=${rpool}${bootfs%/}"
        ;;
esac

On my machine, it executes like so:

# grub-probe --device /dev/nvme0n1p4 /dev/nvme1n1p4 --target=fs_label
grub-probe: error: compression algorithm inherit not supported

Of course the issue could be the same, but I wonder how many people do NOT snapshot their rpools so they never hit this issue? Therefore, I don't think my issue is snapshots, but rather a similiar class of issue with grub-probe where it suddenly fails to detect the pool. I am assuming that a similiar piece of code is present in grub-probe and in grub bootloader itself so why would only grub-probe fail here? (Though it comes to mind that I don't actually update grub bootloader in my EFI boot partition but only the userland tools often, in fact I last did that 2 years ago and I am not going to touch it now for sure :))
Also, I find the way this is done in that script pretty weird and prone to breakage. This could just be a static setting somewhere (/etc/default/grub or similiar), or just taken from current cmdline - which could fail in chroot environments and rescue CDs, but that's where it fails anyway in my experience ¯_(ツ)_/¯, or just left for initramfs (distribution) to figure out, or constructed from the output of "zfs mount" or even "mount"
At least in the case of ZFS this makes little sense, and with other filesystems "mount" would suffice, I wonder what grub people were trying to solve there - some embedded systems? Making it work with livecd and installers? I know you don't always have proper rootfs entry in mtab (you might not even have /proc mounted, so no mtab), so I guess grub-probe tries to be more clever and self-contained?

@almereyda
Copy link

almereyda commented Oct 24, 2023

Seeing the same after upgrading from Ubuntu 23.04 to 23.10. Snapshotting all datasets in this system has never been an issue here since installing this machine with the Ubiquitiy Desktop installer on ZFS with Ubuntu 21.10.

On Ubuntu, the described error message appears when editing the GRUB entry and replacing search --no-floppy --fs-uuid --set=root ... manually with set root=(hd0,gpt3). Else it fails with No such device: ....

@zviratko
Copy link
Author

For the record: I am not running Ubuntu but Gentoo.

I wouldn't want to mix different issues there, let's concentrate on why "grub-probe" fails on my pool if someone wants to investigate. GRUB bootloader part is a slightly different issue (maybe same fix will work for both, maybe not), and I would take ZFSBootMenu elsewhere (ZFS mailing list?) to not pollute this issue further...

@almereyda
Copy link

Refactored, thanks for reminding me.

@SemanticBeeng
Copy link

SemanticBeeng commented Dec 8, 2023

tl;dr snapshoppting the top-level dataset of your boot pool will cause this issue. I ran into this a while back and I believe once you do the snapshot, it becomes a permanent irreversible condition and you have to destroy and re-create the pool

Happens on Debain 12 also.
Ran sanoid and this happened.

Is there any explanation as to exactly what happens and how come a zfs snapshoting operation mutates the state of the pool or partition ?!

update: proxmox has support for addressing the "fragility of booting from ZFS with GRUB"
https://pve.proxmox.com/wiki/ZFS:_Switch_Legacy-Boot_to_Proxmox_Boot_Tool / "Repairing a System Stuck in the GRUB Rescue Shell"

@tomgray
Copy link

tomgray commented Dec 9, 2023

I also experienced this issue when upgrading to OpenZFS 2.2.2.

Recreating the pool with compression disabled fixed it for me (I have snapshots on the pool, OpenZFS 2.2.2):

  • When creating pool with zfs create, use the option -O compression=off
  • When sending/receiving a snapshot of the pool to repopulate its data, use the option -o compression=off on the receive (zfs recv) side.

@zviratko
Copy link
Author

PSA: I updated today to grub-2.12 and grub-probe now works. Not sure if it was grub changes or something else that changed in the meantime.

@mifritscher
Copy link

I can confim that updating fro 2.12-rc1 to 2.12 helps. ( #13873 (comment) )

@n0099
Copy link

n0099 commented Jan 18, 2024

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2041739/comments/9

zpool create \
    -o feature@extensible_dataset=disabled \
    -o feature@bookmarks=disabled \
    -o feature@filesystem_limits=disabled \
    -o feature@large_blocks=disabled \
    -o feature@large_dnode=disabled \
    -o feature@sha512=disabled \
    -o feature@skein=disabled \
    -o feature@edonr=disabled \
    -o feature@userobj_accounting=disabled \
    -o feature@encryption=disabled \
    -o feature@project_quota=disabled \
    -o feature@obsolete_counts=disabled \
    -o feature@bookmark_v2=disabled \
    -o feature@redaction_bookmarks=disabled \
    -o feature@redacted_datasets=disabled \
    -o feature@bookmark_written=disabled \
    -o feature@livelist=disabled \
    -o feature@zstd_compress=disabled \
    -o feature@zilsaxattr=disabled \
    -o feature@head_errlog=disabled \
    -o feature@blake3=disabled \
    -o feature@vdev_zaps_v2=disabled \
[...]

Enabling any of the features in the command above will cause grub not to recognize /boot as zfs again when a snapshot is created on bpool.

@SimonBard
Copy link

If you have a separate pool just for booting (which is what current ZFS docs recommend now days) then IMHO the easiest thing to do is to back up all the data and then destroy / re-create the bpool, and take special precaution to never snapshot the top-level of the bpool. There should be not much in your bpool other than kernels, initrds, and similar files so it shouldn’t be as painful as blowing away your entire dpool.

How should I recreate the bpool? I mean I know how to create a pool, but then its empty. How do I get the needed data there again?

  1. Should I just install the system from scratch?
  2. Should I use a snapshot/backup of bpool to get it back?

@n0099
Copy link

n0099 commented Mar 3, 2024

@GregorKopka
Copy link
Contributor

GregorKopka commented Mar 4, 2024 via email

@SimonBard
Copy link

SimonBard commented Mar 4, 2024

Make a recursive snapshot of the old pool, zfs send that as replication stream somewhere (could be a file somewhere outside that pool), recreate the pool and receive the replication stream into it, install bootloader, reboot. Gregor

Many thanks!
How do I install the bootloader?

@SimonBard
Copy link

SimonBard commented Mar 4, 2024

How do I get the needed data there again?

https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2022.04%20Root%20on%20ZFS.html#step-5-grub-installation

Many thanks!

Unfortunately, i ran into errors at the first step:

grub-probe /boot
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Fehler: Laufwerk >hostdisk//dev/sdb4< wurde nicht gefunden

Translation:
grub-probe: Attention: disk does not exist, partition of the device /dev/sdb4 is used instead
disk >hostdisk//dev/sdb4< was not found

zpool list
Name  size ALLOC
bool 1.88 G
rpool 1.84 T

@n0099
Copy link

n0099 commented Mar 4, 2024

@SimonBard plz show ur

zfs get mountpoint,canmount bpool
stat /boot

@zapotah
Copy link

zapotah commented Sep 19, 2024

this seems to be fixed with at least debian bookworm-backports grub-efi 2.12 packages.

ptr1337 pushed a commit to CachyOS/zfs that referenced this issue Nov 14, 2024
GRUB is not able to detect ZFS pool if snaphsot of top level boot
pool is created. This issue is observed with GRUB versions up to
v2.06 if extensible_dataset feature is enabled on ZFS boot pool.

compatibility=grub2-2.06 would enable all read-only compatible
zpool features except extensible_dataset and other features that
depend on it.

The existing grub2 compatibility file is now renamed to grub2-2.12 to
reflect the appropriate grub2 version. grub2-2.12 lists all read-only
features that can be enabled on boot pool for grub2 with version 2.12
onwards.

A new symlink grub2 is created that currently points to the grub2-2.12
compatibility file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes openzfs#13873
Closes openzfs#15261
Closes openzfs#15909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests