Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 20.04 MIRROR Root on ZFS: shouldn't bpool get cloned on restore? #55

Closed
ddnexus opened this issue Sep 10, 2020 · 24 comments
Closed
Assignees

Comments

@ddnexus
Copy link

ddnexus commented Sep 10, 2020

I am a ZFS newbie so my apologies if I will say some stupid thing especially trying to explain what happens.

I followed the instructions for mirror root and it went well. I installed and removed a few different packages and kernels: I could see the entries in the GRUB history menu. Then I restored on an old one that still was using a kernel that I recently removed, and everything got restored as expected. So I did some other change on the system and reboot. I was expecting to automatically reboot in the just rebooted state, but instead it rebooted in the old one.

After I retried a few times, I discovered that in order to gain the same state (originated after the first restore) the only way I have is going to the grub history and pick again the last restored point. Then it boots in the last state, the kernel is the right one, the new installed software is there... all ok, BUT it looked like I couldn't update the GRUB menu at all.

Besides I manually saved a state with success, and I was expecting to find it in the grub history menu on reboot, but it was nowhere to be found. Even with a manual update-grub (which succeeded) there was no new entry: the grub menu was frozen in the past.

It looks like while the rpool persist the changes in the new state, the bpool is read only, so the GRUB menu will never get changed.

The zsysctl history seems to confirm that: rpool got cloned on restore but bpool didn't.

I suspect that in sigle disk installations using a read only bpool is not a problem. Since grub is on the EFI partition it gets updated as usual, but since the mirror installation has GRUB on bpool, shouldn't zsys clone it on restore as it does with rpool?

Did I miss something?

@rlaager
Copy link
Member

rlaager commented Sep 10, 2020

The next step here would be to try this on a single disk system. If this is reproducible on a single disk, then it is a zsys issue. If it is not, then it is likely something to do with the HOWTO setup.

@ddnexus
Copy link
Author

ddnexus commented Sep 10, 2020

But what is supposed to happen on restore? I see that the bpool didn't get cloned. Is that the expected behavior? If that is the case, then isn't that incompatible with the setup of GRUB on bpool?

@ddnexus
Copy link
Author

ddnexus commented Sep 10, 2020

Not sure whether that are questions to be posted here or elsewhere. Please, let me know.

@rlaager
Copy link
Member

rlaager commented Sep 10, 2020

Was your system installed after or fixed to address the “/boot/grub Not Mounted” errata?

But what is supposed to happen on restore?

I’m not sure. I haven’t tested zsys anywhere near as much as I should have, for a lack of time.

I see that the bpool didn't get cloned. Is that the expected behavior?

When you say “bpool” and “cloned”, what exactly do you mean? Are you expecting bpool/BOOT/ubuntu_ID to get clone to another dataset with a different ID?

If that is the case, then isn't that incompatible with the setup of GRUB on bpool?

I’m not following why you think that. Can you elaborate?

@ddnexus
Copy link
Author

ddnexus commented Sep 10, 2020

The system was installed after, so I didn't need to fix the errata. I just followed the how to for mirror.

When you say “bpool” and “cloned”, what exactly do you mean? Are you expecting bpool/BOOT/ubuntu_ID to get clone to another dataset with a different ID?

Sorry for my bad explanation. As I said I am new to the ZFS world, so I may even miss some basic understanding.
What I am saying is that AFAIK, zsys manages the restore by creating a clone that can be written and persists the changes (that is what happens with rpool).

What I think I understand is that the bpool in singe disk can be reverted to a previous state (e.g. to retrieve an old deleted kernel) by just using a saved snapshot (which is read-only), because the kernel needs only to be read, and the GRUB menu is written in the EFI partition and not in bpool.

With the mirror setup, grub is on the bpool, so in that case, adding changes to the current GRUB menu, implies being able to write the bpool, hence maybe we need a clone for mirror, not a snapshot.

I’m not following why you think that. Can you elaborate?

If that is the case, then just putting GRUB on bpool WITHOUT configuring zsys to clone it on restore (instead of just using the snapshot), would break grub, so shouldn't the how-to also add the instructions to setup zsys that way? (if it is possible and if it is not a problem of some other nature of course)

I am a bit confused about who is responsible for what with zfs and zsys. If you think I should post this question somewhere else (maybe in the zsys issues?) please, could you point me to the right place? Thank you!

@ddnexus
Copy link
Author

ddnexus commented Sep 11, 2020

@rlaager it looks like we got to the bottom of the problem thanks to @didrocks here ubuntu/zsys#163

boot/grub being in bpool causes the boot/grub/grub.conf to be updated in a state that is not available at boot (as explained in this comment ).

Please @rlaager, could you comment about the reason /boot/grub is not mounted from the EFI partition?

@rlaager
Copy link
Member

rlaager commented Sep 11, 2020

quoting from ubuntu/zsys#163 (comment):

The problem is that the file is not available at boot. At boot the same file is a different version (pointing to the old state).

At boot, where is the old version of /boot/grub/grub.cfg coming from? What is grub looking at for /boot/grub if not bpool/BOOT/ubuntu_ID? You should be able to determine this by looking at /boot/efi/EFI/ubuntu/grub.cfg. That should be short (like 3 lines), not a full GRUB config. My test system has:

search.fs_uuid d11dfcf18e7bbf0a root
set prefix=($root)'/BOOT/ubuntu_oy1czd/grub@'
configfile $prefix/grub.cfg

Obviously the UUID (d11dfcf18e7bbf0a) and "UUID" (oy1czd) will differ between systems. But that should be what the EFI setup looks like. That will cause GRUB to look at the bpool filesystem for grub.cfg.

@ddnexus
Copy link
Author

ddnexus commented Sep 12, 2020

Yes. Grub is looking for bpool/BOOT/ubuntu_ID but I think that is exactly the problem.

That ID is not immutable, it is changed (to e.g. ID2) by the restore, while the /boot/efi/EFI/ubuntu/grub.cfg is always pointing to the same initial ID one.

After the restore it should be pointing to /BOOT/ubuntu_ID2/grub/grub.cfg and indeed if we could boot with that file the first menu would be loading the state with ID2 (i.e the id after the restore that is the new current state now).

Instead with the current setup, the grub.cfg at boot will be the old one using the same /BOOT/ubuntu_ID/grub/grub.cfg, and that file has not been changed by an update-grub run after the restore, because that update-grub wrote its update into /BOOT/ubuntu_ID2/grub/grub.cfg.

If my explanation is not clear enough, please, let me know so I could add more details and examples.

Now, I think that /boot/grub should be a mirrored target as you said (because in case of failure of one disk the system should be able to boot from another), but it should also be an immutable target, i.e. not affected by snapshot/restore. It should be the same no matter what state the system is in.

I am thinking to something like bpool/BOOT/grub without any mutable id. Something configured so that whatever state will update-grub would write the grub.config in the same place (e.g. bpool/BOOT/grub/grub.cfg), overriding the previous one, so the last state will be preserved in the menu, and - of course - it will be on all the disks of the mirror.

@rlaager
Copy link
Member

rlaager commented Sep 12, 2020

Ahh, that makes sense... So the "/boot/grub Not Mounted" fix changed this for the worse. It used to be bpool/grub (which I'd say was perfectly reasonable), but zsys was messing with canmount on that.

Can you try this (most of this is double-checking for safety):

  1. zfs rename bpool/BOOT/ubuntu_ID bpool/BOOT/grub
  2. Check that it mounted correctly (mount | grep /boot and check the contents of /boot/grub).
  3. Check that /etc/zfs/zfs-list.cache/bpool updated (and change that by hand if needed)
  4. Update /boot/efi/EFI/ubuntu/grub.cfg
  5. Re-run update-grub
  6. Double-check /boot/efi/EFI/ubuntu/grub.cfg again for good measure.
  7. Verify /boot/grub/grub.cfg is sane.
  8. Reboot normally.
  9. Repeat your test.

@ddnexus
Copy link
Author

ddnexus commented Sep 13, 2020

Thanks for the answer, and sorry for the late reply.

Do you mean: zfs rename bpool/BOOT/ubuntu_ID/grub bpool/BOOT/grub or you really want me to rename the whole bpool/BOOT/ubuntu_ID?

@rlaager
Copy link
Member

rlaager commented Sep 13, 2020

Sorry, just the grub filesystem.

@ddnexus
Copy link
Author

ddnexus commented Sep 13, 2020

FYI: the rename didn't mount, I had to add the mountpoint to the bpool/BOOT/grub dataset.
BTW, I think I should remove the other bpool/BOOT/ubuntu_IDn/grub datasets now.
Also, I think that bpool/BOOT/grub should have the same properties of the renamedd dir which didn' t get preserved:

$ cat /etc/zfs/zfs-list.cache/bpool
bpool	/boot	off	on	on	off	on	off	on	off	-	none
bpool/BOOT	none	off	on	on	off	on	off	on	off	-	none
bpool/BOOT/grub	/boot/grub	on	on	on	off	on	off	on	off	-	none
bpool/BOOT/ubuntu_0j92qe	/boot	on	on	on	off	on	off	on	off	-none
bpool/BOOT/ubuntu_poz5wl	/boot	noauto	on	on	off	on	off	on	off	-none
bpool/BOOT/ubuntu_poz5wl/grub	/boot/grub	noauto	on	on	off	on	off	on	off	-	none
bpool/BOOT/ubuntu_sm53ql	/boot	noauto	on	on	off	on	off	on	off	-none
bpool/BOOT/ubuntu_sm53ql/grub	/boot/grub	noauto	on	on	off	on	off	on	off	-	none

@rlaager
Copy link
Member

rlaager commented Sep 13, 2020

BTW, I think I should remove the other bpool/BOOT/ubuntu_IDn/grub datasets now.

Yes, that should be okay to do once it's confirmed working.

Also, I think that bpool/BOOT/grub should have the same properties of the renamedd dir which didn' t get preserved:

Well, I think you want canmount=on here, as you want it mounted. The others had canmount=noauto since they were mounted by zsys due to being under bpool/BOOT/ubuntu_ID.

@ddnexus
Copy link
Author

ddnexus commented Sep 13, 2020

OK, so should I change anything of the other flags?

@rlaager
Copy link
Member

rlaager commented Sep 13, 2020

The other flags look the same already.

@ddnexus
Copy link
Author

ddnexus commented Sep 13, 2020

It restarted in the old state. It might start from the other disk efi that I didn't change yet (which is pointing to the old state)...
In the /boot/efi/EFI/ubuntu/grub.cfg I updated only the second line with the bpool reference, not the first line.

search.fs_uuid 4cc7ce6bd48a43c5 root 
set prefix=($root)'/BOOT/grub@'
configfile $prefix/grub.cfg

I will try to change also the other file in the second disk. If it will not work I don' t know if I can change it from grup console. At worse I will restart with a live cd and switch it back.

@ddnexus
Copy link
Author

ddnexus commented Sep 13, 2020

BTW... both /boot/efi1/EFI/ubuntu/grub.cf are pointing to the same disk. Shouldn't each point to its own disk?

@rlaager
Copy link
Member

rlaager commented Sep 13, 2020

BTW... both /boot/efi1/EFI/ubuntu/grub.cf are pointing to the same disk. Shouldn't each point to its own disk?

I don't understand. Aren't they pointing to a pool/dataset path where the pool is mirrored? Which part is disk-specific?

@ddnexus
Copy link
Author

ddnexus commented Sep 13, 2020

The reboot worked after updating also the second /boot/efi1/EFI/ubuntu/grub.cfg disk.
Sorry, I am just guessing the function of the first line.
what is the id in this line search.fs_uuid 4cc7ce6bd48a43c5 root ? What is its function?

The UUID does not match any UUID. What is supposed to represent?

$ blkid
/dev/nvme0n1p1: LABEL_FATBOOT="EFI" LABEL="EFI" UUID="0805-24D2" TYPE="vfat" PARTUUID="369ae632-8a2a-4043-94c2-bbe5e39ef358"
/dev/nvme0n1p3: LABEL="bpool" UUID="5532617629770597317" UUID_SUB="3455670217621163001" TYPE="zfs_member" PARTUUID="56d34a38-8970-42bf-9842-5ebe0edb7b31"
/dev/nvme0n1p4: LABEL="rpool" UUID="16719428588792489872" UUID_SUB="3175784280535683349" TYPE="zfs_member" PARTUUID="2802d0ec-a570-437d-8c46-4afc39212b65"
/dev/nvme1n1p1: LABEL_FATBOOT="EFI" LABEL="EFI" UUID="0F2C-7912" TYPE="vfat" PARTUUID="545f78e7-42dc-4a11-877a-784367dba937"
/dev/nvme1n1p3: LABEL="bpool" UUID="5532617629770597317" UUID_SUB="12243550743082945346" TYPE="zfs_member" PARTUUID="ad36b8a3-30b0-4e3e-8975-2c178cef4cd6"
/dev/nvme1n1p4: LABEL="rpool" UUID="16719428588792489872" UUID_SUB="17564055927207318553" TYPE="zfs_member" PARTUUID="538e6347-9e79-4cba-a4ba-19d644623ef4"

@ddnexus
Copy link
Author

ddnexus commented Sep 13, 2020

Using bpool/BOOT/grub is not a good idea, since zsys save states for everything in there, and the problem is not solved.

AFAIK a persistent dataset should be created outside of BOOT, so I also tried to put grub on a separate dataset in order to avoid the multiple states that get created in bpool/BOOT.
I first tried it on bpool/GRUB/grub, but then I tried it also on the original bpool/grub dataset that was giving problems. It looks like it works: the dataset appears to be mounted normally, the update-grub correctly writes to the right file.

zfs create -o canmount=on -o mountpoint=/boot/grub bpool/grub

manually updated both efi grub.cfg, update-grub and restored to an older state.

Correctly, it does not create extra datasets, the restore does trigger the updated menu, but then at the next boot the menu is again the old one. Always the same.

I am wondering where it gets the first boot menu, since - just before the reboot - I checked in the grub.cfg that the menu was the right one.

(edit) I got also a grub-probe error, but it was because I was not using sudo. D'oh :/

@rlaager
Copy link
Member

rlaager commented Sep 27, 2020

So I think the conclusion is that bpool/grub is what we really want (as the HOWTO originally used) but we need ubuntu/zsys@7442bf3 which fixes ubuntu/zsys#164 first.

@rlaager rlaager self-assigned this Sep 27, 2020
@ddnexus
Copy link
Author

ddnexus commented Sep 27, 2020

@rlaager I think that meanwhile the doc should change the current suggested way to create the grub dir, suggesting instead to use the standard grub in EFI configuration. It should also add a warning that the grub.cfg will have to be updated on the second EFI somehow, either manually or with some hook.

A possible temporary update solution could be appending a hardcoded cp statement at the end of grub-mkconfig that should sync the grub.cfg in both EFIs. Of course it would need re-patching when it gets updated.

@rlaager
Copy link
Member

rlaager commented Sep 27, 2020

I have an existing TODO note to investigate the new/improved GRUB support for multiple UEFI disks in 20.04. I'm really swamped right now, so I can't promise a time frame, but these things may end up being addressed together, in the manner you suggest.

rlaager added a commit that referenced this issue Dec 6, 2020
Ubuntu 20.04's GRUB supports multiple EFI disks.  There is a small
caveat in that it doesn't prompt in the chroot, but it works fine after
the reboot.  Using the stock support means that the ESPs will be kept in
sync automatically.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
Refs issue #55
@rlaager rlaager closed this as completed in 44170fd Dec 6, 2020
@rlaager
Copy link
Member

rlaager commented Dec 6, 2020

I have hopefully fixed this. If not, or you have other feedback, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants