Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doing an EL8 to EL9 upgrade on raid1 /boot drives results in (potentially) unbootable system #1054

Closed
warthog9 opened this issue Mar 10, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@warthog9
Copy link

Actual behavior
On a system, I've only seen this on bios boot specifically, with /boot being on a raid1 pair where the two drives both have grub installed. Upgrade runs fine, upon reboot grub declared a symbol error and gets stuck.

Easily fixed by rescue system booting, and explicitly running grub2-install on the respective drives, and everything is good after that.

To Reproduce
Steps to reproduce the behavior

  1. run 8 to 9 upgrade (leapp preupgrade && leapp upgrade) on a system with bios boot raid1 (a VM would suffice here)
  2. leapp preupgrade and upgrade don't flag this as a potential issue or ask a question about maybe explicitly installing grub onto multiple drives
  3. reboot
  4. upgrade occurs, reboot eventually happens and system ends up stuck due to grub symbol error

Expected behavior
leapp notices that /boot is on a raid device and asks about possibly extra steps it may need to take to ensure a bootable system during preupgrade and then actually upgrading grub on all the respective drives afterwards

System information (please complete the following information):

  • AlmaLinux 8.7 upgrading to AlmaLinux 9.latest
  • # uname -a
  • # rpm -qa "*leapp*" (or shorthashes of commits in case of manual installation):
    python3-leapp-0.15.0-2.el8.noarch
    leapp-upgrade-el8toel9-0.16.0-6.el8_6.elevate.8.noarch
    leapp-data-almalinux-0.2-2.el8.noarch
    leapp-repository-deps-el9-5.0.9-100.202203181036Z.249925a3.master.el9.noarch
    leapp-deps-el9-5.0.9-100.202203181036Z.249925a3.master.el9.noarch
# rpm -qa "leapp*"
leapp-0.4.0-1.201812151505Z.87a7b74.master.el7_5.noarch
leapp-repository-data-0.4.0-1.201812131256Z.f25d14e.master.el7_5.noarch
leapp-repository-0.4.0-1.201812131256Z.f25d14e.master.el7_5.noarch

Attach (or provide link to) log files if applicable (optional - may contain confidential information):

  • All files in /var/log/leapp/
  • /var/lib/leapp/leapp.db
  • journalctl
  • If you want, you can optionally send anything else would you like to provide (e.g. storage info)

For your convenience you can pack all logs with this command:

#

Then you may attach only the leapp-logs.tgz file.


Additional context
Add any other context about the problem here.

@warthog9 warthog9 added the bug Something isn't working label Mar 10, 2023
@pirat89 pirat89 transferred this issue from oamg/leapp Mar 10, 2023
@pirat89
Copy link
Member

pirat89 commented Mar 10, 2023

Hi \o This is kind of expected/possible now as RAID is not handled in any way during the upgrades yet, which is on the long-term list of features:

Also it's documented as known issue. If I understand well the reported problem.

@pirat89
Copy link
Member

pirat89 commented Mar 10, 2023

Of course, in this case, it would make sense to have an inhibitor for the upgrade so the system is not lost.

@warthog9
Copy link
Author

warthog9 commented Apr 5, 2023

I think the bugzilla entry is slightly different than what I'm seeing, and oddly not one I'm currently running into? The initrds seem to have mdadm just fine during the upgrades so I'm not sure but that could be a 7 to 8 issue that's not present on 8 to 9 (I can dummy up a test VM and check if that would be helpful?)

The specific issue I'm running into is that grub from 8 to 9 isn't compatible and thus it can get delightfully wedged because the upgrade misses the second reference (and I agree that's not entirely unexpected, storage topologies can be hard, booting doubly so).

I think an inhibitor with a "I can recover from this when/if it goes bad" option is a reasonable speed bump here, and alerts folks that maybe doing this 100% remotely is likely to go extra poorly

@pirat89
Copy link
Member

pirat89 commented Jul 17, 2023

@warthog9 just thinking that this probably fixes your issue: #1093

@warthog9
Copy link
Author

@pirat89 agreed, that does look like it would resolve this. I'll spin up a VM and test it quick, but my reading of the patch feels like it resolves my issue!

@pirat89 pirat89 closed this as completed Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants