Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LEAPPing machines that needs 3rd-party kernel modules (eg, disk drivers) to boot might fail #705

Open
krono opened this issue Aug 9, 2021 · 7 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@krono
Copy link

krono commented Aug 9, 2021

Actual behavior
Some systems are set up with 3rd-party kernel modules for the booting hard disk.
(Most typically, these come as DUPs at install time)

When leapping, it seems that these modules are not picked up for the upgrade-initrd.

To Reproduce
Steps to reproduce the behavior

  1. find a machine with a FakeRAID
  2. Install RHEL7
  3. Try to leapp
  4. After Reboot and boot into the upgrade-initramfs, dracut aborts and complains about missing disks.

Expected behavior
LEAPP may pick up kmods for the initramfs when they appear in the upgrade.

Maybe a switch, simliar to enablerepo could be used to force the "installation" of certain packages prior to creation of the initiramfs?

System information

[System is dead now]

  • RHEL7.9
  • Leapp from RHEL7.9 with leapp-data14.tar.gz
  • Contents of /var/log/leapp
  • (no further info, system does not boot ATM, If I can access it, I'll fill)
Situation that lead to this idea

Context and Things tried

Our machine contains an infamous Intel C620/LSI MegaSR2-RAID chip.
Vendors provide drivers for these as DUPs, eg Dell, HPE, or Fujitsu.
This means, installing with that chip using DUPs is fine.

Our preupgrade went fine, and in anticipation we even included the online-version of the DUPs in the --enablerepo step.
This resulted in no error and the new version of the kmod was listed among the packages to be installe (NOTE: it was marked as a downgrade, as the RHEL7.9 version of the driver has a higher version number than the RHEL8.2) version.
We also had to make use of a targeted LEAPP (to 8.2), as the driver is not yet available for 8.4, only up to 8.3.

After reviewing the report, we proceeded with upgrade and stopped just before reboot.
At that time, we grabbed the log files (see leap-log.zip )
and inspected the initramfs and compared it to the initramfs of the still running RHEL7.9.
(you will find a few mentiones of megasr2 in the logs).

We found the kmod missing. As a workaround, we tried manually including the kmod from the already downloaded rpm into the initramfs:

Initramfs patching steps These are specific to RHEL8.2, and the Fujitsu variant of the MegaSR2-Driver ([which can be found here](http://patches.ts.fujitsu.com/linux/pldp/RHEL8/rhel8-u2/x86_64/)

The following steps make the intiramfs similar to the RHEL7.9 one with regards to megasr.

# upgrade with enabled fujitsu repo
leapp upgrade --enablerepo primergy-kmod-el8.2
# prior to reboot:

# extract initramfs into temporary location
mkdir ~/initramfs-upgrade
cd ~/initramfs-upgrade
/usr/lib/dracut/skipcpio /boot/initramfs-upgrade.x86_64.img | zcat | cpio -idv
# find 
#/var/lib/leapp/el8userspace/var/cache/dnf/primergy-kmod-el8.2-7b6ee48acb7dd887/packages/kmod-megasr2-18.02.2020.0827.4fts-2.el8.2.x86_64.rpm

# pour rpm contents into extracted initramfs
rpm2cpio /var/lib/leapp/el8userspace/var/cache/dnf/primergy-kmod-el8.2-7b6ee48acb7dd887/packages/kmod-megasr2-18.02.2020.0827.4fts-2.el8.2.x86_64.rpm | cpio -idv

# create "weak-update" structure normaly created when actually installing the rpm and running dracut
mkdir -p usr/lib/modules/4.18.0-193.28.1.el8_2.x86_64/weak-updates/primergy-megasr2
ln -s ../../../4.18.0-193.el8.x86_64/extra/primergy-megasr2/megasr2.ko usr/lib/modules/4.18.0-193.28.1.el8_2.x86_64/weak-updates/primergy-megasr2/megasr2.ko

# update various modules.* files but only in the initramfs-directory
depmod -b $PWD 4.18.0-193.28.1.el8_2.x86_64

# backup actual initramfs
mv /boot/initramfs-upgrade.x86_64.img /boot/initramfs-upgrade.x86_64.img.ORIGINAL

# repack initramfs
# NOTE: THIS DROPS AMD MICROCODE UPDATE
find . | cpio -o -c -R root:root | gzip -9 > /boot/initramfs-upgrade.x86_64.img

cd -

reboot

After reboot, the system actually sees the disk which indicates that the driver was found.
However during the brief period we could watch the system,

  • The installation of the primergy-megasr2 package in the new userland seemed to fail
  • The kernel did not seem to be properly installed and dracut failes
  • Eventually, leapp exited and tried to write a log file "outside" of the container, which failed due to "read-only filesystem"

The system then rebooted, but at grub, only the RHEL7 variants/kernels were available.
Trying to boot these hangs the system.


* EDIT *: It turns out the kernel and initramfs were correctly build with the 3rd-party module, however, the respective entries were not written to the grub config.

In fact, the old, RHEL7.9 grub.cfg is in place, and the config to be, grub.cfg.new is cut off right after ### begin /etc/grub.d/10_linux.

manually editing the grub cmdline boots the system, but it seems the root file system was damaged and took with it the leap_resume.service

Note: closing this simply because we used LEAPP_UNSUPPORTED machinery would be fair.
Nonetheless, a means to include kmods for a leapped upgrade would be nice.

@pirat89 pirat89 transferred this issue from oamg/leapp Aug 11, 2021
@pirat89
Copy link
Member

pirat89 commented Aug 11, 2021

Hi @krono. Thanks for the report and for the sharing of steps you need to do to update the upgrade-initramfs. I have a busy time in these days, so I will go more carefully through it later. So answering now just to what I read, without looking into the provided logs.

We are aware about this limitation and we want to deliver a mechanism for users to make possible to create relatively simple custom actors (to customize/extend the default IPU functionality) to take care about various kernel drivers etc. During the upgrade.

Currently we delivered in the upstream a mechanism that provides possibility to affect dracut modules used in the upgrade & target initramfs and added possibility to say what files should be aded into these initramfss. What is missing still is the mechanism for kernel drivers specifically and maybe the possibility to affect used dracut options. Unfortunately we have not documented it yet, as the testing is not finished and we would like to implement the support for drivers first. So to use the implemented mechanism could be now a little bit tricky. In short, it should be so easy as produce couple of messages in an actor. For example, look at the commonleappdracutmodules actor. It's not the best example as this is affecting just the upgrade initramfs and you can see there still a deprecated code present, but in short, something like

   api.produce(UpgradeInitramfsTasks(include_dracut_modules=[DracutModule(name=...,)])

Just in case of drivers, expect something like UpgradeInitramfsTasks(add_drivers=[...]). If you have an RPM providing the dracut modules, etc. You could even tell leapp to install it to the environment (container) we use to create the upgrade initramfs. Expect you probably will need to create a similar message for the target initramfs. More about difference between the upgrade and target initramfs is described in the models files (below). I expect we will document this much better in future when we finish the implementation.

Currenty we are discussing our priorities. Opening ticket on RH support or BZ for leapp-repository could help with prioritisation.

Is the proposed solution OK for you? Currently we do not expect adding of a CLI option for that.

Additional notes:

@krono
Copy link
Author

krono commented Aug 14, 2021

Hi @pirat89 thanks for reading through my (admittedly uncooridnated) notes.

You could even tell leapp to install it to the environment (container) we use to create the upgrade initramfs.

Thats what I thought.
Something like "early packages", which probably would also solve you mdadm-thing…

For me it was a one-time-thing. We somehow got the affected machine to work and just hope that we got far enough in the leapping that its fine now.

Feel free to close; I learned quite bit, tho, thanks!

@pirat89
Copy link
Member

pirat89 commented Aug 16, 2021

Hi @krono, thank for letting us know. I will keep this one opened as we can use it for public tracking around the RAIDs question & drivers in general. Just realized that for someone else who could read this, it would be helpful to see the script that is executed to create the upgrade initramfs:

In case of mdadm, right now it seems that

  • putting related config files (by produce of targetuserspace-related messages mentioned above) into the target userspace container & initramfs
    • enabling building of initramfs with mdadm (LEAPP_DRACUT_MDADMCONF=1 envar)

could be helpful in case of mdadm. But I haven't tested it yet and I am not SME around storage. It's just our idea where we would like to start experiment with mdadm in future.

@krono
Copy link
Author

krono commented Aug 17, 2021

Sounds like a good idea to me. Thanks.

@pirat89 pirat89 added bug Something isn't working enhancement New feature or request labels Nov 25, 2021
@pirat89
Copy link
Member

pirat89 commented Mar 7, 2023

It seems we will be working around that in upcoming months. pinning the issue.

@pirat89 pirat89 pinned this issue Mar 7, 2023
@bessonc
Copy link

bessonc commented Nov 17, 2023

sounds fixed by #1081

@pirat89
Copy link
Member

pirat89 commented Dec 11, 2023

To my understanding It's fixed partially. Another work is still expected regarding mdadm. However,

  • it's possible add additional drivers into the upgrade/target initramfs
  • we have fixed several issues related to RAIDs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants