New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Opal PBA shuts down because of incomplete kernel modules related to MODULES=( 'loaded_modules' ) on Ubuntu 20.04.2 #2615
Comments
@tolga9009
to get an overview of the commits. In this case looking for Opal PBA related commit messages #2455 and |
@tolga9009 Thank you for the detailed issue description and the research you have done. It looks like something is going wrong in the PBA's startup script So I'd ask you to use the latest master branch version of ReaR and turn on the lights for the PBA. I have prepared a small patch file: PBA-shutdown-debug.patch.txt Please try:
You should now see a system booting in text mode, giving us details about the boot process. Finally, the system should display |
Thank you for the fast replies and the patch! I started with Oliver's suggestion, as I already had everything set up. For some reason, after booting I was greeted with a black screen. Boot process was like this:
I was able to reboot by pressing Ctrl+Alt+Del, so the system was still responsive. Came up with the idea, that maybe the fonts were invisible and blindly typed "poweroff". And bingo: 3 seconds later, my PC powered off. So, additionally to the original bug, I seem to have invisible fonts now. Didn't happen with //Update: I was able to unlock disks and enter functional shell with commit 055e3a1, so no problem there. I will try to bisect. |
Thanks for the quick reply. If the screen stays completely blank, it could be that the kernel could not initialize your Nvidia graphics card properly. Maybe some required kernel module or configuration file is missing in the PBA. Could you add
to the If that doesn't help, maybe you could configure your firmware a.k.a. BIOS to disable the Nvidia graphics card on boot? |
I can try it out, but I don't think that's the case. When booting with "quiet splash", I can see UEFI BGRT, Ubuntu Logo and the "Shutting down..." message. I think if this was a GPU issue, I should have a black screen instead, right? |
@tolga9009 |
@tolga9009 If the Ubuntu logo appears, your graphics should be operational. The black screen might be caused by Ubuntu's
in |
Thank you! I had console output now. There was an error message: I also bisected until e7338e5. That was the last OPAL related commit. No problems so far, I will bisect further. I will now look for commits related to |
@OliverO2 Incredible what Ubuntu does:
why do they think it benefits their users with such crazy deviations from upstream? |
@jsmeix I think the title isn't quite fitting. I have display output in the default state. It shows the Dell and Ubuntu Logo, but continues to shutdown, without asking me for a password. I'm unable to unlock my Opal drives. The black screen was caused by disabling quiet / splash for debugging. But I prefer the PBA booting silent and with splash for deployment. I bisected further. The regression happened somewhere between 3058973 and c38e61d. |
@tolga9009 |
@OliverO2 |
The PBA uses |
Yes, I see it right now |
The default behaviour of
so nothing should happen. |
cf0c39d is on a merged branch, which is based on a much older version of master. |
Sorry for the confusion. I have checked out at c6500f9, applied Oliver's patch from #2615 (comment) and got a more verbose output now. I will try to find a way to get the full log, but for now:
|
By the way: |
I'm from Germany aswell, wish you a great holiday and good weekend ;)! //Edit: I checked out at 6466012, it's working. |
Have a great extended weekend, too! I'll be watching this space even then and I'm somewhat curious about what the cause might be. c38e61d (reported to work correctly here) is on a branch based on master 4b43f43. So the cause must be after that. I'd still try reverting the changes from d2588e8 and 6a0013a first as these commits seem to be the ones most likely influencing the situation. The problem is diagnosed by
The normal output should be something like this:
|
@tolga9009 @jsmeix Could you please adjust the title again so that it matches what we know now? Something like this would be more precise and help people find answers:
Take-aways:
|
I cannot reproduce it on my openSUSE Leap 15.2 system But it should not make a real difference compared to "rear mkopalpba" I get exactly the <module_name>.ko files in the recovery system The tricky part how to verify it is that |
@tolga9009 See the With current ReaR GitHub master code
In the log check what happens after |
@tolga9009 @OliverO2 I would like to ask if you could verify (as time permits) @OliverO2 |
@jsmeix Well, you can test an Opal PBA on a machine without self-encrypting devices by setting As the issue apparently did not reproduce on your test installation, I have this on my list and will try to reproduce it here. I should be able to do so, as I have a system configuration available roughly matching Tolga's one. I'll try to test it, hopefully no later than the end of next week. |
I could run
plus
but without setting I got all loaded modules in the ReaR recovery system
So the issue is something specific on Ubuntu. |
By the way:
Apparently they were loaded while |
Regarding the modules listed in I have those loaded and in the ReaR recovery system:
|
I have tried to reproduce the issue on Ubuntu 20.04.2 LTS Desktop with rear c6500f9 (2021-05-11) vs. e7338e5 (2020-12-20). It did not happen in either configuration. So the current state of affairs is:
I have created a small test script which checks for modules in the same way commit d2588e8 does. A reference output for Ubuntu 20.04.2 is at https://pastebin.com/P6kGSqbz. @tolga9009 maybe you could try this on your system to avoid the issue reappearing with a new ReaR release: #!/bin/bash
KERNEL_VERSION="$( uname -r )"
function Error() {
echo "$*" >&2
}
function modinfo_filename () {
local module_name=$1
local module_filename=""
local alias_module_name=( $( modprobe -n -R $module_name 2>/dev/null ) )
test $alias_module_name && module_name=$alias_module_name
module_filename="$( modinfo -k $KERNEL_VERSION -F filename $module_name )"
if ! test "$module_filename" ; then
test "$KERNEL_VERSION" = "$( uname -r )" || Error "modinfo_filename failed because KERNEL_VERSION does not match 'uname -r'"
module_filename="$( modinfo -F filename $module_name )"
fi
grep -q '(builtin)' <<<"$module_filename" && echo '' || readlink -e $module_filename
}
loaded_modules+=" $( lsmod | tail -n +2 | cut -d ' ' -f 1 )"
loaded_modules_files="$( for loaded_module in $loaded_modules ; do modinfo_filename $loaded_module || Error "$loaded_module loaded or to be loaded but no module file?" ; done | sort -u )"
printf "%s\n" "$loaded_modules_files" Is the output relevantly different from that of https://pastebin.com/P6kGSqbz? |
@OliverO2 |
@OliverO2 @tolga9009 |
Thank you! Have an excellent weekend, too! |
Stale issue message |
I ran into this same issue on a new ubuntu 20.04.3 build. This issue is related to the following line in 400_copy_modules.sh: On my system, readlink -e $module_filename resolves to /usr/lib/XXXX instead of /lib/XXXX My temporary workaround was to replace 'readlink -e' with 'echo'. |
@nolocimes |
I assume also this one is fixed by #2731 |
Could you tell me how you solve this problem? |
Relax-and-Recover (ReaR) Issue Template
Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):
ReaR version ("/usr/sbin/rear -V"):
master branch c6500f9
OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
Ubuntu 20.04.2 LTS
ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
/etc/rear/local.conf:
Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
Dell M3800 (Intel Core i7-4702HQ, Nvidia Quadro K1100M)
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
x86
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
UEFI, GRUB
Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
Samsung SSD 860 EVO (SATA)
Description of the issue (ideally so that others can reproduce it):
git clone rear
,make deb
, install itsudo rear mkopalpba
, copy to USB stickIssue happens in c6500f9, but
git checkout rear-2.6
works fine. So, downgrading fixes it. Seems to be a regression, caused somewhere between 10e049b and c6500f9. If you have any ideas which commits might have caused it, please let me know. I can test it.Cheers,
Tolga
The text was updated successfully, but these errors were encountered: