New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPi4 reboot hangs with "kernel not found" blinks when running at 4kp60 #1763
Comments
Do you think this is a regression or has this issue always been present for you? |
It looks like an issue with the 3.3V voltage switch on RPi 1.1 and earlier. Please could you give this a try. |
I think it's a regression in the firmware but I can't tell when it occurred. I've been running LE 9.2.8 (with ancient 2b76cfc firmware) and rather recent bootloaders on that RPi and didn't notice any reboot issues with that. I had been using netboot to test LE10/11 on that RPi and just recently switched to running LE11 from SD card and noticed the issue. I had been testing with SD cards and recent firmwares/bootloaders on other RPi4s in my office room but they were hooked up to non-4k monitors - and they were fine |
@timg236 thanks a lot, I'll give it a try this evening after work! |
I think the regression was introduced around April. I managed to find an RPi 1.0 and noticed that 3V3 was not being reset which appeared to cause the same symptoms that you reported. Failing to switch voltage isn't necessarily fatal on all cards, the 4Kp60 element isn't fully understood but system is in overclocked / voltage-boost state so there could be some subtle interaction there. |
Fix issue where SD voltage was not reset by power cycling PMIC on reboot. See raspberrypi/firmware#1763
The bootloader change seems good so I think we'll likely merge this since it's easy to verify it via 3V3. There's possibly a secondary problem in start4.elf if there is no HDMI display connected and core_freq is boosted above 500MHz and force_turbo=1. |
@timg236 thanks a lot, 2022-12-07 bootloader fixed the issue for me. I've found another bootloader bug though: With 2022-04-26 and 2022-12-07 bootloader my scope confirmed that 3V3 is going low for about 50 ms on reboots and I saw the two bootloader init messages on console - 2022-05-20 and later didn't drop 3V3 and resulted in read errors. However when doing a rpi-eeprom-update from 2022-04-26 to 2022-12-07 3V3 wasn't dropped and I was left with a similar non-boot situation:
With 2022-12-07 bootloader I haven't noticed any issues with 4kp60 yet but it's still puzzling why rebooting eg with 2022-05-20 bootloader (or 2022-11-25 as in my initial report) seem to be working fine without 4kp60 and failing with 4kp60 enabled... |
Just noticed updating from 2022-04-26 to 2022-12-07 wasn't a too good data point for testing 2022-12-07 bootloader but updating 2022-12-07 to 2022-12-07 resulted in the same issue - 3V3 not dropped, read errors.
|
Please could you check the version of recovery.bin installed to /boot before applying the update (strings /boot/recovery.bin | grep VERSION) |
Btw I think the 4kp60 element is a separate issue which is exacerbated by losing the PMIC reset. Best guess is that the reboot causes the ROM to run with an unusual voltage which upsets MMC controller. |
Ah, sorry, my fault. Of course I was using old recovery.bin (from apt update a few hours ago, 14.0-1). apt just got me rpi-eeprom 15.0-1 with new recovery.bin and with that I see 3V3 is being dropped on eeprom update and updating 2022-12-07 to 2022-12-07 succeeded (I was lazy and just used rpi-eeprom-config -e). I'll do a few more tests the next days but so far it's looing good here now. The 4kp60 (probably core_freq?) thing is really puzzling.... You folks with access to firmware source might have a better chance than me finding out why it seems to have triggered the failure. Anyways: though my initial issue seems to be solved now and I couldn't notice other issue I guess it might be good to keep this issue open for a week or so in case other people get hit by it (no one looks at closed issues...) Feel free to close it any time in case I forget though - I'll reopen or create another issue if/when I have more info or an actual failure to report. |
Thanks for confirming the recovery.bin thing. Let’s keep this issue open until we understand the clock issue. |
I'm still investigating the root-cause but I've been testing a pending change to switch the firmware to the newer and more reliable SDHCI controller as used by Linux (and now the bootloader). With this firmware I can no longer reproduces the 4kp60 issues (connected or not). Originally, this change was added in order to see if it helped with boot time, since that wasn't the case it was parked because of the risk of regressions. However, the experience with the bootloader is convincing me that this is is actually more robust (soak testing on various board revisions will be required though! Would you mind giving it a try? I can build the other start*elf variants if needed. |
@timg236 sure, I'll do some tests with it later. Could you also build the 4x variant? That's the one we use in LE by default |
No problem, here's the slightly more official releases from the CI system |
Thanks a lot, initial testing with 32GB Sandisk Extreme and Extreme Pro cards looks good here so far. II'll try to find my Samsung cards (I should have a couple of them, but where...) and do some more tests with them as well |
I had the same problem: after installing firmware version higher than 2022-04-26 and enabled hdmi_enable_4kp60=1, every reboot ends with rainbow screen. |
Testing with a 32GB Samsung Evo+ card and an 8GB rev 1.4 RPi4 looks fine here as well |
firmware: power: Always read the uncached voltage for AIN and USB_PD See: https://forums.raspberrypi.com/viewtopic.php?p=2059832#p2059832 firmware: Use new SDHCI controller instead of legacy arasan See: #1763
firmware: power: Always read the uncached voltage for AIN and USB_PD See: https://forums.raspberrypi.com/viewtopic.php?p=2059832#p2059832 firmware: Use new SDHCI controller instead of legacy arasan See: raspberrypi/firmware#1763
I left some boards in a reboot loop over the weekend and they were still running so this has been pushed to rpi-update master by @popcornmix. Not much else has changed but please shout if you encounter any issues in this area. |
Thanks a lot to everyone involved in fixing this issue! Latest rpi-firmware works fine here so I'm closing this |
@timg236 Can you tell if this (SD card problem) also affects a compute module 4? |
@capiman The 3V3 switch issue resolved by the bootloader update would not have affected CM4 because all CM4s have an SD power switch. The firmware change for headless boot with 4kp60 enabled (or overclocked core freq) would be relevant for CM4 Lite + SD. I didn't see it fail with CM4 + EMMC though. |
Fix issue where SD voltage was not reset by power cycling PMIC on reboot. See raspberrypi/firmware#1763
Describe the bug
When rebooting my RPi4 4GB rev 1.1 connected to a 4K TV rebooting often hangs at the rainbow splash and UART reports that it could not read cmdline.txt and kernel.img from SD card.
The issue is a bit intermittent, sometimes they survive about 5 reboots but quite often the first reboot already fails.
I reproduced that with 2 rev 1.1 RPi4s and 3 different SD cards (two Sandisk Extreme 32G and one Sandisk Extreme Pro 32GB).
To reproduce
The key point seems to be that 4kp60 is enabled and the RPi is connected to a 4kp60 capable TV - or the edid of the 4k TV is used - I've used the latter to reproduce it without any peripherals except serial console connected.
I used this config.txt with the attached edid file lg-55c8.bin.zip
I dd a lot of reboots with and without 4kp60 enabled when connected to a 1920x1200 monitor and that worked fine (boot-looping for an hour).
Expected behaviour
RPi4 reboots fine without locking up in firmware.
Actual behaviour
On an unsuccessful reboot firmware reports that it couldn't load files from the SD card.
System
RPi4 4GB rev 1.1. cpuinfo:
cat /etc/rpi-issue
)?RPi OS Bullseye Lite, fully updated yesterday
vcgencmd version
)?uname -a
)?bootloader:
Logs
Serial console log from an unsuccessful reboot:
This is the log from the successful cold boot just before that
Additional context
I initially noticed the issue on LibreELEC master, running kernel 6.0 or 6.1, which showed the same behaviour
The text was updated successfully, but these errors were encountered: