New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipq40xx: meraki-mr33, meraki-mr74: disable image generation #12953
Conversation
cc @chunkeey |
Sadly more and more devices are being hit. |
I wonder if using lzma-loader, or any other intermediate is feasible on ipq40xx at all. |
Well, somebody would have to port LZMA loader, its not plug and play. |
And while at that, MR33 isn't the most graceful target to test it on. Thankfully I have a spare MF286D, so I may take a plunge at doing so in future. |
I saw it mentioned (in the LEDE-MR33 mega issue) that you are not seeing any console output after the bootloader hands off to kernel. Also saw from OpenWrt device wiki page an OEM kernel booting from an uncompressed FIT image. It may be worth trying the self-decompressing kernel, to rule out bootloader decompression issues. This may be available through FitzImage (I have not used FIT images, sorry):
There are also some additional debug strings avail in the kernel self-extraction phase, before kernel starts, but these need configuring in kernel: CONFIG_DEBUG_LL . I have these extras I used some time ago, but cannot remember if they worked:
--- a/target/linux/ipq40xx/config-5.4
+++ b/target/linux/ipq40xx/config-5.4
@@ -161,8 +161,12 @@ CONFIG_CRYPTO_SIMD=y
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_ZSTD=y
CONFIG_DCACHE_WORD_ACCESS=y
-CONFIG_DEBUG_LL_INCLUDE="mach/debug-macro.S"
+CONFIG_DEBUG_LL=y
+CONFIG_DEBUG_LL_INCLUDE="debug/msm.S"
CONFIG_DEBUG_MISC=y
+CONFIG_DEBUG_QCOM_UARTDM=y
+CONFIG_DEBUG_UART_PHYS=0x078af000
+CONFIG_DEBUG_UART_VIRT=0xf78af000
CONFIG_DMADEVICES=y
CONFIG_DMA_ENGINE=y
CONFIG_DMA_OF=y Cheers |
Also affected: Zyxel WRE6606. I soft-bricked while trying to port DSA. I got it recovered by now. Here's the only commit for ipq40xx so far that bumped compat-version to make people adjust their bootcmd variable |
After migrating to kernel 5.15, upgrading causes the units to become soft-bricked, hanging forever at the kernel startup. Kernel size limitation of 4000000 bytes is suspected here, but this is not fully confirmed. Disable the images to protect users from inadvertent bricking of units, because recovery of those is painful with Cisco's U-boot, until the root cause is found and fixed. Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
586632b
to
9d64cc0
Compare
@Leo-PL have you tested
|
Will try soon-ish, I'm quite busy right now. Seems simple enough, thanks! |
@hauke I just checked, and 23.05 is affected too. This most likely needs a backport. 22.03 works. Checking the FitzImage option now. |
@Djfe Just tried, MR33 is able to boot FitzImage, however this does not help. Might just be useful to make images more compact. Reduction is substantial, more than 1MB for 22.03. I'm having issues with 5.15.79 kernel - have to rebase to the newest. No problem booting a 6MB initramfs image from flash either - this is my current recovery mechanism. |
This also needs |
Okay, I finally got some results:
Why kernel complains about ATAGs on a DT-based system - no idea :/ |
So if I understand you correctly, then something else got broken in Kernel 5.15 and that's why the FitzImage didn't help, yet. It still doesn't boot :/ |
Or in FIT image generation - the kernel either doesn't see the device tree passed to it, or deems it invalid. |
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
Today I noticed, that fresh build with kernel 5.15.120 works.
Which looks like a pure nonsense, but it looks like the culprit somehow is near.
Maybe the image generation is at fault. The further I dig into this, the more puzzled I am. |
I suspected some random issue here aswell as I tried various kernel options which sometimes caused bootable images and sometimes didn't. Disabling something completely unused caused issues, and disabling something else fixed them again. |
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
Does that mean you have 2 images with the same versions, same build config and so on and one is bootable and one isn't? Did you run something like |
So the image I built from latest master with the recipe-change from above (and I also modified the qca8k driver and included that FDB learning disable workaround, that's entirely unrelated though) did lead to a bootable image. I am not sure if I was "just lucky" or if images are now reliable and working properly again though. Only time will tell I guess. |
By any chance. do you have a boot log from your attempt? Just until the kernel starts is enough. |
No bootlog, I didn't disassemble the unit. Where is that its file located? |
Look for When you find it, I think you can force the load address to be unaligned too, then regenerate the image and see if such image boots. Or do that from the image makefile level, this should be easier. |
I haven't opened the unit so I prefer to not try to brick it. Maybe you can try to assemble the broken image and turn it into a good one? |
If everything goes well, maybe I can find some time in the evening. For this exact reason, since last experiments I kept one of my units without replacing the rubber feet and their sticky tape. |
Okay, I managed to squeeze out some time for the test - we have it. Setting the While at that, do we have any influence through |
I can make the PR, its great that we found the issue. |
That's for both, MR33 and MR74, right? @robimarko Thank you for your help! If you want, go ahead (and don't forget to enable image generation and backport to 23 RC). |
They should both the pretty much the same board? @Leo-PL Are you sure the address was There is no way to force alignment during image generation, the same was discussed when UniFi 6 devices had the same issue, there is probably a PR somewhere with the discussion. |
@robimarko, Correct, it is 0x89000000 indeed. While at that, we might enable FitzImage, this would save on kernel size substantially, and extract some common defines within image makefile. At current |
Yes, but if the bootloader uses device tree in place (as MR33's U-boot does) then we have some influence, to at least make the problem rarer. |
We can just completely avoid the issue by hardcoding the adress. |
And this is the proper solution too, because of reliability. At the same time it is device-specific, sadly. |
Well, the thing is that its not supposed to be happening at all, and I have no idea why its happening on Meraki bootloaders as they are just QCA reference ones. Made a PR, I did not move to |
I am not too sure, the code at https://github.com/riptidewave93/meraki-uboot/blob/e89f7c54758e513b8fda3f63af171ebe23f0adb2/common/image.c#L1526 doesn't seem to enfore the alignment, it seems to just try to find whereever that FIT image is currently located and then it tries to use it. And somehow https://github.com/riptidewave93/meraki-uboot/blob/e89f7c54758e513b8fda3f63af171ebe23f0adb2/common/image.c#L1278 won't help here either as disable_relocation seems to get set because |
|
Very likely, having a look at u-boot:
|
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
I thought about this but I don't see the advantage: This unit has more than enough flash and kernel size doesn't seem to be an issue, so what this would do is increase booting time due to the extraction process and I think when doing such a change that should be a factor aswell, something like 1 second would be okay in my opinion, something like 5-10 would make me question the change. Isn't it possible to make it configurable somehow in |
It cannot be selectable, its just a recipe thing. |
Just to put this into perspective a little: On my unit I have left the original system installed on separate partitions and I still have about 32 MiB of free space on pretty much stock openWRT with Luci. The smaller kernel could maybe bump this to 34 MiB. Does it really make sense for this device? I don't think so.... |
On the other hand: are there any real downsides of using FitzImage? The board boots faster and kernel is smaller, if only but a little bit. |
flole probably thought it would boot slower but better compression increases overall read speed on most of these flashes ok there is no comparison for lzma in those slides but unless your device is single core, lzma isn't as slow as you think. |
Exactly, thanks for providing the comparison. In that case we should absolutely switch. |
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
The device was disabled in OpenWrt due to unresolved issues with branches above openwrt-22.03. Even though we only support it as broken it's better to wait and see what happens upstream. ref: openwrt/openwrt#12953
After migrating to kernel 5.15, upgrading causes the units to become soft-bricked, hanging forever at the kernel startup. Kernel size limitation of 4000000 bytes is suspected here, but this is not fully confirmed.
Disable the images to protect users from inadvertent bricking of units, because recovery of those is painful with Cisco's U-boot, until the root cause is found and fixed.