New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad block makes ubi_auto_attach read from wrong offset, fail to mount ubi #12895
Comments
0x520000 read and write both alright, nandtest also shows no error, so why it's detected bad by the kernel is unknown to me. |
Since none of nand_test, flash_erase, dd write and read, complains about offset 0x520000. I made a little test using this diff
boot log excerpt
The following string is manually written to mtd offset 0x560000. This sring is unique in mtd scope so that we can know ubi_auto_attach() actually reads from 0x560000 instead of ubi partition. What I don't understand is:
|
The bug is still there, but OpenWrt boots. Steps
The culprit seems to be the discrepancy between BBT and BBM. Mod kernel code to print some info
Also mod nandtest from mtd-utils, delete read and write codes, only test if a block is bad. nandtest calls ioctl(fd, MEMGETBADBLOCK, &test_ofs) for that purpose. ioctl(MEMGETBADBLOCK) output from stock FW
ioctl(MEMGETBADBLOCK) output from OpenWrt 22.3.05
And bootlog, using codes above
As we can see, bad blocks via different means don't match. Furthermore, I don't think block 43 is really bad, I guess it's ubi-related code that marked block 43 bad, that's why once OpenWrt boots, it always boots even flashed to official OpenWrt. The bug is still out there. |
Did you try master branch? nand driver issue should be fixed recently. |
The lastest master branch still showed this message 11 times. so maybe not fixed |
Describe the bug
There is an unmarked bad block at 0x520000, in partition "kernel". When creating MTD partitions, this bad block is detected, but instead of marking it, the kernel tries to erase it. nand_erase_nand() rejects the erasing.
Somehow this incident makes mtd_read() in ubi_auto_attach() read from a wrong offset, one-eraseblock-back specifically.
I used the following code to show magic[16], compared it to mtd dumps, then found out that magic[16] is actually read from 0x560000 instead of 0x580000. 0x560000 is the last eraseblock of partion "kernel".
Boot log, timestamp removed
OpenWrt version
22.03.5
OpenWrt target/subtarget
ramips/mt7621
Device
Linksys EA7500 v2
Image kind
Official downloaded image
Steps to reproduce
There is an unmarked bad block near the end of kernel partition. Flash the firmware, boot, fail.
Actual behaviour
ubi_auto_attach() reads from the last block of kernel partition and tries to find ubi magic.
Expected behaviour
ubi_auto_attach() reads from ubi partition and find ubi magic
Additional info
No response
Diffconfig
No response
Terms
The text was updated successfully, but these errors were encountered: