Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 5.18 kernel unable to read from NAND on i.MX6Q phycore SOM #12

Closed
S4mw1s3 opened this issue Jun 27, 2022 · 5 comments
Closed

Linux 5.18 kernel unable to read from NAND on i.MX6Q phycore SOM #12

S4mw1s3 opened this issue Jun 27, 2022 · 5 comments

Comments

@S4mw1s3
Copy link
Contributor

S4mw1s3 commented Jun 27, 2022

Hi,

I recently updated the Linux kernel (linux-fslc) on our i.MX6Q phycore SOM to version 5.18 and I encounter some issues. More specifically in the bootlog I see there are no bad block tables found anymore. When comparing to the bootlog of a 5.15 kernel, here are the differences:

[    0.000000] Linux version 5.18.3 (oe-user@oe-host) (arm-puppy-linux-gnueabi-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38.20220516) #1 SMP Thu Jun 9 19:48:48 UTC 2022
...
[    2.519208] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd3
[    2.525807] nand: AMD/Spansion S34ML08G2
[    2.529786] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
[    2.544941] Bad block table not found for chip 0
[    2.553694] Bad block table not found for chip 0
[    2.558526] Scanning device for bad blocks
[    6.800208] Bad block table written to 0x00003ffe0000, version 0x01
[    6.808571] Bad block table written to 0x00003ffc0000, version 0x01
...
[    8.366735] ubi0: default fastmap pool size: 256
[    8.371471] ubi0: default fastmap WL pool size: 128
[    8.376441] ubi0: attaching mtd2
[    8.744289] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[    8.755985] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[    8.767567] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[    8.779105] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read 64 bytes
[    8.789112] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.3 #1
[    8.795054] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[    8.801598]  unwind_backtrace from show_stack+0x10/0x14
[    8.806854]  show_stack from dump_stack_lvl+0x80/0x90
[    8.811933]  dump_stack_lvl from ubi_io_read+0x134/0x2f8
[    8.817265]  ubi_io_read from ubi_io_read_ec_hdr+0x48/0x234
[    8.822856]  ubi_io_read_ec_hdr from scan_peb+0x68/0x7e4
[    8.828187]  scan_peb from ubi_attach+0x184/0x378
[    8.832909]  ubi_attach from ubi_attach_mtd_dev+0x580/0xb9c
[    8.838510]  ubi_attach_mtd_dev from ubi_init+0x164/0x224
[    8.843934]  ubi_init from do_one_initcall+0x68/0x428
[    8.849009]  do_one_initcall from kernel_init_freeable+0x178/0x224
[    8.855216]  kernel_init_freeable from kernel_init+0x14/0x140
[    8.860987]  kernel_init from ret_from_fork+0x14/0x28
[    8.866061] Exception stack(0xf083dfb0 to 0xf083dff8)
[    8.871129] dfa0:                                     00000000 00000000 00000000 00000000
[    8.879321] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    8.887511] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    8.894709] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read only 64 bytes, retry
[    8.906245] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read only 64 bytes, retry
[    8.917808] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read only 64 bytes, retry
[    8.929367] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read 64 bytes

[    0.000000] Linux version 5.15.32 (oe-user@oe-host) (arm-puppy-linux-gnueabi-gcc (GCC) 11.3.0, GNU ld (GNU Binutils) 2.38.20220313) #1 SMP Mon Apr 4 11:11:23 UTC 2022
...
[    2.555871] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd3
[    2.562486] nand: AMD/Spansion S34ML08G2
[    2.566466] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
[    2.578328] Bad block table found at page 524224, version 0x01
[    2.590426] Bad block table found at page 524160, version 0x01
...
[    4.150045] ubi0: default fastmap pool size: 256
[    4.154781] ubi0: default fastmap WL pool size: 128
[    4.159729] ubi0: attaching mtd2
[    4.377529] ubi0: attached by fastmap
[    4.381420] ubi0: fastmap pool size: 256
[    4.385367] ubi0: fastmap WL pool size: 128
[    4.428856] ubi0: attached mtd2 (name "root", size 1007 MiB)
[    4.434635] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[    4.441588] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[    4.448398] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[    4.455447] ubi0: good PEBs: 8052, bad PEBs: 4, corrupted PEBs: 0

I tested the above on 2 devices and both had the same issue so I don't think it's device specific. Also, barebox is still able to read from NAND so the flash is not broken or anything.

With kernel 5.18, after a "failed" boot, barebox also starts to complain about bad block tables not being found:

barebox 2021.04.0 #1 Wed Jun 8 07:54:17 UTC 2022

Bad block table not found for chip 0
Bad block table not found for chip 0

so for both reading/writing from/to nand, it looks like the 5.18 kernel is doing something but it's having issues.

Could somebody try the 5.18 kernel on a i.MX6Q phycore SOM with NAND and confirm the above issue or is it maybe an issue on our side (incorrect device tree)?

@smk-embedded
Copy link
Collaborator

Thanks for the report! We will have a look at the issue.

@S4mw1s3
Copy link
Contributor Author

S4mw1s3 commented Jun 29, 2022

In the mean time I switched back to LTS 5.15 and I am now also experiencing the same issue on 5.15.48 kernel. After investigating and reverting only commit mtd: rawnand: gpmi: fix controller timings setting, the NAND issue was solved.

Coincidentally I just stumbled on https://lore.kernel.org/linux-mtd/20220614083138.3455683-1-s.hauer@pengutronix.de/ which I assume will fix the issue (still need to test). I see that commit has just been added to kernels 5.15.51 and 5.18.8 so I think from those kernels on, the phycore i.MX6 NAND SOM will boot again.

@smk-embedded
Copy link
Collaborator

We can reproduce the issue with the 5.15 upstream kernel.
If we revert "mtd: rawnand: gpmi: fix controller timings setting" the device boots and NAND works.
Applying "mtd: rawnand: gpmi: Fix setting busy timeout setting", like suggested, doesn't fix the problem. It boots, but NAND tests still show errors. We are investigating on the current LTS v5.15.51.

@S4mw1s3
Copy link
Contributor Author

S4mw1s3 commented Jul 1, 2022

After doing some additional testing I can confirm your observations. In fact, the issue I encounter is that mounting another ubifs volume fails:

[   11.498822] UBIFS (ubi0:4): Mounting in unauthenticated mode
[   11.506009] UBIFS (ubi0:4): background thread "ubifs_bgt0_4" started, PID 267
[   11.550442] UBIFS error (ubi0:4 pid 266): ubifs_unpack_nnode: invalid type (15) in LPT node type 1
[   11.559750] CPU: 0 PID: 266 Comm: mount Tainted: G           O      5.15.48 #1
[   11.567004] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   11.573556] [<c01107b8>] (unwind_backtrace) from [<c010b99c>] (show_stack+0x10/0x14)
[   11.581338] [<c010b99c>] (show_stack) from [<c07f17bc>] (dump_stack_lvl+0x80/0x90)
[   11.588932] [<c07f17bc>] (dump_stack_lvl) from [<c03b1b68>] (ubifs_unpack_nnode+0x110/0x134)
[   11.597403] [<c03b1b68>] (ubifs_unpack_nnode) from [<c03b1ec4>] (ubifs_read_nnode+0x1a0/0x21c)
[   11.606039] [<c03b1ec4>] (ubifs_read_nnode) from [<c03b251c>] (ubifs_lpt_lookup_dirty+0x1d8/0x288)
[   11.615024] [<c03b251c>] (ubifs_lpt_lookup_dirty) from [<c03a5b4c>] (ubifs_replay_journal+0x44/0x1570)
[   11.624356] [<c03a5b4c>] (ubifs_replay_journal) from [<c039ab94>] (ubifs_mount+0x468/0x1614)
[   11.632816] [<c039ab94>] (ubifs_mount) from [<c03155a0>] (legacy_get_tree+0x24/0x4c)
[   11.640589] [<c03155a0>] (legacy_get_tree) from [<c02d1598>] (vfs_get_tree+0x24/0xe8)
[   11.648446] [<c02d1598>] (vfs_get_tree) from [<c02fdd1c>] (path_mount+0x2cc/0xb68)
[   11.656039] [<c02fdd1c>] (path_mount) from [<c02feb7c>] (sys_mount+0x178/0x288)
[   11.663367] [<c02feb7c>] (sys_mount) from [<c0100080>] (ret_fast_syscall+0x0/0x1c)

Mounting the rootfs (also ubifs) does work though (after applying "mtd: rawnand: gpmi: Fix setting busy timeout setting") but there is indeed still something going on :(

Above issues where not encountered in 5.15.32.

@smk-embedded
Copy link
Collaborator

The Error is reported upstream and will be fixed by Sascha Hauer:
https://lore.kernel.org/all/20220701091909.GE2387@pengutronix.de/
We will wait for the upstream solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants