Linux 5.18 kernel unable to read from NAND on i.MX6Q phycore SOM #12

S4mw1s3 · 2022-06-27T07:17:50Z

Hi,

I recently updated the Linux kernel (linux-fslc) on our i.MX6Q phycore SOM to version 5.18 and I encounter some issues. More specifically in the bootlog I see there are no bad block tables found anymore. When comparing to the bootlog of a 5.15 kernel, here are the differences:

[    0.000000] Linux version 5.18.3 (oe-user@oe-host) (arm-puppy-linux-gnueabi-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38.20220516) #1 SMP Thu Jun 9 19:48:48 UTC 2022
...
[    2.519208] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd3
[    2.525807] nand: AMD/Spansion S34ML08G2
[    2.529786] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
[    2.544941] Bad block table not found for chip 0
[    2.553694] Bad block table not found for chip 0
[    2.558526] Scanning device for bad blocks
[    6.800208] Bad block table written to 0x00003ffe0000, version 0x01
[    6.808571] Bad block table written to 0x00003ffc0000, version 0x01
...
[    8.366735] ubi0: default fastmap pool size: 256
[    8.371471] ubi0: default fastmap WL pool size: 128
[    8.376441] ubi0: attaching mtd2
[    8.744289] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[    8.755985] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[    8.767567] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
[    8.779105] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read 64 bytes
[    8.789112] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.3 #1
[    8.795054] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[    8.801598]  unwind_backtrace from show_stack+0x10/0x14
[    8.806854]  show_stack from dump_stack_lvl+0x80/0x90
[    8.811933]  dump_stack_lvl from ubi_io_read+0x134/0x2f8
[    8.817265]  ubi_io_read from ubi_io_read_ec_hdr+0x48/0x234
[    8.822856]  ubi_io_read_ec_hdr from scan_peb+0x68/0x7e4
[    8.828187]  scan_peb from ubi_attach+0x184/0x378
[    8.832909]  ubi_attach from ubi_attach_mtd_dev+0x580/0xb9c
[    8.838510]  ubi_attach_mtd_dev from ubi_init+0x164/0x224
[    8.843934]  ubi_init from do_one_initcall+0x68/0x428
[    8.849009]  do_one_initcall from kernel_init_freeable+0x178/0x224
[    8.855216]  kernel_init_freeable from kernel_init+0x14/0x140
[    8.860987]  kernel_init from ret_from_fork+0x14/0x28
[    8.866061] Exception stack(0xf083dfb0 to 0xf083dff8)
[    8.871129] dfa0:                                     00000000 00000000 00000000 00000000
[    8.879321] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    8.887511] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    8.894709] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read only 64 bytes, retry
[    8.906245] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read only 64 bytes, retry
[    8.917808] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read only 64 bytes, retry
[    8.929367] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read 64 bytes

[    0.000000] Linux version 5.15.32 (oe-user@oe-host) (arm-puppy-linux-gnueabi-gcc (GCC) 11.3.0, GNU ld (GNU Binutils) 2.38.20220313) #1 SMP Mon Apr 4 11:11:23 UTC 2022
...
[    2.555871] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd3
[    2.562486] nand: AMD/Spansion S34ML08G2
[    2.566466] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
[    2.578328] Bad block table found at page 524224, version 0x01
[    2.590426] Bad block table found at page 524160, version 0x01
...
[    4.150045] ubi0: default fastmap pool size: 256
[    4.154781] ubi0: default fastmap WL pool size: 128
[    4.159729] ubi0: attaching mtd2
[    4.377529] ubi0: attached by fastmap
[    4.381420] ubi0: fastmap pool size: 256
[    4.385367] ubi0: fastmap WL pool size: 128
[    4.428856] ubi0: attached mtd2 (name "root", size 1007 MiB)
[    4.434635] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[    4.441588] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[    4.448398] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[    4.455447] ubi0: good PEBs: 8052, bad PEBs: 4, corrupted PEBs: 0

I tested the above on 2 devices and both had the same issue so I don't think it's device specific. Also, barebox is still able to read from NAND so the flash is not broken or anything.

With kernel 5.18, after a "failed" boot, barebox also starts to complain about bad block tables not being found:

barebox 2021.04.0 #1 Wed Jun 8 07:54:17 UTC 2022

Bad block table not found for chip 0
Bad block table not found for chip 0

so for both reading/writing from/to nand, it looks like the 5.18 kernel is doing something but it's having issues.

Could somebody try the 5.18 kernel on a i.MX6Q phycore SOM with NAND and confirm the above issue or is it maybe an issue on our side (incorrect device tree)?

The text was updated successfully, but these errors were encountered:

smk-embedded · 2022-06-27T07:58:19Z

Thanks for the report! We will have a look at the issue.

S4mw1s3 · 2022-06-29T09:55:35Z

In the mean time I switched back to LTS 5.15 and I am now also experiencing the same issue on 5.15.48 kernel. After investigating and reverting only commit mtd: rawnand: gpmi: fix controller timings setting, the NAND issue was solved.

Coincidentally I just stumbled on https://lore.kernel.org/linux-mtd/20220614083138.3455683-1-s.hauer@pengutronix.de/ which I assume will fix the issue (still need to test). I see that commit has just been added to kernels 5.15.51 and 5.18.8 so I think from those kernels on, the phycore i.MX6 NAND SOM will boot again.

smk-embedded · 2022-06-30T13:33:24Z

We can reproduce the issue with the 5.15 upstream kernel.
If we revert "mtd: rawnand: gpmi: fix controller timings setting" the device boots and NAND works.
Applying "mtd: rawnand: gpmi: Fix setting busy timeout setting", like suggested, doesn't fix the problem. It boots, but NAND tests still show errors. We are investigating on the current LTS v5.15.51.

S4mw1s3 · 2022-07-01T09:33:35Z

After doing some additional testing I can confirm your observations. In fact, the issue I encounter is that mounting another ubifs volume fails:

[   11.498822] UBIFS (ubi0:4): Mounting in unauthenticated mode
[   11.506009] UBIFS (ubi0:4): background thread "ubifs_bgt0_4" started, PID 267
[   11.550442] UBIFS error (ubi0:4 pid 266): ubifs_unpack_nnode: invalid type (15) in LPT node type 1
[   11.559750] CPU: 0 PID: 266 Comm: mount Tainted: G           O      5.15.48 #1
[   11.567004] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   11.573556] [<c01107b8>] (unwind_backtrace) from [<c010b99c>] (show_stack+0x10/0x14)
[   11.581338] [<c010b99c>] (show_stack) from [<c07f17bc>] (dump_stack_lvl+0x80/0x90)
[   11.588932] [<c07f17bc>] (dump_stack_lvl) from [<c03b1b68>] (ubifs_unpack_nnode+0x110/0x134)
[   11.597403] [<c03b1b68>] (ubifs_unpack_nnode) from [<c03b1ec4>] (ubifs_read_nnode+0x1a0/0x21c)
[   11.606039] [<c03b1ec4>] (ubifs_read_nnode) from [<c03b251c>] (ubifs_lpt_lookup_dirty+0x1d8/0x288)
[   11.615024] [<c03b251c>] (ubifs_lpt_lookup_dirty) from [<c03a5b4c>] (ubifs_replay_journal+0x44/0x1570)
[   11.624356] [<c03a5b4c>] (ubifs_replay_journal) from [<c039ab94>] (ubifs_mount+0x468/0x1614)
[   11.632816] [<c039ab94>] (ubifs_mount) from [<c03155a0>] (legacy_get_tree+0x24/0x4c)
[   11.640589] [<c03155a0>] (legacy_get_tree) from [<c02d1598>] (vfs_get_tree+0x24/0xe8)
[   11.648446] [<c02d1598>] (vfs_get_tree) from [<c02fdd1c>] (path_mount+0x2cc/0xb68)
[   11.656039] [<c02fdd1c>] (path_mount) from [<c02feb7c>] (sys_mount+0x178/0x288)
[   11.663367] [<c02feb7c>] (sys_mount) from [<c0100080>] (ret_fast_syscall+0x0/0x1c)

Mounting the rootfs (also ubifs) does work though (after applying "mtd: rawnand: gpmi: Fix setting busy timeout setting") but there is indeed still something going on :(

Above issues where not encountered in 5.15.32.

smk-embedded · 2022-07-01T12:00:01Z

The Error is reported upstream and will be fixed by Sascha Hauer:
https://lore.kernel.org/all/20220701091909.GE2387@pengutronix.de/
We will wait for the upstream solution.

smk-embedded closed this as completed Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux 5.18 kernel unable to read from NAND on i.MX6Q phycore SOM #12

Linux 5.18 kernel unable to read from NAND on i.MX6Q phycore SOM #12

S4mw1s3 commented Jun 27, 2022

smk-embedded commented Jun 27, 2022

S4mw1s3 commented Jun 29, 2022

smk-embedded commented Jun 30, 2022

S4mw1s3 commented Jul 1, 2022

smk-embedded commented Jul 1, 2022

Linux 5.18 kernel unable to read from NAND on i.MX6Q phycore SOM #12

Linux 5.18 kernel unable to read from NAND on i.MX6Q phycore SOM #12

Comments

S4mw1s3 commented Jun 27, 2022

smk-embedded commented Jun 27, 2022

S4mw1s3 commented Jun 29, 2022

smk-embedded commented Jun 30, 2022

S4mw1s3 commented Jul 1, 2022

smk-embedded commented Jul 1, 2022