Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ramips/mt7621: SQUASHFS filesystem corruption #9085

Open
openwrt-bot opened this issue Oct 20, 2021 · 28 comments
Open

ramips/mt7621: SQUASHFS filesystem corruption #9085

openwrt-bot opened this issue Oct 20, 2021 · 28 comments
Labels
bug flyspray kernel release/21.02 release/22.03 target/ramips

Comments

@openwrt-bot
Copy link

openwrt-bot commented Oct 20, 2021

crowston:

Supply the following if possible:

  • Device problem occurs on

Western Digital My Net N750

  • Software versions of OpenWrt/LEDE release, packages, etc.

openwrt-21.02.0

strongswan, dnscrypt-proxy2, avahi-utils, luci-app-ddns

  • Steps to reproduce

I installed openwrt-21.02.0-ath79-generic-wd_mynet-n750-squashfs-sysupgrade.bin on a Western Digital My Net N750 that had been running openwrt-19.

The router seemed okay initially but after power cycling, it started reporting errors:

Oct 17 12:20:37 router2 kernel: [ 38.613970] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 12:20:37 router2 kernel: [ 38.621029] SQUASHFS error: squashfs_read_data failed to read block 0x23686e
Oct 17 12:20:37 router2 kernel: [ 38.628199] SQUASHFS error: Unable to read fragment cache entry [23686e]
Oct 17 12:20:37 router2 kernel: [ 38.635010] SQUASHFS error: Unable to read page, block 23686e, size 16b28

The filesystem problem would leave some random file damaged, so different services would fail. Over time, the router became less and less functional as various files became inaccessible and after a few cycles, wouldn't boot at all.

I wondered if there was a problem with my old configuration on the new release (though I'm not sure how that could damage the squashfs), so I reinstalled a few more times in different ways, e.g., doing a factory install (openwrt-21.02.0-ath79-generic-wd_mynet-n750-squashfs-factory.bin and then the upgrade) instead of just the upgrade, and configuring from scratch rather than from the backup. But each time I had the same problem with the router.

It wasn't the same block on different installs, I noticed, but it seemed to be consistent for a particular installation attempt.

Oct 17 16:11:14 router2 kernel: [ 53.182571] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 16:11:14 router2 kernel: [ 53.189582] SQUASHFS error: squashfs_read_data failed to read block 0x21e9e6
Oct 17 16:11:14 router2 kernel: [ 53.196749] SQUASHFS error: Unable to read fragment cache entry [21e9e6]
Oct 17 16:11:14 router2 kernel: [ 53.203559] SQUASHFS error: Unable to read page, block 21e9e6, size fd9c

Once there were two blocks (I think this is a reboot of the install above):

Oct 17 16:29:04 router2 kernel: [ 78.505075] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 16:29:04 router2 kernel: [ 78.512103] SQUASHFS error: squashfs_read_data failed to read block 0x1e6e76
Oct 17 16:29:05 router2 kernel: [ 79.111366] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 16:29:05 router2 kernel: [ 79.118386] SQUASHFS error: squashfs_read_data failed to read block 0x21e9e6
Oct 17 16:29:05 router2 kernel: [ 79.125565] SQUASHFS error: Unable to read fragment cache entry [21e9e6]
Oct 17 16:29:05 router2 kernel: [ 79.132445] SQUASHFS error: Unable to read page, block 21e9e6, size fd9c

One time there was first a jffs error, followed by lots of squashfs errors. Sorry, I don't have the log for that one.

I now realize that I should have tried power cycling a clean install a few times to see if there were errors right away or if they only happened after files were installed/changed.

To check whether the router was just having a hardware problem, I reinstalled openwrt-19.07.8 and configured it the same. I have not seen any errors after a few power cycles, which points to a problem with the new release. I did not see any bug reports on this tracker that mention squashfs problems and googling, I did not find any useful discussions, hence this bug report.

I guess it could be that the new release uses a bad bit of memory that the earlier release managed to miss. I looked for but didn't find a memory test utility, so I don't know how to examine that possibility. Though the fact that it was different blocks each time makes it not sound like a hardware problem.

@openwrt-bot
Copy link
Author

openwrt-bot commented Oct 22, 2021

crowston:

I tried installing on a different router and after a few powercycles saw the same SQUASHFS errors, suggesting it's not just bad memory:

Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.569402] SQUASHFS error: xz decompression failed, data probably corrupt
Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.576445] SQUASHFS error: squashfs_read_data failed to read block 0x4b5872
Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.584696] SQUASHFS error: xz decompression failed, data probably corrupt
Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.591837] SQUASHFS error: squashfs_read_data failed to read block 0x4b5872

But most of the time it seems to work fine.

@openwrt-bot
Copy link
Author

openwrt-bot commented Oct 24, 2021

M95D:

I have this exact problem with WRT1900ACv1, OpenWRT built from git master. It won't boot at all with the new firmware.

@openwrt-bot
Copy link
Author

openwrt-bot commented Oct 24, 2021

M95D:

More debugging:

Apparently, the image is not correctly written to flash. Reading back the squashfs and trying to mount it on a x86 Gentoo linux gives the same decompression errors.

See attachment for details.
router is booted from the working firmware (mtd5). mtd7 is the new defective firmware.

@openwrt-bot
Copy link
Author

openwrt-bot commented Oct 24, 2021

M95D:

Even more debugging:

I extracted the squashfs from the original firmware image that was uploaded to the router. They are identical, except for some extra 0xFF at the end (ubifs read back from the router's mtd is larger, probably because it extends until the end of the erase block).

So, it's not a flash write issue, and it's not a hardware defect.
Both squashfs images can be extracted with the unsquashfs tool without any errors. So, there must be something wrong with the kernel xz decompressor. This affects both my router and my x64 Gentoo machine. Both kernels are v5.10

@openwrt-bot
Copy link
Author

openwrt-bot commented Oct 29, 2021

M95D:

It seems that ARM BCJ filter decoder is needed in kernel, even on the desktop. Having only x86 BCJ filter decoder won't help.

Maybe there should be a warning put somwhere to alert users that alter the default kernel config.

@openwrt-bot
Copy link
Author

openwrt-bot commented Nov 17, 2021

brianmercer:

My WD Mynet N750 is also unstable and also displays these same errors in the log.

@openwrt-bot
Copy link
Author

openwrt-bot commented Nov 30, 2021

danak6jq:

I am also seeing this on a WD MyNet N750, starting with 21.02.1. I made an attempt to build a kernel/image with ARM BCJ pinned to the kernel and it did not make a difference.

@ShapeShifter499
Copy link

ShapeShifter499 commented Mar 1, 2022

I'm seeing this issue with a fresh download of 21.02.2 from https://firmware-selector.openwrt.org/?version=21.02.2&target=ath79%2Fgeneric&id=wd_mynet-n750

I also have a WD MyNet N750

@M95D
Copy link
Contributor

M95D commented Mar 7, 2022

Someone found the true problem:
https://forum.openwrt.org/t/patch-squashfs-data-probably-corrupt/70480

@EccoB
Copy link

EccoB commented Mar 10, 2022

I also ran in the issue after updating my WD MyNet N750 to 21.02.2 r16495-bf0c965af0 from an 19.x version. After now around five days I get a high CPU load and the same reading errors:

kern.err kernel: [ 1177.557521] SQUASHFS error: Unable to read fragment cache entry [270732]
kern.err kernel: [ 1177.564383] SQUASHFS error: Unable to read page, block 270732, size 137e8

I re-flashed the version and for the moment it works fine again.

@ShapeShifter499
Copy link

ShapeShifter499 commented Mar 12, 2022

@EccoB have you power cycled it yet?

I find it weird that it can run initially but that, at least in my experience, a power cycle causes issue. Never had that issue with OpenWRT 19.X

@EccoB
Copy link

EccoB commented Mar 12, 2022

@ShapeShifter499 Till now, I did not and there were no errors so far.
Already after a reboot now, the logs look pretty bad, indicating a corrupted memory. If I find an old 19.x version, I will try to revert to that version. I would expect similar problems now with the old version if it is really due to the flash memory.

[    7.054434] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.1: link becomes ready
[   11.305538] jffs2: error: (599) verify_xattr_ref: node CRC failed at 0x8089c8, read=0xfd7fff7b, calc=0x8445ca05
[   11.315861] jffs2: error: (599) verify_xattr_ref: node CRC failed at 0x8088f0, read=0xfd7aef7f, calc=0xae5ef7f6
[   11.326169] jffs2: error: (599) verify_xattr_ref: node CRC failed at 0x80876c, read=0xfd7be7fb, calc=0xa93333ef
[   11.336466] jffs2: error: (599) verify_xattr_ref: node CRC failed at 0x803dec, read=0xff7bfffe, calc=0xdec45f10
[   11.346762] jffs2: error: (599) verify_xattr_ref: node CRC failed at 0x803d14, read=0xfd7af7fb, calc=0xf3b2a6fa
[   11.357142] jffs2: error: (599) verify_xattr_ref: node CRC failed at 0x803b90, read=0xfdfeef7e, calc=0xf3b2a6fa
[   11.367775] jffs2: notice: (599) jffs2_build_xattr_subsystem: complete building xattr subsystem, 24 of xdatum (14 unchecked, 7 orphan) and 25 of xref (3 dead, 0 orphan) found.
[   11.384407] jffs2: notice: (599) jffs2_get_inode_nodes: Node header CRC failed at 0x8088ac. {fdff,e77a,fd7ae77e,fdffe77e}
[   11.397055] mount_root: switching to jffs2 overlay
[   11.404605] jffs2: error: (600) do_verify_xattr_datum: node CRC failed at 0x80897c, read=0xfdfef7ff, calc=0x21946102
[   11.415369] jffs2: error: (600) do_verify_xattr_datum: node CRC failed at 0x808720, read=0xfdfef7ff, calc=0xd89d70f5
[   11.426105] jffs2: error: (600) do_verify_xattr_datum: node CRC failed at 0x803da0, read=0xfdfef7ff, calc=0x4c9c347
[   11.436764] jffs2: error: (600) do_verify_xattr_datum: node CRC failed at 0x803b44, read=0xfdfef7ff, calc=0xf0b38b4a
[   11.448140] overlayfs: upper fs does not support tmpfile.
[   11.470901] jffs2: notice: (485) jffs2_get_inode_nodes: Node header CRC failed at 0x807520. {fdff,e77b,fd7ae77b,fd7eff7b}
[   11.482420] jffs2: notice: (485) jffs2_get_inode_nodes: Node header CRC failed at 0x807030. {fdff,e77b,fd7ae77e,ff7af7fa}
[...]
[   14.753171] crng init done
[   14.985636] jffs2: notice: (600) jffs2_get_inode_nodes: Node header CRC failed at 0x803ad4. {fdff,e77a,fd7ae77e,fdffe77e}
[   14.996784] jffs2: warning: (600) jffs2_do_read_inode_internal: no data nodes found for ino #48
[   15.005623] jffs2: Returned error for crccheck of ino #48. Expect badness...
[   15.205575] jffs2: Node CRC 4e4d8e0c != calculated CRC a2ebca7d for node at 0080a4b4
[   15.293646] jffs2: Node CRC a25051a8 != calculated CRC af4fc269 for node at 00008e5c
[   15.769624] jffs2: notice: (600) jffs2_get_inode_nodes: Node header CRC failed at 0x805c44. {fdff,e77b,fd7ae77a,fffee7fb}
[...]
[   27.067284] jffs2: notice: (600) jffs2_get_inode_nodes: Node header CRC failed at 0x8053e4. {fdff,e77a,fd7ae77e,fdffe77e}
[   27.389615] jffs2: notice: (600) jffs2_get_inode_nodes: Node header CRC failed at 0x8053a0. {fdff,e77a,fd7ae77e,fdffe77e}
[   27.652358] jffs2: notice: (600) jffs2_get_inode_nodes: Node header CRC failed at 0x80535c. {fdff,e77a,fd7ae77e,fdffe77e}
[   27.877672] jffs2: notice: (600) jffs2_get_inode_nodes: Node header CRC failed at 0x8052ac. {fdff,e77a,fd7ae7ff,fd7bfffa}
[   28.109609] jffs2: notice: (600) jffs2_get_inode_nodes: Node header CRC failed at 0x805230. {fdff,e77a,fd7ae77e,fdffe77e}
[   28.120748] jffs2: warning: (600) jffs2_do_read_inode_internal: no data nodes found for ino #60
[   28.129595] jffs2: Returned error for crccheck of ino #60. Expect badness...
[...]
[   30.220763] jffs2: warning: (600) jffs2_do_read_inode_internal: no data nodes found for ino #61
[...]
[   31.644770] jffs2: warning: (600) jffs2_do_read_inode_internal: no data nodes found for ino #62
[...]
[   33.164750] jffs2: warning: (600) jffs2_do_read_inode_internal: no data nodes found for ino #63
(at least 20 data nodes are bad)

@EccoB
Copy link

EccoB commented Mar 12, 2022

The router was screwed (see last post), Luci told that the password was not set (which shouldn't be the case), and lots of CRC errors.
Via the LuCI interface, flashing the sysupgrade did not have any effect, version stayed the same even after trying to flash an old sysupgrade of Openwrt 19.

  • Reflashed firmware 21.02 via Recovery mode (did not have an old firmware file with 19.x and was afraid of mixing old sysupgrade and new firmware ) and after installation sysupgrade via Luci
  • With default configuration: Checked the kernel logs, repeat power cycle at least 3 times and check again logs: Everything good
  • Restoring config file via Luci worked flawlessly. A already noticed one line in the Kernellog, but cannot judge if that's something:
    [ 11.296985] jffs2: notice: (599) jffs2_build_xattr_subsystem: complete building xattr subsystem, 13 of xdatum (7 unchecked, 2 orphan) and 16 of xref (2 dead, 0 orphan) found.
  • Power Cycle
    [ 11.304601] jffs2: notice: (599) jffs2_build_xattr_subsystem: complete building xattr subsystem, 14 of xdatum (7 unchecked, 3 orphan) and 17 of xref (3 dead, 0 orphan) found.
  • Power Cycle
    [ 11.298356] jffs2: notice: (599) jffs2_build_xattr_subsystem: complete building xattr subsystem, 15 of xdatum (7 unchecked, 4 orphan) and 18 of xref (4 dead, 0 orphan) found.
  • Power Cycle, put it back in Network
    [ 11.293568] jffs2: notice: (599) jffs2_build_xattr_subsystem: complete building xattr subsystem, 16 of xdatum (7 unchecked, 5 orphan) and 19 of xref (5 dead, 0 orphan) found.
    Installing mosquitto mqtt broker and spamming it with thousend messages (I had the impression that in former times, the router became more unstable with that installed)
  • Reboot
    [ 11.307845] jffs2: notice: (599) jffs2_build_xattr_subsystem: complete building xattr subsystem, 25 of xdatum (18 unchecked, 6 orphan) and 31 of xref (6 dead, 0 orphan) found.
  • Everything still fine at the moment.

Over the next days, I will monitor the behaviour and document if there are any issues. If there is something I can do for further investigation you may tell me.

@lpyparmentier
Copy link

lpyparmentier commented Mar 14, 2022

Hello, I recently went through the same issue with my edgerouter-x:

root@edgerouterx:~# cat /etc/openwrt_release 
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='21.02.0'
DISTRIB_REVISION='r16279-5cc0535800'
DISTRIB_TARGET='ramips/mt7621'
DISTRIB_ARCH='mipsel_24kc'
DISTRIB_DESCRIPTION='OpenWrt 21.02.0 r16279-5cc0535800'
DISTRIB_TAINTS=''

Sorry I reinstall everything and did not take time to log, I will come back if it happened again. Did nothing special except disable uhttpd service and reboot, then I noticed that clients don't get their ips (dns issue) and when I looked in the logs (dmesg) I had a lot of SQUASHFS errors.

@ynezz ynezz added release/21.02 bug labels Mar 14, 2022
@jlpapple
Copy link

jlpapple commented Mar 27, 2022

Examples of various SQUASHFS, jffs2 errors from my N750, running the March 7 snapshot. I do not encounter any errors running 19.07.X

04:31:17 2022 kern.notice kernel: [ 0.000000] Linux version 5.10.103 (builder@buildhost) (mips-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0 r19069-98113220fa) 11.2.0, GNU ld (GNU Binutils) 2.37) #0 Mon Mar 7 20:44:53 2022
04:31:20 2022 kern.err kernel: [ 25.484464] SQUASHFS error: xz decompression failed, data probably corrupt
04:31:20 2022 kern.err kernel: [ 25.491513] SQUASHFS error: Failed to read block 0x283332: -5
04:31:20 2022 kern.err kernel: [ 25.497372] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:20 2022 kern.err kernel: [ 25.504166] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:20 2022 kern.err kernel: [ 25.511079] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:20 2022 kern.err kernel: [ 25.517888] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:20 2022 kern.err kernel: [ 25.524800] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:20 2022 kern.err kernel: [ 25.531606] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:20 2022 kern.err kernel: [ 25.538529] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:20 2022 kern.err kernel: [ 25.545322] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:20 2022 kern.err kernel: [ 25.552245] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:20 2022 kern.err kernel: [ 25.559058] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:21 2022 kern.err kernel: [ 25.800381] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:21 2022 kern.err kernel: [ 25.807246] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:21 2022 kern.err kernel: [ 25.832165] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:21 2022 kern.err kernel: [ 25.839033] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:21 2022 kern.err kernel: [ 25.856978] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:21 2022 kern.err kernel: [ 25.863789] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:21 2022 kern.err kernel: [ 25.888582] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:21 2022 kern.err kernel: [ 25.895399] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:45 2022 kern.err kernel: [ 50.532357] SQUASHFS error: xz decompression failed, data probably corrupt
04:31:45 2022 kern.err kernel: [ 50.539418] SQUASHFS error: Failed to read block 0x283332: -5
04:31:45 2022 kern.err kernel: [ 50.545253] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:45 2022 kern.err kernel: [ 50.552073] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:45 2022 daemon.notice netifd: wan (2443): udhcpc: broadcasting select for 67.82.48.98, server 167.206.148.47
04:31:46 2022 kern.err kernel: [ 50.756863] SQUASHFS error: xz decompression failed, data probably corrupt
04:31:46 2022 kern.err kernel: [ 50.763889] SQUASHFS error: Failed to read block 0x2bf5a: -5
04:31:46 2022 kern.err kernel: [ 50.769666] SQUASHFS error: Unable to read fragment cache entry [2bf5a]
04:31:46 2022 kern.err kernel: [ 50.776388] SQUASHFS error: Unable to read page, block 2bf5a, size 11c14
04:31:46 2022 kern.err kernel: [ 50.783227] SQUASHFS error: Unable to read fragment cache entry [2bf5a]
04:31:46 2022 kern.err kernel: [ 50.789944] SQUASHFS error: Unable to read page, block 2bf5a, size 11c14
04:31:46 2022 user.notice ucitrack: Setting up /etc/config/network reload dependency on /etc/config/dhcp
04:31:46 2022 kern.err kernel: [ 50.909423] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:46 2022 kern.err kernel: [ 50.916238] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:46 2022 user.notice ucitrack: Setting up /etc/config/wireless reload dependency on /etc/config/network
04:31:46 2022 daemon.notice netifd: wan (2443): udhcpc: lease of 67.82.48.98 obtained from 167.206.148.47, lease time 43200
04:31:46 2022 kern.err kernel: [ 51.006440] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:46 2022 kern.err kernel: [ 51.013306] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:46 2022 kern.err kernel: [ 51.098070] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:46 2022 kern.err kernel: [ 51.104883] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:46 2022 kern.err kernel: [ 51.185265] SQUASHFS error: Unable to read fragment cache entry [283332]
04:31:46 2022 kern.err kernel: [ 51.192140] SQUASHFS error: Unable to read page, block 283332, size 16968
04:31:46 2022 user.notice ucitrack: Setting up /etc/config/firewall reload dependency on /etc/config/luci-splash
04:31:47 2022 user.notice ucitrack: Setting up /etc/config/firewall reload dependency on /etc/config/qos
04:31:47 2022 kern.err kernel: [ 52.021806] SQUASHFS error: Unable to read fragment cache entry [2bf5a]
04:31:47 2022 kern.err kernel: [ 52.028579] SQUASHFS error: Unable to read page, block 2bf5a, size 11c14
04:31:47 2022 kern.err kernel: [ 52.035389] SQUASHFS error: Unable to read fragment cache entry [2bf5a]
04:31:47 2022 kern.err kernel: [ 52.042122] SQUASHFS error: Unable to read page, block 2bf5a, size 11c14
04:31:47 2022 user.notice ucitrack: Setting up /etc/config/firewall reload dependency on /etc/config/miniupnpd
04:31:47 2022 daemon.notice netifd: wan (2443): /lib/netifd/dhcp.script: line 22: ipcalc.sh: I/O error
04:31:47 2022 kern.err kernel: [ 52.275504] SQUASHFS error: Unable to read fragment cache entry [2bf5a]
04:31:47 2022 kern.err kernel: [ 52.282274] SQUASHFS error: Unable to read page, block 2bf5a, size 11c14
04:31:47 2022 user.notice ucitrack: Setting up /etc/config/firewall reload dependency on /etc/config/sqm
04:31:47 2022 daemon.notice netifd: wan (2443): /lib/netifd/dhcp.script: line 27: ipcalc.sh: I/O error
13:54:50 2022 kern.notice kernel: [ 679.887110] jffs2: notice: (1078) jffs2_get_inode_nodes: Node header CRC failed at 0x007378. {0b00,ffff,00000044,a4ef223e}
13:54:50 2022 user.info : luci: accepted login on /admin/network/firewall for root from 192.168.1.174
13:54:50 2022 kern.notice kernel: [ 680.357551] jffs2: notice: (4090) jffs2_get_inode_nodes: Node header CRC failed at 0x95d248. {e595,ffff,00000044,a4ef223e}

@plunet
Copy link

plunet commented May 4, 2022

I'm seeing a similar thing on a Ubiquity ER-X which has been stable and running 21.02.1 for many months.

cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='21.02.1'
DISTRIB_REVISION='r16325-88151b8303'
DISTRIB_TARGET='ramips/mt7621'
DISTRIB_ARCH='mipsel_24kc'
DISTRIB_DESCRIPTION='OpenWrt 21.02.1 r16325-88151b8303'
DISTRIB_TAINTS=''

root@OpenWrt:/tmp# uptime
 13:10:50 up 59 days,  8:38,  load average: 0.01, 0.03, 0.00

I noticed today that LUCI and uhttpd is not running

root@OpenWrt:/tmp# /etc/init.d/uhttpd start
root@OpenWrt:/tmp# Wed May  4 13:15:43 2022 kern.err kernel: [5129016.776653] SQUASHFS error: xz decompressi                                      on failed, data probably corrupt
Wed May  4 13:15:43 2022 kern.err kernel: [5129016.790750] SQUASHFS error: squashfs_read_data failed to read                                                   block 0x161c32
Wed May  4 13:15:43 2022 kern.err kernel: [5129016.851875] SQUASHFS error: xz decompression failed, data pro                                                  bably corrupt
Wed May  4 13:15:43 2022 kern.err kernel: [5129016.865978] SQUASHFS error: squashfs_read_data failed to read                                                   block 0x161c32
Wed May  4 13:15:43 2022 daemon.info procd: Instance uhttpd::instance1 s in a crash loop 10 crashes, 0 secon                                                  ds since last crash

@vertuxt
Copy link

vertuxt commented May 12, 2022

Same here, changed some config + reboot. Now the ER-X is stuck in Bootloop after running stable for ~2years

3: System Boot system code via Flash.
## Booting image at bfd40000 ...
   Image Name:   MIPS OpenWrt Linux-5.4.154
   Image Type:   MIPS Linux Kernel Image (uncompressed)
   Data Size:    2367034 Bytes =  2.3 MB
   Load Address: 80001000
   Entry Point:  80001000
.....................................   Verifying Checksum ... OK
OK
No initrd
## Transferring control to Linux (at address 80001000) ...
## Giving linux memsize in MB, 256

Starting kernel ...



OpenWrt kernel loader for MIPS based SoC
Copyright (C) 2011 Gabor Juhos <juhosg@openwrt.org>
Decompressing kernel... done!
Starting kernel at 80001000...

[    0.000000] Linux version 5.4.154 (builder@buildhost) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 r16325-88151b8303)) #0 SMP Sun Oct 24 09:01:35 2021
[    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[    0.000000] printk: bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[    0.000000] MIPS: machine is Ubiquiti EdgeRouter X
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] VPE topology {2,2} total 4
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] percpu: Embedded 14 pages/cpu s26768 r8192 d22384 u57344
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64960
[    0.000000] Kernel command line: console=ttyS0,57600 rootfstype=squashfs,jffs2
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000] Writing ErrCtl register=00049340
[    0.000000] Readback ErrCtl register=00049340
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 250792K/262144K available (6089K kernel code, 210K rwdata, 748K rodata, 1260K init, 238K bss, 11352K reserved, 0K cma-reserved, 0K highmem)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] random: get_random_bytes called from 0x806e5a3c with crng_init=0
[    0.000000] CPU Clock: 880MHz
[    0.000000] clocksource: GIC: mask: 0xffffffffffffffff max_cycles: 0xcaf478abb4, max_idle_ns: 440795247997 ns
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 4343773742 ns
[    0.000009] sched_clock: 32 bits at 440MHz, resolution 2ns, wraps every 4880645118ns
[    0.015502] Calibrating delay loop... 583.68 BogoMIPS (lpj=1167360)
[    0.055845] pid_max: default: 32768 minimum: 301
[    0.065197] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.079603] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.097734] rcu: Hierarchical SRCU implementation.
[    0.107843] smp: Bringing up secondary CPUs ...
[    2.194788] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    2.194800] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    2.194813] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    2.194914] CPU1 revision is: 0001992f (MIPS 1004Kc)
[    0.145017] Synchronize counters for CPU 1: done.
[    2.285843] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    2.285853] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    2.285861] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    2.285918] CPU2 revision is: 0001992f (MIPS 1004Kc)
[    0.239469] Synchronize counters for CPU 2: done.
[    2.376968] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    2.376978] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    2.376987] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    2.377048] CPU3 revision is: 0001992f (MIPS 1004Kc)
[    0.327069] Synchronize counters for CPU 3: done.
[    0.386680] smp: Brought up 1 node, 4 CPUs
[    0.399024] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.418312] futex hash table entries: 1024 (order: 3, 32768 bytes, linear)
[    0.432146] pinctrl core: initialized pinctrl subsystem
[    0.444102] NET: Registered protocol family 16
[    0.475286] workqueue: max_active 576 requested for napi_workq is out of range, clamping between 1 and 512
[    0.496036] clocksource: Switched to clocksource GIC
[    0.507386] thermal_sys: Registered thermal governor 'step_wise'
[    0.507842] NET: Registered protocol family 2
[    0.528464] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.543595] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[    0.560338] TCP established hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.575469] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.589641] TCP: Hash tables configured (established 2048 bind 2048)
[    0.602390] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.615281] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.629392] NET: Registered protocol family 1
[    0.637968] PCI: CLS 0 bytes, default 32
[    0.735983] 4 CPUs re-calibrate udelay(lpj = 1167360)
[    0.747494] workingset: timestamp_bits=14 max_order=16 bucket_order=2
[    0.765135] random: fast init done
[    0.773416] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.784910] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[    0.806141] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    0.822666] mt7621_gpio 1e000600.gpio: registering 32 gpios
[    0.833992] mt7621_gpio 1e000600.gpio: registering 32 gpios
[    0.845314] mt7621_gpio 1e000600.gpio: registering 32 gpios
[    0.857176] Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
[    0.873578] printk: console [ttyS0] disabled
[    0.882039] 1e000c00.uartlite: ttyS0 at MMIO 0x1e000c00 (irq = 19, base_baud = 3125000) is a 16550A
[    0.899958] printk: console [ttyS0] enabled
[    0.899958] printk: console [ttyS0] enabled
[    0.916512] printk: bootconsole [early0] disabled
[    0.916512] printk: bootconsole [early0] disabled
[    0.937833] mt7621-nand 1e003000.nand: Using programmed access timing: 31c07388
[    0.952680] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xda
[    0.965335] nand: Macronix MX30LF2G18AC
[    0.972974] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[    0.988055] mt7621-nand 1e003000.nand: ECC strength adjusted to 4 bits
[    1.001087] mt7621-nand 1e003000.nand: Using programmed access timing: 21005134
[    1.015647] mt7621-nand 1e003000.nand: Using programmed access timing: 21005134
[    1.030208] Scanning device for bad blocks
[    1.604645] Bad eraseblock 446 at 0x0000037c0000
[    3.014983] Bad eraseblock 1551 at 0x00000c1e0000
[    3.286361] Bad eraseblock 1758 at 0x00000dbc0000
[    3.305267] Bad eraseblock 1766 at 0x00000dcc0000
[    3.671255] 6 fixed-partitions partitions found on MTD device mt7621-nand
[    3.684784] Creating 6 MTD partitions on "mt7621-nand":
[    3.695200] 0x000000000000-0x000000080000 : "u-boot"
[    3.706593] 0x000000080000-0x0000000e0000 : "u-boot-env"
[    3.718474] 0x0000000e0000-0x000000140000 : "factory"
[    3.729978] 0x000000140000-0x000000440000 : "kernel1"
[    3.741378] 0x000000440000-0x000000740000 : "kernel2"
[    3.752983] 0x000000740000-0x00000ff00000 : "ubi"
[    3.766349] libphy: Fixed MDIO Bus: probed
[    3.802483] libphy: mdio: probed
[    3.809195] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[    3.825773] mtk_soc_eth 1e100000.ethernet dsa: mediatek frame engine at 0xbe100000, irq 20
[    3.846779] NET: Registered protocol family 10
[    3.857197] Segment Routing with IPv6
[    3.864666] NET: Registered protocol family 17
[    3.873620] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[    3.899664] 8021q: 802.1Q VLAN Support v1.8
[    3.909868] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[    3.931845] libphy: dsa slave smi: probed
[    3.940310] mt7530 mdio-bus:1f eth0 (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY]
[    3.958261] mt7530 mdio-bus:1f eth1 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY]
[    3.976315] mt7530 mdio-bus:1f eth2 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY]
[    3.994207] mt7530 mdio-bus:1f eth3 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY]
[    4.012273] mt7530 mdio-bus:1f eth4 (uninitialized): PHY [dsa-0.0:04] driver [Generic PHY]
[    4.030272] mt7530 mdio-bus:1f: configuring for fixed/rgmii link mode
[    4.047924] DSA: tree 0 setup
[    4.055068] UBI: auto-attach mtd5
[    4.061721] ubi0: attaching mtd5
[    4.068498] mt7530 mdio-bus:1f: Link is Up - 1Gbps/Full - flow control off
[    6.587532] ubi0: scanning is finished
[    6.614055] ubi0: attached mtd5 (name "ubi", size 247 MiB)
[    6.625014] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[    6.638702] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[    6.652222] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[    6.666085] ubi0: good PEBs: 1978, bad PEBs: 4, corrupted PEBs: 0
[    6.678217] ubi0: user volume: 2, internal volumes: 1, max. volumes count: 128
[    6.692608] ubi0: max/mean erase counter: 2/0, WL threshold: 4096, image sequence number: 23324735
[    6.710448] ubi0: available PEBs: 0, total reserved PEBs: 1978, PEBs reserved for bad PEB handling: 36
[    6.729009] ubi0: background thread "ubi_bgt0d" started, PID 467
[    6.731441] block ubiblock0_0: created from ubi0:0(rootfs)
[    6.751934] ubiblock: device ubiblock0_0 (rootfs) set to be root filesystem
[    6.765816] hctosys: unable to open rtc device (rtc0)
[    6.782057] VFS: Mounted root (squashfs filesystem) readonly on device 254:0.
[    6.800700] Freeing unused kernel memory: 1260K
[    6.809742] This architecture does not have kernel memory protection.
[    6.822569] Run /sbin/init as init process
[    6.961655] SQUASHFS error: xz decompression failed, data probably corrupt
[    6.975369] SQUASHFS error: squashfs_read_data failed to read block 0x61422
[    7.011259] SQUASHFS error: xz decompression failed, data probably corrupt
[    7.024970] SQUASHFS error: squashfs_read_data failed to read block 0x61422
[    7.039002] Starting init: /sbin/init exists but couldn't execute it (error -5)
[    7.053610] Run /etc/init as init process
[    7.061766] Run /bin/init as init process
[    7.071906] Run /bin/sh as init process
[    7.233750] SQUASHFS error: xz decompression failed, data probably corrupt
[    7.247464] SQUASHFS error: squashfs_read_data failed to read block 0x61422
[    7.261506] Starting init: /bin/sh exists but couldn't execute it (error -5)
[    7.275591] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
[    7.303812] Rebooting in 1 seconds..

@vjorlikowski
Copy link

vjorlikowski commented May 16, 2022

I am seeing the same sort of thing here on the GL.iNet GL-B1300 with 21.02.1

According to dmesg, my storage configuration looks like this:

[    0.616046] spi_qup 78b5000.spi: IN:block:16, fifo:64, OUT:block:16, fifo:64
[    0.618456] spi-nor spi0.0: mx25l25635e (32768 Kbytes)
[    0.624336] 9 fixed-partitions partitions found on MTD device spi0.0
[    0.629188] Creating 9 MTD partitions on "spi0.0":
[    0.635733] 0x000000000000-0x000000040000 : "SBL1"
[    0.641255] 0x000000040000-0x000000060000 : "MIBIB"
[    0.645880] 0x000000060000-0x0000000c0000 : "QSEE"
[    0.650752] 0x0000000c0000-0x0000000d0000 : "CDT"
[    0.655560] 0x0000000d0000-0x0000000e0000 : "DDRPARAMS"
[    0.660417] 0x0000000e0000-0x0000000f0000 : "APPSBLENV"
[    0.665309] 0x0000000f0000-0x000000170000 : "APPSBL"
[    0.670629] 0x000000170000-0x000000180000 : "ART"
[    0.675740] 0x000000180000-0x000002000000 : "firmware"
[    0.680854] 2 fit-fw partitions found on MTD device firmware
[    0.684571] Creating 2 MTD partitions on "firmware":
[    0.690460] 0x000000000000-0x000000390000 : "kernel"
[    0.696267] 0x0000003884d4-0x000001e80000 : "rootfs"
[    0.701138] mtd: device 10 (rootfs) set to be root filesystem
[    0.705488] 1 squashfs-split partitions found on MTD device rootfs
[    0.710928] 0x0000006f0000-0x000001e80000 : "rootfs_data"

My errors occur against the rootfs (mtdblock10).

When this occurs to me, I start seeing errors similar to:

Sun May 15 14:37:12 2022 kern.err kernel: [533320.106127] blk_update_request: I/O error, dev mtdblock10, sector 2594 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:15 2022 kern.err kernel: [533323.226313] blk_update_request: I/O error, dev mtdblock10, sector 3488 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:16 2022 kern.err kernel: [533324.269099] blk_update_request: I/O error, dev mtdblock10, sector 2596 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:19 2022 kern.err kernel: [533327.385955] blk_update_request: I/O error, dev mtdblock10, sector 3490 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:20 2022 kern.err kernel: [533328.426176] blk_update_request: I/O error, dev mtdblock10, sector 2598 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:23 2022 kern.err kernel: [533331.546073] blk_update_request: I/O error, dev mtdblock10, sector 3492 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:25 2022 kern.err kernel: [533333.627892] blk_update_request: I/O error, dev mtdblock10, sector 2600 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:27 2022 kern.err kernel: [533335.707119] blk_update_request: I/O error, dev mtdblock10, sector 3494 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:28 2022 kern.err kernel: [533336.745936] blk_update_request: I/O error, dev mtdblock10, sector 2602 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:30 2022 kern.err kernel: [533338.825761] blk_update_request: I/O error, dev mtdblock10, sector 3496 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:31 2022 kern.err kernel: [533339.865779] blk_update_request: I/O error, dev mtdblock10, sector 2604 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sun May 15 14:37:33 2022 kern.err kernel: [533340.905699] blk_update_request: I/O error, dev mtdblock10, sector 3498 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

which then progress to:

Sun May 15 14:37:33 2022 kern.err kernel: [533340.910849] SQUASHFS error: squashfs_read_data failed to read block 0x130f86
Sun May 15 14:37:33 2022 kern.err kernel: [533340.935308] SQUASHFS error: squashfs_read_data failed to read block 0x1b407e
Sun May 15 14:37:33 2022 kern.err kernel: [533340.935350] SQUASHFS error: Unable to read fragment cache entry [1b407e]
Sun May 15 14:37:33 2022 kern.err kernel: [533340.941430] SQUASHFS error: Unable to read page, block 1b407e, size c9c4
Sun May 15 14:37:33 2022 kern.err kernel: [533340.948634] SQUASHFS error: Unable to read fragment cache entry [1b407e]
Sun May 15 14:37:33 2022 kern.err kernel: [533340.955139] SQUASHFS error: Unable to read page, block 1b407e, size c9c4
Sun May 15 14:37:33 2022 kern.err kernel: [533340.975622] SQUASHFS error: xz decompression failed, data probably corrupt
Sun May 15 14:37:33 2022 kern.err kernel: [533340.975666] SQUASHFS error: squashfs_read_data failed to read block 0x130f86

Squashfs caches the read failures until the hardware is rebooted - whereupon everything is once again "fine"; I am able to perform read checks against the entire rootfs without encountering any obvious storage errors, after the reboot. The appearance of the read errors appears to be "random" - but, once squashfs caches them, only a reboot is able to resolve the situation.

This is clearly some issue with the storage controller, that the caching in squashfs makes worse.

@ynezz ynezz added target/ramips kernel release/22.03 labels May 23, 2022
@ynezz ynezz changed the title FS#4100 - SQUASHFS errors with OpenWrt 21.02 ramips/mt7621: SQUASHFS filesystem corruption May 23, 2022
@vjorlikowski
Copy link

vjorlikowski commented Jun 8, 2022

@ynezz This issue is occurring for me as well, and not on hardware that is ramips-based (GL.iNet GL-B1300).
Squashfs starts reporting unrecoverable corruption (that resolves on reboot), due to some instability with the underlying storage.

Should I open a separate bug for the issue, for my hardware?

@dkadioglu
Copy link

dkadioglu commented Jun 16, 2022

The same happened to my Edgerouter X SFP on Monday. However, just came home today to investigate. A reboot didn't resolve it, still the same SQUASHFS errors. I then reflashed the same build and the router works again as it should, without SQUASHFS errors.
As far as I understand it, this error is difficult to reproduce and can vary in its expression. If there is anything I can do to help to further investigate, please ask. I am a little nervous that the next failure (and then maybe permanent) could come at any time...

DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='SNAPSHOT'
DISTRIB_REVISION='r18376-15d0c4d5cd'
DISTRIB_TARGET='ramips/mt7621'
DISTRIB_ARCH='mipsel_24kc'
DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r18376-15d0c4d5cd'
DISTRIB_TAINTS='override'

@M95D
Copy link
Contributor

M95D commented Jun 16, 2022

@jlpapple
Copy link

jlpapple commented Jun 17, 2022

@ynezz This issue is occurring for me as well, and not on hardware that is ramips-based (GL.iNet GL-B1300). Squashfs starts reporting unrecoverable corruption (that resolves on reboot), due to some instability with the underlying storage.

Should I open a separate bug for the issue, for my hardware?

I agree, this issue should not be exclusively title or tagged as a mt7621 platform issue. Frankly, I think the original title should be restored, as the issue also exists on Ath79/Atheros devices.

@jlpapple
Copy link

jlpapple commented Jun 17, 2022

The same happened to my Edgerouter X SFP on Monday. However, just came home today to investigate. A reboot didn't resolve it, still the same SQUASHFS errors. I then reflashed the same build and the router works again as it should, without SQUASHFS errors. As far as I understand it, this error is difficult to reproduce and can vary in its expression. If there is anything I can do to help to further investigate, please ask. I am a little nervous that the next failure (and then maybe permanent) could come at any time...

DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='SNAPSHOT'
DISTRIB_REVISION='r18376-15d0c4d5cd'
DISTRIB_TARGET='ramips/mt7621'
DISTRIB_ARCH='mipsel_24kc'
DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r18376-15d0c4d5cd'
DISTRIB_TAINTS='override'

I don't have your specific device, but I can share that I experienced multiple failures with my WD N750 and was able to restore a working image of stock and OpenWRT via bootloader/tftp on many occasions. I agree though...the errors are worrisome.

@jlpapple
Copy link

jlpapple commented Jul 8, 2022

I'm testing the latest OpenWRT Snapshot (r20029-3c06a344e9), running the 5.15.50 kernel via testing mode on a WD N750, but still seeing the same SquashFS errors.

Update: Possible improvement! On initial boot after flashing I observed the SquashFS errors in the Kernel log. After a reboot, no errors and there have been no errors for past 7 days. No issues on second reboot after 7 day uptime either. In the past on OpenWRT 21 and 22 I would see a few SquashFS errors immediately upon reboot.

I built the current firmware image from source using the July 7th snapshot, selected testing kernel and running 5.15.50. I'll keep monitoring and will report back.

@iamsubhranil
Copy link

iamsubhranil commented Jul 8, 2022

Happens on my TP-Link ArcherC6U. I don't know if this is already observed, but in my case, the errors don't come immediately after a cold boot. After a cold boot, everything works fine for quite some time, unless I perform some particular tasks, in which case, the errors come flooding in.
In case of a warm reboot, the errors are flooded right from the boot, and often stop some basic service from starting.

Using snapshot r19971-416d4483e8. This started happening maybe a month ago. All snapshots up to that was perfectly functional. It is a mt7621 device.

@rnhmjoj
Copy link

rnhmjoj commented Jul 16, 2022

Does anyone know a workaround (besides patching the kernel)?

The corruption can be detected with

find /rom -type f | xargs sha256sum > /dev/null 2>&1
dmesg | grep -q 'SQUASHFS error'; echo $?

I'm thinking of adding some code to /etc/rc.local (assuming it can boot up to this) to automatically reflash and restore from a backup in case it breaks after a power loss. I'm not sure if it's possible to fully automate this, though.

@jlpapple
Copy link

jlpapple commented Jul 30, 2022

After a month of testing I’m sad to report the SquashFS corruption errors continue with my WD N750, even when running the 5.15.50 kernel on a July 2022 OpenWrt Snapshot. I’ve had no success with 21.02, 22.03, or Snapshot - all present errors after a matter of a few hours to a few weeks.

@csharper2005
Copy link
Contributor

csharper2005 commented Aug 11, 2022

See if this helps:
https://forum.openwrt.org/t/patch-squashfs-data-probably-corrupt/70480

@M95D, @NoTengoBattery, this is very annoying and widespread bug. Can the patch be applied to mt7621 or all arch? Are you going to open the pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug flyspray kernel release/21.02 release/22.03 target/ramips
Projects
None yet
Development

No branches or pull requests