-
-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL-WDR3600 v1.5 intermittently hangs at reboot #13043
Comments
|
If you read my description correctly, you will (I hope??) understand that the device doesn't even enter Uboot, so is still far away from loading the kernel. |
More like something left in memory by kernel that prevents uboot from completing, like it is expecting zeroed heap at some memory location but it is not zeroed. Serial output is likely to contain addresses, like to exclude 16kB at this physical location from linux to assure reboot. |
For the record, I was able to dig out the old issue from 2014, and the proposed fix, as well as the accepted fix for it. Here's the issue from the archived issue tracker, here the proposed fix from the mailing list. The accepted fix by nbd is fa3cb9f. Note that none of both fixes changes the behavior as described here, for me. Also for the record, the last fix I know of for a cache invalidation resp. cache flush race condition is 26bc8f6, which didn't change the behavior I'm experiencing either. I'll try to extract a serial log one of these days (may take a while, though), just to make sure that it's not a completely different problem I'm experiencing here. In the meantime, I'd appreciate if others, who own a TL-WDR3600 or TL-WDR4300 could report here whether they're also seeing the reboot-after-sysupgrade issue or not, along with their exact HW version. |
I can reproduce the issue on a TP-Link WDR4300 using stock OpenWRT, with a simple reboot. No sysupgrade needed. SummaryDeviceTP-Link TL-WDR4300 v1 DescriptionRegression introduced with OpenWRT 22.03. I have a TP-Link WDR4300 with a serial port installed and a serial console cable attached. Simply installing stock OpenWRT (a simple
Additional infoReverting commit ebf0d8d did seem to have no impact on the problem. There is also an extensive discussion here: freifunk-gluon/gluon#2904 Steps to reproduce
while sleep 30
do
ssh root@192.168.1.1 reboot && echo "Rebooted at $(date --iso=s)" >> ~/reboot-tests/reboot_$(date --iso).log
done |
Thanks for confirming! You're right, I'm also experiencing hangs during normal (non-sysupgrade) reboots of my WDR3600 @ 22.03+, but these are rare, so I wasn't aware of it yet when creating this issue. I have a WDR4300 v1.7 as well by now, which didn't show the issue yet - though here as well, I can't be sure, since I only rebooted/sysupgraded it a handful of times so far. Thanks for linking to the issue# in the Gluon repo, I wasn't aware of that (I'm not following the Gluon repo). I already saw the "test/tp-link-wdr4300-hangs" branch in your site-ffm repo, though, which kindof comfirmed to me that I'm not the only one experiencing this problem :-) |
@Shine- @grische Are you using u-boot_mod bootloader? A few months ago, I encountered this problem by chance. |
Stock TP-Link boot loader from latest official firmware, for me. |
My report. #12764 (comment)
I guess some reset functions were not called correctly. Unfortunately, I haven't encountered it again recently, so I can't debug it. This is a generic problem, unrelated to the specific device. So, I would suggest change title to something like |
I am also for renaming the ticket, but only WDR4300 and WDR3600 seem affected. And it's independent of the sysupgrade, a simple reboot will do the trick. EDIT: to clarify, we have a large set of different models and we only noticed the problems with these two models. |
If you can build the openwrt source code, I believe this patch has a chance to fix this issue. |
Also, I have to agree with @grische that only my WDR3600 (and possibly my WDR4300 if I finally get around to using it) is affected for me, none of my many other ath79 based devices. |
Add a cache-barrier after the reset-register write. This fixes spurious reboot issues on TP-Link WDR3600 and WDR4300 devices with Zental DDR2 DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Signed-off-by: David Bauer <mail@david-bauer.net>
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net>
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net>
Some addition: With @blocktrron's patch #14378, I was able to boot both OpenWRT master and OpenWRT 23.05 successfully for several reboots. Thanks a lot! 🙏 |
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: #13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net>
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: #13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net> (cherry picked from commit 2fe8ecd)
#14378 merged, closing |
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net>
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: #13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net> (cherry picked from commit 2fe8ecd)
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net>
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net>
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net>
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net> (cherry picked from commit 2fe8ecd)
Read back the reset register in order to flush the cache. This fixes spurious reboot hangs on TP-Link TL-WDR3600 and TL-WDR4300 with Zentel DRAM chips. This issue was fixed in the past, but switching to the reset-driver specific implementation removed the cache barrier which was previously implicitly added by reading back the register in question. Link: freifunk-gluon/gluon#2904 Link: openwrt/openwrt#13043 Link: https://dev.archive.openwrt.org/ticket/17839 Link: f8a7bfe1cb2c ("MIPS: ath79: fix system restart") Signed-off-by: David Bauer <mail@david-bauer.net> (cherry picked from commit 2fe8ecd880396b5ae25fe9583aaa1d71be0b8468)
Describe the bug
This is an old-old issue that apparently reappeared starting with 22.03 (including any post-22.03 version). This should also affect TL-WDR4300 v1.7. If it really is this reoccuring old issue, then earlier HW revisions of these two devices should not be affected.
After sysupgrading, the device resets, all LEDs turn off, then the Ethernet Link/Status LEDs start flashing again like normal, but the star-shaped status LED stays off. The device hangs indefinitely and needs to be powercycled. After powercycle, everything is back to normal.
The issue occurs intermittently, in my experience around 5-50% of all sysupgrades, and even less frequent when rebooting normally.
I remember this issue from around (or more than?) 10 years ago. Iirc, after a lot of guesswork and tries from multiple people, it finally disappeared. I can't, however, find the commit that actually fixed it.
As versions 19.07/21.02 (ath79) aren't affected, I assume the issue doesn't have anything to do with the ar71xx to ath79 transition. Otherwise it would've likely reappeared in 19.07 already.
Testing this does require some patience, since it can happen that 10-20 sysupgrades succeed without the issue occuring. Since I'm currently using a TL-WDR3600 v1.5 for some testing and am therefore resetting/sysupgrading it quite often, I can see this issue happen more or less frequently with my device.
The difference to the old issue from 10y ago is, nowadays it seems to occur only after sysupgrading, not after a normal reboot.Wild guess: Might this be another sporadic issue of missing cache invalidation on reboot, so the CPU ends up in an endless loop? Like was fixed for a number of ath79 devices before?
OpenWrt version
22.03 or later
OpenWrt target/subtarget
ath79 / generic
Device
TL-WDR3600 v1.5 (and likely also TL-WDR4300 v1.7)
Image kind
Official downloaded image
Steps to reproduce
sysupgrade
orsysupgrade -n
to any version 22.03 or laterIn rare cases, a normal reboot without sysupgrade is sufficient to show the issue.
Actual behaviour
After 1 to X tries, the device will hang indefinitely after the reboot has been initiated by
sysupgrade
. The "star" shaped status LED will be off, Ethernet Link/Status LEDs will flash like normal. Device requires powercycling to work normally again.Expected behaviour
Device comes up again reliably after any number of
sysupgrade
trials. The "star" shaped LED turns off after reset, then turns on (during U-Boot stage), then normal OpenWrt startup begins (flashing rapidly, then flashing slowly, then on steadily).Additional info
To make sure this doesn't affect 21.02, I ran >10 syspgrades in a row with 21.02.7 right before creating this issue. Reboot was always successful. There's no 100%, though...
Fact is, I never saw this issue while this device was in production use with 19.07 or 21.02 based Gluon firmware. The issue started appearing with 22.03 based Gluon, therefore I took it out of production and started using it as a testing device (using vanilla OpenWrt, not Gluon).
Diffconfig
No response
Terms
The text was updated successfully, but these errors were encountered: