Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ubnt EdgeRouterX switch dies or sthg (affects ramips-mt7621) #494

Open
bobster-galore opened this issue Dec 1, 2017 · 30 comments
Open

ubnt EdgeRouterX switch dies or sthg (affects ramips-mt7621) #494

bobster-galore opened this issue Dec 1, 2017 · 30 comments
Labels
hardware-related adding, removal or changes OpenWRT

Comments

@bobster-galore
Copy link
Contributor

bobster-galore commented Dec 1, 2017

ubnt erx and +sfp have been seen in the wild, when suddenly the switch is dying which shows in loosing connections and / or interfaces.
We could investigate in that subject to find out what is causing it and try / help to solve the problem, since there will be soon a significant number of routers online (ff-Meko-project).
Some work is already going on in lede, what we could support. I attach in ascending date:
http://lists.infradead.org/pipermail/lede-dev/2017-July/008268.html | mt7621 wdt reset- console not accepting commands
http://lists.infradead.org/pipermail/lede-dev/2017-August/008594.html | Transmit timeouts with mtk_eth_soc and MT7621
http://lists.infradead.org/pipermail/lede-dev/2017-August/008738.html | ramips: Improve stability of the mt7621 switch
https://patchwork.ozlabs.org/patch/808121/ | ramips: Improve stability of the mt7621 switch
Can somebody shed light on this?

@SvenRoederer
Copy link
Contributor

there is also a discussion that the previous listed ideas might not lead to a solution: http://lists.infradead.org/pipermail/lede-dev/2017-November/009799.html

@bobster-galore
Copy link
Contributor Author

Has there been a check if original firmware behaves different? May be it's an hardware issue?

@SvenRoederer
Copy link
Contributor

another one in the OpenWrt-Mailinglist: http://lists.infradead.org/pipermail/lede-dev/2018-April/011939.html

@SvenRoederer SvenRoederer changed the title ubnt erx switch dies or sthg ubnt EdgeRouterX switch dies or sthg May 15, 2018
@SvenRoederer
Copy link
Contributor

some recent OpenWrt-commits:

In case of error, the function devm_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR().

Fixes: f079b6406348 ("staging: mt7621-eth: add gigabit switch driver (GSW)")

@SvenRoederer SvenRoederer removed the LEDE label May 18, 2018
@booo
Copy link
Member

booo commented May 23, 2018

Is there a build that incorporates the patches?

@booo
Copy link
Member

booo commented May 29, 2018

I installed OpenWrt SNAPSHOT, r7050-9c409cb on a erx-sfp that we had to restart a few times in the past. The snapshot should include the fix.

So far I see strange load patterns (constant load of 1):

http://monitor.berlin.freifunk.net/detail.php?p=load&t=load&h=flughafen-core&s=86400

And we had one exception in the kernel code so far:

[ 2776.744924] ------------[ cut here ]------------
[ 2776.754179] WARNING: CPU: 3 PID: 0 at ./include/net/dst.h:256 0x8e8cd4d8
[ 2776.767546] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt i2c_gpio i2c_algo_pca i2c_algo_bit gpio_pca953x i2c_dev ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio gpio_button_hotplug
[ 2776.891351] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.43 #0
[ 2776.903325] Stack : 00000000 00000000 00000000 00000000 805f7ad2 00000034 00000000 00000000
[ 2776.919962]         8fc44e74 80590947 8051cbfc 00000003 00000000 00000001 8fc15c38 532616ca
[ 2776.936602]         00000000 00000000 805f0000 00003a98 00000000 000000ca 00000007 00000000
[ 2776.953247]         00000000 80590000 000d99d7 00000000 00000000 00000000 805b0000 8e8cd4d8
[ 2776.969893]         00000009 00000100 00000001 00000003 00000003 80291630 0000000c 805f000c
[ 2776.986531]         ...
[ 2776.991396] Call Trace:
[ 2776.996283] [<80010498>] show_stack+0x58/0x100
[ 2777.005146] [<8045f4ac>] dump_stack+0x9c/0xe0
[ 2777.013820] [<8002e208>] __warn+0xe0/0x114
[ 2777.021969] [<8002e2cc>] warn_slowpath_null+0x1c/0x30
[ 2777.032036] [<8e8cd4d8>] 0x8e8cd4d8
[ 2777.039100] ---[ end trace 833f5b5e0b6c2d47 ]---
[ 2778.073010] dst_release: dst:8e03fa80 refcnt:-1

I will report back if we have another crash with the new code.

@booo
Copy link
Member

booo commented May 29, 2018

Still up and running but we get even more interesting output:

[48546.622689] ------------[ cut here ]------------
[48546.631924] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
[48546.648426] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[48546.662325] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt i2c_gpio i2c_algo_pca i2c_algo_bit gpio_pca953x i2c_dev ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio gpio_button_hotplug
[48546.786047] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.14.43 #0
[48546.800418] Stack : 00000000 00000000 00000000 00000000 805f7ad2 00000042 00000000 00000000
[48546.817062]         80590db4 80590947 8051cbfc 00000000 00000000 00000001 8fc09d68 532616ca
[48546.833693]         00000000 00000000 805f0000 00004240 00000000 000000dd 00000007 00000000
[48546.850321]         00000000 80590000 000bfe7f 00000000 00000000 00000000 805b0000 8036da68
[48546.866948]         00000009 00000140 00000000 8ff9df40 00000001 80291630 00000000 805f0000
[48546.883577]         ...
[48546.888432] Call Trace:
[48546.893316] [<80010498>] show_stack+0x58/0x100
[48546.902170] [<8045f4ac>] dump_stack+0x9c/0xe0
[48546.910830] [<8002e208>] __warn+0xe0/0x114
[48546.918969] [<8002e26c>] warn_slowpath_fmt+0x30/0x3c
[48546.928854] [<8036da68>] dev_watchdog+0x1ac/0x324
[48546.938219] [<800861a4>] call_timer_fn.isra.3+0x24/0x84
[48546.948603] [<800863bc>] run_timer_softirq+0x1b8/0x244
[48546.958844] [<8047c750>] __do_softirq+0x128/0x2ec
[48546.968202] [<80032910>] irq_exit+0x98/0xcc
[48546.976533] [<8024a6cc>] plat_irq_dispatch+0xfc/0x138
[48546.986582] [<8000b5a8>] except_vec_vi_end+0xb8/0xc4
[48546.996450] [<8000cf70>] r4k_wait_irqoff+0x1c/0x24
[48547.005993] [<8006645c>] do_idle+0xe4/0x168
[48547.014312] [<800666d8>] cpu_startup_entry+0x24/0x2c
[48547.024191] [<805b9bf8>] start_kernel+0x48c/0x4ac
[48547.033742] ---[ end trace 833f5b5e0b6c2d48 ]---
[48547.042978] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[48547.055320] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[48547.067337] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f130000, max=0, ctx=2839, dtx=2839, fdx=2838, next=2839
[48547.089031] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0eae0000, max=0, calc=3617, drx=3728
[48547.111380] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[48547.131715] mtk_soc_eth 1e100000.ethernet: PPE started
[49547.049404] ------------[ cut here ]------------
[49547.058664] WARNING: CPU: 1 PID: 0 at ./include/net/dst.h:256 0x8e8cd4d8
[49547.072041] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt i2c_gpio i2c_algo_pca i2c_algo_bit gpio_pca953x i2c_dev ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio gpio_button_hotplug
[49547.195767] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.14.43 #0
[49547.210145] Stack : 00000000 00000000 00000000 00000000 805f7ad2 00000042 00000000 00000000
[49547.226787]         8fc441f4 80590947 8051cbfc 00000001 00000000 00000001 8fc0dc38 532616ca
[49547.243440]         00000000 00000000 805f0000 00004ee0 00000000 000000fe 00000007 00000000
[49547.260084]         00000000 80590000 0002fcb7 00000000 00000000 00000000 805b0000 8e8cd4d8
[49547.276727]         00000009 00000100 00000001 00000003 00000001 80291630 00000004 805f0004
[49547.293367]         ...
[49547.298232] Call Trace:
[49547.303121] [<80010498>] show_stack+0x58/0x100
[49547.312001] [<8045f4ac>] dump_stack+0x9c/0xe0
[49547.320680] [<8002e208>] __warn+0xe0/0x114
[49547.328838] [<8002e2cc>] warn_slowpath_null+0x1c/0x30
[49547.338917] [<8e8cd4d8>] 0x8e8cd4d8
[49547.345938] ---[ end trace 833f5b5e0b6c2d49 ]---
[49547.355253] dst_release: dst:8edd9780 refcnt:-1
[50292.637021] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[50292.649359] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[50292.661372] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eae0000, max=0, ctx=896, dtx=896, fdx=895, next=896
[50292.682366] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0dc60000, max=0, calc=856, drx=862
[50292.703700] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5860000c, 0x10c = 0x80818
[50292.724203] mtk_soc_eth 1e100000.ethernet: PPE started
[51252.638068] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[51252.650410] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[51252.662423] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ec80000, max=0, ctx=3580, dtx=3580, fdx=3579, next=3580
[51252.684138] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e3b0000, max=0, calc=585, drx=601
[51252.705470] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5a60000c, 0x10c = 0x80818
[51252.725940] mtk_soc_eth 1e100000.ethernet: PPE started
[51997.599643] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[51997.612013] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[51997.624016] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0dcd0000, max=0, ctx=3060, dtx=3060, fdx=3059, next=3060
[51997.645715] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0de30000, max=0, calc=3653, drx=3664
[51997.667336] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[51997.687505] mtk_soc_eth 1e100000.ethernet: PPE started

@booo
Copy link
Member

booo commented Jun 4, 2018

Problem persist even with the new openwrt version mentioned above.

@bobster-galore
Copy link
Contributor Author

What a pity!
It's only visible under load?
What could be a help? There is an idle erx in spandau, we could treat it?!

SvenRoederer added a commit that referenced this issue Jul 15, 2018
01df4a2565, 0c285bd081, 2601e34fad might bring improvements for #494

b123921a92 include/prereq-build.mk: explicitly check for -f flag when using busybox time
36fa1bbf6f include/kernel-build.mk: fix kernel rebuild on backport patch changes
18533ff415 kernel: backport page fragment API changes from 4.10+ to 4.9
888a15ff83 ppp: add missing -fPIC to rp-pppoe.so CFLAGS
2601e34fad ramips: ethernet: disable fraglist support
154c0c4006 ubus: compile with LTO enabled
73fc67b614 procd: compile with LTO enabled
47b42137ce dropbear: compile with LTO enabled
ef96d1e34a firewall: compile with LTO enabled
ef16a394d2 iw: compile with LTO enabled
e7397eef69 ppp: compile with LTO enabled
dfbd49bd22 ppp: fix linker flags for the radius plugin
07940acc34 netifd: compile with LTO enabled
8c11133c9d busybox: compile with LTO enabled
4e56af5ab4 mt76: update to the latest version
16035a7dd3 include/feeds.mk: rework generation of opkg distfeeds.conf
6dac434c00 base-files: fix feed list in PKG_CONFIG_DEPENDS
9af22f1ac9 include/feeds.mk: always add available feeds to PACKAGE_SUBDIRS
6bdd5d8459 scripts/feeds: add src-dummy method
0c285bd081 ramips: ethernet: use own page_frag_cache
01df4a2565 ramips: ethernet: use skb_free_frag to free fragments
2eeb4b78c6 ramips: TP-Link TL-WR902AC v3: add missing wps button
33321ebefa ramips: TP-Link TL-WR902AC v3: don't build factory image
a07e1126bc tools: kernel2minor: update to latest version
11d6547455 config: extend small_flash feature
cf7154db07 kernel: only optimized for size if small_flash
621fa91a82 ar71xx: move boards to tiny subtarget
671999157d verbose.mk: quote SUBMAKE options
12915b105a arc: Update variables substitutions in u-boot env files
d238c7f995 mediatek: Fix memory node for U7623
d3b8e6b2a7 kernel: gpio-nct5104d remove boardname check
af70d86d62 netifd: update to latest git HEAD
33553a11ab ramips: clean up and fix MT7621 NAND driver issues
21ee8ce9b5 kernel: replace bridge port isolate hack with upstream patch backport on 4.14
68f9921ed8 netifd: update to the latest version
41a1c1af4b kernel: adjust bridge port isolate patch to match upstream attribute naming
e07ad61aec procd: update to the latest version, fixes gcc 8 build error
8b42a260ed mac80211: Expose support for ath9k Dynack
ba2b0f0ac6 kernel: bump 4.14 to 4.14.54
954faac7bc qos-scripts: fix indentation
4630159294 wireguard: bump to 0.0.20180708
7e82418372 iproute2: update to 4.17.0
6dac92a42e hostapd: build with LTO enabled (using jobserver for parallel build)
9b965d3b71 binutils: remove version 2.27
7c3e3eb098 binutils: update to version 2.30, resolves issues with LTO
55055aee50 binutils: backport an upstream fix for a linker bug that triggers with LTO
7ddba08d87 kernel: bcm47xxpart: fix getting user-space data partition name
a5188eb258 nasm: disable LTO, remove host specific workarounds
98a6bee09a odhcpd: update to latest git HEAD
e204717ef2 toolchain/nasm: force ar and ranlib only on macOSX
79b38047b9 build: README punctuation pendantry
5781fc6b3f build: Update README & github help
edf338f248 basefiles: Reword sysupgrade message
6476148034 ath79: add support for OCEDO Raccoon
da6c09eff4 kernel: move CONFIG_USB_MTU3 to generic config
29fa9ac559 kernel: disable some DRM_PANEL config options
@SvenRoederer
Copy link
Contributor

just found this in the OpenWrt-devel list: http://lists.infradead.org/pipermail/openwrt-devel/2018-October/014272.html

Probably someone can test?

@SvenRoederer SvenRoederer added the hardware-related adding, removal or changes label Dec 29, 2018
@SvenRoederer
Copy link
Contributor

@pmelange
Copy link
Contributor

pmelange commented Jun 3, 2019

Ever since I upgraded the firmware on the erx-sfp (coloniaallee) from some 1.0.0-alpha version to 1.0.2 the router has been online. Not once has the problem described above happened again.

@SvenRoederer
Copy link
Contributor

As @booo mentioned, he was able to see this several times with a more recent kernel than Hedy-1.0.2 is using. So I'm quite sure, this bug is still waiting to get triggered...

SvenRoederer added a commit that referenced this issue Aug 4, 2019
830440d nodogsplash: Backport Version 4.0.1. (#494)
a93e684 alfred: Merge bugfixes from 2019.3
6ea9e9b batctl: Upgrade hardif settings patches to upstream version
d65d6f1 batctl: Merge bugfixes from 2019.3
9d559fd batman-adv: Merge bugfixes from 2019.3
784ae0e Merge pull request #496 from ecsv/batadv-for-19.07
SvenRoederer added a commit that referenced this issue Oct 14, 2019
830440d nodogsplash: Backport Version 4.0.1. (#494)
a93e684 alfred: Merge bugfixes from 2019.3
6ea9e9b batctl: Upgrade hardif settings patches to upstream version
d65d6f1 batctl: Merge bugfixes from 2019.3
9d559fd batman-adv: Merge bugfixes from 2019.3
784ae0e Merge pull request #496 from ecsv/batadv-for-19.07
@SvenRoederer
Copy link
Contributor

@SvenRoederer
Copy link
Contributor

There is a nice report of finding "ethernet pause frames" as cause of the problem: http://lists.infradead.org/pipermail/openwrt-devel/2020-February/021742.html

@SvenRoederer
Copy link
Contributor

openwrt/openwrt@c8f8e59 sounds like a fix for this issue. Anyone can test?

@pmelange
Copy link
Contributor

According to https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573/50000/59 it didn't make a difference.

But I am currently building with this patch. I don't have high hopes though.


after 7577 seconds uptime

[ 7757.227823] ------------[ cut here ]------------
[ 7757.232533] WARNING: CPU: 1 PID: 0 at include/net/dst.h:256 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.242372] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ip_tables compat gpio_beeper input_core nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ipip tunnel4 ip_tunnel leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[ 7757.309857] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.171 #0
[ 7757.315947] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[ 7757.324313]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0bb20 ac07f5b2
[ 7757.332676]         8fc0bbb8 00000000 00000000 00000000 00000038 804997f8 00000008 00000000
[ 7757.341044]         00000000 00000000 0004ba61 ffffffff 00000000 8fc0bb00 00000000 8f14455c
[ 7757.349429]         8f1447fc 00000100 00000001 00000003 00000000 802c096c 00000004 80690004
[ 7757.357803]         ...
[ 7757.360277] Call Trace:
[ 7757.360357] [<804997f8>] 0x804997f8
[ 7757.366327] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.373136] [<802c096c>] 0x802c096c
[ 7757.376672] [<8000bf28>] 0x8000bf28
[ 7757.380167] [<8000bf30>] 0x8000bf30
[ 7757.383669] [<80560000>] 0x80560000
[ 7757.387174] [<80482754>] 0x80482754
[ 7757.390687] [<800773c4>] 0x800773c4
[ 7757.394189] [<8002ed30>] 0x8002ed30
[ 7757.397704] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.404546] [<8002e9d4>] 0x8002e9d4
[ 7757.408094] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[ 7757.414931] [<803ad4e4>] 0x803ad4e4
[ 7757.418461] [<803b7310>] 0x803b7310
[ 7757.421994] [<803b6f10>] 0x803b6f10
[ 7757.425517] [<803b6020>] 0x803b6020
[ 7757.429033] [<80461630>] 0x80461630
[ 7757.432576] [<803b5670>] 0x803b5670
[ 7757.436098] [<8036f60c>] 0x8036f60c
[ 7757.439620] [<80371d7c>] 0x80371d7c
[ 7757.443147] [<803ba4cc>] 0x803ba4cc
[ 7757.446668] [<8046b2a8>] 0x8046b2a8
[ 7757.450198] [<8046b5f0>] 0x8046b5f0
[ 7757.453724] [<8046b318>] 0x8046b318
[ 7757.457224] [<8036f308>] 0x8036f308
[ 7757.460751] [<8036f91c>] 0x8036f91c
[ 7757.464293] [<8037220c>] 0x8037220c
[ 7757.467801] [<8007d0c4>] 0x8007d0c4
[ 7757.471339] [<8049f950>] 0x8049f950
[ 7757.474835] [<800336c8>] 0x800336c8
[ 7757.478330] [<80275f24>] 0x80275f24
[ 7757.481868] [<80007388>] 0x80007388
[ 7757.485371] 
[ 7757.487000] ---[ end trace 8822d76274df4638 ]---
[ 7757.492211] dst_release: dst:8e0d3700 refcnt:-1

@pmelange
Copy link
Contributor

The system is still running, But at 32978 seconds, I have another kernel error

[32978.876756] ------------[ cut here ]------------
[32978.881419] WARNING: CPU: 2 PID: 0 at include/net/dst.h:256 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32978.891227] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ip_tables compat gpio_beeper input_core nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ipip tunnel4 ip_tunnel leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[32978.958675] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W       4.14.171 #0
[32978.965973] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[32978.974335]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0db20 ac07f5b2
[32978.982671]         8fc0dbb8 00000000 00000000 00000000 00000038 804997f8 00000008 00000000
[32978.991010]         00000000 00000000 000ea0d3 ffffffff 00000000 8fc0db00 00000000 8f14455c
[32978.999345]         8f1447fc 00000100 00000001 00000003 00000002 802c096c 00000008 80690008
[32979.007680]         ...
[32979.010117] Call Trace:
[32979.010170] [<804997f8>] 0x804997f8
[32979.016071] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32979.022828] [<802c096c>] 0x802c096c
[32979.026315] [<8000bf28>] 0x8000bf28
[32979.029787] [<8000bf30>] 0x8000bf30
[32979.033255] [<80560000>] 0x80560000
[32979.036726] [<80482754>] 0x80482754
[32979.040198] [<800773c4>] 0x800773c4
[32979.043667] [<8002ed30>] 0x8002ed30
[32979.047149] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32979.053907] [<8002e9d4>] 0x8002e9d4
[32979.057391] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
[32979.064154] [<803ad4e4>] 0x803ad4e4
[32979.067638] [<803b7310>] 0x803b7310
[32979.071114] [<803b6f10>] 0x803b6f10
[32979.074589] [<803b6020>] 0x803b6020
[32979.078060] [<80461630>] 0x80461630
[32979.081535] [<803b5670>] 0x803b5670
[32979.085009] [<8036f60c>] 0x8036f60c
[32979.088487] [<803b9128>] 0x803b9128
[32979.091970] [<803bb14c>] 0x803bb14c
[32979.095444] [<80371d7c>] 0x80371d7c
[32979.098916] [<803ba4cc>] 0x803ba4cc
[32979.102393] [<8046b2a8>] 0x8046b2a8
[32979.105865] [<803b6f10>] 0x803b6f10
[32979.109346] [<8046b5f0>] 0x8046b5f0
[32979.112820] [<8046b318>] 0x8046b318
[32979.116292] [<8036f308>] 0x8036f308
[32979.119772] [<8036f91c>] 0x8036f91c
[32979.123244] [<8037220c>] 0x8037220c
[32979.126718] [<8007d0c4>] 0x8007d0c4
[32979.130201] [<8049f950>] 0x8049f950
[32979.133671] [<800336c8>] 0x800336c8
[32979.137143] [<80275f24>] 0x80275f24
[32979.140616] [<80007388>] 0x80007388
[32979.144084] 
[32979.145677] ---[ end trace 8822d76274df4639 ]---
[32979.150503] dst_release: dst:8d4b4b00 refcnt:-1

@SvenRoederer
Copy link
Contributor

Just seen, that there are 2 sources of the kernel-error:

  • PID: 0 at include/net/dst.h:256: in recent comments related to "nf_conntrack"
  • PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324: related to "eth0 (mtk_soc_eth): transmit queue 0 timed out"

So are this probably two separate issues or really the same which cause different errors?

@pmelange
Copy link
Contributor

I don't want to see any kernel dumps of any kind :)

I'm leaving the router online until it crashes. Then I'll go back to the good old trusty WDR4900 with gonzo-rc2. I just hope I'm around when the router crashes and that the ca 70 people who use freifunk around here won't be cut-off from their youtube/facebook/ebay for too long.


Here is a kernel log for another rb350gr3. It has a mix of include/net/dst.h:256, net/sched/sch_generic.c:320 and mtk_soc_eth 1e100000.ethernet eth0: transmit timed out. This router suffers from the "dies or sthg" issue

[20157.340982] ------------[ cut here ]------------
[20157.345672] WARNING: CPU: 2 PID: 0 at ./include/net/dst.h:256 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.355717] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache libcrc32c iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables compat act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress gpio_beeper input_core
[20157.426928]  ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb ipip tunnel4 ip_tunnel veth leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[20157.450316] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.98 #0
[20157.456345] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[20157.464745]         00000000 00000000 00000000 00000000 00000000 00000001 8fc11c30 ac07f57b
[20157.473113]         8fc11cc8 00000000 00000000 00003f00 00000038 8044e3d8 00000008 00000000
[20157.481469]         00000000 804e0000 0006df0c 00000000 8fc11c10 00000000 80500000 8e8a14c8
[20157.489816]         00000009 00000100 00000001 00000003 00000003 8027faf4 00000008 80540008
[20157.498169]         ...
[20157.500622] Call Trace:
[20157.500689] [<8044e3d8>] 0x8044e3d8
[20157.506615] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.513379] [<8027faf4>] 0x8027faf4
[20157.516880] [<80010050>] 0x80010050
[20157.520359] [<80010058>] 0x80010058
[20157.523835] [<8043762c>] 0x8043762c
[20157.527322] [<80071254>] 0x80071254
[20157.530819] [<8002ee48>] 0x8002ee48
[20157.534319] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.541111] [<8002ef0c>] 0x8002ef0c
[20157.544604] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[20157.551445] [<8ef84300>] 0x8ef84300 [ip_tables@8ef84000+0x2830]
[20157.557395] [<80368794>] 0x80368794
[20157.560906] [<8037226c>] 0x8037226c
[20157.564384] [<80368794>] 0x80368794
[20157.567883] [<80371eb0>] 0x80371eb0
[20157.571376] [<80370be8>] 0x80370be8
[20157.574858] [<8041443c>] 0x8041443c
[20157.578386] [<803703dc>] 0x803703dc
[20157.581881] [<80328024>] 0x80328024
[20157.585360] [<80015550>] 0x80015550
[20157.588865] [<8032aae4>] 0x8032aae4
[20157.592354] [<8032e314>] 0x8032e314
[20157.595824] [<80076bd0>] 0x80076bd0
[20157.599310] [<80454810>] 0x80454810
[20157.602792] [<800335ac>] 0x800335ac
[20157.606271] [<80235c68>] 0x80235c68
[20157.609790] [<8000b4c8>] 0x8000b4c8
[20157.613281] 
[20157.614909] ---[ end trace abc5a3d60b545c8d ]---
[20157.619801] dst_release: dst:8ec8d500 refcnt:-1
[52394.114464] ------------[ cut here ]------------
[52394.119113] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 0x80354ec8
[52394.126182] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[52394.133132] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache libcrc32c iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables compat act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress gpio_beeper input_core
[52394.204218]  ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb ipip tunnel4 ip_tunnel veth leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[52394.227599] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W       4.14.98 #0
[52394.234801] Stack : 00000000 00000000 00000000 8fe73440 00000000 00000000 00000000 00000000
[52394.243156]         00000000 00000000 00000000 00000000 00000000 00000001 8fc15d60 ac07f57b
[52394.251500]         8fc15df8 00000000 00000000 00004c10 00000038 8044e3d8 00000008 00000000
[52394.259836]         00000000 804e0000 0003790f 00000000 8fc15d40 00000000 80500000 80354ec8
[52394.268173]         00000009 00000140 00000003 8fe73440 00000001 8027faf4 0000000c 8054000c
[52394.276506]         ...
[52394.278942] Call Trace:
[52394.279010] [<8044e3d8>] 0x8044e3d8
[52394.284916] [<80354ec8>] 0x80354ec8
[52394.288393] [<8027faf4>] 0x8027faf4
[52394.291867] [<80010050>] 0x80010050
[52394.295340] [<80010058>] 0x80010058
[52394.298813] [<8043762c>] 0x8043762c
[52394.302292] [<80070304>] 0x80070304
[52394.305790] [<8002ee48>] 0x8002ee48
[52394.309264] [<80354ec8>] 0x80354ec8
[52394.312746] [<8002eeac>] 0x8002eeac
[52394.316237] [<80354ec8>] 0x80354ec8
[52394.319710] [<80097870>] 0x80097870
[52394.323195] [<80354d1c>] 0x80354d1c
[52394.326670] [<80087074>] 0x80087074
[52394.330149] [<80087288>] 0x80087288
[52394.333628] [<80077850>] 0x80077850
[52394.337121] [<80454810>] 0x80454810
[52394.340593] [<800335ac>] 0x800335ac
[52394.344062] [<80235c68>] 0x80235c68
[52394.347543] [<8000b4c8>] 0x8000b4c8
[52394.351016] 
[52394.352590] ---[ end trace abc5a3d60b545c8e ]---
[52394.357270] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[52394.363451] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[52394.369553] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eed0000, max=0, ctx=3398, dtx=3398, fdx=3397, next=3398
[52394.380516] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e070000, max=0, calc=1412, drx=1413
[52394.394715] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5a60000c, 0x10c = 0x80818
[52394.410152] mtk_soc_eth 1e100000.ethernet: PPE started
[78432.202961] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[78432.209142] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[78432.215184] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f240000, max=0, ctx=3977, dtx=3977, fdx=3976, next=3977
[78432.226039] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0cfa0000, max=0, calc=2375, drx=2376
[78432.240695] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[78432.255306] mtk_soc_eth 1e100000.ethernet: PPE started
[94287.287041] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[94287.293232] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[94287.299288] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0cc30000, max=0, ctx=2725, dtx=2725, fdx=2724, next=2725
[94287.310261] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ce00000, max=0, calc=2090, drx=2091
[94287.324265] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[94287.339236] mtk_soc_eth 1e100000.ethernet: PPE started
[130272.363155] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[130272.369432] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[130272.375549] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ea60000, max=0, ctx=948, dtx=948, fdx=947, next=948
[130272.386211] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ed50000, max=0, calc=1107, drx=1108
[130272.400214] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5e60000c, 0x10c = 0x80818
[130272.414209] mtk_soc_eth 1e100000.ethernet: PPE started
[174322.500623] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[174322.506900] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[174322.513042] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f2e0000, max=0, ctx=797, dtx=797, fdx=796, next=797
[174322.523726] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ce50000, max=0, calc=2701, drx=2702
[174322.537464] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[174322.551527] mtk_soc_eth 1e100000.ethernet: PPE started
[241107.744361] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[241107.750655] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[241107.756804] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0d960000, max=0, ctx=334, dtx=334, fdx=333, next=334
[241107.767423] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0cc20000, max=0, calc=1867, drx=1868
[241107.787661] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[241107.802788] mtk_soc_eth 1e100000.ethernet: PPE started
[382128.124724] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[382128.130997] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[382128.137129] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0cc80000, max=0, ctx=3821, dtx=3821, fdx=3820, next=3821
[382128.148126] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c160000, max=0, calc=576, drx=577
[382128.161844] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[382128.177572] mtk_soc_eth 1e100000.ethernet: PPE started
[408753.174559] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[408753.180827] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[408753.186982] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e230000, max=0, ctx=1009, dtx=1009, fdx=1008, next=1009
[408753.198542] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0cdc0000, max=0, calc=4039, drx=4040
[408753.212265] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5e60000c, 0x10c = 0x80818
[408753.226380] mtk_soc_eth 1e100000.ethernet: PPE started
[436068.295884] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[436068.302158] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[436068.308296] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0c110000, max=0, ctx=2654, dtx=2654, fdx=2653, next=2654
[436068.319370] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c250000, max=0, calc=3828, drx=3830
[436068.333100] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[436068.347219] mtk_soc_eth 1e100000.ethernet: PPE started
[453368.346248] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[453368.352523] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[453368.358630] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eed0000, max=0, ctx=3044, dtx=3044, fdx=3043, next=3044
[453368.369556] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c1d0000, max=0, calc=3659, drx=3660
[453368.383524] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5b60000c, 0x10c = 0x80818
[453368.398532] mtk_soc_eth 1e100000.ethernet: PPE started
[460258.369148] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[460258.375432] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[460258.381545] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ea20000, max=0, ctx=1432, dtx=1432, fdx=1431, next=1432
[460258.392620] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c210000, max=0, calc=3940, drx=3941
[460258.406803] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[460258.421713] mtk_soc_eth 1e100000.ethernet: PPE started
[510304.273624] ------------[ cut here ]------------
[510304.278398] WARNING: CPU: 3 PID: 0 at ./include/net/dst.h:256 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.288468] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache libcrc32c iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables compat act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress gpio_beeper input_core
[510304.359545]  ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb ipip tunnel4 ip_tunnel veth leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
[510304.382947] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W       4.14.98 #0
[510304.390235] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
[510304.398673]         00000000 00000000 00000000 00000000 00000000 00000001 8fc15c30 ac07f57b
[510304.407104]         8fc15cc8 00000000 00000000 00007b40 00000038 8044e3d8 00000008 00000000
[510304.415530]         00000000 804e0000 0005d7e3 20202020 8fc15c10 00000000 80500000 8e8a14c8
[510304.423960]         00000009 00000100 00000001 00000003 00000002 8027faf4 0000000c 8054000c
[510304.432400]         ...
[510304.434938] Call Trace:
[510304.435007] [<8044e3d8>] 0x8044e3d8
[510304.441105] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.447953] [<8027faf4>] 0x8027faf4
[510304.451525] [<80010050>] 0x80010050
[510304.455080] [<80010058>] 0x80010058
[510304.458660] [<8043762c>] 0x8043762c
[510304.462233] [<80071254>] 0x80071254
[510304.465817] [<8002ee48>] 0x8002ee48
[510304.469416] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.476320] [<8002ef0c>] 0x8002ef0c
[510304.479892] [<8e8a14c8>] 0x8e8a14c8 [nf_conntrack_rtcache@8e8a1000+0xaa0]
[510304.486758] [<80368794>] 0x80368794
[510304.490361] [<8037226c>] 0x8037226c
[510304.493953] [<80371eb0>] 0x80371eb0
[510304.497562] [<80370be8>] 0x80370be8
[510304.501142] [<8041443c>] 0x8041443c
[510304.504715] [<803703dc>] 0x803703dc
[510304.508297] [<80328024>] 0x80328024
[510304.511861] [<80015550>] 0x80015550
[510304.515439] [<8032aae4>] 0x8032aae4
[510304.519020] [<8032e314>] 0x8032e314
[510304.522584] [<80076bd0>] 0x80076bd0
[510304.526184] [<80454810>] 0x80454810
[510304.529746] [<800335ac>] 0x800335ac
[510304.533300] [<80235c68>] 0x80235c68
[510304.536868] [<8000b4c8>] 0x8000b4c8
[510304.540427] 
[510304.542062] ---[ end trace abc5a3d60b545c8f ]---
[510304.596544] dst_release: dst:8f7b7b80 refcnt:-1
[552153.577508] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[552153.583779] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[552153.589920] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ed50000, max=0, ctx=2821, dtx=2821, fdx=2820, next=2821
[552153.600961] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c200000, max=0, calc=904, drx=905
[552153.614498] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5c60000c, 0x10c = 0x80818
[552153.628660] mtk_soc_eth 1e100000.ethernet: PPE started

@pmelange
Copy link
Contributor

pmelange commented Feb 26, 2020

And here, an ERX-SFP (coloniaallee)

[54115.113725] ------------[ cut here ]------------
[54115.122941] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:306 0x802a0ba0()
[54115.137357] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[54115.151252] Modules linked in: ifb iptable_nat nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_mangle iptable_filter ipt_ECN ip_tables act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress i2c_dev batman_adv libcrc32c cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables l2tp_ip6 l2tp_ip l2tp_eth l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipip tunnel4 ip_tunnel leds_gpio gpio_button_hotplug crc32c_generic [last unloaded: ifb]
[54115.326904] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.167 #0
[54115.338847] Stack : 00000000 00000000 80436882 00000034 00000000 00000000 00000000 00000000
[54115.338847] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54115.338847] 	  ...
[54115.409584] Call Trace:[<8001653c>] 0x8001653c
[54115.418461] [<8001653c>] 0x8001653c
[54115.425391] [<801a72cc>] 0x801a72cc
[54115.432331] [<8002bb90>] 0x8002bb90
[54115.439263] [<802a0ba0>] 0x802a0ba0
[54115.446192] [<8002bbec>] 0x8002bbec
[54115.453133] [<802a0ba0>] 0x802a0ba0
[54115.460060] [<802a0948>] 0x802a0948
[54115.466989] [<80070ca0>] 0x80070ca0
[54115.473918] [<8025cd7c>] 0x8025cd7c
[54115.480843] [<8006df94>] 0x8006df94
[54115.487769] [<80070ef0>] 0x80070ef0
[54115.494707] [<8002e6b4>] 0x8002e6b4
[54115.501651] [<8002e994>] 0x8002e994
[54115.508589] [<801cd270>] 0x801cd270
[54115.515531] [<80005988>] 0x80005988
[54115.522460] 
[54115.525577] ---[ end trace 0ef5542dd3a7a2f3 ]---
[54115.534808] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[54115.547186] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[54115.559230] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f31e000, max=512, ctx=440, dtx=440, fdx=439, next=440
[54115.580593] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f33c000, max=512, calc=13, drx=14

@pmelange
Copy link
Contributor

The test router had another kernel warning and reboot itself. Uptime 114195 seconds (just under 32 hours)

Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.012727] ------------[ cut here ]------------
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.017472] WARNING: CPU: 1 PID: 0 at include/net/dst.h:256 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.027391] Modules linked in: batman_adv nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ip_tables compat gpio_beeper input_core nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ipip tunnel4 ip_tunnel leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.095014] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.14.171 #0
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.102392] Stack : 00000000 00000000 00000000 00000003 00000000 00000000 00000000 00000000
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.110822]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0bb20 ac07f5b2
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.119254]         8fc0bbb8 00000000 00000000 00000000 00000038 804997f8 00000008 00000000
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.127701]         00000000 00000000 00017326 20202020 00000000 8fc0bb00 00000000 8f14455c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.136142]         8f1447fc 00000100 00000001 00000003 00000003 802c096c 00000004 80690004
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.144592]         ...
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.147124] Call Trace:
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.147177] [<804997f8>] 0x804997f8
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.153276] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.160136] [<802c096c>] 0x802c096c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.163716] [<8000bf28>] 0x8000bf28
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.167275] [<8000bf30>] 0x8000bf30
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.170835] [<80560000>] 0x80560000
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.174402] [<80482754>] 0x80482754
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.177976] [<800773c4>] 0x800773c4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.181545] [<8002ed30>] 0x8002ed30
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.185121] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.191999] [<8002e9d4>] 0x8002e9d4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.195566] [<8f14455c>] 0x8f14455c [nf_conntrack_rtcache@8f144000+0xaa0]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.202429] [<803ad4e4>] 0x803ad4e4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.206005] [<803b7310>] 0x803b7310
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.209577] [<803b6f10>] 0x803b6f10
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.213161] [<803b6020>] 0x803b6020
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.216732] [<80461630>] 0x80461630
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.220303] [<803b5670>] 0x803b5670
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.223892] [<8036f60c>] 0x8036f60c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.227468] [<80371d7c>] 0x80371d7c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.231025] [<803ba4cc>] 0x803ba4cc
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.234602] [<8046b2a8>] 0x8046b2a8
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.238177] [<8046b5f0>] 0x8046b5f0
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.241741] [<8046b318>] 0x8046b318
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.245306] [<8036f308>] 0x8036f308
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.248894] [<8036f91c>] 0x8036f91c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.252468] [<8037220c>] 0x8037220c
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.256036] [<8007d0c4>] 0x8007d0c4
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.259618] [<8049f950>] 0x8049f950
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.263187] [<800336c8>] 0x800336c8
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.266751] [<80275f24>] 0x80275f24
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.270323] [<80007388>] 0x80007388
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.273885]
Wed Feb 26 20:46:09 2020 kern.warn kernel: [114195.275612] ---[ end trace 8822d76274df463a ]---

@SvenRoederer
Copy link
Contributor

another rb350gr3.

So also the MikroTik RG750Gr3 devices are affected? Even we don't see completely freeze of network here.

@pmelange
Copy link
Contributor

also the MikroTik RG750Gr3 devices are affected?

All mt7621 devices are affected.

@pmelange
Copy link
Contributor

Even we don't see completely freeze of network here.

Verklarung-core has almost no traffic. That's probably why it's not causing problems.

Perleberger36 and coloniaallee have a lot of traffic, and if you look at the uptime, every time it reboots is because of a kernel crash and either the rooter reboots itself or the watchdog reboots it.

The test was done at the scherer8, which also has a lot of traffic. 32hrs and it rebooted itself.

@SvenRoederer SvenRoederer changed the title ubnt EdgeRouterX switch dies or sthg ubnt EdgeRouterX switch dies or sthg (affects ramips-mt7621) May 26, 2020
@SvenRoederer
Copy link
Contributor

in 179c140 there is a reference to openwrt-commit 498f1f4f5d, which reads that it might fix the cause of the problem.

@pmelange
Copy link
Contributor

pmelange commented Jun 5, 2020

New Patch made it into the OpenWRT master branch openwrt/openwrt#2942 (comment)

@SvenRoederer
Copy link
Contributor

As usually these OpenWrt-commits have been added to the "daily/upstream-master" branch automatically in 1d4f5c9.

So some tests need to be carried out.

@hmh
Copy link

hmh commented Jan 23, 2023

This seems to be fixed since OpenWRT 21.02 ? Should it be closed as fixed ?

@Akira25
Copy link
Member

Akira25 commented Jan 29, 2023

This seems to be fixed since OpenWRT 21.02 ? Should it be closed as fixed ?

We could do it, if you want to. Anyway, this project is not maintained anymore since some years. We use the falter-firmware now:
https://github.com/freifunk-berlin/falter-packages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hardware-related adding, removal or changes OpenWRT
Projects
None yet
Development

No branches or pull requests

6 participants