Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#441 - Kernel crash: eth0 (ag71xx): transmit queue 0 timed out #5447

Open
openwrt-bot opened this issue Jan 28, 2017 · 8 comments
Open

FS#441 - Kernel crash: eth0 (ag71xx): transmit queue 0 timed out #5447

openwrt-bot opened this issue Jan 28, 2017 · 8 comments
Labels

Comments

@openwrt-bot
Copy link

openwrt-bot commented Jan 28, 2017

amain:

Device: TL-WR1043ND v1
LEDE: snapshot r3189-12db207

During a simultaneous bidirectional iperf load test, after about 20 minutes, the kernel crashes. I reproduced this several times:

Server 1 <---> 1043ND <---> Laptop via wireless N

LEDE is using a default setup. Only changes:

  • Setting wireless encryption to psk with password
  • Setting a DNAT rule for server 1 to be able reach the iperf server on the Laptop

This bug was actually discovered while testing fixes for FS#13 - Ath9k AP stays up for connected clients but doesn't show in scan on new ones

Serial console ouput:

[ 1294.022551] ------------[ cut here ]------------ [ 1294.027247] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x1dc/0x260() [ 1294.035754] NETDEV WATCHDOG: eth0 (ag71xx): transmit queue 0 timed out [ 1294.042319] Modules linked in: ath9k ath9k_common pppoe ppp_async ath9k_hw ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nn [ 1294.106287] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.45 #0 [ 1294.112066] Stack : 803e4844 00000000 00000001 80440000 8042f1dc 8042ee63 803c5e64 00000000 804a378c 8042d4fc 00000200 00100000 0000000a 800a7618 803cb554 80430000 00000003 8042d4fc 803c9960 81809e34 0000000a 800a5594 00000006 00000000 00000000 801f5400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ... [ 1294.148122] Call Trace: [ 1294.150617] [<800a7618>] vprintk_default+0x24/0x30 [ 1294.155474] [<800a5594>] printk+0x2c/0x38 [ 1294.159515] [<801f5400>] wait_for_xmitr+0x84/0xcc [ 1294.164289] [<80081c3c>] warn_slowpath_common+0xa0/0xd0 [ 1294.169564] [<801a72dc>] dump_stack+0x14/0x28 [ 1294.173975] [<80071eb0>] show_stack+0x50/0x84 [ 1294.178376] [<80081c3c>] warn_slowpath_common+0xa0/0xd0 [ 1294.183661] [<8028ef3c>] dev_watchdog+0x1dc/0x260 [ 1294.188408] [<80081c98>] warn_slowpath_fmt+0x2c/0x38 [ 1294.193450] [<8028ef3c>] dev_watchdog+0x1dc/0x260 [ 1294.198191] [<8028ed60>] dev_watchdog+0x0/0x260 [ 1294.202782] [<800b08d0>] call_timer_fn.isra.5+0x24/0x80 [ 1294.208051] [<800b0b54>] run_timer_softirq+0x1b4/0x1fc [ 1294.213248] [<800a89f0>] handle_irq_event_percpu+0x154/0x188 [ 1294.218960] [<800841b8>] __do_softirq+0x250/0x298 [ 1294.223721] [<800abdac>] handle_percpu_irq+0x50/0x80 [ 1294.228746] [<8006a9e0>] plat_irq_dispatch+0xd4/0x10c [ 1294.233848] [<80060bf4>] handle_int+0x134/0x140 [ 1294.238400] [ 1294.239904] ---[ end trace 17bad011a41ccba7 ]--- [ 1294.244567] eth0: tx timeout [ 1299.022570] eth0: tx timeout [ 1304.022581] eth0: tx timeout [ 1309.022588] eth0: tx timeout

The eth0: tx timeout line is repeated every 5 seconds.

@openwrt-bot
Copy link
Author

openwrt-bot commented Jan 29, 2017

IronicSven:

@johan: Regarding your question in FS#13: It's not possible to run the tests on the smartphone simultaneously but I will try to borrow a laptop to run the tests again.

@openwrt-bot
Copy link
Author

openwrt-bot commented Jan 29, 2017

IronicSven:

Is your device overclocked, Johan?
I was able to reproduce a kernel crash/reboot with a TL-WR1043ND v1 unit overclocked at 430 MHz by running a bidirectional iperf load test for a few minutes. Reverting it to 400 MHz fixed it for me.

@openwrt-bot
Copy link
Author

openwrt-bot commented Jan 29, 2017

amain:

Sven, thanks for trying this out too. No, I haven't been overclocking; using the device just as is, only with a serial console added.

[ 0.000000] Clocks: CPU:400.000MHz, DDR:400.000MHz, AHB:200.000MHz, Ref:5.000MHz

How long have you been running the test without over clocking?

@openwrt-bot
Copy link
Author

openwrt-bot commented Jan 29, 2017

IronicSven:

About 20 minutes. I can repeat the test if that wasn't long enough.

@openwrt-bot
Copy link
Author

openwrt-bot commented Jan 29, 2017

amain:

If you have the time, please let the test run some longer. And also generate some normal load on the CPU, during the test. I've been using the router during iperf test also for normal internet browsing. Running top, etc. I'm hoping with some extra CPU load the issue will surface.

Installing packages using opkg seems to causes another [[https://bugs.lede-project.org/index.php?do=details&task_id=120|issue]]; in my case once it just rebooted without spitting anything to the console. All in all the 1043ND doesn't yet sound stable on master.

@openwrt-bot
Copy link
Author

openwrt-bot commented Feb 1, 2017

IronicSven:

Johan, sorry but I still can't reproduce this issue. I just tested the current snapshot with bidirectional iperf load, multiple putty windows with top und multiple browser windows with luci for 40 minutes.
I started from a default setup and only enabled wifi with WPA2-PSK, Force CCMP (AES) and password.
Could you please explain what you mean with DNAT rule? Is it a port forwarding rule in the firewall settings?

@openwrt-bot
Copy link
Author

openwrt-bot commented Feb 1, 2017

amain:

Thanks Sven for having another look. In don't want to put you yet through another round of tests. Looks like it's more an hardware issue with my device then an software issue. But if you're still interested.

First I start iperf -c(client) on the laptop(192.168.1.152), which then connects over wifi and then is NATted to my test server (192.168.100.0/24 network). Due to the NAT, the server won't be able to connect back to the laptop without some help. SO after the first iperf is started, I enter:

iptables -t nat -I PREROUTING -p tcp --dport 5001 -j DNAT --to 192.168.1.152

iptables -t filter -I FORWARD -j ACCEPT

And then the second iperf -c(client) is started on the server, which connect to the laptop.

I've been performing this test in my mini lab, because this is how the 1043ND is going to be used, when connected to the Internet.

@openwrt-bot
Copy link
Author

openwrt-bot commented Mar 5, 2017

psyborg55:

have you both used same revison for testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant