Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MT7603 - persistent issues #167

Closed
CR-Ryan opened this issue Apr 17, 2018 · 51 comments
Closed

MT7603 - persistent issues #167

CR-Ryan opened this issue Apr 17, 2018 · 51 comments

Comments

@CR-Ryan
Copy link

CR-Ryan commented Apr 17, 2018

After pulling the latest commit (ca5cc9a), we are still seeing instability on 7603

Behavior is almost identical to what was described in this post

The device I am currently testing with is an LG v20, uses a dual band 11ac chipset. We have seen this same behavior across many other devices though - it's not just this one.

After initially connecting, the device remains connected for about 1 - 10 minutes. During which browsing and speedtests work fine. Eventually, the wireless connection gets "stuck". The device shows as connected to the network, but it cannot ping the AP, and the AP can't ping it. Obviously browsing doesn't work in this state. Disconnecting and reconnecting will typically fix this temporarily. Also, setting htmode to 'none' appears to prevent the issue entirely - but this is not a great fix.

There is interesting behavior showing in the station dumps. When in this "stuck" state, the rx bitrate will continuously jump between 24Mbits and 144Mbps. The two station dumps below were taken during a speedtest, 1 second apart, and displays this behavior:

root@ph:/etc/config# iw dev wlan0 station dump
Station a8:b8:6e:7d:06:fe (on wlan0)
inactive time: 110 ms
rx bytes: 1102876
rx packets: 11322
tx bytes: 26489082
tx packets: 9515
tx retries: 333
tx failed: 9
rx drop misc: 7
signal: -46 [-47, -46] dBm
signal avg: -47 [-49, -47] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 24.0 MBit/s
expected throughput: 46.875Mbps
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 134 seconds

root@ph:/etc/config# iw dev wlan0 station dump
Station a8:b8:6e:7d:06:fe (on wlan0)
inactive time: 60 ms
rx bytes: 1102964
rx packets: 11324
tx bytes: 26489144
tx packets: 9516
tx retries: 333
tx failed: 9
rx drop misc: 7
signal: -53 [-54, -53] dBm
signal avg: -48 [-49, -48] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 144.4 MBit/s MCS 15 short GI
expected throughput: 46.875Mbps
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 135 seconds

This is in contrast to when it is working well, where the rx bitrate stays on 144Mbps continuously. Here is a station dump from when it was working fine:

Station a8:b8:6e:7d:06:fe (on wlan0)
inactive time: 10 ms
rx bytes: 1085972
rx packets: 11015
tx bytes: 26449459
tx packets: 9430
tx retries: 253
tx failed: 5
rx drop misc: 3
signal: -49 [-49, -51] dBm
signal avg: -47 [-47, -50] dBm
tx bitrate: 144.4 MBit/s MCS 15 short GI
rx bitrate: 144.4 MBit/s MCS 15 short GI
expected throughput: 87.158Mbps
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 22 seconds

Here is our wireless config:

config wifi-device 'radio0'
option type 'mac80211'
option hwmode '11g'
option path 'pci0000:00/0000:00:01.0/0000:02:00.0'
option disabled '0'
option channel '6'
option country 'US'
option txpower '23'
option noscan '1'
option htmode 'HT40'
option log_level '1'

config wifi-iface 'default_radio0'
option device 'radio0'
option network 'lan'
option mode 'ap'
option hidden '0'
option disassoc_low_ack '0'
option ssid 'CleanRouter'
option encryption 'psk2'
option key '3453453456'

config wifi-device 'radio1'
option type 'mac80211'
option hwmode '11a'
option path 'pci0000:00/0000:00:00.0/0000:01:00.0'
option htmode 'VHT80'
option channel '149'
option country 'US'
option txpower '30'
option noscan '1'
option disabled '0'
option log_level '1'

config wifi-iface 'default_radio1'
option device 'radio1'
option hidden '0'
option encryption 'psk2'
option network 'lan'
option ssid 'CleanRouter5GHz'
option disassoc_low_ack '0'
option mode 'ap'
option key '3453453456'

Dmesg and logread look fine - no errors present. This is true even when the connection gets "stuck".

I can reproduce this reliably - I am happy to provide more info if needed.

Thanks!

@CR-Ryan
Copy link
Author

CR-Ryan commented Apr 24, 2018

Update: Issues still happening on e2eedc9. There is a definite improvement though - it is self-recovering now. Typically, it will recover in about ~30 seconds. Just like before, station dump shows odd behavior during the connection issues. The three dumps below were taken consecutively over 3 seconds, and you can see the rx bitrate bounce around.

root@ph:/etc/config# iw dev wlan0 station dump
Station a8:b8:6e:7d:06:fe (on wlan0)
inactive time: 30 ms
rx bytes: 9034283
rx packets: 98952
tx bytes: 266642551
tx packets: 98723
tx retries: 6356
tx failed: 117
rx drop misc: 63
signal: -39 [-39, -41] dBm
signal avg: -38 [-38, -41] dBm
tx bitrate: 130.0 MBit/s MCS 15
rx bitrate: 24.0 MBit/s
expected throughput: 33.507Mbps
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 2224 seconds
root@ph:/etc/config# iw dev wlan0 station dump
Station a8:b8:6e:7d:06:fe (on wlan0)
inactive time: 40 ms
rx bytes: 9036852
rx packets: 98973
tx bytes: 266650403
tx packets: 98733
tx retries: 6358
tx failed: 117
rx drop misc: 63
signal: -38 [-38, -39] dBm
signal avg: -38 [-38, -39] dBm
tx bitrate: 130.0 MBit/s MCS 15
rx bitrate: 130.0 MBit/s MCS 15
expected throughput: 37.994Mbps
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 2225 seconds
root@ph:/etc/config# iw dev wlan0 station dump
Station a8:b8:6e:7d:06:fe (on wlan0)
inactive time: 180 ms
rx bytes: 9037100
rx packets: 98981
tx bytes: 266650497
tx packets: 98734
tx retries: 6358
tx failed: 117
rx drop misc: 63
signal: -40 [-40, -40] dBm
signal avg: -39 [-39, -39] dBm
tx bitrate: 130.0 MBit/s MCS 15
rx bitrate: 24.0 MBit/s
expected throughput: 40.191Mbps
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 2226 seconds

@Tom-Brouwer
Copy link

Tom-Brouwer commented Apr 25, 2018

I'm seeing the same behaviour as CR-RYAN, using a Xiaomi Router 3G. Some time after connecting (few minutes) loading web sites becomes very slow, and eventually none of them load anymore.

Weirdly enough some existing connections appear to keep functioning during this behaviour. E.g. I can access LUCI web interface, and if I start streaming Live TV right after connecting, I can continue streaming without a problem. Like with CR-RYAN, setting HTMODE to NONE, effectively disabling Wireless N, gets rid of the problem. Also I experience none of these problems on my 5GHZ (MT7612). The Wireless config for 2.4Ghz, that gives me the problems is as follows:

config wifi-device 'radio0'
        option type 'mac80211'
        option hwmode '11g'
        option path 'pci0000:00/0000:00:00.0/0000:01:00.0'
        option htmode 'HT20'
        option country 'NL'
        option channel '13'
        option log_level '4'
        option txpower '16'
        option legacy_rates '0'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option encryption 'psk2+ccmp'
        option key 'WIFIPW'
        option ssid 'WIFINAME'

@mastum
Copy link

mastum commented Apr 28, 2018

I have updated to OpenWrt SNAPSHOT r6772 and now I have the same problem with iPad connected to 2.4 GHz.
When it happens in the log I find "disconnected due to excessive missing ACKs", device shows as connected to the network but it cannot ping others devices.
I solved momentarily with option disassoc_low_ack 0.

@mastum
Copy link

mastum commented Apr 29, 2018

As not said, option disassoc_low_ack 0 did not solve the problem.

@CR-Ryan
Copy link
Author

CR-Ryan commented Apr 30, 2018

@Mafesa - We are seeing the same thing - using the disassoc_low_ack 0 option doesn't fix the issue for us either.

@mastum
Copy link

mastum commented Apr 30, 2018

@CR-Ryan I do not know if it's a coincidence, maybe nbd knows why, but I have updated to r6787, set option lecacy to 0 and disabled Flow Offload.
For now I have no problems for 22 hours with 10 clients connected.

@CR-Ryan
Copy link
Author

CR-Ryan commented Apr 30, 2018

@Mafesa - I just tried "option legacy_rates '0' (and "1"), and neither seemed to fix the issue for me. I hadn't heard of Flow Offload before, and couldn't find anything on the wiki about it. Is that a wireless option?

@Tom-Brouwer
Copy link

Tom-Brouwer commented May 1, 2018

@CR-Ryan I think he means this: https://forum.lede-project.org/t/xiaomi-wifi-router-3g/5377/696 . Its a firewall feature. I continue to have the problems, whether its on or off. Although I had it switched on in my most recent tests. If I have time, I'll try it in combination with legacy_rates 0 soon, to see if it has any effect for me...

@CR-Ryan
Copy link
Author

CR-Ryan commented May 2, 2018

@Tom-Brouwer Thanks for that - it looks like that feature is off by default, but I set it to 0 just to be safe. Made no difference, issues persist. I just noticed the config you posted was missing this option:

option disassoc_low_ack '0'

Have you tried running with that set to zero?

@mastum
Copy link

mastum commented May 2, 2018

For now I have no problems for 22 hours with 10 clients connected.

This morning router reboot itself and now is very unstable, 2.4 GHz it's not usable.

Have you tried running with that set to zero?

@CR-Ryan @Tom-Brouwer

Do not waste time, it's useless.

Does anyone remember when the problems started?
With which commit?

@slthomason
Copy link

slthomason commented May 2, 2018 via email

@mastum
Copy link

mastum commented May 3, 2018

I'll try to figure out where the problem is, I start testing the commits since April 12th.

@Tom-Brouwer
Copy link

@CR-Ryan I'm not sure, but I think I tried option disassoc_low_ack '0' before. I'll try to test that this weekend.

@slthomason For me, this issue also occurs with HTMODE set to 20...

@mastum
Copy link

mastum commented May 4, 2018

I tried HT20 and HT40 with no results, the only way for a stable 2.4 GHz connection is to set up "Lecacy" with LuCI and disable HT mode.

My stable config for 2.4

config wifi-device 'radio0'
	option type 'mac80211'
	option hwmode '11g'
	option path 'pci0000:00/0000:00:00.0/0000:01:00.0'
	option channel '1'
	option country '00'
	option legacy_rates '1'

@slthomason
Copy link

slthomason commented May 4, 2018 via email

@CR-Ryan
Copy link
Author

CR-Ryan commented May 4, 2018

@Tom-Brouwer Yeah, HT20 has the same issues. We have been fighting these issues for over a year now.

@Mafesa The legacy option didn't do anything for us, but setting the HTmode to none does fix it.

@mastum
Copy link

mastum commented May 4, 2018

@CR-Ryan
Sorry I explained myself badly...
The legacy option isn't option legacy_rates '0' but is LuCI option that set HTmode to none
legacy

@mastum
Copy link

mastum commented May 4, 2018

We have been fighting these issues for over a year now.

I have this problem only from April 12th.

@CR-Ryan
Copy link
Author

CR-Ryan commented May 7, 2018

@Mafesa Ok - thx for clarifying on the legacy option

We first started using MT7603 hardware in April of last year, and had problems from day 1. The HTmodes didn't work correctly back then either. It is definitely better now, but it is still not 100% stable on 7603. We also tested the stock firmware provided by the manufacturer of our equipment, and the drivers they are using work fine. The issue is they are on a really old kernel, so we can't use their firmware.

@slthomason
Copy link

@nbd168 - see my comments in #152. But I believe these are the same issues. Can we implement the same checks and behavior for mt7603?

@jsantala
Copy link

Yes, I'm also using mt7603e (I think) on VoCore2 and I have packet loss issues (using only adhoc), so if the fix for #152 by @nbd163 also affects mt7603 it would be great news!

@nbd168
Copy link
Member

nbd168 commented May 18, 2018

I ran some tests, MT7603 does not have the same issue

@slthomason
Copy link

slthomason commented May 18, 2018 via email

@nbd168
Copy link
Member

nbd168 commented May 18, 2018

You could make a pcap of the issue in action, with a separate device for monitoring

@slthomason
Copy link

slthomason commented May 18, 2018 via email

@CR-Ryan
Copy link
Author

CR-Ryan commented May 18, 2018

@nbd168 - Here is our first stab at a packet capture. The tcpdump below was taken while a device was in a "bad state". It was connected to the AP, but couldn't browse or ping the AP.

tcpdump -vv host android-7638b9f93e1c93d0 -i wlan0 > /etc/tcpdump

13:25:49.688688 ARP, Ethernet (len 6), IPv4 (len 4), Reply is-at 78:a3:51:2d:a1:26 (oui Unknown), length 28
13:25:50.779824 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has tell android-7638b9f93e1c93d0.lan, length 28
13:25:50.779965 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has tell android-7638b9f93e1c93d0.lan, length 28
13:25:50.780138 ARP, Ethernet (len 6), IPv4 (len 4), Reply is-at 78:a3:51:2d:a1:26 (oui Unknown), length 28
13:25:51.698146 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has tell android-7638b9f93e1c93d0.lan, length 28
13:25:51.698265 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has tell android-7638b9f93e1c93d0.lan, length 28
13:25:51.698455 ARP, Ethernet (len 6), IPv4 (len 4), Reply is-at 78:a3:51:2d:a1:26 (oui Unknown), length 28
13:25:52.697837 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has tell android-7638b9f93e1c93d0.lan, length 28
(This output repeated itself several times over)

I only know of two ways to capture packets: tcpdump and wireshark. Is there a better way for us to do this? I know wireshark can capture radio layer packets, but that requires "monitor mode", which I have not been able to enable. I don't think my devices wireless driver supports it.

Please let us know if there is any other info we can grab for you - we are anxious to fix this - we have thousands of customers impacted by this bug. We can easily reproduce this when using any htmode that is not "NONE". (So HT20/40) We are totally willing to spin up remote access to both the router AND the affected device during one of these issues if that would help you.

We know you are busy, and so we appreciate any time you can spend on this. In fact, we would be more than happy to buy you pizza + a sixer of your favorite beer if you can fix this. Seriously. : )

@nbd168
Copy link
Member

nbd168 commented May 19, 2018

Only real monitor mode captures are helpful for debugging this. The output should be a real .pcap file.

By the way, I've pushed some more fixes for MT7603 and MT7628, please test the latest version from mt76.git (not pushed to OpenWrt yet).

@Rising-Sun
Copy link

Rising-Sun commented May 20, 2018

The problem is persistent with driver version d87e4b0 and 792dbe0.
Testet with Xiaomi mir3g as router and iPhone 7 as client. (Additionally an iPhone 5s and a android box are connected with the router but usually with 5Ghz)

Thanks for the hard work.

@intriguedlife
Copy link

intriguedlife commented May 20, 2018

Hello,

I also experience similar problems with MT7603. The issues start immediately after connecting. As has been mentioned; these problems go away when setting htmode to none. I am using a Xiaomi 3G router with openwrt master and the latest mt76 driver: 792dbe0.

I have captured WLAN frames using the WLAN card of a second device in monitor mode:

mt7603.pcapng is a capture when htmode is set to HT40 and running a speedtest. Initially there is no internet connection, after disconnecting from the AP and reconnecting a connection is possible but the speeds fluctuate and then go down to no connectivity yet again.

mt7603_htmode_none.pcapng is a capture when htmode is set to none and running the same speedtest. I experienced no connectivity or stability problems at all.

There are a lot of "Null function" frames present in both captures. I am no wifi expert, but this looks weird to me.

Data packets are often supplied to the packet capture mechanism, by default, as "fake" Ethernet packets, synthesized from the 802.11 header; you don't see the real 802.11 link-layer header.
https://wiki.wireshark.org/CaptureSetup/WLAN#Data_Packets

(you can use the following display filter with wireshark to hide view obstructing beacon frames: wlan.fc.type_subtype != 0x08 )

captures.zip

Hopefully these captures help to solve these issues.

@CR-Ryan
Copy link
Author

CR-Ryan commented May 22, 2018

We just pulled the latest OpenWRT code for 18.06, and are having build issues. Both wireless radios won't come up, and we get some nasty error messages. Is anyone else seeing this?

https://github.com/openwrt/mt76/issues/173

@jsantala
Copy link

VoCore2 with mt7603e running OpenWrt SNAPSHOT, r7050-9c409cb:
--- 192.168.12.1 ping statistics ---
172 packets transmitted, 30 packets received, 82% packet loss
round-trip min/avg/max = 55134.079/62646.031/67986.789 ms
Nothing in the logs as far as I can tell. The hosts are right next to each other. The ping will at first drop all packets, but eventually start getting some packets through with huge round-trip values.

@jsantala
Copy link

Ah, ok, it seems to happen with other wifi too, tried with RT5370 against RTL8192CU, at first anyway:
--- 192.168.13.1 ping statistics ---
85 packets transmitted, 40 packets received, 52% packet loss
round-trip min/avg/max = 2950.277/4330.621/5806.895 ms

But wait, there's more, after I took mt7602e interfaces down (ifconfig down wlan0) and waited a bit, the results on the other usb wifi devices got better, at least part of the time:
--- 192.168.13.1 ping statistics ---
115 packets transmitted, 107 packets received, 6% packet loss
round-trip min/avg/max = 1.689/657.916/2421.064 ms

The RT5270 host had some of this in the logs though:
Mon May 28 08:42:08 2018 kern.warn kernel: [ 607.710597] ieee80211 phy1: rt2800usb_txdone: Warning - Data pending for entry 7 in queue 2

So I swapped the RT5270 for a RTL8188CUS and it seems to talk to RTL8192CU quite ok right away:
--- 192.168.13.1 ping statistics ---
101 packets transmitted, 101 packets received, 0% packet loss
round-trip min/avg/max = 2.281/115.664/385.810 ms

Then going back to the mt7603e to mt7603e I get very little again.

All devices being tested at the same adhoc network:
phy#1
Interface wlan1
ifindex 5
wdev 0x100000002
addr 00:13:ef:50:13:a0
ssid vad-B9gRYA
type IBSS
channel 11 (2462 MHz), width: 20 MHz (no HT), center1: 2462 MHz
txpower 20.00 dBm
phy#0
Interface wlan0
ifindex 6
wdev 0x2
addr b8:d8:12:67:68:19
ssid vad-B9gRYA
type IBSS
channel 11 (2462 MHz), width: 20 MHz (no HT), center1: 2462 MHz

I also have batman set up among all the interfaces, but for these test I used the interfaces directly.

@jsantala
Copy link

Just saw this immediately after reboot:
[ 20.982146] wlan0: Created IBSS using preconfigured BSSID 02:50:45:4c:xx:xx
[ 20.989291] wlan0: Creating new IBSS network, BSSID 02:50:45:4c:xx:xx
[ 21.033805] ------------[ cut here ]------------
[ 21.038801] WARNING: CPU: 0 PID: 9 at backports-2017-11-01/net/mac80211/ibss.c:1087 ieee80211_get_vht_mask_from_cap+0x1784/0x1a9c [mac80211]
[ 21.051603] Modules linked in: pppoe ppp_async cdc_mbim rtl8192cu rtl8192c_common rtl_usb rt2800usb rt2800lib rndis_host pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE huawei_cdc_ncm cdc_subset cdc_ncm cdc_ether xt_time xt_tcpudp xt_nat xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD usbserial usbnet slhc rtlwifi rt2x00usb rt2x00lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack mt76x2e mt7603e mt76 mac80211 iptable_mangle iptable_filter ip_tables crc_itu_t crc_ccitt cdc_wdm cdc_acm ledtrig_usbport batman_adv libcrc32c cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common
[ 21.124061] ip6table_mangle ip6table_filter ip6_tables x_tables mmc_block usb_storage mtk_sd mmc_core leds_gpio ohci_platform ohci_hcd ehci_platform sd_mod scsi_mod ehci_hcd gpio_button_hotplug usbcore nls_base usb_common crc16 mii crc32c_generic crypto_hash
[ 21.147421] CPU: 0 PID: 9 Comm: kworker/u2:1 Not tainted 4.14.43 #0
[ 21.153893] Workqueue: phy0 ieee80211_ibss_leave [mac80211]
[ 21.159563] Stack : 874edbf0 874ecb80 00000000 8004e5b4 87c22ed4 8041b8e7 803c98ec 00000009
[ 21.168066] 803c9858 87ccfacc 874edbf0 8004f290 00000000 00000001 87ccfaa8 0844f86a
[ 21.176563] 00000000 00000000 80430a70 000000b5 00000000 31313230 7362695f 656c5f73
[ 21.185054] 65656569 00000000 00000006 31323038 00000000 00000000 8759210c 875d9dd0
[ 21.193555] 00000009 0000043f 874edbf0 874ecb80 00000002 801d1ce8 00000000 80470000
[ 21.202053] ...
[ 21.204535] Call Trace:
[ 21.207048] [<8000e384>] show_stack+0x74/0x104
[ 21.211576] [<800244f0>] __warn+0x110/0x118
[ 21.215823] [<8002458c>] warn_slowpath_null+0x1c/0x30
[ 21.221091] [<8759210c>] ieee80211_get_vht_mask_from_cap+0x1784/0x1a9c [mac80211]
[ 21.228808] ---[ end trace c2db3b5d0024ee5e ]---

@CR-Ryan
Copy link
Author

CR-Ryan commented May 29, 2018

Update:
For the last week we have been having issues described here.

Fortunately, a fix was pushed earlier today and wireless is working correctly again. However, we are still seeing instability on 7603, when using any htmodes. This is on the latest commit. (792dbe0)

@jsantala
Copy link

I also confirm that on openwrt commit 6c81c27 things are back to "normal" - mt7603e wifi works somewhat, but not enough to maintain a stable ssh connection, for example, even with htmode NONE. I run IBSS only with batman on top. If I switch to known-good USB sticks everything works ok while keeping everything else the same.

@WBINVD
Copy link

WBINVD commented Jun 1, 2018

I find the output of tcpdump on wlan0 interesting. When the wireless interface becomes unstable, EAPOL and DHCP work fine, but mDNS and ARP only work in one direction - from the wireless client to the rest of the network.

The ARP respsonses from the LAN network do not show up on the wlan0 tcpdump but do on br-lan. Something is dropping the ARP traffic, possibly before the wireless driver can process it.

I tested this on an MT7620 which I believe uses the rt2800pci driver and not mt76. Setting htmode to none didn't help in my case.

@lukasz1992
Copy link

I found out, that IEEE80211_TX_STAT_ACK can be set even with MT_TXS0_ACK_TIMEOUT set.

I don't know if repairing it could help, but could you try with replacing

if (!fixed_rate && !ack_timeout)
sta->ampdu_acked++;
info->flags |= IEEE80211_TX_STAT_ACK;

With

if (!fixed_rate && !ack_timeout) {
sta->ampdu_acked++;
info->flags |= IEEE80211_TX_STAT_ACK;}

?

@jsantala
Copy link

jsantala commented Jun 6, 2018

Current state at openwrt 7590c3c is that if I have just one VoCore2 mt7603e device in my mesh (adhoc with batman on top, HT NONE) things seem to be quite ok, at least initially. However, if I add another mt7603e device things get hairy. New device won't work at all and the one next to it that was ok just a moment ago goes bad as well. I can have many devices in the same mesh with non-mt76 chips, but only one mt7603e at a time. Don't know if this helps at all, but that's my observation as of 7590c3c.

@jsantala
Copy link

jsantala commented Jun 11, 2018

This is what happens when I have a ping against 8.8.8.8 and a second mt7603e device joins the same adhoc+batman network:
64 bytes from 8.8.8.8: seq=24 ttl=56 time=11.771 ms
64 bytes from 8.8.8.8: seq=25 ttl=56 time=11.807 ms
64 bytes from 8.8.8.8: seq=26 ttl=56 time=14.153 ms
64 bytes from 8.8.8.8: seq=27 ttl=56 time=39.726 ms
64 bytes from 8.8.8.8: seq=28 ttl=56 time=1080.775 ms
64 bytes from 8.8.8.8: seq=29 ttl=56 time=2680.045 ms
64 bytes from 8.8.8.8: seq=30 ttl=56 time=3522.285 ms
64 bytes from 8.8.8.8: seq=31 ttl=56 time=4724.123 ms
64 bytes from 8.8.8.8: seq=32 ttl=56 time=5309.847 ms

The pings will get even higher and I have reboot the first device before it works again:
64 bytes from 8.8.8.8: seq=176 ttl=56 time=11824.587 ms
64 bytes from 8.8.8.8: seq=179 ttl=56 time=11968.502 ms
64 bytes from 8.8.8.8: seq=180 ttl=56 time=11353.341 ms
64 bytes from 8.8.8.8: seq=182 ttl=56 time=12228.260 ms

After reboot that single mt7603e device is happy again - as long as it's the only one:
64 bytes from 8.8.8.8: seq=91 ttl=56 time=14.940 ms
64 bytes from 8.8.8.8: seq=92 ttl=56 time=12.170 ms
64 bytes from 8.8.8.8: seq=93 ttl=56 time=12.416 ms
64 bytes from 8.8.8.8: seq=94 ttl=56 time=11.646 ms
64 bytes from 8.8.8.8: seq=96 ttl=56 time=27.127 ms
64 bytes from 8.8.8.8: seq=97 ttl=56 time=11.264 ms

@laoshaw
Copy link

laoshaw commented Jul 25, 2018

from my study into mt7603en it seems this is not purely a software/driver issue, but a hardware one, in short, this 2.4Ghz chip itself is not robust, a router replacement(that does not use mt7603en for 2.4Ghz) is the only dependable fix.

@slthomason
Copy link

slthomason commented Jul 25, 2018 via email

@laoshaw
Copy link

laoshaw commented Aug 30, 2018

running newest 18.06.1 on zbt1326, 2.4Ghz will still fail often, with htmode set to either HT20 or none(legacy 11bg), to the point not usable, use legacy alone(11bg) will not give me stable connection. another non-mt7603e 2.4G router running 18.06.1 stays connected 24x7 all the time.

@nbd168
Copy link
Member

nbd168 commented Aug 30, 2018

Which git revision are you running?

@laoshaw
Copy link

laoshaw commented Aug 30, 2018

For openwrt 18.06 git:
commit 159a52e1c2d0889bbb137c42df0062e7df24cac3
Author: Giuseppe Lippolis giu.lippolis@gmail.com
Date: Sun Aug 26 10:52:27 2018 +0200
comgt: increase timeout on runcommands

for mt76:
commit 14580aaf81c692b0f54b4e7aa003f20eeb8705f6
Author: Felix Fietkau nbd@nbd.name
Date: Wed Aug 22 12:31:55 2018 +0200

@laoshaw
Copy link

laoshaw commented Aug 30, 2018

I draw a live graph to monitor /proc/net/wireless and ping at the PC in parallel, in the background I download some data from the router constantly to generate the traffic for my testing. My router is zbt1326 which has mt7603en for 2.4G.

Final testing result:
On a laptop with ath9k the connection to mt76-AP is stable, it lost network a few times while wifi is still active and it will recover the network in about 4 minutes. Not perfect but acceptable.
On a PC with bcm4313 running ubuntu18.04 which loads 'wl' driver by default, it's very unstable, network will disconnect once a while, sometimes even wifi will also disconnect, most of the time I have to restart network to get things back. Switching to 'brcmsmac' driver dramatically stabilized all these.
Note, the same 'wl' driver worked well with openwrt running ath9k 2.4Ghz.

For both STAs they lose network 3~4 time per day(wifi signal stays, just no network), roughly 1 of them will never resume the network connection, I had to re-connect to the Openwrt to get both wifi associated and network working.

In short, mt7603en AP has issues when 'wl' STA is used, it works well with ath9k STA. 'wl' STA works well with atk9k AP however.

@nbd168
Copy link
Member

nbd168 commented Sep 29, 2018

Please test the latest version from OpenWrt master or the 18.06 branch.

@laoshaw
Copy link

laoshaw commented Sep 29, 2018

with brcmsmac which had not even a single issue for weeks with a tplink C7 running 18.06, I just upgraded the newest 18.06 build to a zbt1326 and connected via the same brcmsmac STA, the network(not the wireless) is gone after 20 minutes(downloading traffic at the background via 2.4Ghz), and it does not recover(for another 20 minutes). the wifi speed is 5Mbps with wget for a 50MB image file.

on another ath9k STA laptop now it's running 40 minutes and the connection stays solid, with downloads at 24Mbps via the same 2.4Ghz(channel 11 which is not crowded here), everything else as far as network configuration etc is the same for both test scenarios, only difference is the STA.

In short, zbt1326 still is not robust, especially when I used it with brcmsmac STA, it is probably worse now, as in previous tests I did not lose network this quickly and if it's lost, most of the time it can recover.

@nbd168
Copy link
Member

nbd168 commented Sep 30, 2018

I need to know, which of the last commits made it worse for you. Please try this:

Edit package/kernel/mt76/Makefile, set PKG_SOURCE_VERSION to these values, one by one, in order, and run make (no make clean required) and test again with the brcmsmac sta after each change:
27af7a570f8eb71b66e98dab8e5a0f6683f292c1
497c30431b8635b6865ef80d72cc89784257c9ca
6e1898d60a780c9d89dff9cbb3569f267db13e21
980c60666eb57daa144d8725712ba303e876e677
7daf9621ada28638831beca3073ee3e2b8e6609e

Please stop at the first hash value that makes it work again, and let me know which one that was.

Thanks

@laoshaw
Copy link

laoshaw commented Oct 3, 2018

run each for 2+ hours, not much difference for them: all lost network(not wireless) 2~3 times, all caused reboot one or two times(not sure this is totally wireless related though), 6e18(first four digits in the hash) had one case that failed to self-recover network connection. all download speed is about 6Mbps on average(which is much slower comparing to ath9k STA). For 7daf I got a kernel hang within 10 minutes and I ended up power-cycling the device, also with 7daf my PC/STA can no longer have wireless work, I had to reboot my PC and reload it with "modprobe wl", looks like the STA has some issues too, but I never had problems with other openwrt routers(e.g. tplink) in the last few years while using the same PC.

I will hook a serial cable and do more tests later, the above tests are preliminary. The STA is a ubuntu 18.04 with brcmsmac driver.

Using wl driver instead of brcmsmac(since brcmsmac was totally dead once and ubuntu 18.04 default to wl anyways), I captured the first kernel log:

[ 8060.898725] CPU 0 Unable to handle kernel paging request at virtual address 07406000, epc == 801078cc, ra == 80316aec
[ 8060.909331] Oops[#1]:
[ 8060.911597] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.72 #0
[ 8060.917578] task: 8055c9c0 task.stack: 8054a000
[ 8060.922084] $ 0   : 00000000 00000001 00000000 81498600
[ 8060.927296] $ 4   : 8055a1b8 00000001 00000001 07406000
[ 8060.932505] $ 8   : 00000d46 00000d45 8d4fdc5c 0000040b
[ 8060.937713] $12   : 18d6b384 6c088197 ffffffff 40000000
[ 8060.942922] $16   : 8fc20e00 01080020 8e838000 8fc20e00
[ 8060.948131] $20   : 00000001 00000000 00000000 00000b12
[ 8060.953339] $24   : 0e0b3000 00000190
[ 8060.958548] $28   : 8054a000 8fc09df0 0000000f 80316aec
[ 8060.963757] Hi    : 00000587
[ 8060.966617] Lo    : 0000000f
[ 8060.969511] epc   : 801078cc kmem_cache_alloc+0x128/0x17c
[ 8060.974904] ra    : 80316aec __alloc_skb+0x74/0x180
[ 8060.979752] Status: 11008403 KERNEL EXL IE
[ 8060.983923] Cause : 40800008 (ExcCode 02)
[ 8060.987907] BadVA : 07406000
[ 8060.990770] PrId  : 0001992f (MIPS 1004Kc)
[ 8060.994839] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt7603e mt76 mac80211 iptable
_nat ipt_REJECT ipt_MASQUERADE ebtable_nat ebtable_filter ebtable_broute cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_mult
iport xt_mark xt_mac xt_limit xt_iprange xt_conntrack xt_connlabel xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD x
t_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_ta
ble_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink nf_conntrack iptable_mangle ip
table_filter ip_tables ebtables ebt_vlan ebt_stp ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_among ebt_802_$
 crc_ccitt compat arptable_filter arpt_mangle arp_tables xt_set ip_set_list_set
[ 8061.065842]  ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_$
ash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port$
p_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_manglei$
6table_filter ip6_tables x_tables tun mmc_block mtk_sd mmc_core leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_bu$
ton_hotplug usbcore nls_base usb_common
[ 8061.112211] Process swapper/0 (pid: 0, threadinfo=8054a000, task=8055c9c0, tls=00000000)
[ 8061.120258] Stack : 00000003 805c0000 805c0000 805c0000 01080020 00000100 01080020 80316aec
[ 8061.128591]         8f7a9b18 8f7a9570 81495520 8f7a9b18 8f7a9560 00000001 000000a4 01080020
[ 8061.136923]         00000000 8f121880 00000001 00001c48 00000000 803817d0 8f6a3b40 80560000
[ 8061.145253]         8fc09e50 00000000 00000000 8f121880 8e721200 000005a8 0000fe88 80391fec
[ 8061.153583]         8f7a95e8 8f7a9560 81495520 00000040 8f7a95a0 8f788ed0 8f121d98 00010001
[ 8061.161916]         ...
[ 8061.164351] Call Trace:
[ 8061.166788] [<801078cc>] kmem_cache_alloc+0x128/0x17c
[ 8061.171821] [<80316aec>] __alloc_skb+0x74/0x180
[ 8061.176350] [<803817d0>] sk_stream_alloc_skb+0x80/0x17c
[ 8061.181557] [<80391fec>] tcp_write_xmit+0xb38/0x1104
[ 8061.186497] [<80394258>] tcp_tsq_handler.part.13+0x1bc/0x1cc
[ 8061.192129] [<803944c8>] tcp_tasklet_func+0x134/0x194
[ 8061.197171] [<80032744>] tasklet_action+0x104/0x1d0
[ 8061.202029] [<80451358>] __do_softirq+0x128/0x2ec
[ 8061.206708] [<80032b34>] irq_exit+0xac/0xc8
[ 8061.210896] [<8023be6c>] plat_irq_dispatch+0xfc/0x138
[ 8061.215931] [<8000b5e8>] except_vec_vi_end+0xb8/0xc4
[ 8061.220875] [<8000cfb0>] r4k_wait_irqoff+0x1c/0x24
[ 8061.225667] [<800666ac>] do_idle+0xe4/0x168
[ 8061.229833] [<80066928>] cpu_startup_entry+0x24/0x2c
[ 8061.234776] [<80583bf0>] start_kernel+0x484/0x4a4
[ 8061.239460] Code: 00000000  8e020014  00e23821 <8ce20000> 10000009  cc400000  1040ffbd  00000000  8e060010
[ 8061.249183]
[ 8061.250877] ---[ end trace 482c422d8c80c16a ]---
Thu Oct  4 13:57[ 8061.257911] Kernel panic - not syncing: Fatal exception in interrupt
[ 8061.267188] Rebooting in 3 seconds..

This is with 7daf962 build, I have seen similar reboot in the past so it might not be that sensitive to these SHA1 checkpoints. The reboot seems triggered by the download-over-wifi-2.4Ghz-zbt1326. I have 3 ZBT1326 routers all have the same issues.

second kernel reboot log, same software:

CPU 1 Unable to handle kernel paging request at virtual address 07406000, epc == 80108bd8, ra == 80108abc
[ 3814.110985] Oops[#1]:
[ 3814.113258] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.72 #0
[ 3814.119238] task: 8fc43e80 task.stack: 8fc64000
[ 3814.123744] $ 0   : 00000000 00000001 00000000 814a56b0
[ 3814.128957] $ 4   : 8055a1b8 00000001 00000001 07406000
[ 3814.134171] $ 8   : 0003cbe2 0003cbe1 00000000 00000000
[ 3814.139381] $12   : 00ffffff 052add91 005191f0 00000000
[ 3814.144592] $16   : 8fc02a00 01090220 8f270000 803188b0
[ 3814.149804] $20   : 00000800 000005a8 8f093100 01080020
[ 3814.155014] $24   : 00000000 80376d40                  
[ 3814.160224] $28   : 8fc64000 8fc0d8e8 0000002d 80108abc
[ 3814.165434] Hi    : 00000002
[ 3814.168299] Lo    : 00000001
[ 3814.171202] epc   : 80108bd8 __kmalloc_track_caller+0x1d4/0x228
[ 3814.177097] ra    : 80108abc __kmalloc_track_caller+0xb8/0x228
[ 3814.182905] Status: 11007c03	KERNEL EXL IE 
[ 3814.187078] Cause : 40800008 (ExcCode 02)
[ 3814.191064] BadVA : 07406000
[ 3814.193928] PrId  : 0001992f (MIPS 1004Kc)
[ 3814.197998] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE ebtable_nat ebtable_filter ebtable_broute cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_iprange xt_conntrack xt_connlabel xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink nf_conntrack iptable_mangle iptable_filter ip_tables ebtables ebt_vlan ebt_stp ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_among ebt_802_3 crc_ccitt compat arptable_filter arpt_mangle arp_tables xt_set ip_set_list_set
[ 3814.269007]  ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables tun mmc_block mtk_sd mmc_core leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 3814.315383] Process swapper/1 (pid: 0, threadinfo=8fc64000, task=8fc43e80, tls=00000000)
[ 3814.323434] Stack : 0000002d 800e46bc 80560000 00000004 8fc9e6b8 00000000 01080020 80316a14
[ 3814.331773]         8057a960 8041e500 ff7d9040 8ee3836c 8fc9e6b8 000005a8 00000000 00000140
[ 3814.340107]         00000801 803188b0 8e742798 8fc9e6b8 8ee3aaf4 8e742808 8e742798 8fc9e6b8
[ 3814.348439]         000005a8 8fc9e6b8 ffdd7888 00000801 000005a8 8f093100 8fc9f380 8031a650
[ 3814.356774]         fffffff0 8032daf8 00000010 00000003 8fc9e6b8 8055a1b8 8fc1e000 ffdd7888
[ 3814.365110]         ...
[ 3814.367548] Call Trace:
[ 3814.369997] [<80108bd8>] __kmalloc_track_caller+0x1d4/0x228
[ 3814.375577] [<80316a14>] __kmalloc_reserve.isra.7+0x40/0xa4
[ 3814.381133] [<803188b0>] pskb_expand_head+0x88/0x318
[ 3814.386075] [<8031a650>] __pskb_pull_tail+0x90/0x3f4
[ 3814.391027] [<8032df20>] validate_xmit_skb+0x2d4/0x334
[ 3814.396147] [<8032e8f0>] __dev_queue_xmit+0x688/0x85c
[ 3814.401188] [<80374c10>] ip_finish_output2+0x258/0x2e8
[ 3814.406303] [<80376d98>] ip_output+0x58/0xc8
[ 3814.410557] [<80391400>] __tcp_transmit_skb+0xa58/0xb0c
[ 3814.415763] [<803921cc>] tcp_write_xmit+0xd18/0x1104
[ 3814.420704] [<803925f4>] __tcp_push_pending_frames+0x3c/0xc0
[ 3814.426340] [<8038d4c8>] tcp_rcv_established+0x79c/0x824
[ 3814.431638] [<803977b4>] tcp_v4_do_rcv+0x98/0x1d0
[ 3814.436323] [<8039a2c8>] tcp_v4_rcv+0x8ac/0xd70
[ 3814.440832] [<80371430>] ip_local_deliver_finish+0x120/0x184
[ 3814.446469] [<803719d8>] ip_local_deliver+0x78/0xdc
[ 3814.451329] [<80371c9c>] ip_rcv+0x260/0x2e4
[ 3814.455517] [<803294e8>] __netif_receive_skb_core+0xa94/0xc5c
[ 3814.461240] [<8032f384>] netif_receive_skb_internal+0xd8/0xf0
[ 3814.466970] [<804200c0>] br_pass_frame_up+0xe8/0x154
[ 3814.471917] [<804206b0>] br_handle_frame_finish+0x52c/0x570
[ 3814.477468] [<80420a24>] br_handle_frame+0x330/0x3dc
[ 3814.482419] [<803291dc>] __netif_receive_skb_core+0x788/0xc5c
[ 3814.488137] [<8032bfb0>] process_backlog+0x98/0x160
[ 3814.492996] [<8032f7f0>] net_rx_action+0x150/0x30c
[ 3814.497769] [<80451358>] __do_softirq+0x128/0x2ec
[ 3814.502470] [<80032b34>] irq_exit+0xac/0xc8
[ 3814.506659] [<8023be6c>] plat_irq_dispatch+0xfc/0x138
[ 3814.511698] [<8000b5e8>] except_vec_vi_end+0xb8/0xc4
[ 3814.516642] [<8000cfb0>] r4k_wait_irqoff+0x1c/0x24
[ 3814.521439] [<800666ac>] do_idle+0xe4/0x168
[ 3814.525604] [<80066928>] cpu_startup_entry+0x24/0x2c
[ 3814.530543] Code: 00000000  8e020014  00e23821 <8ce20000> 10000009  cc400000  1040ffbd  00000000  8e060010 
[ 3814.540270] 
[ 3814.541939] ---[ end trace 908d93d068c072b3 ]---
[ 3814.548430] Kernel panic - not syncing: Fatal exception in interrupt
[ 3814.556350] Rebooting in 3 seconds..

Another one, this one is a default built, means I used the default profile without any other packages/modules, I used 5 scp streams(download and upload) to stress zbt1326 instead of the one-way-wget method, this crash happens fast.

[ 2737.258544] CPU 1 Unable to handle kernel paging request at virtual address 07406000, epc == 801078cc, ra == 8004daa4
[ 2737.273004] Oops[#1]:
[ 2737.275305] CPU: 1 PID: 2624 Comm: sh Not tainted 4.14.72 #0
[ 2737.280962] task: 8fdc44c0 task.stack: 8e106000
[ 2737.285486] $ 0   : 00000000 00000001 00000000 814a5670
[ 2737.290726] $ 4   : 805521b8 00000001 00000001 07406000
[ 2737.295966] $ 8   : 0000654b 0000654a 00ef5000 ffffff80
[ 2737.301198] $12   : 7fbef648 77ff02c0 00000000 00000000
[ 2737.306437] $16   : 8fc02e00 014000c0 8e4a0000 00000000
[ 2737.311681] $20   : 00000000 8fdc44c0 00000000 00000000
[ 2737.316921] $24   : 00000000 77fa8ee0                  
[ 2737.322165] $28   : 8e106000 8e107dd0 00000000 8004daa4
[ 2737.327404] Hi    : 0000001b
[ 2737.330286] Lo    : 0000005b
[ 2737.333202] epc   : 801078cc kmem_cache_alloc+0x128/0x17c
[ 2737.338619] ra    : 8004daa4 prepare_creds+0x28/0x90
[ 2737.343573] Status: 11007c03	KERNEL EXL IE 
[ 2737.347779] Cause : 40800008 (ExcCode 02)
[ 2737.351784] BadVA : 07406000
[ 2737.354663] PrId  : 0001992f (MIPS 1004Kc)
[ 2737.358749] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables mmc_block mtk_sd mmc_core leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 2737.427655] Process sh (pid: 2624, threadinfo=8e106000, task=8fdc44c0, tls=77ff1dc0)
[ 2737.435361] Stack : 80578340 00000000 80578348 00000000 8fe65140 8fdc44c0 8e9dc000 8004daa4
[ 2737.443696]         00000000 80107780 00000000 800f0b6c 00000003 8fe65140 00000012 8004e004
[ 2737.452033]         00000000 00000000 00000000 8fe65140 8fe65140 8fe65140 00000012 8002c204
[ 2737.460369]         8fc02c00 00000000 00000020 8012b384 8f9d88f8 00000001 8e107e98 805b0000
[ 2737.468704]         00000000 8010ee74 8e107ef0 801277b4 5bb6a9e0 00000004 8fd3b460 00000020
[ 2737.477045]         ...
[ 2737.479496] Call Trace:
[ 2737.481960] [<801078cc>] kmem_cache_alloc+0x128/0x17c
[ 2737.487004] [<8004daa4>] prepare_creds+0x28/0x90
[ 2737.491604] [<8004e004>] copy_creds+0x80/0x12c
[ 2737.496030] [<8002c204>] copy_process.part.9+0x288/0x151c
[ 2737.501410] [<8002d62c>] _do_fork+0xe0/0x304
[ 2737.505660] [<8002d8b0>] sys_fork+0x24/0x30
[ 2737.509843] [<80019578>] syscall_common+0x34/0x58
[ 2737.514525] Code: 00000000  8e020014  00e23821 <8ce20000> 10000009  cc400000  1040ffbd  00000000  8e060010 
[ 2737.524250] 
[ 2737.536706] ---[ end trace 7af6a177123a7a5d ]---
Thu Oct  4 19:01[ 2737.544003] Kernel panic - not syncing: Fatal exception
:37 2018 kern.al[ 2737.552099] Rebooting in 3 seconds..

Now this becomes a zbt1326 problem, it keeps rebooting:

[   10.137800] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[   10.152558] procd: - early -
[   10.155568] procd: - watchdog -
[   10.824599] procd: - watchdog -
[   10.828081] procd: - ubus -
[   10.887068] procd: - init -
Please press Enter to activate this console.
[   11.141841] kmodloader: loading kernel modules from /etc/modules.d/*
[   11.152186] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   11.490098] Loading modules backported from Linux version wt-2017-11-01-0-gfe248fc2c180
[   11.498119] Backport generated by backports.git v4.14-rc2-1-31-g86cf0e5d
[   11.506655] ip_tables: (C) 2000-2006 Netfilter Core Team
[   11.829156] nf_conntrack version 0.5.0 (8192 buckets, 32768 max)
[   12.041060] xt_time: kernel timezone is -0000
[   12.311561] bus=0x2, slot = 0x1, irq=0xff
<reboot loop>

@laoshaw
Copy link

laoshaw commented Oct 6, 2018

Use ath9k STA for overnight iperf3 stressing test, one reboot and one self-recover network connection observed, otherwise the connection remains solid, the 7603e wireless problem is more related to broadcom STAs in my case.

When using Archer C7 18.06 router with both STAs(bcm43xx and ath9k), they are 100% solid for 10+ hours, no loss of connection, no reboot,etc

@nbd168
Copy link
Member

nbd168 commented Jan 26, 2019

Should be fixed in current versions

@nbd168 nbd168 closed this as completed Jan 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests