Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#2176 - ubiquiti loco xw (AR9342 Rev.2) stops receiving on wireless #7016

Open
openwrt-bot opened this issue Mar 10, 2019 · 7 comments
Open
Labels
flyspray kernel

Comments

@openwrt-bot
Copy link

@openwrt-bot openwrt-bot commented Mar 10, 2019

dangowrt:

This bug has been existing as long as the ubiquiti loco xw 5 ghz hardware is around, probably what we are dealing with is a hardware bug in the AR9342 Rev. 2 chip. It is somehow hard to reproduce, but it hits us reliable every couple of hours, sometimes days, running any version of OpenWrt up to todays master branch. It just happened again and this time I decided to check if the bug is actually listed on FS -- in addition to creating the usual cron-job executing iw dev wlan0 scan every minute. There is even a watchdog in the community libremesh repository designed to catch exactly this bug:
https://github.com/libremesh/lime-packages/blob/master/packages/cotonete/Makefile#L21

So to run into it, here we got 2 ubiquiti nanobeam m5 devices running OpenWrt ar71xx/generic loco-m-xw pointing at each other over a distance of roughly 2km. This link is acceptable, but not perfect and slightly asymmetric.

Device A (worse RX SNR):

Station f0:9f:c2:xx:xx:7a (on wlan0-mesh)
inactive time: 0 ms
rx bytes: 1433997029
rx packets: 8589913
tx bytes: 23160025785
tx packets: 15667682
tx retries: 677924
tx failed: 0
rx drop misc: 30076
signal: -76 [-79, -79] dBm
signal avg: -74 [-77, -77] dBm
Toffset: 4887897799 us
tx bitrate: 43.3 MBit/s MCS 10 short GI
rx bitrate: 43.3 MBit/s MCS 4 short GI
expected throughput: 24.536Mbps
mesh llid: 0
mesh plid: 0
mesh plink: ESTAB
mesh local PS mode: ACTIVE
mesh peer PS mode: ACTIVE
mesh non-peer PS mode: ACTIVE
authorized: yes
authenticated: yes
associated: yes
preamble: long
WMM/WME: yes
MFP: yes
TDLS peer: no
DTIM period: 2
beacon interval:100
connected time: 18117 seconds

Device B (better RX SNR)

Station fc:ec:da:xx:xx:8c (on wlan0-mesh)
inactive time: 20 ms
rx bytes: 23442553203
rx packets: 16047526
tx bytes: 1274688573
tx packets: 8251479
tx retries: 1337700
tx failed: 2932
rx drop misc: 45195
signal: -72 [-79, -73] dBm
signal avg: -71 [-78, -73] dBm
Toffset: 18446744068821653812 us
tx bitrate: 57.8 MBit/s MCS 11 short GI
rx bitrate: 43.3 MBit/s MCS 10 short GI
last ack signal:24 dBm
expected throughput: 24.536Mbps
mesh llid: 0
mesh plid: 0
mesh plink: ESTAB
mesh local PS mode: ACTIVE
mesh peer PS mode: ACTIVE
mesh non-peer PS mode: ACTIVE
authorized: yes
authenticated: yes
associated: yes
preamble: long
WMM/WME: yes
MFP: yes
TDLS peer: no
DTIM period: 2
beacon interval:100
connected time: 18179 seconds

Now it so happens that device A (and always only device A!) becomes deaf after some hours of mostly sending lots of traffic to device B. It will continue to send beacons, but loose all associations. device B will keep trying setting up a link, but it keeps ending up in 'BLOCKED' state and dumps using a monitor mode interface show that device A simply doesn't react at all to any of the frames send by device B. A simple iw dev wlan0 scan on device A (which doesn't give any results) fixes the problem.

The channel seems rather unused otherwise and signal quality only varies by weather conditions. Interestingly this seems to happen on non-DFS channels only. And it happens on Ad-Hoc mode (unencrypted, never tried encrypted) and 802.11s (open ie. setup via iw tool as well as with SAE ie. running wpa_supplicant) equally. It doesn't happen on all nodes, but only on those with rather bad signal or at least one far-off neighbor.

Maybe related to FS#1246

I saw this occuring on ubnt nanostation loco m5 XW as well as on all nanobeam m5 variants (which is supposedly compatible with the loco-m-xw image).

ieee80211 phy0: Atheros AR9340 Rev:2 mem=0xb8100000, irq=47

WiFi EEPROM of the devices:

* 00001000 02 02 F0 9F C2 XX XX XX 00 30 3a 31 35 3a 36 64 |.....XXX.0:15:6d| 00001010 3a 64 64 3a 64 65 3a 61 64 00 00 00 00 00 1f 00 |:dd:de:ad.......| 00001020 33 01 00 00 00 00 04 00 00 00 2d 04 03 00 08 ff |3.........-.....| 00001030 20 01 00 00 00 20 02 00 00 cc cc 0c 00 50 01 50 | .... .......P.P| 00001040 01 50 01 00 00 00 00 00 00 21 00 a4 00 00 00 00 |.P.......!......| 00001050 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001060 0e 0e 03 00 2c e2 00 02 0e 1c e0 e0 00 0c e0 e0 |....,...........| 00001070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001080 00 00 00 00 00 00 00 00 00 00 70 89 ac 00 00 00 |..........p.....| 00001090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000010c0 00 00 00 70 ac 70 89 ac 70 89 ac 70 89 ac 22 22 |...p.p..p..p..""| 000010d0 22 22 22 22 22 22 22 22 20 1c 22 22 20 1c 22 22 |"""""""" ."" .""| 000010e0 20 1c 24 24 20 18 16 14 20 16 14 12 20 20 1c 14 | .$$ ... ... ..| 000010f0 24 24 20 18 16 14 20 16 14 12 20 20 1c 14 24 24 |$$ ... ... ..$$| 00001100 20 18 16 14 20 16 14 12 20 20 1c 14 22 22 1e 16 | ... ... ..""..| 00001110 14 12 1e 14 12 10 20 20 1c 14 22 22 1e 16 14 12 |...... ..""....| 00001120 1e 14 12 10 20 20 1c 14 22 22 1e 16 14 12 1e 14 |.... ..""......| 00001130 12 10 20 20 1c 14 11 12 15 17 41 42 45 47 31 32 |.. ......ABEG12| 00001140 35 37 70 75 ac b8 70 75 ac b8 70 75 ac b8 70 75 |57pu..pu..pu..pu| 00001150 ac b8 70 75 ac b8 70 75 ac b8 70 75 ac b8 70 75 |..pu..pu..pu..pu| * 00001170 ac b8 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c |..<|<|<|<|<|<|<|| 00001180 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c |<|<|<|<|<|<|<|<|| * 000011a0 3c 7c 10 01 00 00 22 22 02 00 00 00 00 00 00 00 |<|....""........| 000011b0 00 00 00 00 00 00 44 00 00 00 00 00 00 ff 00 00 |......D.........| 000011c0 00 00 00 00 00 00 00 00 00 00 00 00 ff 0e 0e 03 |................| 000011d0 00 2d e2 00 02 0e 1c 00 00 00 00 00 00 00 00 00 |.-..............| 000011e0 00 00 00 00 00 00 00 00 00 44 44 00 00 00 00 00 |.........DD.....| 000011f0 00 00 00 00 00 00 00 4c 58 68 8c a4 b4 bd cd d9 |.......LXh......| 00001200 00 89 00 00 00 dc 00 89 00 00 00 e0 00 8a 00 00 |................| 00001210 00 e2 00 8b 00 00 00 de 00 8b 00 00 00 de 00 8b |................| 00001220 00 00 00 dc 00 89 00 00 00 da 00 8b 00 00 00 e0 |................| 00001230 00 89 00 00 00 e4 00 8a 00 00 00 e7 00 8b 00 00 |................| 00001240 00 e6 00 8b 00 00 00 e2 00 8c 00 00 00 e1 00 8c |................| 00001250 00 00 00 df 00 8b 00 00 00 dd 00 8b 00 00 00 00 |................| 00001260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4c |...............L| 00001290 54 68 78 8c a0 b4 c5 4c 54 68 78 8c a0 b4 c5 4c |Thx....LThx....L| 000012a0 54 68 78 8c a0 b4 c5 26 20 1e 1c 26 20 1e 1c 26 |Thx....& ..& ..&| 000012b0 20 1e 1c 26 20 1e 1c 26 20 1e 1c 26 20 1e 1c 26 | ..& ..& ..& ..&| 000012c0 20 1e 1c 26 20 1e 1c 26 22 20 1e 1c 1a 20 1e 1c | ..& ..&" ... ..| 000012d0 1a 00 00 00 00 26 22 20 1e 1c 1a 20 1e 1c 1a 00 |.....&" ... ....| 000012e0 00 00 00 26 22 20 1e 1c 1a 20 1e 1c 1a 00 00 00 |...&" ... ......| 000012f0 00 26 22 20 1e 1c 1a 20 1e 1c 1a 00 00 00 00 26 |.&" ... .......&| 00001300 22 20 1e 1c 1a 20 1e 1c 1a 00 00 00 00 26 22 20 |" ... .......&" | 00001310 1e 1c 1a 20 1e 1c 1a 00 00 00 00 26 22 20 1e 1c |... .......&" ..| 00001320 1a 20 1e 1c 1a 00 00 00 00 26 22 20 1e 1c 1a 20 |. .......&" ... | 00001330 1e 1c 1a 00 00 00 00 26 22 20 1e 1c 1a 20 1e 1c |.......&" ... ..| 00001340 1a 00 00 00 00 26 22 20 1e 1c 1a 20 1e 1c 1a 00 |.....&" ... ....| 00001350 00 00 00 26 22 20 1e 1c 1a 20 1e 1c 1a 00 00 00 |...&" ... ......| 00001360 00 26 22 20 1e 1c 1a 20 1e 1c 1a 00 00 00 00 26 |.&" ... .......&| 00001370 22 20 1e 1c 1a 20 1e 1c 1a 00 00 00 00 26 22 20 |" ... .......&" | 00001380 1e 1c 1a 20 1e 1c 1a 00 00 00 00 26 22 20 1e 1c |... .......&" ..| 00001390 1a 20 1e 1c 1a 00 00 00 00 26 22 20 1e 1c 1a 20 |. .......&" ... | 000013a0 1e 1c 1a 00 00 00 00 10 16 18 40 46 48 30 36 38 |..........@FH068| 000013b0 4c 54 68 78 8c a0 b9 cd 4c 54 68 78 8c a0 b9 cd |LThx....LThx....| * 000013f0 4c 54 68 78 8c a0 b9 cd 3c 7c 3c 7c 3c 7c 3c 7c |LThx....<|<|<|<|| 00001400 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c 3c 7c |<|<|<|<|<|<|<|<|| * 00001440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| *
@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Mar 10, 2019

dangowrt:

I'm speculating that bgscan may indirectly prevent it from occurring (and thereby may have hidden it from QA)

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented May 3, 2019

sumpfralle:

Does this issue occur on the master or the client side of the connection?

(I am just curious, since we are having a similar issue - but only on the master side)

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Jun 5, 2019

psyborg:

to rule out ath9k DFS code, try building image with DFS flags disabled/removed and then run it on some DFS channel

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 7, 2019

xback:

Is there any warning in dmesg when this occurs?
Does it still occur in latest 19.07 or master?

Thanks

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 31, 2020

dangowrt:

Still occurs with 19.07.4, on both DFS and non-DFS channels running Ad-Hoc, 802.11s/mesh modes (but probably with AP as well, didn't ever test that though).
The LibreMesh folks made a new workaround called 'wifi-unstuck-wa' which also mentions this happening on AP interfaces.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Dec 13, 2020

ewtoombs:

I have observed this same bug on the TP-Link Archer C7 v2, running OpenWrt 19.07.4 r11208-ce6496d796. The SoC is a QCA9558 ver 1 rev 0. The wifi radio is an AR9550 rev. 0 and it also uses the ath9k module. Issuing a ubus call network restart fixes it.

I've also observed days of good performance followed by a sudden onset of severely degraded performance (~10% packet loss). The above ubus call fixes it too.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Jun 24, 2021

argonym:

Possible workaround and related discussions: freifunk-gluon/gluon#2114

@aparcar aparcar added the kernel label Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray kernel
Projects
None yet
Development

No branches or pull requests

2 participants