-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In some rare RF conditions, no traffic on IBSS connections #18
Comments
I think this might be a supplicant bug. I saw this message in other testing scenarios: �[32m[349951.140828] �[0m�[33msta5002�[0m: 04:f0:21:67:53:95 authenticate with f8:a0:97:44:94:55 at: 1520282299.350687 The supplicant does not know how to work around the EEXIST errno (-17). Please manually hack in the check for EEXIST. This patch will not apply to upstream supplicant, and I have not re-spun it quite yet, but I think it will be enough for you to make the change: |
Have you been able to test with supplicant patched as suggested? |
The Linux 4.9 kernel, at least, can return EEXIST when trying to auth a station that already exists. We have seen this bug in multiple places, but it is difficult to reproduce. Here is a link to someone else that appears to have hit this issue: greearb/ath10k-ct#18 Signed-off-by: Ben Greear <greearb@candelatech.com>
I added the suggested patch. I'm now using latest 4.9 driver, mac80211 backported wt-2017-01-31-0-ge882dff19e7 (same as LEDE-17.01), wpa_supplicant 2016-12-19 (same as LEDE-17.01) and it still does not work. Here is the log: [ 11.740000] ath10k 4.9 driver, optimized for CT firmware, probing pci device: 0x3c. |
Some additionnal information: I'm building the driver without DFS support (CONFIG_PACKAGE_ATH_DFS in OpenWrt) |
There are no obvious errors in the logs, though there is one failed-to-transmit message. In this particular case that generated the logs, you are seeing no pkts on air from the ath10k system? Have you tried testing the same setup without encryption to see if the problem is related to encryption? In the case you just reported, does the problem always happen on this system, or is it transient? And finally, please show me output of: (Adjust for your phy name) |
It seems to be related to the encryption. Broadcast traffic is exchanged unencrypted successfully but unicast traffic is sent & received encrypted and does not seem to be decrypted. It always happens on these systems in this particular location (I guess there are some specific RF conditions in this area). I already changed all the HW several times (electronics, antenna, cables) and I always got the same result. This is working fine using official ath10k + 802.11s +WPA2-SAE cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_regs
|
Those register values must have been taken when the node was inactive? I need to see them And, what you describe sounds like key setting issues, which I have previously seen, but in the cases I have debugged, it appears the stack and/or supplicant was doing the wrong thing. I now suspect the previous 'dmesg' logs were also from a case that was not actively failing? I need to see as complete of a boot log as possible for the case that is actually failing in order to debug this. |
[ 0.000000] Linux version 3.18.92 (tpateloup@build) (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 5.3.0b1-10-gcfc70d75da) ) #209 Fri May 25 11:12:05 CEST 2018 |
Is that log from a working case, or a broken case? If you restart that same node a few times, will it sometimes work and sometimes not? If so, maybe it would be helpful if you could provide one log of broken and one log of the working case so I can compare. Please clearly note any symptoms you see (or do not see) when posting logs so that I know what to look for. I see no obvious errors in the log you posted above. |
This is always broken on this node. In fact it always happens to any node placed on some specific area. Station 06:f0:21:31:5c:14 (on adhoc0) On broken nodes, there is always an asymetric number of streams between RX & TX |
And moreover it works fine using ath10k + 802.11s +WPA-SAE on these nodes |
You could send me the fw_regs from when the radio is running, that might possibly show some In the past, similar issues have been related to the on-air packet size. The firmware might send a frame 16 bytes too large or too small, for instance. I have previously debugged this by sending frames of a known size and then looked on-air to see if they were the correct size or not. You might also be able to grab a packet sniff and use wireshark to decode it and look for errors. If you can somehow reproduce this in a lab (maybe using attenuators?) then it would be a lot easier to debug and fix. |
It seems to happen when there is a mismatch on the NSS. For example: side B: |
Hello! This looks quite like the same issue we faced and reported in dec 2016. @oamouroux After some time, "inactive time:" switches to a normal state, but no traffic can be sent. |
Hello, @oamouroux and @greearb . We've jist triggered this issue in our lab again.
This issue look's like a beacon interval related, but I can't explain why. Also I found a bug in kernel mac80211 code and I want to make a commit later. Best regards, |
Hello, Ben! @greearb if we look into fwstats, there will be a strange behavior. This is fwstats PEES stats log below.
|
We also noticed that the first peer should not exist at all.
ip addr shows 3a:ed is the first two octets of the interface mac address and the last two octets of phantom peer. printf %x\n 53063 It's a reverse order of last two octets from ibss interface mac. So may be there are some bugs with pointers inside the firmware. I don't know. |
The stats parsing in ath10k is complex and bug prone. The issue could easily be in the driver as well. Do you see this weird first peer only in cases where you are having connectivity issues, or does it happen all the time? And, the 'beacon int' that fixes this is sending around 100 beacons per second, or did I mis-understand that? |
We see this phantom peer all the time, even with AP interface without any sta connected. For example:
ap_5g_1_1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 root@OpenWrt:~# printf %x/n 45976
You understand that 100% right. Well, I don;'t want to say "Fixed", because we see a lot of overruns in dmesg. [23211.095047] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon But traffic can pass normally through many IBSS hops without visible issues and throughput drops like in this report. Also, as we can see, this bug can be reproduced with both 10.1 and 10.4 firmwares in IBSS mode. |
So that peer thing is probably not related to the IBSS bug. Please open a new bug for the peer stats problem. For the IBSS problem, I cannot think of any reason why sending beacons more often would fix this particular bug report so I don't have any new ideas on how to fix this more properly in the firmware or driver. @klukonin Earlier you mentioned you forced the system to 2x2 NSS as part of your fix. Are both the 2x2 force and the beacon hack needed, or will one or the other by itself mitigate the problem? |
I can confirm that max_nss = 2 with beacon_int = 10 helps. Ok. I will open a new bug report |
Hello @oamouroux and @greearb He have 100% working recipe of how to trigger this issue and how to avoid it.
So may be this issue is not beacon related, but TSF related. I remember a lot of weired things in ath10k and firmware about TSF. |
Also, this can be interesting for you, With original 10.2.6 firmware we see how in mesh mode sometimes TSF offset can raise up to 34 days right after we power on our lab. |
The Linux 4.9 kernel, at least, can return EEXIST when trying to auth a station that already exists. We have seen this bug in multiple places, but it is difficult to reproduce. Here is a link to someone else that appears to have hit this issue: greearb/ath10k-ct#18 Signed-off-by: Ben Greear <greearb@candelatech.com>
The Linux 4.9 kernel, at least, can return EEXIST when trying to auth a station that already exists. We have seen this bug in multiple places, but it is difficult to reproduce. Here is a link to someone else that appears to have hit this issue: greearb/ath10k-ct#18 Signed-off-by: Ben Greear <greearb@candelatech.com>
In specific field only RF conditions, some WPA2 IBSS links can be established but no traffic is received
OpenWRT Chaos Calmer, Kernel 3.18.45, mac80211 backported from v4.4-rc5-1913-gc8fdf68, driver ath10k-ct 4.7 2f3171d, fW-020-64bb715
This is also reproducible on ath10k-ct 4.7 1df6b26, fw 18
Hardware (NIC chipset, platform, etc)
QCA9882
Logs (dmesg, maybe supplicant and/or hostap)
dmesg.txt
The text was updated successfully, but these errors were encountered: