Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#4098 - MESH-SAE-AUTH-BLOCKED #9082

Open
openwrt-bot opened this issue Oct 20, 2021 · 3 comments
Open

FS#4098 - MESH-SAE-AUTH-BLOCKED #9082

openwrt-bot opened this issue Oct 20, 2021 · 3 comments
Labels

Comments

@openwrt-bot
Copy link

@openwrt-bot openwrt-bot commented Oct 20, 2021

nemesisdev:

  • Device problem occurs on: reported by multiple users on [[https://github.com/libremesh/lime-packages/issues/837|different devices]], I am using [[http://www.win-star.com/en_us/product/WS_WN552K1_WN552K2_WN552K3.html|a mediatek based one]]

  • I am experiencing this on current master, revision r2857+4-9d994f35b4

  • Steps to reproduce: it randomly occurs some times when the root node of a mesh using plain 802.11s (mesh mode) + SAE/PSK2 authentication is rebooted (or a power outage), in order to replicate it, one would have to keep on rebooting aggressively until it happens. Maybe turning off and on wifi may be able to replicate it as well

What happens?

Some times, the devices in a mesh can't connect each other after a power outage or a reboot of the root node (the node which is connected to the gateway and allows the rest of the mesh to connect to the internet).

Log lines:

Oct 20 13:04:40 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:04:47 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:04:59 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:05:01 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:05:11 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:05:12 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:05:24 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:05:24 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-BLOCKED addr=*0:3f:5d:::1a duration=300
Oct 20 13:05:26 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:05:26 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-BLOCKED addr=*0:3f:5d:::1b duration=300

When this happens, the links show up in "iw mesh0 station dump" or "iw mesh1 station dump" but in BLOCKED state.

Rebooting the nodes which have their link blocked at the same time fixes the issue, which seems to rule out an interference issue, because how can a reboot fix an interference issue?

I also tried setting "cell_density '1'" in the configuration of the radios, but the problem keep happening, it doesn't happen often, but when it happens it can wreak havoc.

The mesh configuration is the following:

config wifi-device 'radio0'
option type 'mac80211'
option channel '11'
option hwmode '11g'
option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
option htmode 'HT20'
option disabled '0'
option log_level '0'
option legacy_rates '0'
option country 'US'
option cell_density '1'

config wifi-device 'radio1'
option type 'mac80211'
option hwmode '11a'
option path '1e140000.pcie/pci0000:00/0000:00:01.0/0000:02:00.0'
option htmode 'VHT80'
option disabled '0'
option log_level '0'
option channel '40'
option country 'US'
option cell_density '1'

config wifi-iface 'wifi_mesh0'
option device 'radio0'
option ifname 'mesh0'
option mode 'mesh'
option encryption 'psk2+ccmp'
option key ''
option mesh_id '
'
option network 'lan'
option mesh_fwding '1'
option mesh_rssi_threshold '-80'

config wifi-iface 'wifi_mesh1'
option device 'radio1'
option ifname 'mesh1'
option mode 'mesh'
option encryption 'psk2+ccmp'
option key ''
option mesh_id '
'
option network 'lan'
option mesh_fwding '1'
option mesh_rssi_threshold '-80'

config wifi-iface 'wifi_wlan0'
option device 'radio0'
option ifname 'wlan0'
option mode 'ap'
option encryption 'psk2'
option key ''
option ssid '
'
option network 'lan'
option ieee80211r '1'
option ft_psk_generate_local '1'
option rsn_preauth '1'
option reassociation_deadline '20000'
option ft_over_ds '1'

config wifi-iface 'wifi_wlan1'
option device 'radio1'
option ifname 'wlan1'
option mode 'ap'
option encryption 'psk2'
option key ''
option ssid '
'
option network 'lan'
option ieee80211r '1'
option ft_psk_generate_local '1'
option rsn_preauth '1'
option reassociation_deadline '20000'
option ft_over_ds '1'

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Oct 20, 2021

nemesisdev:

The exact commit of my OpenWrt master build is ade56b8d9e.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Dec 8, 2021

Steve-Newcomb:

We have this problem too. It occurs in two of our three meshes. It is much more frequent lately. I do not know whether it is merely coincidental that we recently upgraded from 21.01 to 21.02.

My current solution is to maintain a pair of openssh tunnels between each dhcp server (in which gw_mode='server') and each client (in which gw_mode='client'). If a dhcp server finds itself with no clients that are (still?) in contact with it, it reboots. If a client finds itself with no dhcp server that is (still?) in contact with it, it reboots. It's a ridiculously heavy solution which is a lot of trouble to set up in a secure manner, but it has the advantage that each node can detect whether it is in contact with the node(s) with which it has one or more critical relationships.

I suspect this problem is actually a driver issue. These are all Archer [CA]7 v [245] routers (affordable!) with QCA "wave1" radios. I haven't been able to use the -CT (Candela Technologies) driver for those radios in a mesh; perhaps I haven't understood the advice I've received about that, or perhaps the advice just doesn't work. Therefore, I have to use the stock (QCA) driver's inherent 802.11s implementation, which has quirks. For example, it always fails, usually with hours or minutes, if I have tweaked the radio's built-in MAC address. Therefore, I suspect the QCA firmware may be insufficiently hardened against the depredations of real-world environments.

On the other hand, this could be a real OpenWRT bug. I have no explanation as to why it is suddenly so much more frequent. If anyone can suggest debugging instrumentation that I haven't already tried, I'll be grateful for the advice.

@openwrt-bot
Copy link
Author

@openwrt-bot openwrt-bot commented Dec 20, 2021

EelcoV:

Currently I am not using openwrt, but I have/had a similar issue. This had to do with "too" many clients trying to connect to the mesh peer at the same time. It then also got into the PLINK_BLOCKED state.

First of all, I removed setting the PLINK_BLOCKED state when authentication fails several times (couldn't find it in the ieee802.11 standard anyway...). Then I noticed a lot of "anti-clogging" messages (see also chapter 12.4.6 in ieee802.11 standard). This mechanism will start sending tokens along with frames to reduce the number of peers which are allowed to perform authentication at the same time. This then led to peers getting blocked because they were not allowed to authenticate.

Maybe you can check your logs for this kind of messages; Also, when you try to reproduce the issue, make sure you have a lot of peers (I had to have more than 5 peers...)

I have posted my original issue here, maybe this helps to get more insight into the issue. http://lists.infradead.org/pipermail/hostap/2021-December/040095.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant