Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[23.05-SNAPSHOT] filogic: WPA2+WPA3 Unable to connect periodically #13156

Open
1 task done
soxrok2212 opened this issue Jul 25, 2023 · 60 comments
Open
1 task done

[23.05-SNAPSHOT] filogic: WPA2+WPA3 Unable to connect periodically #13156

soxrok2212 opened this issue Jul 25, 2023 · 60 comments
Labels
bug issue report with a confirmed bug release/23.05 pull request/issue targeted (also) for OpenWrt 23.05 release target/mediatek pull request/issue for mediatek target

Comments

@soxrok2212
Copy link

soxrok2212 commented Jul 25, 2023

Describe the bug

Running recent builds, I'm sometimes unable to connect with WPA2+WPA3. On 22.03 snapshots, this was never an issue. Resolving requires restarting the radios on the AP. Tested with wpad-mesh-openssl and wpad-mesh-mbedtls on MT76 XDR-6086. Looks like a loop of this:

Mon Jul 24 23:36:26 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
Mon Jul 24 23:36:26 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 2)

Mostly has been seen on iPhone with iOS 16 and MacBook with 13.4.1. Haven't verified with other devices as sometimes others can connect without issue while one fails to connect.

OpenWrt version

r23290-339e71cbd3

OpenWrt target/subtarget

mediatek/filogic

Device

TP-Link TL-XDR6086

Image kind

Self-built image

Steps to reproduce

Toggle the radio on/off a few times.

Actual behaviour

Clients cannot authenticate with the AP.

Expected behaviour

Clients connect to the AP without issue.

Additional info

No response

Diffconfig

CONFIG_TARGET_mediatek=y
CONFIG_TARGET_mediatek_filogic=y
CONFIG_TARGET_mediatek_filogic_DEVICE_tplink_tl-xdr6086=y
CONFIG_BATMAN_ADV_BATMAN_V=y
CONFIG_BATMAN_ADV_BLA=y
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_MCAST=y
CONFIG_HTOP_LMSENSORS=y
CONFIG_LIBCURL_COOKIES=y
CONFIG_LIBCURL_FILE=y
CONFIG_LIBCURL_FTP=y
CONFIG_LIBCURL_HTTP=y
CONFIG_LIBCURL_MBEDTLS=y
CONFIG_LIBCURL_NGHTTP2=y
CONFIG_LIBCURL_NO_SMB="!"
CONFIG_LIBCURL_PROXY=y
CONFIG_LIBCURL_UNIX_SOCKETS=y
CONFIG_NFS_KERNEL_SERVER_V4=y
CONFIG_OPENSSL_ENGINE=y
CONFIG_OPENSSL_WITH_ASM=y
CONFIG_OPENSSL_WITH_CHACHA_POLY1305=y
CONFIG_OPENSSL_WITH_CMS=y
CONFIG_OPENSSL_WITH_DEPRECATED=y
CONFIG_OPENSSL_WITH_ERROR_MESSAGES=y
CONFIG_OPENSSL_WITH_IDEA=y
CONFIG_OPENSSL_WITH_MDC2=y
CONFIG_OPENSSL_WITH_PSK=y
CONFIG_OPENSSL_WITH_SEED=y
CONFIG_OPENSSL_WITH_SRP=y
CONFIG_OPENSSL_WITH_TLS13=y
CONFIG_OPENSSL_WITH_WHIRLPOOL=y
CONFIG_PACKAGE_adblock=y
CONFIG_PACKAGE_batctl-default=y
CONFIG_PACKAGE_block-mount=y
CONFIG_PACKAGE_ca-certificates=y
CONFIG_PACKAGE_cgi-io=y
CONFIG_PACKAGE_coreutils=y
CONFIG_PACKAGE_coreutils-sort=y
CONFIG_PACKAGE_curl=y
CONFIG_PACKAGE_ddns-scripts=y
CONFIG_PACKAGE_ddns-scripts-services=y
CONFIG_PACKAGE_ethtool-full=y
CONFIG_PACKAGE_hd-idle=y
CONFIG_PACKAGE_htop=y
CONFIG_PACKAGE_iperf3=y
CONFIG_PACKAGE_iptables-mod-ipopt=y
CONFIG_PACKAGE_iptables-nft=y
CONFIG_PACKAGE_kmod-asn1-decoder=y
CONFIG_PACKAGE_kmod-asn1-encoder=y
CONFIG_PACKAGE_kmod-batman-adv=y
CONFIG_PACKAGE_kmod-crypto-arc4=y
CONFIG_PACKAGE_kmod-crypto-cbc=y
CONFIG_PACKAGE_kmod-crypto-cts=y
CONFIG_PACKAGE_kmod-crypto-ecb=y
CONFIG_PACKAGE_kmod-crypto-kpp=y
CONFIG_PACKAGE_kmod-crypto-lib-chacha20=y
CONFIG_PACKAGE_kmod-crypto-lib-chacha20poly1305=y
CONFIG_PACKAGE_kmod-crypto-lib-curve25519=y
CONFIG_PACKAGE_kmod-crypto-lib-poly1305=y
CONFIG_PACKAGE_kmod-crypto-user=y
CONFIG_PACKAGE_kmod-dax=y
CONFIG_PACKAGE_kmod-dm=y
CONFIG_PACKAGE_kmod-dnsresolver=y
CONFIG_PACKAGE_kmod-fs-exportfs=y
CONFIG_PACKAGE_kmod-fs-nfs=y
CONFIG_PACKAGE_kmod-fs-nfs-common=y
CONFIG_PACKAGE_kmod-fs-nfs-common-rpcsec=y
CONFIG_PACKAGE_kmod-fs-nfs-v4=y
CONFIG_PACKAGE_kmod-fs-nfsd=y
CONFIG_PACKAGE_kmod-ifb=y
CONFIG_PACKAGE_kmod-ipt-core=y
CONFIG_PACKAGE_kmod-ipt-ipopt=y
CONFIG_PACKAGE_kmod-keys-encrypted=y
CONFIG_PACKAGE_kmod-keys-trusted=y
CONFIG_PACKAGE_kmod-lib-crc16=y
CONFIG_PACKAGE_kmod-nf-ipt=y
CONFIG_PACKAGE_kmod-nft-compat=y
CONFIG_PACKAGE_kmod-oid-registry=y
CONFIG_PACKAGE_kmod-random-core=y
CONFIG_PACKAGE_kmod-sched-cake=y
CONFIG_PACKAGE_kmod-sched-core=y
CONFIG_PACKAGE_kmod-scsi-core=y
CONFIG_PACKAGE_kmod-tpm=y
CONFIG_PACKAGE_kmod-udptunnel4=y
CONFIG_PACKAGE_kmod-udptunnel6=y
CONFIG_PACKAGE_kmod-usb-storage=y
CONFIG_PACKAGE_kmod-wireguard=y
CONFIG_PACKAGE_libblkid=y
CONFIG_PACKAGE_libcurl=y
CONFIG_PACKAGE_libdevmapper=y
CONFIG_PACKAGE_libiperf3=y
CONFIG_PACKAGE_libiptext=y
CONFIG_PACKAGE_libiptext-nft=y
CONFIG_PACKAGE_libiptext6=y
CONFIG_PACKAGE_libkeyutils=y
CONFIG_PACKAGE_liblua=y
CONFIG_PACKAGE_liblucihttp=y
CONFIG_PACKAGE_liblucihttp-lua=y
CONFIG_PACKAGE_liblucihttp-ucode=y
CONFIG_PACKAGE_libmount=y
CONFIG_PACKAGE_libncurses=y
CONFIG_PACKAGE_libnghttp2=y
CONFIG_PACKAGE_libopenssl=y
CONFIG_PACKAGE_libopenssl-conf=y
CONFIG_PACKAGE_libpcap=y
CONFIG_PACKAGE_libqrencode=y
CONFIG_PACKAGE_librt=y
CONFIG_PACKAGE_libsmartcols=y
CONFIG_PACKAGE_libtirpc=y
CONFIG_PACKAGE_libubus-lua=y
CONFIG_PACKAGE_libustream-mbedtls=m
CONFIG_PACKAGE_libustream-openssl=y
CONFIG_PACKAGE_libuuid=y
CONFIG_PACKAGE_libwolfssl=y
CONFIG_PACKAGE_libwrap=y
CONFIG_PACKAGE_libxtables=y
CONFIG_PACKAGE_lsblk=y
CONFIG_PACKAGE_lua=y
CONFIG_PACKAGE_luci=y
CONFIG_PACKAGE_luci-app-adblock=y
CONFIG_PACKAGE_luci-app-ddns=y
CONFIG_PACKAGE_luci-app-firewall=y
CONFIG_PACKAGE_luci-app-hd-idle=y
CONFIG_PACKAGE_luci-app-opkg=y
CONFIG_PACKAGE_luci-app-sqm=y
CONFIG_PACKAGE_luci-base=y
CONFIG_PACKAGE_luci-lib-base=y
CONFIG_PACKAGE_luci-lib-ip=y
CONFIG_PACKAGE_luci-lib-jsonc=y
CONFIG_PACKAGE_luci-lib-nixio=y
CONFIG_PACKAGE_luci-light=y
CONFIG_PACKAGE_luci-lua-runtime=y
CONFIG_PACKAGE_luci-mod-admin-full=y
CONFIG_PACKAGE_luci-mod-network=y
CONFIG_PACKAGE_luci-mod-status=y
CONFIG_PACKAGE_luci-mod-system=y
CONFIG_PACKAGE_luci-proto-batman-adv=y
CONFIG_PACKAGE_luci-proto-ipv6=y
CONFIG_PACKAGE_luci-proto-ppp=y
CONFIG_PACKAGE_luci-proto-wireguard=y
CONFIG_PACKAGE_luci-theme-bootstrap=y
CONFIG_PACKAGE_nfs-kernel-server=y
CONFIG_PACKAGE_nfs-kernel-server-utils=m
CONFIG_PACKAGE_nfs-utils-libs=y
CONFIG_PACKAGE_openssl-util=y
CONFIG_PACKAGE_qrencode=y
CONFIG_PACKAGE_rpcbind=y
CONFIG_PACKAGE_rpcd=y
CONFIG_PACKAGE_rpcd-mod-file=y
CONFIG_PACKAGE_rpcd-mod-iwinfo=y
CONFIG_PACKAGE_rpcd-mod-luci=y
CONFIG_PACKAGE_rpcd-mod-rrdns=y
CONFIG_PACKAGE_rpcd-mod-ucode=y
CONFIG_PACKAGE_sqm-scripts=y
CONFIG_PACKAGE_tc-tiny=y
CONFIG_PACKAGE_tcpdump=y
CONFIG_PACKAGE_terminfo=y
CONFIG_PACKAGE_trusted-firmware-a-mt7986-spim-nand-ddr4=y
CONFIG_PACKAGE_ucode-mod-html=y
CONFIG_PACKAGE_ucode-mod-lua=y
CONFIG_PACKAGE_ucode-mod-math=y
CONFIG_PACKAGE_uhttpd=y
CONFIG_PACKAGE_uhttpd-mod-ubus=y
CONFIG_PACKAGE_wireguard-tools=y
# CONFIG_PACKAGE_wpad-basic-mbedtls is not set
CONFIG_PACKAGE_wpad-mesh-mbedtls=y
CONFIG_PACKAGE_xtables-nft=y
CONFIG_RPCBIND_LIBWRAP=y
CONFIG_RPCBIND_RMTCALLS=y
CONFIG_WOLFSSL_HAS_NO_HW=y

Terms

  • I am reporting an issue for OpenWrt, not an unsupported fork.
@soxrok2212 soxrok2212 added the bug issue report with a confirmed bug label Jul 25, 2023
@brada4
Copy link

brada4 commented Jul 25, 2023

What is in your /etc/config/wireless (minus passwords and AP names)

@soxrok2212
Copy link
Author

soxrok2212 commented Jul 25, 2023

config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/soc/18000000.wifi'
	option channel '1'
	option band '2g'
	option htmode 'HE20'
	option country 'US'
	option cell_density '0'

config wifi-device 'radio1'
	option type 'mac80211'
	option path 'platform/soc/18000000.wifi+1'
	option channel '161'
	option band '5g'
	option htmode 'HE80'
	option country 'US'
	option cell_density '0'

config wifi-iface 'wifinet1'
	option device 'radio0'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option network 'lan'
	option key 'x'
	option ieee80211w '1'

config wifi-iface 'wifinet2'
	option device 'radio0'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option key 'x'
	option network 'iot'

config wifi-iface 'wifinet3'
	option device 'radio0'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option hidden '1'
	option key 'x'
	option network 'security'

config wifi-iface 'wifinet4'
	option device 'radio0'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option key 'x'
	option network 'guest'

config wifi-iface 'wifinet5'
	option device 'radio0'
	option mode 'ap'
	option ssid 'x'
	option encryption 'none'
	option network 'public'
	option isolate '1'

config wifi-iface 'wifinet11'
	option device 'radio1'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed’
	option network 'lan'
	option key 'x'
	option ieee80211w '1'

config wifi-iface 'wifinet12'
	option device 'radio1'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option key 'x'
	option network 'iot'

config wifi-iface 'wifinet13'
	option device 'radio1'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option hidden '1'
	option key 'x'
	option network 'security'

config wifi-iface 'wifinet14'
	option device 'radio1'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option key 'x'
	option network 'guest'

config wifi-iface 'wifinet15'
	option device 'radio1'
	option mode 'ap'
	option ssid 'x'
	option encryption 'none'
	option network 'public'
	option isolate '1'

config wifi-iface 'wifinet10'
	option device 'radio1'
	option mode 'mesh'
	option mesh_id 'x'
	option mesh_fwding '0'
	option mesh_rssi_threshold '0'
	option key 'x'
	option encryption 'sae'
	option network 'batmesh'

config wifi-iface 'wifinet18'
	option device 'radio0'
	option mode 'ap'
	option encryption 'sae-mixed'
	option key 'x'
	option network 'sat1'
	option ssid 'x'
	option hidden '1'

config wifi-iface 'wifinet19'
	option device 'radio1'
	option mode 'ap'
	option encryption 'sae-mixed'
	option key 'x'
	option network 'sat1'
	option ssid 'x'
	option hidden '1'

config wifi-iface 'wifinet20'
	option device 'radio0'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option network 'sat2'
	option key 'x'
	option hidden '1'

config wifi-iface 'wifinet21'
	option device 'radio1'
	option mode 'ap'
	option ssid 'x'
	option encryption 'sae-mixed'
	option key 'x'
	option network 'sat2'
	option hidden '1'

@soxrok2212
Copy link
Author

FWIW, on the device, it reports the PSK is wrong. Restarting the AP works without changing the PSK on the device.

@neheb
Copy link
Contributor

neheb commented Jul 25, 2023

That is a big hint as to what's wrong. I was running a snapshot build recently that caused similar behavior for my iPhone with regular AP mode where after a while it was reporting wrong password. @stintel pointed out this is a hostapd issue.

@brada4
Copy link

brada4 commented Jul 25, 2023

Try setting up 2 access points w same name and pass in place of one mixed AP - one with WPA2-PSK (AES and optional w-protection to match interop minimum security), other with WPA3-SAE, it is more interop that way.
Obviously try to see if observed glitches continue.

@ynezz ynezz changed the title [23.05-rc2] WPA2+WPA3 Unable to connect periodically [23.05-rc2] filogic: WPA2+WPA3 Unable to connect periodically Jul 25, 2023
@ynezz ynezz added target/mediatek pull request/issue for mediatek target release/23.05 pull request/issue targeted (also) for OpenWrt 23.05 release labels Jul 25, 2023
@ynezz
Copy link
Member

ynezz commented Jul 25, 2023

@soxrok2212 recompile images with CONFIG_WPA_MSG_MIN_PRIORITY=0 and enable debug logging, it should hopefully provide more details. I would as well recommend to reproduce the issue with wpad-full-openssl.

@stintel pointed out this is a hostapd issue.

That is probably not that correct and might be misleading, IIRC then he is using snapshots and was pointing fingers at cd804c1 (hostapd: update to 2023-06-22) which is not in 23.05.0-rc2 yet. Reverting that update on main branch fixes the problems.

@stintel
Copy link
Member

stintel commented Jul 25, 2023

he is using snapshots and was pointing fingers at cd804c1 (hostapd: update to 2023-06-22) which is not in 23.05.0-rc2 yet. Reverting that update on main branch fixes the problems.

Correct. Haven't had time to debug that further and probably won't have anytime soon. I have a small co-working space at home and someone is working here for the rest of the month.

@soxrok2212
Copy link
Author

@soxrok2212 recompile images with CONFIG_WPA_MSG_MIN_PRIORITY=0 and enable debug logging, it should hopefully provide more details. I would as well recommend to reproduce the issue with wpad-full-openssl.

Will do. Seems a pretty good trigger is to make a change on a VAP or sometimes even just restarting the radio.

I'm unsure if this is related at all, but I see this in the logs when simply restarting a radio. Seems related to 6603748

Tue Jul 25 07:47:21 2023 daemon.notice netifd: Network device 'phy0-ap3' link is down                              
Tue Jul 25 07:47:21 2023 daemon.notice hostapd: nl80211: Failed to remove interface phy0-ap3 from bridge br: No suc
h device                                                                                                           
Tue Jul 25 07:47:21 2023 daemon.notice hostapd: phy0-ap2: AP-DISABLED                                              
Tue Jul 25 07:47:21 2023 daemon.notice hostapd: phy0-ap2: CTRL-EVENT-TERMINATING                                   
Tue Jul 25 07:47:21 2023 daemon.err hostapd: rmdir[ctrl_interface=/var/run/hostapd]: Permission denied             
Tue Jul 25 07:47:21 2023 daemon.notice netifd: Network device 'phy0-ap2' link is down                              
Tue Jul 25 07:47:21 2023 kern.info kernel: [27778.660701] br: port 6(phy0-ap2) entered disabled state              
Tue Jul 25 07:47:21 2023 kern.info kernel: [27778.709090] device phy0-ap2 left promiscuous mode                    
Tue Jul 25 07:47:21 2023 kern.info kernel: [27778.713799] br: port 6(phy0-ap2) entered disabled state
Tue Jul 25 07:47:22 2023 daemon.notice hostapd: nl80211: Failed to remove interface phy0-ap2 from bridge br: No such device

That is probably not that correct and might be misleading, IIRC then he is using snapshots and was pointing fingers at cd804c1 (hostapd: update to 2023-06-22) which is not in 23.05.0-rc2 yet. Reverting that update on main branch fixes the problems.

Are you positive? Looks like it made it into 23.05-rc2 5 days ago. I'm on the openwrt-23.05 branch using the latest commit.

@PussAzuki
Copy link

@soxrok2212 recompile images with CONFIG_WPA_MSG_MIN_PRIORITY=0 and enable debug logging, it should hopefully provide more details. I would as well recommend to reproduce the issue with wpad-full-openssl.

Is wpad-full-openssl the same thing as wpad-openssl? I found the openwrt wiki The description of them on the wiki is very bad.

I've been manually selecting to wpad-openssl and compiling, and it was working fine until June 3. After I waited until July 9 for mt76 to do a big update, I also started a big update (I did re-pull the source code), and then the wireless key errors and disconnected wireless connection issues started happening more often! I also do use wpa2-psk/wpa3-sae encryption.

In the past few days, I have had tests come out that it wasn't the mt76 that was causing the wireless issues(?). At least rolling back to an older dated package didn't fix the problem.

@soxrok2212
Copy link
Author

@ynezz with CONFIG_WPA_MSG_MIN_PRIORITY=0 and wpad-full-openssl I didn't seem to see anything new when the AP decides to stop working. Same loop of

Mon Jul 24 23:36:26 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
Mon Jul 24 23:36:26 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 2)

@ynezz ynezz changed the title [23.05-rc2] filogic: WPA2+WPA3 Unable to connect periodically [23.05-SNAPSHOT] filogic: WPA2+WPA3 Unable to connect periodically Jul 25, 2023
@ynezz
Copy link
Member

ynezz commented Jul 25, 2023

I'm on the openwrt-23.05 branch using the latest commit.

Ok, thanks for clearing this out, but then you're not using 23.05-rc2 as the issue title would suggest. Then simply git revert 8d7d9aa4a46366 and report back.

with CONFIG_WPA_MSG_MIN_PRIORITY=0 and wpad-full-openssl I didn't seem to see anything new when the AP decides to stop working

perhaps you've missed the enable debug logging part?

I also started a big update (I did re-pull the source code), and then the wireless key errors and disconnected wireless connection issues started happening

Ok, then we're probably looking at the wrong place (hostapd) and should focus on that mt76 updates? Which update broke it, 01885bc ?

@soxrok2212
Copy link
Author

Ok, thanks for clearing this out, but then you're not using 23.05-rc2 as the issue title would suggest. Then simply git revert 8d7d9aa4a46366 and report back.

Yep, that's correct, my mistake.

perhaps you've missed the enable debug logging part?

I totally missed that. Will report back.

@soxrok2212
Copy link
Author

@ynezz I can confirm this bug can be triggered by simply toggling the radio on/off (maybe a few times). Looks like only the 1st message in the handshake is sent before it times out.

Tue Jul 25 10:07:44 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to local deauth request
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authentication OK (open system)
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-AUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, OPEN_SYSTEM)
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:47 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: association OK (aid 1)
Tue Jul 25 10:07:47 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 1)
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-ASSOCIATE.indication(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: binding station to interface 'phy1-ap0'
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: event 1 notification
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: start authentication
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.1X: unauthorizing port
Tue Jul 25 10:07:47 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: sending 1/4 msg of 4-Way Handshake
Tue Jul 25 10:07:48 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: EAPOL-Key timeout
Tue Jul 25 10:07:48 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: sending 1/4 msg of 4-Way Handshake
Tue Jul 25 10:07:49 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: EAPOL-Key timeout
Tue Jul 25 10:07:49 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: sending 1/4 msg of 4-Way Handshake
Tue Jul 25 10:07:50 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: EAPOL-Key timeout
Tue Jul 25 10:07:50 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: sending 1/4 msg of 4-Way Handshake
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: EAPOL-Key timeout
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: PTKSTART: Retry limit 4 reached
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: event 3 notification
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.1X: unauthorizing port
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DEAUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, 15)
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: event 3 notification
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.1X: unauthorizing port
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DEAUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, 3)
Tue Jul 25 10:07:51 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authentication OK (open system)
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-AUTHENTICATE.indication(xx:xx:xx:xx:xx:xx, OPEN_SYSTEM)
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:56 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: association OK (aid 1)
Tue Jul 25 10:07:56 2023 daemon.info hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 1)
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-ASSOCIATE.indication(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx MLME: MLME-DELETEKEYS.request(xx:xx:xx:xx:xx:xx)
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: binding station to interface 'phy1-ap0'
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: event 1 notification
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: start authentication
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx IEEE 802.1X: unauthorizing port
Tue Jul 25 10:07:56 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: sending 1/4 msg of 4-Way Handshake
Tue Jul 25 10:07:57 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: EAPOL-Key timeout
Tue Jul 25 10:07:57 2023 daemon.debug hostapd: phy1-ap0: STA xx:xx:xx:xx:xx:xx WPA: sending 1/4 msg of 4-Way Handshake

@ynezz
Copy link
Member

ynezz commented Jul 25, 2023

@ynezz I can confirm this bug can be triggered by simply toggling the radio on/off (maybe a few times). Looks like only the 1st message in the handshake is sent before it times out.

Does git revert 8d7d9aa4a46366 fix this issue?

@soxrok2212
Copy link
Author

Looks like that commit is the culprit. I can't seem to trigger the bug any more.

@ynezz
Copy link
Member

ynezz commented Jul 25, 2023

Looks like that commit is the culprit. I can't seem to trigger the bug any more.

@PolynomialDivision @dhewg FYI this is about 8d7d9aa (backport of cd804c1)

@PolynomialDivision
Copy link
Member

PolynomialDivision commented Jul 26, 2023

I just tested on latest master with latest hostapd, and I can not reproduce it:
https://github.com/PolynomialDivision/openwrt/tree/hostapd-2023-07-21

config wifi-device 'radio0'
	option type 'mac80211'
	option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
	option channel '1'
	option band '2g'
	option htmode 'HT20'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'OpenWrtTest'
	option encryption 'sae-mixed'
	option key 'test1234567'

config wifi-device 'radio1'
	option type 'mac80211'
	option path '1e140000.pcie/pci0000:00/0000:00:01.0/0000:02:00.0'
	option channel '36'
	option band '5g'
	option htmode 'VHT80'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'OpenWrtTest'
	option encryption 'sae-mixed'
	option key 'test1234567'

Let's see if I can reproduce it on 23.05. Update: I have no issues.

@stintel
Copy link
Member

stintel commented Jul 26, 2023

I just tested on latest master with latest hostapd, and I can not reproduce it:
https://github.com/PolynomialDivision/openwrt/tree/hostapd-2023-07-21

Did a quick test and my phone doesn't seem to be disconnecting with this. Can you open a PR?

@PussAzuki
Copy link

Ok, then we're probably looking at the wrong place (hostapd) and should focus on that mt76 updates? Which update broke it, 01885bc ?

Nope, instead it would not be mt76 as I just have tested it ...

@stintel
Copy link
Member

stintel commented Jul 26, 2023

I just tested on latest master with latest hostapd, and I can not reproduce it:
https://github.com/PolynomialDivision/openwrt/tree/hostapd-2023-07-21

Did a quick test and my phone doesn't seem to be disconnecting with this. Can you open a PR?

Scratch that. Started happening again. I propose we revert the bump that causes the problem until someone can fix it properly?

@dhewg
Copy link
Contributor

dhewg commented Jul 26, 2023 via email

@gca31
Copy link

gca31 commented Jul 27, 2023

@ynezz I can confirm this bug can be triggered by simply toggling the radio on/off (maybe a few times). Looks like only the 1st message in the handshake is sent before it times out.

FYI, on my side I discovered that this issue occurs when the STA devices are switching from a 2.4G wifi connection to a 5G wifi connection. I can reproduce the issue by rebooting the AP. Then all STA devices connect to 2.4G wifi N. And then, when the 5G wifi is available ( approx 1 min after due to DFS) some devices try to switch to 5G wifi AC/AX and boom ! wifi password required on STA devices.
My AP config: BPI-R3 (mt7986) with OpenWrt SNAPSHOT r23610-844bb4bfad / LuCI Master git-23.158.78004-23a246e

@stintel
Copy link
Member

stintel commented Jul 27, 2023

I could bisect the hostapd commit required to fix the issue I ran into

Would be interesting to know how you would do that. We can't really use CONFIG_SRC_TREE_OVERRIDE afaik because of all our patches ...

@PolynomialDivision
Copy link
Member

I guess we could revert it until it is fixed in master? Can someone put also the issue to the hostap mailinglist?

@dhewg
Copy link
Contributor

dhewg commented Jul 28, 2023

Would be interesting to know how you would do that. We can't really use CONFIG_SRC_TREE_OVERRIDE afaik because of all our patches ...

Usually I'd git am the patches/ folder and then go from there, but some hostapd patches are plain diffs, so patch -p1 everything, commit that and rebase on top of a bisect?

openwrt-bot pushed a commit that referenced this issue Aug 19, 2023
Commit e978072baaca ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Fixes: #13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
(cherry picked from commit 3246739)
@gca31
Copy link

gca31 commented Aug 20, 2023

I merged @stintel 's change yesterday to master. Could someone please check if the problem is gone in master, then I will backport it to 23.05.

If mt76 gets fixed we can revert the change to hostapd again.

Tested today with latest OpenWrt SNAPSHOT r23769-324673914d / LuCI Master git-23.223.85458-f7583b6 on Banana-pi BPI-R3 and the issue is still there

@pktpls
Copy link

pktpls commented Aug 20, 2023

Issue still present on avm_fritzbox-4040 ipq40xx/generic - tested just now with snapshots r23782-ac68fbf526 and 23.05-SNAPSHOT r23389-5deed175a5

I thought this Fritzbox had a sudden wifi-death the other day until I found this GH issue. Was about to take a marker and write "broken" on it :)

Happends both with Open and OWE / Enhanced Open, I'm not even using WPA2/WPA3 here.

> uname -r ; wpa_supplicant -v ; nmcli -v ; echo ; dmesg -w
6.4.10-200.fc38.x86_64
wpa_supplicant v2.10
nmcli tool, version 1.42.8-1.fc38
...
[ 9868.029615] wlp3s0: authenticate with f2:b0:14:e2:b1:d6
[ 9868.029674] wlp3s0: 80 MHz not supported, disabling VHT
[ 9868.046000] wlp3s0: send auth to f2:b0:14:e2:b1:d6 (try 1/3)
[ 9868.081675] wlp3s0: authenticated
[ 9868.083697] wlp3s0: associate with f2:b0:14:e2:b1:d6 (try 1/3)
[ 9868.090671] wlp3s0: RX AssocResp from f2:b0:14:e2:b1:d6 (capab=0x431 status=0 aid=1)
[ 9868.154682] wlp3s0: associated
[ 9870.956272] wlp3s0: Connection to AP f2:b0:14:e2:b1:d6 lost
[ 9880.895468] wlp3s0: authenticate with f2:b0:14:e2:b1:d6
[ 9880.895514] wlp3s0: 80 MHz not supported, disabling VHT
[ 9880.911431] wlp3s0: send auth to f2:b0:14:e2:b1:d6 (try 1/3)
[ 9880.947277] wlp3s0: authenticated
[ 9880.949648] wlp3s0: associate with f2:b0:14:e2:b1:d6 (try 1/3)
[ 9880.958767] wlp3s0: RX AssocResp from f2:b0:14:e2:b1:d6 (capab=0x431 status=0 aid=1)
[ 9881.025889] wlp3s0: associated
[ 9883.551482] wlp3s0: Connection to AP f2:b0:14:e2:b1:d6 lost

@gca31
Copy link

gca31 commented Aug 21, 2023

@soxrok2212 Can this issue be reopen as the issue does not seem solved ? Or do we need to open a new one ?

@csharper2005
Copy link
Contributor

And one more bug with updated hostapd: deauthenticated due to local deauth request even after 10 seconds of connection. Seen on android devices.

My quick test showed that this was probably fixed in 23.05.0-rc3.

@soxrok2212
Copy link
Author

I am running the latest SNAPSHOT (not 23.05 snapshot) with the hostapd revert, issue seems to have come back.

@soxrok2212
Copy link
Author

I unfortunately can not re-open. Ping @hauke

@stintel
Copy link
Member

stintel commented Aug 21, 2023

Are we sure we're not confusing two different issues here?

@soxrok2212
Copy link
Author

I believe it's the same. I checked logs, its looping over message 1 in the 4-way handshake, just like it was originally. Rebooting the AP (or restarting just wifi a few times) does the trick. A few others have reported its still not working as well, and one on ipq4xxx so its not localized to MT76.

@professor-jonny
Copy link

Being I'm the only one who reported this issue on the IPQ4xxx platform is there and debugging or as such that may help?
I built from the latest snapshot and I still seem to have the issue.

@neheb
Copy link
Contributor

neheb commented Aug 22, 2023

Weird. I was having this issue so I applied 3246739 and flashed my own build to fix. Works great now. I'm guessing someone else broke it.

@stintel
Copy link
Member

stintel commented Aug 22, 2023

So maybe I experienced a different issue. It's definitely fixed for me. I'll reopen this one and someone still experiencing it shall have to bisect.

@stintel stintel reopened this Aug 22, 2023
@soxrok2212
Copy link
Author

Last night I disabled my 5ghz radio to ensure devices don’t flip between bands. Haven’t had any problems since then, so yeah seems like this still exists.

Vladdrako pushed a commit to Vladdrako/openwrt that referenced this issue Aug 23, 2023
Commit e978072baaca ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Fixes: openwrt#13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
@Entropy512
Copy link

And one more bug with updated hostapd: deauthenticated due to local deauth request even after 10 seconds of connection. Seen on android devices.

Did you file a separate bug report on this?

That sounds like an issue unrelated to the one being discussed here (merits its own bug report), but the behavior of "passes traffic for 10 seconds then timesout with authentication failure" (when it clearly did auth properly since it passed traffic) sounds like #12634 is affecting AP mode in addition to client mode.

If you can confirm you're seeing behavior in AP mode similar to my client mode issues, maybe drop a note on my issue and I'll amend the title/description.

In general, I'm questioning why hostapd has been updated to pull in random git commits that are yet to be released upstream. All of the multilink stuff seems to have introduced a variety of regressions in link state handling with non-MLO systems. I haven't figured out HOW, but somehow on a Raspberry Pi, the hostapd updates cause the wifi driver to stop properly reporting link state handling up from the kernel. I cannot yet figure out HOW hostapd is causing the driver to do weird things as far as reporting link state. Maintaining a series of reverts is getting harder and harder as more patches are dropped on top of the broken update in 304423a - as others have mentioned, OpenWRT's pile of patches on top of hostapd makes it REALLY hard to bisect upstream to find the exact commit that caused a regression.

@pktpls
Copy link

pktpls commented Aug 25, 2023

Issue still present on avm_fritzbox-4040 ipq40xx/generic - tested just now with snapshots r23782-ac68fbf526 and 23.05-SNAPSHOT r23389-5deed175a5

I thought this Fritzbox had a sudden wifi-death the other day until I found this GH issue. Was about to take a marker and write "broken" on it :)

Happends both with Open and OWE / Enhanced Open, I'm not even using WPA2/WPA3 here.

My apologies - this Fritzbox indeed has its wifi hardware broken, apparantly.

Works fine with a different device of the same model.

@soxrok2212
Copy link
Author

Issue still present on avm_fritzbox-4040 ipq40xx/generic - tested just now with snapshots r23782-ac68fbf526 and 23.05-SNAPSHOT r23389-5deed175a5
I thought this Fritzbox had a sudden wifi-death the other day until I found this GH issue. Was about to take a marker and write "broken" on it :)
Happends both with Open and OWE / Enhanced Open, I'm not even using WPA2/WPA3 here.

My apologies - this Fritzbox indeed has its wifi hardware broken, apparantly.

Works fine with a different device of the same model.

So this means it again may be specific to MT76

@gca31
Copy link

gca31 commented Aug 27, 2023

I merged @stintel 's change yesterday to master. Could someone please check if the problem is gone in master, then I will backport it to 23.05.
If mt76 gets fixed we can revert the change to hostapd again.

Tested today with latest OpenWrt SNAPSHOT r23769-324673914d / LuCI Master git-23.223.85458-f7583b6 on Banana-pi BPI-R3 and the issue is still there

Yesterday I downloaded the latest snapshot and the issue was still there.
Today, I built my own firmware with a fresh VirtualBox/Ubuntu development platform which allows me to confirm the patch from @stintel and committed by @hauke has been applied and the issue seems to have vanished.
Weird ! Can anyone confirm that ? Any idea ?

@domodwyer
Copy link

This used to affect my setup, but after bumping to a build from HEAD ~a week ago I've had no problems since.

@professor-jonny
Copy link

Recent patched fixed my problem also.

cotequeiroz pushed a commit to cotequeiroz/hostap that referenced this issue Feb 10, 2024
Commit e978072 ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Ref: openwrt/openwrt#13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
cotequeiroz pushed a commit to cotequeiroz/hostap that referenced this issue Feb 10, 2024
Commit e978072 ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Ref: openwrt/openwrt#13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
cotequeiroz pushed a commit to cotequeiroz/hostap that referenced this issue Feb 12, 2024
Commit e978072 ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Ref: openwrt/openwrt#13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
cotequeiroz pushed a commit to cotequeiroz/hostap that referenced this issue Feb 12, 2024
Commit e978072 ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Ref: openwrt/openwrt#13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
cotequeiroz pushed a commit to cotequeiroz/hostap that referenced this issue Feb 12, 2024
Commit e978072 ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Ref: openwrt/openwrt#13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
cotequeiroz pushed a commit to cotequeiroz/hostap that referenced this issue Feb 14, 2024
Commit e978072 ("Do prune_association only after the STA is
authorized") causes issues when an STA roams from one interface to
another interface on the same PHY. The mt7915 driver is not able to
handle this properly. While the commits fixes a DoS, there are other
devices and drivers with the same limitation, so revert to the orginal
behavior for now, until we have a better solution in place.

Ref: openwrt/openwrt#13156
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue report with a confirmed bug release/23.05 pull request/issue targeted (also) for OpenWrt 23.05 release target/mediatek pull request/issue for mediatek target
Projects
None yet
Development

No branches or pull requests