Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#3947 - Wifi STA that loses AP signal takes down the whole router, sometimes rebooting it #8943

Open
openwrt-bot opened this issue Jul 24, 2021 · 4 comments
Labels
flyspray kernel pull request/issue with Linux kernel related changes release/21.02 pull request/issue targeted (also) for OpenWrt 21.02 release

Comments

@openwrt-bot
Copy link

openwrt-bot commented Jul 24, 2021

slick_diligence:

Initially reported on forum: https://forum.openwrt.org/t/wifi-client-disconnecting-takes-the-whole-wifi-ap-down-on-21-02-snapshot-how-to-debug/102094/8

Device: Asus RT-N56U
Branch: openwrt-21.02, initial report commit 60fad8f (v21.02.0-rc3-74-g60fad8f82b)

Observation: when a wifi client disconnects that appears to see the AP with low signal strength, the whole AP goes down. Verified with airmon/tcpdump that AP beacons stop. Reboot of the router also observed when log_level set to 2 or lower.

Bisecting, commit https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=a078037ace50 (v21.02.0-rc3-35-ga078037ace) seems to be the issue:

mac80211: improve rate control performance
Call rate control handler after intermediate queueuing
Includes follow-up fixes

Test case:

  1. Connect to AP
  2. Move far away or shield mobile so the AP signal drops significantly as seen by mobile
  3. Viewing the wifi networks of the mobile (android), if when signal drops below some threshold:
    4.1 AP disappears from list: FAIL [the router also reboots]
    4.2 AP moves from "Connected" to "Saved": PASS

I did not have these issues at all in May 2021 using the dev snapshot. I updated to the July 2021 dev snapshot, observed the issue, then built openwrt-21.02 and still observed the issue.

@openwrt-bot
Copy link
Author

openwrt-bot commented Jul 24, 2021

slick_diligence:

More information in case it matters:

  • wifi-device: 802.11n HT40 5GHz channel, txpower is limited
  • wifi-iface: these options I mucked with but did not make a difference in seeing the issue: ieee80211w, isolate, wpa_disable_eapol_key_retries)

My 23 July 2021 build of openwrt-21.02 with "git revert ccbe535; git revert a07803" has not encountered the issue in 1 day compared to almost seeing the issue 5+ times per day.

@openwrt-bot
Copy link
Author

openwrt-bot commented Nov 23, 2021

dbpalan:

Exactly same symptom from openwrt-21.02.0 as well as openwrt-21.02.1:

(1) One wifi client move far away (low signal) from AP
(2) ALL wifi clients connected to that AP disconnect and cannot find that AP (another AP from the same router has no problem, i.e. disconnect from 2.4GHz AP will not affect 5GHz AP)
(3) After around 1 minute, the disconnected AP re-appears and able to connect again

Same symptom occured in two different routers.

Image used: https://downloads.openwrt.org/releases/21.02.1/targets/ramips/mt7621/lenovo_newifi-d1-squashfs-sysupgrade.bin

@openwrt-bot
Copy link
Author

openwrt-bot commented Nov 24, 2021

slick_diligence:

It sounds like there may be some progress from:

https://lkml.org/lkml/2021/11/18/539

The reporter to the kernel list identified the exact same commit that I did "mac80211: call ieee80211_tx_h_rate_ctrl() when dequeue".

There appears to be a patch available from Felix Fietkau:

https://lkml.org/lkml/2021/11/21/252

---
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1822,15 +1822,15 @@ static int invoke_tx_handlers_late(struct ieee80211_tx_data *tx)
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb);
ieee80211_tx_result res = TX_CONTINUE;

  • if (!ieee80211_hw_check(&tx->local->hw, HAS_RATE_CONTROL))
  •   CALL_TXH(ieee80211_tx_h_rate_ctrl);
    
  • if (unlikely(info->flags & IEEE80211_TX_INTFL_RETRANSMISSION)) {
    __skb_queue_tail(&tx->skbs, tx->skb);
    tx->skb = NULL;
    goto txh_done;
    }
  • if (!ieee80211_hw_check(&tx->local->hw, HAS_RATE_CONTROL))
  •   CALL_TXH(ieee80211_tx_h_rate_ctrl);
    
  • CALL_TXH(ieee80211_tx_h_michael_mic_add);
    CALL_TXH(ieee80211_tx_h_sequence);
    CALL_TXH(ieee80211_tx_h_fragment);

I will give this a try and see if it improves.

@openwrt-bot
Copy link
Author

openwrt-bot commented Nov 24, 2021

slick_diligence:

The above patch as used in OpenWrt at commit https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=d1ea575baa1b53bb477a020974afcec1b1193edc fixes the issue.

I had 100% success rate crashing the AP with my test case with the commit prior, and 0% success rate crashing the AP with my test case with the above commit.

The commit indicates:

"This showed up primarily on rt2x00"

But based on my report, @dbpalan's, and the LKML report, it occurred with:

  • rt2x00usb (Raspberry pi, not OpenWrt)
  • ramips/mt7621 (lenovo newifi-d1)
  • rampis/rt3883 (asus rt-n56u)

@aparcar aparcar added release/21.02 pull request/issue targeted (also) for OpenWrt 21.02 release kernel pull request/issue with Linux kernel related changes labels Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray kernel pull request/issue with Linux kernel related changes release/21.02 pull request/issue targeted (also) for OpenWrt 21.02 release
Projects
None yet
Development

No branches or pull requests

2 participants