-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WI-FI is unstable at 2.4 GHz #793
Comments
Same issue on GL MT300N v2 with mt7628. |
known issue, because of faulty ethernet driver |
Please try latest OpenWrt master or 23.05 branch |
@nbd168 Lines 91 to 99 in c19b62f
My debug patch:
Kernel log:
|
This function looks suspicious. Line 1569 in c19b62f
The vendor driver will check it 10 times before reset and will reset the pse counter if the 0x4244 register meets some conditions. MT7603 vendor driverIn mt7603_wifi\common\cmm_data_pci.c
MT7628 vendor driverIn mt7628_wifi\hw_ctrl\cmm_chip_mt.c
|
mt76 will also check this multiple times. The function is called via a wrapper that keeps the counter. |
Sadly it doesn't work. Maybe we need additional check for
|
Could you please apply this patch for printing register debug values and show me the output around reset? |
Sure, this is the log:
|
I finally understand the bug now, and this patch should fix it: https://nbd.name/p/4ece22b2 The way it works is this: in the vendor code the function on its own does not detect a rx hang, it only detects if rx is busy, which could also happen due to normal rx activity. The missing part was that in the vendor driver it resets the counter on rx irqs (which indicate real activity), so that it only issues a reset if rx really encountered a hang. In my patch, I adjusted the mt76 code accordingly. |
The watchdog will still reset the chip. It takes approximately one minute to recover from the hang state. It only took about one second before.
|
I tried to skip the RX PSE reset with your patch https://nbd.name/p/4ece22b2, but it would cause the client to disconnect after reset request.
Therefore, I believe that reset is the expected behavior. It must be something else caused the RX_PSE_BUSY. Perhaps in some functions, MT7628 requires a different operation than MT7603. |
I found a few more 7628 specific things, here's a new combined patch: https://nbd.name/p/883e48cf |
Thanks for your hard work. It seems that we still need some fixes. With the patch https://nbd.name/p/883e48cf, when I start downloading something, the WiFi signal/SSID will disappear about 30 seconds after watchdog reset.
|
Sorry, had a copy&paste bug in there. Fixed version: https://nbd.name/p/40608ec5 |
Still no lucky. SSID disappear after reset.
@Linaro1985 suspects that there are some DMA issues with the Ethernet driver of mt7628. openwrt/openwrt#10074 (comment) |
I found mt7628 vendor driver has some additional DMA related code in
|
I did more testing and rework to make the reset and the beacon stuck check more reliable. Please try this patch: https://nbd.name/p/54045fb4 |
It is still broken. I used
|
When it triggers, does it break the connection? |
Yes, the SSID will disappear several seconds. |
Does bumping MT7603_RX_RING_SIZE in mt7603.h to 256 help? |
Edit: |
Update: |
Hope this issue will have a fix soon. 5G wifi is the only stable connection as of now. |
@nbd168 Hi! Finally, I know what caused the PSE reset. It's WiFi fragmentation threshold. Setting any frag value instead of using the default one (off?) can avoid PSE reset. I'm not sure if this is a problem with mt76 or some OpenWrt core packages.
|
@DragonBluep, wow, nice find. I suspect that the main effect of that knob is that it likely disables A-MSDU tx. Please try removing this line from mac80211.c in mt76: If that makes it work reliably as well, please put the line back in and try gradually reducing the value of Thanks. |
@nbd168 Good news, removing These tests are based on patch #793 (comment)
Summaries:
|
Please try this patch on top of current mt76: https://nbd.name/p/762e9946 |
@shown19 MT7612 hardware restart issue is caused by USB3.0 port. Please focus on this issue as it goes beyond the topic here. #457 (comment) |
I actually commented in that post dated July 21 and don't even have hdd inserted but still got that issue. Anyway, this is unrelated issue to this topic here. I will try to figure this out. Thank you. |
Hi, has anyone tried to do long term tests with MT7628 with most recent version of the driver ? |
@nachalni no, we thought that you would do them :) Joking - fix is too recent to have long-term test |
We did some tests with multiple clients running heavy http traffic for 24hrs to/from wan. Reset file output only had this one exception: |
@shown19 just for the record, I have a similar board:
but remember having "Hardware restart was requested" in logs just once or twice since 19.07. I suspect in your case there may be HW interference or a short circuit, etc. |
that might be the case and also whenever I lowered the txpower from 20dbm to 12dbm, It doesn't log hardware restart requested so my guess also is maybe the 5ghz chipset cannot handle higher power anymore. I'm not experiencing it back then actually for months, it just so happened all of a sudden frequently now. |
It was reported that this can cause the PSE hang issues, even with a low number of fragments. Link: #793 (comment) Signed-off-by: Felix Fietkau <nbd@nbd.name> (cherry picked from commit b14c235)
@DragonBluep @nbd168 I continue to test the mt7628 devices in hard conditions with many connected clients (about 20) on OpenWrt 22.03 snapshot. Sometimes wifi stops working and "Beacon stuck" counter increments in /sys/kernel/debug/ieee80211/phy0/mt76/reset until command wifi down && wifi up. After that wifi will continue to work. Update: after a while Wi-Fi recovers itself |
This comment was marked as resolved.
This comment was marked as resolved.
Yes. It contains. |
@Linaro1985 Have you tested the main branch? Did the "Beacon stuck" counter increasing before the SSID disappears or after the SSID disappears? In the previous PSE reset issue, about one minute after the SSID disappeared, the PSE reset counter increased by 1, and finally WiFi returned to normal state. It seems that the AP will enter an uncontrolled state in some conditions and watchdog doesn't catch the MCU hang. |
@Linaro1985 add this 3 lines in /etc/config/wireless - wifi-device
Stable Wi-Fi using 23.05-SNAPSHOT r23400 & 22.03-SNAPSHOT r20213 :) |
@Dahhyunnee thanks! I try it. Have you tried without these parameters with the latest changes on openwrt 22.03/23.05/main?
@DragonBluep only on devices that are near me and there is no any problems. But I have multiple access points in a remote office. They have many connected clients. After updating from version 19.07 to the latest 22.03 snapshot I have such a problem. Due to the fact that this is a work office, I can not put a version higher than 22.03 to check 23.05/main branches.
Good question, but I can't reproduce this moment by self. It may happen after some time. Wifi SSID disappears and no clients data exchange with AP. I wrote a small script (launch by cron every 1 minute), for a temporary solution:
|
@DragonBluep before "Beacon stuck" occurs in the log there are the following suspicious lines
/etc/config/wireless
Maybe this will help in finding the cause. |
@Linaro1985 It seems that the SSID has already returned to normal state before crontab runs your script. These |
I don't know why there are so many authenticated messages from one client. Client signal level is normal. And without script wifi will no longer work normally. I'll try to update the device to snapshot 23.05, but it will take time to make own build with configs. |
@Linaro1985 Not sure if this new firmware has some help, but it's worth trying. I cannot reproduce your problem in daily use, perhaps it only appears in complex scenarios. |
Just curious: does this option require a specific hostapd?
|
Yes, it does not work anymore. I removed that option and this is my final configuration. config wifi-device 'radio0' |
It was already disabled by default.
|
2G is now stable but 5G is dropping under heavy load. [299289.533871] mt76x2e 0000:01:00.0: Firmware Version: 0.0.00 |
Please try to apply this patch |
It was reported that this can cause the PSE hang issues, even with a low number of fragments. Link: openwrt/mt76#793 (comment) Signed-off-by: Felix Fietkau <nbd@nbd.name>
It was reported that this can cause the PSE hang issues, even with a low number of fragments. Link: openwrt/mt76#793 (comment) Signed-off-by: Felix Fietkau <nbd@nbd.name>
It was reported that this can cause the PSE hang issues, even with a low number of fragments. Link: openwrt/mt76#793 (comment) Signed-off-by: Felix Fietkau <nbd@nbd.name>
It was reported that this can cause the PSE hang issues, even with a low number of fragments. Link: openwrt/mt76#793 (comment) Signed-off-by: Felix Fietkau <nbd@nbd.name>
It was reported that this can cause the PSE hang issues, even with a low number of fragments. Link: openwrt#793 (comment) Signed-off-by: Felix Fietkau <nbd@nbd.name> (cherry picked from commit b14c235)
Hello, I have a Xiaomi router 4A (R4AC) with OpenWrt installed SNAPSHOT r23454-01885bc6a3 / LuCI Master git-23.158.78004-23a246e
From time to time, with a Wi-Fi load of 2.4 GHz, the network starts to disappear, after which it appears again after a couple of seconds. There is no information in the log other than the actual disconnection and connection of devices to Wi-Fi. I also managed to catch a driver crash once, but I don't think it could be related to the problem (it never showed up again).
Disabling WMM mode helps, but the network speed drops below 20 Mbps.
This problem does not appear on a 5 GHz network.
Below I will provide the crash log of the driver, but keep in mind that it is not reproducible, and appeared only 1 time.
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.656151] ------------[ cut here ]------------
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.660885] WARNING: CPU: 0 PID: 511 at target-mipsel_24kc_musl/linux-ramips_mt76x8/mt76-2023-05-13-969b7b5e/mt7603/mac.c:208 mt7603_filter_tx+0x178/0x180 [mt7603e]
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.675870] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt76x2e mt76x2_common mt76x02_lib mt7603e mt76 mac80211 lzo cfg80211 slhc nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 lzo_rle lzo_decompress lzo_compress libcrc32c crc_ccitt compat sha512_generic sha256_generic libsha256 seqiv jitterentropy_rng drbg hmac cmac crypto_acompress leds_gpio gpio_button_hotplug crc32c_generic
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.744421] CPU: 0 PID: 511 Comm: napi/phy0-3 Not tainted 5.15.118 #0
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.750972] Stack : 00000000 00000000 81a39c7c 808e0000 80720000 8066c410 80e33d00 8071de83
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.759499] 808e33b4 000001ff 00000000 80061ae4 80665a7c 00000001 81a39c38 1a20d335
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.768015] 00000000 00000000 8066c410 81a39ad0 ffffefff 00000000 00000000 ffffffea
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.776537] 00000000 81a39adc 000000d7 807242f8 808e0000 00000009 00000000 81a04688
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.785059] 00000009 00000000 00003a98 80000000 00000018 80340db8 00000000 808e0000
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.793577] ...
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.796060] Call Trace:
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.798535] [<8000702c>] show_stack+0x28/0xf0
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.802998] [<800261c0>] __warn+0x9c/0x124
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.807165] [<800262a4>] warn_slowpath_fmt+0x5c/0xac
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.812230] [<81a04688>] mt7603_filter_tx+0x178/0x180 [mt7603e]
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.818272] [<81a04818>] mt7603_wtbl_set_ps+0x12c/0x134 [mt7603e]
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.824492] [<81a01a90>] mt7603_sta_ps+0x38/0x434 [mt7603e]
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.830184] [<81a75984>] mt76_rx_poll_complete+0x520/0x638 [mt76]
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.836417] [<81a72288>] mt76_dma_rx_poll+0x284/0x4fc [mt76]
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.842204] [<803f773c>] __napi_poll+0x70/0x1f8
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.846817] [<803f7a00>] napi_threaded_poll+0x13c/0x188
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.852145] [<8004604c>] kthread+0x140/0x164
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.856505] [<80002478>] ret_from_kernel_thread+0x14/0x1c
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.862005]
Thu Jun 29 18:54:00 2023 kern.warn kernel: [ 210.863519] ---[ end trace 64b883a3276bd278 ]---
The text was updated successfully, but these errors were encountered: