Huge ping value increase after running 5g wifi for some time #152

updatede · 2018-02-17T04:22:38Z

device: xiaomi router 3g
target: ramips/mt7621
system: OpenWrt SNAPSHOT r6150-dc7a1e8
signal strength: -60dbm
I use just only one client to connect router: a ipad air2.
When wlan just start:

After play game for some time, network get stucked, at the same place, close all programs on pad, ping value increase huge, won't decrease untill restart wlan on router, after restart ping value decrease to normal.

iancd · 2018-02-17T10:26:57Z

This should be expected behaviour from iOS. It would not be possible for an iPad to achieve 10 hours of battery life if it behaved like a desktop computer.

When the device is idle the wifi hardware goes into a powersave mode because it isn't critical to get data quickly. When network data is needed the wifi hardware is taken out of power save mode until network traffic is reduced to an idle state again so that power save mode is kicked back in. Source link

Maybe streaming audio from the internet in the background of your game will prevent the iPad powersaving.

updatede · 2018-02-17T12:10:20Z

no, the device is not idle and hardware does not go to powersave mode. because it's never happen when using wndr3800 with this same ipad doing same thing at same place. and this "idle" can't wake up whatever you do to ipad, unless you do restart on mt76 router. so it's mt76 driver not ios.

mastum · 2018-02-17T12:18:17Z

@updatede I can't reproduce it with my iPad, what is for you "for some time" ?
Mine is connected (5 GHz) from 5 days without problems.
Xiaomi 3G OpenWrt SNAPSHOT r6022

updatede · 2018-02-17T12:20:18Z

@Mafesa for some time is a duration of time , not fixed, 3mins,5mins,10mins or just connected.

updatede · 2018-02-17T12:24:41Z

@Mafesa what's your ping value from route to client ?

mastum · 2018-02-17T12:25:56Z

192.168.0.98 is your router IP?
I will test r6150

updatede · 2018-02-17T12:29:53Z

@Mafesa I don't know what's your client ip. you need ping from the router to client.
192.168.0.89 is my ipad's ip.

mastum · 2018-02-17T12:37:19Z

OK I understand, I was doing the opposite from iPad to router with this app.

mastum · 2018-02-17T14:28:34Z

Tested now OpenWrt SNAPSHOT r6152-f4e5880 and ping router -> iPad 5GHz is constant with ~50 ms

updatede · 2018-02-19T04:51:23Z

@nbd168 because lack of debug info, I want to add some debug info in my copy of source code to find the reason why ping value get so high to 1000 or 2000 ms, any advice for places to add ?

barhom · 2018-02-22T23:41:31Z

I can say that I see something similar on both mt7603 and mt7612.
After running a while ping increases a lot.
I still don't know the cause.

updatede · 2018-02-25T02:24:13Z

Still unfixed on latest snapshot r6302-0f54d96, 7% packet loss on this test

Mushoz · 2018-03-06T08:27:56Z

Seeing similar behavior on my Dir-860L. 1000-2000ms pings, high packet loss and very slow speeds. Only rebooting the router solves the issue for me. Lede 17.01.4 is completely fine.

nbd168 · 2018-03-08T20:49:50Z

Please try to build the latest version from my staging tree at https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=summary

updatede · 2018-03-08T23:05:41Z

Because I rely on snapshot sdk to build, and this tree switch to kernel 4.14 before mac80211 patch, I don't know how to build it, I will test as soon as it commit to master tree. Thank very much @nbd168

moonwatcher · 2018-03-09T05:34:25Z

nbd, I have been experimenting with mir3g and the new 4.14 kernel. i have been seeing similar behavior. Would love to test patches. I am currently running 6c194078db6958e349114c8605304a7b87edae3e on 4.09, but I can build 4.14 with patches tomorrow if you want me to try stuff (I have 6 of those units...).

How do I build your staging, is that the entire build system or a package repo?

updatede · 2018-03-09T05:48:41Z

@nbd168 I have modified function ieee80211_amsdu_aggregate according to your patch, and compile mac80211.ko on my old snapshot sdk with 4.9.77 kernel. I replace mac80211.ko on my router which run a 4.9.86 kernel snapshot, after restart, it always popup error of wifi password not correct. I'll try with same snapshot version system and sdk later.

Mushoz · 2018-03-09T10:08:23Z

@nbd168 I am leaving for my holiday tomorrow. I will be back the 19th of March. If you still need testing to be done I will help after that. I'm sorry that I can't help at the moment.

updatede · 2018-03-09T11:57:02Z

@nbd168, I manually patched tx.c in mac80211 directory of backports-2017-11-01, and rebuild mac80211.ko file with latest snapshot sdk. After testing, the same problem still exist. But it take about 20mins to trigger, it usually take 3-5mins to trigger before this patch. There are other similar problems, move to weak signal place, client will stuck in high ping value and packet loss, move back to strong signal place won't get out of stuck. Turn off wifi or screen on client, wait about 5mins, turn on wifi or screen again, client will stuck in high ping value and packet loss either. The common problem of all above is, once high ping and packet loss appears, you can't let it disappear by any means unless restart wifi on router.

nbd168 · 2018-03-09T14:15:22Z

Any chance you could give me SSH access to your device while it's in the faulty state? Maybe I can find out a bit more about this bug that way.

updatede · 2018-03-09T14:32:39Z

@nbd168 of course, but I need private way to tell you how. is there private message at github ? it is 22:28 at my time, if you can do it tonight, I’ll tell you privately as soon as stuck appear.

nbd168 · 2018-03-09T14:37:22Z

You could send me an email: nbd@nbd.name

updatede · 2018-03-09T14:39:15Z

ok , wait my email

updatede · 2018-03-09T15:24:23Z

have you receive my email ? I have not receive any response.

nbd168 · 2018-03-09T15:24:53Z

I'm logged in and running some tests already. Thanks!

Mushoz · 2018-03-19T10:19:10Z

I am back from my skiing holiday. Do you still need testers, @nbd168 ?

updatede · 2018-04-09T17:41:17Z

@nbd168 The latest r6653-c0bbb97 still has this bug unfixed after issue opened for about two months.
After analysing captured 80211 packets, I found that while my IPAD in normal state, AP often received sleep frame from my IPAD, after AP send TIM+ (TIM indicating buffered frame for IPAD) in beacon, AP received wakeup frame from my IPAD immediately.
But when my IPAD enter wifi doze state(turn off screen for about 10 mins), then I turn on screen and ping my IPAD, AP can't receive my IPAD's wakeup frame right after it sent TIM+ , after sending many many TIM+, AP received wakeup frame from IPAD. No matter what you do (even reboot IPAD) , this state will not change unless restart wifi on AP. When repeat this doze then turn on process to my wndr3800 with openwrt 15.05.1, it can wake up correctly just like normal state.
I don't know why. Maybe wifi doze STA let AP enter a false state and possibly:

make AP can't transmit TIM+ to STA.( I can just see TIM+ in monitor interface by tcpdump, maybe these frames not really sent to air or something).
make AP can't receive wakeup frame from STA.
AP notify STA stay in wifi doze state forever (weird，because I can't get any ping response in doze state, but I can get ping response when turn on screen from doze state, just massive latency, don't know if there is mechanism in 80211).

Mushoz · 2018-04-09T17:47:17Z

Thank you very much for digging into this! @nbd if you need testers, please let me know.

Mushoz · 2018-04-09T17:51:26Z

I meant to tag @nbd168 :) sorry about that

nbd168 · 2018-04-15T09:15:49Z

Please try the latest version

updatede · 2018-04-15T10:36:24Z

@nbd168
It improves since commit 7a31731. Turn off screen for 10 mins seems have no latency any more. But for 15~20 mins, the same problem is back.
I checked ath9k code , when sta is about sleeping, ath9k call ieee80211_sta_set_buffered with "true", and it has a ath_tx_aggr_sleep function that call ieee80211_sta_set_buffered either, after releasing buffer, it call ieee80211_sta_set_buffered with "false". But in mt76 code , there is only place call ieee80211_sta_set_buffered with "false", no "true" call by checking sleeping, is it the reason of issue ?

updatede · 2018-04-15T11:54:33Z

"If the driver buffers frames in the driver for aggregation in any way, it must use the ieee80211_sta_set_buffered() call when it is notified of the station going to sleep to inform mac80211 of any TIDs that have frames buffered." from Linux 802.11 Driver Developer’s Guide.
Maybe need some aggregation-sleeping logic.

updatede · 2018-04-16T05:32:21Z

@nbd168
I add some printk() in mt76_txq_schedule_list(), the doze time need to trigger bug increase again.
So, maybe it's newly added code increase the execution time of mt76_txq_schedule_list(), and the trigger of the bug need to call mt76_txq_schedule_list() a number of times.

updatede · 2018-04-17T09:50:25Z

@nbd168
I reverted mt76 driver to first commit 9e7049b "Initial import" and the same problem still exist.
I made some modification to fit latest mac80211/cfg80211 because I couldn't get old mac80211 source code.
After that I wonder if mir3g's hardware is not compatible with ios or ipad air2 hardware, so I restore mir3g to stock firmware, the issue disppeared.
So I think bug is in initial driver code, or mac80211 code before/after "Initial import".

updatede · 2018-04-18T13:36:54Z

@nbd168
I disable 80211n/ac by setting mode to legacy to see if issue still exist.
From captures I can see ping frame from AP to STA now because of no aggregation.
When bug triggered I saw once a ping frame sent to STA before STA sent wakup null frame to AP.
If it's not a monitor interface issue, I think maybe it's the reason of this bug.
I've sent packet capture to your email address.

updatede · 2018-04-21T06:00:20Z

@nbd168
I have some new findings about this bug.
I find new ways to reproduce the symptom:

Switch off wifi on iPad air2 , restart wifi on AP, don't connect any STA to AP, let AP run wifi for 15~20 mins, then connect iPad air2 to AP, the symptom appears.
Restart wifi on AP, connect iPad air2 to AP, keep pinging from AP to iPad air2 for 15-20 mins, the symptom appears.
So, even we can see later wakeup null frame, but this bug is not about sleep, not about sleep time, it's only about running time of AP after restarting wifi.
Old version ipad that only support 80211N seems not affected , not very sure.
Android client that support 80211AC also affected, ping echo time after triggered is 500-600ms, it's less than iPad's but significantly larger than normal.

updatede · 2018-05-06T03:46:52Z

@nbd168
I don't know if you test after my last comment. But I found that changing "beacon_int" can change bug trigger time. For example, change "beacon_int" to 50 will trigger this bug 10 mins after restarting wifi, 100 is 20 mins, 200 is 40 mins, and so on. So I think this bug is in beacon or packet sending related code.

updatede · 2018-05-16T07:49:36Z

this project does not want to fix this bug and i switch to stock driver to abandon this driver.

nbd168 · 2018-05-16T07:57:06Z

I do want to fix this bug, I just don't have much time on my hands at the moment.

updatede · 2018-05-18T06:34:04Z

no, thanks

jsantala · 2018-05-18T06:35:19Z

Just keep the ticket open, please. There's other people who want to see this fixed. Thanks!

Mushoz · 2018-05-18T08:09:14Z

Agreed. Please leave this ticket open. There is valuable information in here and there are more people who would like to see this fixed.

jedi7 · 2018-05-18T08:18:27Z

And want help. Because @nbd168 do great work. (It is still opensource)

nbd168 · 2018-05-18T15:15:32Z

I think I may have found a possible cause for this. Apparently the beacon timer drifts by one microsecond every time it fires. The minimum configurable interval is 64us, so it has to be corrected every 64 beacons. This would also explain why changing the beacon interval affects the time it takes for this issue to manifest.
When the client is in powersave mode, it regularly wakes up before the beacon transmission time and waits for a short period of time before going back to sleep again. After some time, the drift becomes so big that the client goes back to sleep again before the AP sends its beacon. The client then only notices after beacon loss detection kicks in and forces the client out of powersave.
I will let you know when I've implemented a fix for this and verified the timing.
In the mean time, is there anybody here except for @updatede that can reliably reproduce the issue?

Mushoz · 2018-05-18T15:19:23Z

I was seeing the same issue 2 months ago when I tried the master branch. I returned back to 17.01.4 where this issue does not exist for me.

I haven't tried the master branch in the mean time, but I plan on flashing the 18.06 branch this Monday when I come back from my holiday to help testing. Will let you know if I am able to reliably reproduce this.

nbd168 · 2018-05-18T16:16:08Z

I've made a fix and verified the timings. It's in the master branch now, and I intend to push it to the 18.06 branch soon.

slthomason · 2018-05-18T16:28:29Z

@nbd168 - I am not trying to cross issues - but the description here in the email as well as in the commit perfectly describe behavior that we are getting really badly also on the mt7603 (2Ghz) chipset that we reported in this issue: #167 Any chance there would be a way to implement similar checks and behavior for the mt7603 chips as well? Could they both be suffering from the same? We are only able to replicate the behavior when devices are coming out of power save mode ....

…

On Fri, May 18, 2018 at 9:16 AM, nbd168 ***@***.***> wrote: I've made a fix and verified the timings. It's in the master branch now, and I intend to push it to the 18.06 branch soon. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#152 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB0TcvpjHM534DvjyJ0oyE2PQNQNwK9wks5tzvPJgaJpZM4SJL20> .

nbd168 · 2018-05-18T16:55:52Z

I ran some tests, MT7603 does not have the same issue

araujorm · 2018-05-19T17:00:31Z

Hello.

As I have mentioned in #139 during tests for mt76x2u (USB variant) that is sharing some of the code, this exact issue happens everytime one scans for networks. TX and RX are simply blocked for about 15 seconds, which is not normal. I've just had the chance of confirming that it happens exactly the same with mt76x2 on latest master.

Just try it, launch some kind of speed test, iperf or ping and then do a iw dev wlan1 scan inside a shell on your openwrt router, or invoke a scan with the 5GHz radio in LuCI.

As I've also mentioned there, if you go to tx.c and comment both of the if (...) return -EBUSY lines the issue no longer happens, although performance is subpar during the scan.

My guess is that the original issue reporter has something in his configuration that triggers a network scan once in a while (in my case with the USB variant, it was happening very frequently since I was in a PC with Fedora/MATE and NetworkManager triggers a scan every other minute if the signal isn't what they would consider perfect).

Hope this helps @nbd168 and @LorenzoBianconi to find out what is going on.

nbd168 · 2018-05-20T06:01:22Z

I've fixed the issue with latency during scanning in current mt76 git. Please test.

araujorm · 2018-05-20T13:32:36Z

@nbd168 Looking good, I no longer have total RX/TX loss when scanning. Ping responses are now normal when doing so. Iperf3 performance isn't still what I'd call optimal on that situation, but that may involve something specific to my environment, so it would be nice if other people also tested and gave feedback :)

Thank you.

Mushoz · 2018-05-24T07:05:28Z

I promised to get back to you @nbd168 to report whether I could still reproduce this issue. I have been running the 18.06 branch since Tuesday (which does not contain the beacon timer drift fix) and I am unable to reproduce the issue. So in my case, the flaky 5ghz connection must have been caused by something else. So therefor, I am unable to test whether the beacon timer drift fix actually fixed anything regarding this reported issue. It would be best if @updatede could confirm/deny whether the fix is successful for him.

updatede mentioned this issue Feb 26, 2018

5g wifi packet stuck issue #150

Closed

LorenzoBianconi mentioned this issue May 10, 2018

MT7612U support #139

Closed

updatede closed this as completed May 16, 2018

nbd168 reopened this May 16, 2018

updatede closed this as completed May 18, 2018

nbd168 reopened this May 18, 2018

slthomason mentioned this issue May 18, 2018

MT7603 - persistent issues #167

Closed

nbd168 closed this as completed Jun 18, 2018

Huge ping value increase after running 5g wifi for some time #152

Huge ping value increase after running 5g wifi for some time #152

Comments

updatede commented Feb 17, 2018 • edited Loading

iancd commented Feb 17, 2018

updatede commented Feb 17, 2018

mastum commented Feb 17, 2018

updatede commented Feb 17, 2018

updatede commented Feb 17, 2018

mastum commented Feb 17, 2018 • edited Loading

updatede commented Feb 17, 2018

mastum commented Feb 17, 2018

mastum commented Feb 17, 2018 • edited Loading

updatede commented Feb 19, 2018

barhom commented Feb 22, 2018

updatede commented Feb 25, 2018

Mushoz commented Mar 6, 2018 • edited Loading

nbd168 commented Mar 8, 2018

updatede commented Mar 8, 2018

moonwatcher commented Mar 9, 2018 • edited Loading

updatede commented Mar 9, 2018

Mushoz commented Mar 9, 2018

updatede commented Mar 9, 2018 • edited Loading

nbd168 commented Mar 9, 2018

updatede commented Mar 9, 2018

nbd168 commented Mar 9, 2018

updatede commented Mar 9, 2018

updatede commented Mar 9, 2018

nbd168 commented Mar 9, 2018

Mushoz commented Mar 19, 2018

updatede commented Apr 9, 2018 • edited Loading

Mushoz commented Apr 9, 2018

Mushoz commented Apr 9, 2018

nbd168 commented Apr 15, 2018

updatede commented Apr 15, 2018 • edited Loading

updatede commented Apr 15, 2018

updatede commented Apr 16, 2018 • edited Loading

updatede commented Apr 17, 2018 • edited Loading

updatede commented Apr 18, 2018 • edited Loading

updatede commented Apr 21, 2018 • edited Loading

updatede commented May 6, 2018

updatede commented May 16, 2018

nbd168 commented May 16, 2018

updatede commented May 18, 2018

jsantala commented May 18, 2018

Mushoz commented May 18, 2018

jedi7 commented May 18, 2018

nbd168 commented May 18, 2018

Mushoz commented May 18, 2018

nbd168 commented May 18, 2018

slthomason commented May 18, 2018 via email

nbd168 commented May 18, 2018

araujorm commented May 19, 2018

nbd168 commented May 20, 2018

araujorm commented May 20, 2018

Mushoz commented May 24, 2018

updatede commented Feb 17, 2018 •

edited

Loading

mastum commented Feb 17, 2018 •

edited

Loading

mastum commented Feb 17, 2018 •

edited

Loading

Mushoz commented Mar 6, 2018 •

edited

Loading

moonwatcher commented Mar 9, 2018 •

edited

Loading

updatede commented Mar 9, 2018 •

edited

Loading

updatede commented Apr 9, 2018 •

edited

Loading

updatede commented Apr 15, 2018 •

edited

Loading

updatede commented Apr 16, 2018 •

edited

Loading

updatede commented Apr 17, 2018 •

edited

Loading

updatede commented Apr 18, 2018 •

edited

Loading

updatede commented Apr 21, 2018 •

edited

Loading