[RFC] Fixes for long-standing bugs #11
Conversation
Most powersave related code has already been removed previously, so it was not functional anyway. Forcing WSM_PSM_ACTIVE fixes delays/losses on incoming frames.
5.5Mb and 11Mb do work in CCK mode, only PBCC modulation is unsupported
Without info element, probe requests are invalid and most APs do not respond to them. On passive scan: listen for a full 100TU to get all standard beacons
Leave rate/retry selection to the minstrel rate adaption algorithm. Fixes wrong-item bug in TX retry policy cache due to questionable optimizations. Rewritten feedback parsing to pass correct retry counts back to minstrel. Seems to increase data rate and stability.
|
Awesome work. I looked through the commits and aside from one white space addition, which I don't really mind, it all looks good. I'm tempted to just merge it now. :) |
|
I have been wondering how significant the changes in the LTS are, as far as the WiFi performance goes. I think they redesigned part of the power supply circuitry, because it injects a lot of noise into the WiFi in the older versions. I've got my own LTS on the way, so I can see for myself whenever the postman makes it through the apocalypse. :) I'm wondering though, @dbeinder, if you get the same missed interrupt problem we have over here: #10 (Not that I have any reason to believe the interrupt issue is related to stray noise, it's just the most evident problem I see when I look at logs, so it's the one I'm hoping someone will magically resolve somehow. :)) As far as this PR goes, I'm not seeing any significant changes to throughput: Just to have it on the same page for comparison, cut and paste from what I reported in the previous PR (#10 (comment)): (As an aside, when I initially tested through the wired interface by mistake tonight, I was seeing differences in bitrate as I waved my arms about. That made sense for WiFi, but now I realise that was the wired speed I'm somewhat perplexed. :) Hopefully it was just coincidental congestion as some data moved about. Either way, I sat here waving my arms about like a lunatic for no good reason for a while earlier. :)) It's possible that it's taking me longer to authenticate with the AP than it previously was. I'll have to have a closer look when I switch back the build without this PR included, but as it tries to come up after boot there's quite a lot of this sort of thing: Until, eventually: I don't recall offhand if it struggled so hard when it initially came up before. I'll have to check that out when I switch builds. However, I recalled that when I did a channel hop on the previous build it didn't struggle so much (#10 (comment)), so thought it might be worth a mention even though it's not the exact same scenario. There's no change to the constant missed interrupt messages, they're all over the place and I didn't collect latency data previously. I should have done before I popped this build on, but didn't think to. If I remember I'll post pings for the previous build in the next day or two, when I get around to putting it back on. This is what I'm seeing right now though: Ping in: Ping out: I don't know a huge amount about WiFi down at that layer, so I don't know why I'm getting 0.3ms and @dbeinder is getting 3ms. Presumably the signal strength difference plays a role. Either that or I'm measuring it differently, because it's unlikely the speed of light is different for us. :) Regardless, it seems slightly slower to ping out from the OPiZero for me. Signal strength seems good, I'm quite close to this particular router though: I still see the same number of APs but I'm in a brick house with too many computers so I don't usually see neighbours' attenuated signals above the noise. Not with the little stock antenna on the thing, anyway. I'm not sure it's worth the effort of me testing this aspect any further. I could put a better antenna on and see if this PR changes the number of APs I see when I'm making a sensible effort to look for APs, but realistically I think if I start down that road I'm going to end up on the roof of the house waving my yagi around for fun. :) Which is all well and good, but realistically it'd be a distraction from other things I should be doing. So, overall, I'm not seeing a huge difference. I've got a good WiFi signal though, so it's not surprising if the improvements are aimed at more tenuous links. There's a couple of things I don't have data for direct comparison, but I should have in the next few days. |
|
I put the older build back on. It still seems slow to authenticate/associate at startup, so that's no different. Latency seems about the same, maybe a 10% improvement in incoming pings with the PR applied. this is what I'm seeing at the moment: Ping in: Ping out: So, in conclusion, nothing's obviously working significantly better or worse for me. The changes made in the PR sound sensible though, I don't doubt they could make a difference for others. And they don't break anything that I can see, so there's no reason not to merge it that I'm aware of based on the tests I've done. |
|
@moonbuggy Thanks for taking the time to test it out. My opinion is if it doesn't make anything worse it should be merged. |
|
@moonbuggy thanks for checking it out! I'm not sure about the auth retries, over here I'm getting auth on the first try even at -80dBm, on my version and the old build, so it probably really is a separate issue. Those 10MBytes/sec+ are definitely through the wired interface, the XR819 supports only up to 802.11n 65MBit/s, and after protocol overheads, that's about 3.x MBytes/sec in the very best circumstances. The problem with testing it while on SSH through Ethernet is this: you now have two IP adresses on the same subnet on eth0 & wlan0. Linux and your router simply figure out that Ethernet works better and route all traffic through the wired connection even if you ping/iperf the IP of the wlan0 interface. So you'd have to disconnect Ethernet first. I've done done some power measurements and it looks like powersave mode does save almost 500mW on idle - on a small board like the OPZ that can be 50% of the total power. So I tried to fix it instead of shutting it off, and I believe I got some improvements in ping times and dropped frames. WiFi powersave works like this: the device signals to the AP that it will sleep, which will then buffer all traffic. Shortly before the next beacon from the AP (usually every 100ms) arrives, the client wakes up to receive it. The AP sets a bit in the beacon if there is buffered traffic for the client. If so, the client retrieves it from the AP. So incoming pings will always be delayed by by 0-100ms with powersave. But it is adaptive and will switch off if there is a lot of traffic. So you will only see this if you send pings spaced at least 200ms apart. I found some code about beacon wakeup that looked really wrong, and I'd say it made powersave mode more reliable. I haven't seen the old problem that I couldn't ping from the outside until the XR819 sends a ping out by itself. In any case, for those who know they'll have a frequent traffic, wouldn't benefit from powersave and would like instant incoming frames - I have revived the code to allow setting powersave from userspace as is usual for wifi drivers. It can now be set using I also backported a small change from cw1200 that was needed because of kernel API that broke unsecured networks. Other than getting rid of the whitespace and the commented-out exprimental portion in tx.c, everything new is in the powersave commit. |
|
My only worry at the moment is, now that powersave is controlled by userspace, I've been unable to set it ON as default from the driver side. It seems on Armbian at least, without changing NetworkManager config, it will switch powersave off, and not honor this flag: dbeinder@c909de1#diff-2045016cb90d1e65d71c2407a2570927R291-R295 Total power consumption of my OPZ:
In use, power consumption with my patch is the same as with the current version. So it is definitely an improvement, but I can already see people complaining of overheating if there is no way to change the default. Edit: about "missed interrupt", unless the message happens 10-100x each second, this is probably not a real problem. Most likely, the interrupt happens while these lines are executed: https://github.com/fifteenhex/xradio/blob/master/bh.c#L765-L778 and then a we get the message simply because the IRQ happend while we were checking if me missed it. |
|
@dbeinder, I had assumed 802.11n meant it had 150Mbps available so didn't think much of it when I saw 10MBps rates being reported. I was definitely instructing iperf3 and ping to use the wlan0 interface in the data I did paste in my comment but, yeah, if it's only capable of 65Mbps then something isn't right. Now that I've got the LAN cable physically disconnected I'm getting very different results (this is the build without your PR applied): Ping in: Ping out: Basically, all the data I presented earlier is rubbish, it turns out. That's what I get for assuming software is using the interfaces I've told it to use and it claims it's using, I suppose. :) Apologies for that. I have some different SBCs running multiple interfaces on the same subnet that are independent of each other but I set them up a while ago and had forgotten that I had to specifically configure them to be independent. So I didn't really think about that aspect of it when I was doing some quick and dirty tests on this OPiZero. Given that my WiFi link, when tested sensibly, is performing significantly worse than yours (even with a strong signal) I wouldn't be at all surprised if I could now also see the improvements you saw. I'm now actually wondering if I've screwed something up in my kernel build that's crippling the WiFi beyond what you see in Armbian. Unfortunately I won't have time for at least a few days to pop this PR back on and test it again. However, even though I wasn't doing valid throughput/latency tests, my conclusion that it didn't break anything remains true. It's now just a conclusion based on significantly less robust data. :) Again, apologies for the nonsense. My focus so far hasn't really been on the WiFi, beyond successfully building the module against kernels 5+, and I've clearly not been giving it an appropriate level of thought when I do occasionally poke at it. I'm kind of annoyed and embarrassed by the outcome of this lack of thought. Obviously don't wait on me to decide if it gets merged or not, but if things go a planned I should have some time to test it again towards the end of the week. |
[RFC] Fixes for long-standing bugs
|
@fifteenhex @moonbuggy No worries, I've been bitten by this myself and couldn't find a neat solution either. I think one other problem with setting the interface in ping/iperf is that even that works, you still have no control over the path the returning packets take. |
|
I'll reset master to the point before your commits so you can create a new pull request if that's easier? |
|
Sure, that'd be ideal. I don't think this repo is active enough to inconvenience anyone ;) |
|
It's reset now. |
@dbeinder, unrelated to this radio module, but the case where I have multiple interfaces on a device on the same subnet involves multiple MACVLAN interfaces. I'm still a bit busy and distracted, so I haven't thought that much about if it makes sense in this specific scenario, but to ensure each of those interfaces only uses the desired IP/MAC required some ARP settings to be changed. I thought I'd mention it on the off chance it was a neat solution for you in some applications. Anyway, That's the configuration I'd done but forgotten about, that I mentioned earlier. It's not often that I'm messing about with the link layer, which is probably why it didn't pop into my mind before. Basically, I know ARP is a thing, but for the most part I just leave it to sort itself out. :) Like I say though, I haven't properly thought about it. It works for my MACVLAN interfaces, I don't know first-hand if it directly translates to physical interfaces. It obviously won't help if the return path is determined upstream of the interfaces and is incorrect (my MACVLAN interfaces are, of course, all attached to the same copper as the the physical interface so there's only a single physical return path involved). It's something I'm planning to have a play with when I next get a chance to look at my OPiZero (which, btw, now looks like it will be next week, rather than the end of this week). |
This should fix:
Tested on OrangePi Zero LTS
Let's see if we can get some testers before merging this. @moonbuggy @sunzone93
The text was updated successfully, but these errors were encountered: