Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Fixes for long-standing bugs #11

Merged
merged 5 commits into from Apr 20, 2020
Merged

Conversation

@dbeinder
Copy link

@dbeinder dbeinder commented Apr 17, 2020

This should fix:

  • delayed & inconsistent ping in vs. out: I'm now seeing median pings <3ms in both directions
  • unreliable scanning: scan used to show only half the APs in range, this is definitely fixed
  • wonky rate selection: Not 100% sure about this, but I've gotten about 2x higher throughput on a connection with bad signal (-78dBm, about 12Mbit).
    Tested on OrangePi Zero LTS

Let's see if we can get some testers before merging this. @moonbuggy @sunzone93

dbeinder added 5 commits Apr 17, 2020
Most powersave related code has already been removed previously, so it was not functional anyway.
Forcing WSM_PSM_ACTIVE fixes delays/losses on incoming frames.
5.5Mb and 11Mb do work in CCK mode, only PBCC modulation is unsupported
Without info element, probe requests are invalid and most APs do not respond to them. On passive scan: listen for a full 100TU to get all standard beacons
Leave rate/retry selection to the minstrel rate adaption algorithm. Fixes wrong-item bug in TX retry policy cache due to questionable optimizations. Rewritten feedback parsing to pass correct retry counts back to minstrel. Seems to increase data rate and stability.
@fifteenhex
Copy link
Owner

@fifteenhex fifteenhex commented Apr 18, 2020

Awesome work. I looked through the commits and aside from one white space addition, which I don't really mind, it all looks good. I'm tempted to just merge it now. :)

@moonbuggy
Copy link

@moonbuggy moonbuggy commented Apr 18, 2020

I have been wondering how significant the changes in the LTS are, as far as the WiFi performance goes. I think they redesigned part of the power supply circuitry, because it injects a lot of noise into the WiFi in the older versions.

I've got my own LTS on the way, so I can see for myself whenever the postman makes it through the apocalypse. :) I'm wondering though, @dbeinder, if you get the same missed interrupt problem we have over here: #10 (Not that I have any reason to believe the interrupt issue is related to stray noise, it's just the most evident problem I see when I look at logs, so it's the one I'm hoping someone will magically resolve somehow. :))

As far as this PR goes, I'm not seeing any significant changes to throughput:

[  5] local 192.168.30.50 port 33304 connected to 192.168.30.139 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  12.2 MBytes  12.2 MBytes/sec   50    225 KBytes
[  5]   1.00-2.00   sec  11.0 MBytes  11.0 MBytes/sec    0    258 KBytes
[  5]   2.00-3.00   sec  11.0 MBytes  11.0 MBytes/sec   40    210 KBytes
[  5]   3.00-4.00   sec  11.6 MBytes  11.6 MBytes/sec    0    247 KBytes
[  5]   4.00-5.00   sec  11.1 MBytes  11.1 MBytes/sec   48    197 KBytes
[  5]   5.00-6.00   sec  11.0 MBytes  11.0 MBytes/sec    0    235 KBytes
[  5]   6.00-7.00   sec  11.6 MBytes  11.6 MBytes/sec    0    271 KBytes
[  5]   7.00-8.00   sec  11.0 MBytes  11.0 MBytes/sec   48    227 KBytes
[  5]   8.00-9.00   sec  11.6 MBytes  11.6 MBytes/sec    0    261 KBytes
[  5]   9.00-10.00  sec  11.0 MBytes  11.0 MBytes/sec   46    215 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   113 MBytes  11.3 MBytes/sec  232             sender
[  5]   0.00-10.05  sec   112 MBytes  11.2 MBytes/sec                  receiver

Just to have it on the same page for comparison, cut and paste from what I reported in the previous PR (#10 (comment)):

Connecting to host 192.168.30.132, port 5201
[  5] local 192.168.30.50 port 39054 connected to 192.168.30.132 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  12.8 MBytes  12.8 MBytes/sec   51    215 KBytes
[  5]   1.00-2.00   sec  11.0 MBytes  11.0 MBytes/sec    0    252 KBytes
[  5]   2.00-3.00   sec  11.6 MBytes  11.6 MBytes/sec   48    202 KBytes
[  5]   3.00-4.00   sec  11.2 MBytes  11.1 MBytes/sec    0    242 KBytes
[  5]   4.00-5.00   sec  11.6 MBytes  11.6 MBytes/sec    0    275 KBytes
[  5]   5.00-6.00   sec  11.0 MBytes  11.0 MBytes/sec   42    231 KBytes
[  5]   6.00-7.00   sec  11.6 MBytes  11.6 MBytes/sec    0    265 KBytes
[  5]   7.00-8.00   sec  11.2 MBytes  11.1 MBytes/sec   48    220 KBytes
[  5]   8.00-9.00   sec  11.0 MBytes  11.0 MBytes/sec    0    255 KBytes
[  5]   9.00-10.00  sec  11.6 MBytes  11.7 MBytes/sec   40    207 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   115 MBytes  11.5 MBytes/sec  229             sender
[  5]   0.00-10.05  sec   113 MBytes  11.3 MBytes/sec                  receiver

iperf Done.

(As an aside, when I initially tested through the wired interface by mistake tonight, I was seeing differences in bitrate as I waved my arms about. That made sense for WiFi, but now I realise that was the wired speed I'm somewhat perplexed. :) Hopefully it was just coincidental congestion as some data moved about. Either way, I sat here waving my arms about like a lunatic for no good reason for a while earlier. :))

It's possible that it's taking me longer to authenticate with the AP than it previously was. I'll have to have a closer look when I switch back the build without this PR included, but as it tries to come up after boot there's quite a lot of this sort of thing:

[   17.967175] wlan0: authenticate with 14:dd:a9:d0:ab:70
[   17.967687] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[   18.134137] wlan0: authenticated
[   18.144746] wlan0: associate with 14:dd:a9:d0:ab:70 (try 1/3)
[   18.996777] wlan0: associate with 14:dd:a9:d0:ab:70 (try 2/3)
[   18.996932] wlan0: Connection to AP 14:dd:a9:d0:ab:70 lost
[   19.837077] wlan0: associate with 14:dd:a9:d0:ab:70 (try 3/3)
[   20.646851] wlan0: association with 14:dd:a9:d0:ab:70 timed out
[   21.491694] wlan0: authenticate with 14:dd:a9:d0:ab:70
[   21.492157] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[   22.329014] wlan0: authenticate with 14:dd:a9:d0:ab:70
[   22.329039] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[   23.004651] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 2/3)
[   23.288798] wlan0: authenticate with 14:dd:a9:d0:ab:70
[   23.288821] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[   24.034639] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 2/3)
[   24.236556] wlan0: authenticate with 14:dd:a9:d0:ab:70
[   24.236577] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[   24.355296] wlan0: authenticated
[   25.355328] wlan0: authenticate with 14:dd:a9:d0:ab:70
[   25.355350] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[   25.356352] wlan0: authenticated
[   26.318087] wlan0: authenticate with 14:dd:a9:d0:ab:70
[   26.318114] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)

Until, eventually:

[  108.181719] wlan0: authenticate with 14:dd:a9:d0:ab:70
[  108.181749] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[  108.183351] wlan0: authenticate with 14:dd:a9:d0:ab:70
[  108.183373] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 1/3)
[  108.706251] wlan0: send auth to 14:dd:a9:d0:ab:70 (try 2/3)
[  109.267946] wlan0: authenticated
[  109.276245] wlan0: associate with 14:dd:a9:d0:ab:70 (try 1/3)
[  110.696127] wlan0: associate with 14:dd:a9:d0:ab:70 (try 2/3)
[  111.656081] wlan0: associate with 14:dd:a9:d0:ab:70 (try 3/3)
[  112.056867] wlan0: RX AssocResp from 14:dd:a9:d0:ab:70 (capab=0x411 status=0 aid=4)
[  112.097514] wlan0: associated

I don't recall offhand if it struggled so hard when it initially came up before. I'll have to check that out when I switch builds. However, I recalled that when I did a channel hop on the previous build it didn't struggle so much (#10 (comment)), so thought it might be worth a mention even though it's not the exact same scenario.

There's no change to the constant missed interrupt messages, they're all over the place and dmesg is happy to tell me about them. (Not that the PR claimed to fix them, of course. Just reporting on them now because they stand out in logs, as mentioned.) I removed them from the logs I pasted above.

I didn't collect latency data previously. I should have done before I popped this build on, but didn't think to. If I remember I'll post pings for the previous build in the next day or two, when I get around to putting it back on. This is what I'm seeing right now though:

Ping in:

PING 192.168.30.139 (192.168.30.139) 56(84) bytes of data.
64 bytes from 192.168.30.139: icmp_seq=1 ttl=64 time=0.239 ms
64 bytes from 192.168.30.139: icmp_seq=2 ttl=64 time=0.196 ms
64 bytes from 192.168.30.139: icmp_seq=3 ttl=64 time=0.199 ms
64 bytes from 192.168.30.139: icmp_seq=4 ttl=64 time=0.197 ms
64 bytes from 192.168.30.139: icmp_seq=5 ttl=64 time=0.224 ms
64 bytes from 192.168.30.139: icmp_seq=6 ttl=64 time=0.237 ms
64 bytes from 192.168.30.139: icmp_seq=7 ttl=64 time=0.274 ms
64 bytes from 192.168.30.139: icmp_seq=8 ttl=64 time=0.295 ms
64 bytes from 192.168.30.139: icmp_seq=9 ttl=64 time=0.289 ms
64 bytes from 192.168.30.139: icmp_seq=10 ttl=64 time=0.289 ms

--- 192.168.30.139 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 194ms
rtt min/avg/max/mdev = 0.196/0.243/0.295/0.043 ms

Ping out:

PING 192.168.30.50 (192.168.30.50) from 192.168.30.139: 56 data bytes
64 bytes from 192.168.30.50: seq=0 ttl=64 time=0.426 ms
64 bytes from 192.168.30.50: seq=1 ttl=64 time=0.352 ms
64 bytes from 192.168.30.50: seq=2 ttl=64 time=0.339 ms
64 bytes from 192.168.30.50: seq=3 ttl=64 time=0.398 ms
64 bytes from 192.168.30.50: seq=4 ttl=64 time=0.403 ms
64 bytes from 192.168.30.50: seq=5 ttl=64 time=0.318 ms
64 bytes from 192.168.30.50: seq=6 ttl=64 time=0.361 ms
64 bytes from 192.168.30.50: seq=7 ttl=64 time=0.361 ms
64 bytes from 192.168.30.50: seq=8 ttl=64 time=0.402 ms
64 bytes from 192.168.30.50: seq=9 ttl=64 time=0.367 ms

--- 192.168.30.50 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0.318/0.372/0.426 ms

I don't know a huge amount about WiFi down at that layer, so I don't know why I'm getting 0.3ms and @dbeinder is getting 3ms. Presumably the signal strength difference plays a role. Either that or I'm measuring it differently, because it's unlikely the speed of light is different for us. :) Regardless, it seems slightly slower to ping out from the OPiZero for me.

Signal strength seems good, I'm quite close to this particular router though:

wlan0     IEEE 802.11  ESSID:"xxxxxxxx"
          Mode:Managed  Frequency:2.457 GHz  Access Point: 14:DD:A9:D0:AB:70
          Bit Rate=65 Mb/s   Tx-Power=20 dBm
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:on
          Link Quality=70/70  Signal level=-30 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:2  Invalid misc:7   Missed beacon:0

I still see the same number of APs but I'm in a brick house with too many computers so I don't usually see neighbours' attenuated signals above the noise. Not with the little stock antenna on the thing, anyway. I'm not sure it's worth the effort of me testing this aspect any further. I could put a better antenna on and see if this PR changes the number of APs I see when I'm making a sensible effort to look for APs, but realistically I think if I start down that road I'm going to end up on the roof of the house waving my yagi around for fun. :) Which is all well and good, but realistically it'd be a distraction from other things I should be doing.

So, overall, I'm not seeing a huge difference. I've got a good WiFi signal though, so it's not surprising if the improvements are aimed at more tenuous links. There's a couple of things I don't have data for direct comparison, but I should have in the next few days.

@moonbuggy
Copy link

@moonbuggy moonbuggy commented Apr 19, 2020

I put the older build back on.

It still seems slow to authenticate/associate at startup, so that's no different.

Latency seems about the same, maybe a 10% improvement in incoming pings with the PR applied. this is what I'm seeing at the moment:

Ping in:

PING 192.168.30.139 (192.168.30.139) 56(84) bytes of data.
64 bytes from 192.168.30.139: icmp_seq=1 ttl=64 time=0.310 ms
64 bytes from 192.168.30.139: icmp_seq=2 ttl=64 time=0.278 ms
64 bytes from 192.168.30.139: icmp_seq=3 ttl=64 time=0.301 ms
64 bytes from 192.168.30.139: icmp_seq=4 ttl=64 time=0.299 ms
64 bytes from 192.168.30.139: icmp_seq=5 ttl=64 time=0.305 ms
64 bytes from 192.168.30.139: icmp_seq=6 ttl=64 time=0.234 ms
64 bytes from 192.168.30.139: icmp_seq=7 ttl=64 time=0.320 ms
64 bytes from 192.168.30.139: icmp_seq=8 ttl=64 time=0.224 ms
64 bytes from 192.168.30.139: icmp_seq=9 ttl=64 time=0.295 ms
64 bytes from 192.168.30.139: icmp_seq=10 ttl=64 time=0.298 ms

--- 192.168.30.139 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 217ms
rtt min/avg/max/mdev = 0.224/0.286/0.320/0.034 ms

Ping out:

PING 192.168.30.50 (192.168.30.50) from 192.168.30.139: 56 data bytes
64 bytes from 192.168.30.50: seq=0 ttl=64 time=0.460 ms
64 bytes from 192.168.30.50: seq=1 ttl=64 time=0.390 ms
64 bytes from 192.168.30.50: seq=2 ttl=64 time=0.398 ms
64 bytes from 192.168.30.50: seq=3 ttl=64 time=0.318 ms
64 bytes from 192.168.30.50: seq=4 ttl=64 time=0.387 ms
64 bytes from 192.168.30.50: seq=5 ttl=64 time=0.415 ms
64 bytes from 192.168.30.50: seq=6 ttl=64 time=0.418 ms
64 bytes from 192.168.30.50: seq=7 ttl=64 time=0.405 ms
64 bytes from 192.168.30.50: seq=8 ttl=64 time=0.376 ms
64 bytes from 192.168.30.50: seq=9 ttl=64 time=0.368 ms

--- 192.168.30.50 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0.318/0.393/0.460 ms

So, in conclusion, nothing's obviously working significantly better or worse for me. The changes made in the PR sound sensible though, I don't doubt they could make a difference for others. And they don't break anything that I can see, so there's no reason not to merge it that I'm aware of based on the tests I've done.

@fifteenhex
Copy link
Owner

@fifteenhex fifteenhex commented Apr 19, 2020

@moonbuggy Thanks for taking the time to test it out.

My opinion is if it doesn't make anything worse it should be merged.

@dbeinder
Copy link
Author

@dbeinder dbeinder commented Apr 19, 2020

@moonbuggy thanks for checking it out! I'm not sure about the auth retries, over here I'm getting auth on the first try even at -80dBm, on my version and the old build, so it probably really is a separate issue.

Those 10MBytes/sec+ are definitely through the wired interface, the XR819 supports only up to 802.11n 65MBit/s, and after protocol overheads, that's about 3.x MBytes/sec in the very best circumstances. The problem with testing it while on SSH through Ethernet is this: you now have two IP adresses on the same subnet on eth0 & wlan0. Linux and your router simply figure out that Ethernet works better and route all traffic through the wired connection even if you ping/iperf the IP of the wlan0 interface. So you'd have to disconnect Ethernet first.

I've done done some power measurements and it looks like powersave mode does save almost 500mW on idle - on a small board like the OPZ that can be 50% of the total power. So I tried to fix it instead of shutting it off, and I believe I got some improvements in ping times and dropped frames.

WiFi powersave works like this: the device signals to the AP that it will sleep, which will then buffer all traffic. Shortly before the next beacon from the AP (usually every 100ms) arrives, the client wakes up to receive it. The AP sets a bit in the beacon if there is buffered traffic for the client. If so, the client retrieves it from the AP. So incoming pings will always be delayed by by 0-100ms with powersave. But it is adaptive and will switch off if there is a lot of traffic. So you will only see this if you send pings spaced at least 200ms apart.

I found some code about beacon wakeup that looked really wrong, and I'd say it made powersave mode more reliable. I haven't seen the old problem that I couldn't ping from the outside until the XR819 sends a ping out by itself.

In any case, for those who know they'll have a frequent traffic, wouldn't benefit from powersave and would like instant incoming frames - I have revived the code to allow setting powersave from userspace as is usual for wifi drivers. It can now be set using iwconfig wlan0 power on/off, or permanently using NetworkManager.

I also backported a small change from cw1200 that was needed because of kernel API that broke unsecured networks. Other than getting rid of the whitespace and the commented-out exprimental portion in tx.c, everything new is in the powersave commit.
I have put the revised patches here: https://github.com/dbeinder/xradio/tree/revised

@dbeinder
Copy link
Author

@dbeinder dbeinder commented Apr 19, 2020

My only worry at the moment is, now that powersave is controlled by userspace, I've been unable to set it ON as default from the driver side. It seems on Armbian at least, without changing NetworkManager config, it will switch powersave off, and not honor this flag: dbeinder@c909de1#diff-2045016cb90d1e65d71c2407a2570927R291-R295

Total power consumption of my OPZ:
WiFI OFF: 600mW
Connected to WiFi & idle:

  • Current driver on fifteenhex/xradio: 920mW (slightly broken powersave)
  • My patch: 740mW (powersave ON)
  • My patch: 1220mW (powersave OFF)

In use, power consumption with my patch is the same as with the current version. So it is definitely an improvement, but I can already see people complaining of overheating if there is no way to change the default.

Edit: about "missed interrupt", unless the message happens 10-100x each second, this is probably not a real problem. Most likely, the interrupt happens while these lines are executed: https://github.com/fifteenhex/xradio/blob/master/bh.c#L765-L778 and then a we get the message simply because the IRQ happend while we were checking if me missed it.

@moonbuggy
Copy link

@moonbuggy moonbuggy commented Apr 20, 2020

@dbeinder, I had assumed 802.11n meant it had 150Mbps available so didn't think much of it when I saw 10MBps rates being reported. I was definitely instructing iperf3 and ping to use the wlan0 interface in the data I did paste in my comment but, yeah, if it's only capable of 65Mbps then something isn't right.

Now that I've got the LAN cable physically disconnected I'm getting very different results (this is the build without your PR applied):

[  5] local 192.168.30.50 port 40886 connected to 192.168.30.132 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   793 KBytes  0.77 MBytes/sec   20   21.4 KBytes
[  5]   1.00-2.00   sec   753 KBytes  0.74 MBytes/sec    2   22.8 KBytes
[  5]   2.00-3.00   sec   627 KBytes  0.61 MBytes/sec    2   21.4 KBytes
[  5]   3.00-4.00   sec   627 KBytes  0.61 MBytes/sec    2   22.8 KBytes
[  5]   4.00-5.00   sec   878 KBytes  0.86 MBytes/sec    6   21.4 KBytes
[  5]   5.00-6.00   sec   753 KBytes  0.74 MBytes/sec    2   22.8 KBytes
[  5]   6.00-7.00   sec   753 KBytes  0.74 MBytes/sec    2   24.2 KBytes
[  5]   7.00-8.00   sec   502 KBytes  0.49 MBytes/sec    4   22.8 KBytes
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 MBytes/sec    2   22.8 KBytes
[  5]   9.00-10.00  sec   125 KBytes  0.12 MBytes/sec    1   21.4 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.68 MBytes  0.57 MBytes/sec   43             sender
[  5]   0.00-10.52  sec  5.55 MBytes  0.53 MBytes/sec                  receiver

Ping in:

PING 192.168.30.132 (192.168.30.132) 56(84) bytes of data.
64 bytes from 192.168.30.132: icmp_seq=1 ttl=64 time=22.5 ms
64 bytes from 192.168.30.132: icmp_seq=2 ttl=64 time=27.1 ms
64 bytes from 192.168.30.132: icmp_seq=3 ttl=64 time=23.5 ms
64 bytes from 192.168.30.132: icmp_seq=4 ttl=64 time=27.10 ms
64 bytes from 192.168.30.132: icmp_seq=5 ttl=64 time=23.9 ms
64 bytes from 192.168.30.132: icmp_seq=6 ttl=64 time=20.3 ms
64 bytes from 192.168.30.132: icmp_seq=7 ttl=64 time=25.8 ms
64 bytes from 192.168.30.132: icmp_seq=8 ttl=64 time=21.9 ms
64 bytes from 192.168.30.132: icmp_seq=9 ttl=64 time=26.5 ms
64 bytes from 192.168.30.132: icmp_seq=10 ttl=64 time=30.10 ms

--- 192.168.30.132 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 19ms
rtt min/avg/max/mdev = 20.345/25.047/30.986/3.057 ms

Ping out:

PING 192.168.30.50 (192.168.30.50): 56 data bytes
64 bytes from 192.168.30.50: seq=0 ttl=64 time=40.684 ms
64 bytes from 192.168.30.50: seq=1 ttl=64 time=40.580 ms
64 bytes from 192.168.30.50: seq=2 ttl=64 time=32.076 ms
64 bytes from 192.168.30.50: seq=3 ttl=64 time=38.138 ms
64 bytes from 192.168.30.50: seq=4 ttl=64 time=36.456 ms
64 bytes from 192.168.30.50: seq=5 ttl=64 time=1024.813 ms
64 bytes from 192.168.30.50: seq=6 ttl=64 time=960.309 ms
64 bytes from 192.168.30.50: seq=7 ttl=64 time=1008.966 ms
64 bytes from 192.168.30.50: seq=8 ttl=64 time=40.649 ms
64 bytes from 192.168.30.50: seq=9 ttl=64 time=158.469 ms

--- 192.168.30.50 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 32.076/338.114/1024.813 ms

Basically, all the data I presented earlier is rubbish, it turns out. That's what I get for assuming software is using the interfaces I've told it to use and it claims it's using, I suppose. :) Apologies for that.

I have some different SBCs running multiple interfaces on the same subnet that are independent of each other but I set them up a while ago and had forgotten that I had to specifically configure them to be independent. So I didn't really think about that aspect of it when I was doing some quick and dirty tests on this OPiZero.

Given that my WiFi link, when tested sensibly, is performing significantly worse than yours (even with a strong signal) I wouldn't be at all surprised if I could now also see the improvements you saw. I'm now actually wondering if I've screwed something up in my kernel build that's crippling the WiFi beyond what you see in Armbian.

Unfortunately I won't have time for at least a few days to pop this PR back on and test it again. However, even though I wasn't doing valid throughput/latency tests, my conclusion that it didn't break anything remains true. It's now just a conclusion based on significantly less robust data. :)

Again, apologies for the nonsense. My focus so far hasn't really been on the WiFi, beyond successfully building the module against kernels 5+, and I've clearly not been giving it an appropriate level of thought when I do occasionally poke at it. I'm kind of annoyed and embarrassed by the outcome of this lack of thought.

Obviously don't wait on me to decide if it gets merged or not, but if things go a planned I should have some time to test it again towards the end of the week.

fifteenhex added a commit that referenced this issue Apr 20, 2020
[RFC] Fixes for long-standing bugs
@fifteenhex fifteenhex merged commit 4498d2b into fifteenhex:master Apr 20, 2020
@dbeinder
Copy link
Author

@dbeinder dbeinder commented Apr 20, 2020

@fifteenhex
Sorry, I guess what I did wasn't how it is supposed to be done on github, I should have force pushed to update the commits linked to the PR instead of creating a new branch and writing a link to it in the comments.
Do you want me to rebase the newest changes?
https://github.com/dbeinder/xradio/commits/revised

@moonbuggy No worries, I've been bitten by this myself and couldn't find a neat solution either. I think one other problem with setting the interface in ping/iperf is that even that works, you still have no control over the path the returning packets take.

@fifteenhex
Copy link
Owner

@fifteenhex fifteenhex commented Apr 20, 2020

I'll reset master to the point before your commits so you can create a new pull request if that's easier?

@dbeinder
Copy link
Author

@dbeinder dbeinder commented Apr 20, 2020

Sure, that'd be ideal. I don't think this repo is active enough to inconvenience anyone ;)

@fifteenhex
Copy link
Owner

@fifteenhex fifteenhex commented Apr 20, 2020

It's reset now.

@moonbuggy
Copy link

@moonbuggy moonbuggy commented Apr 22, 2020

No worries, I've been bitten by this myself and couldn't find a neat solution either.

@dbeinder, unrelated to this radio module, but the case where I have multiple interfaces on a device on the same subnet involves multiple MACVLAN interfaces. I'm still a bit busy and distracted, so I haven't thought that much about if it makes sense in this specific scenario, but to ensure each of those interfaces only uses the desired IP/MAC required some ARP settings to be changed. I thought I'd mention it on the off chance it was a neat solution for you in some applications.

Anyway, /etc/sysctl.d/50-macvlan-arp.conf on my NanoPi:

net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.rp_filter = 2

That's the configuration I'd done but forgotten about, that I mentioned earlier. It's not often that I'm messing about with the link layer, which is probably why it didn't pop into my mind before. Basically, I know ARP is a thing, but for the most part I just leave it to sort itself out. :)

Like I say though, I haven't properly thought about it. It works for my MACVLAN interfaces, I don't know first-hand if it directly translates to physical interfaces. It obviously won't help if the return path is determined upstream of the interfaces and is incorrect (my MACVLAN interfaces are, of course, all attached to the same copper as the the physical interface so there's only a single physical return path involved).

It's something I'm planning to have a play with when I next get a chance to look at my OPiZero (which, btw, now looks like it will be next week, rather than the end of this week).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants