New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds WSARecvFrom as a Windows fallback #882
Conversation
Thanks for the contribution! Before we can merge this, we need @lewishazell to sign the Salesforce Inc. Contributor License Agreement. |
b8df553
to
48ab464
Compare
I've now tested this implementation on a faster link between two servers and it's capable of receiving over 200Mb/s.
This is slightly slower than the actual link which is 250Mb/s but I think the bottle-neck is the CPU on the low powered Linux host. |
This does look promising! I have been spending some time lately looking at Windows performance again. I have an old branch that I am preparing a pull request for that does registered io on Windows. The trouble with these APIs are that they were introduced in ~Windows 8.1/Server 2012R2. This is where your PR comes in handy. Overlapped IO has been in Windows long before 7 and its performance is close to RIO in general. It is a fantastic fallback over the current golang stdlib implementation and captures the remaining Windows market that RIO misses. I'm far from a Windows expert though and when I was testing this PR I noticed a couple of issues.
With all that said, it would be fantastic to land this PR. Any chance you can join the Nebula OSS Slack group? |
Thanks for checking it out @nbrownus! You definitely caught some things I didn't see. I had a feeling you would mention the backwards compatibility and I agree it would make a good fallback here. I noticed there is a udp-interface branch, would this be required for runtime fallback functionality? I'm also not a Windows (or golang) expert and just trying to help fix an issue we face! That also explains some of the oddities you see. There isn't any particular reason I chose I'll join the slack group and try to find some time 😄 |
Hey @nbrownus, I looked into the issues. I was able to reproduce the high re-transmission rate by setting up a test network between a Windows and WSL. As it was, I was seeing up to 1500 re-transmissions per second. Fixing the code by making use of the The real improvement happened when I tweaked the buffer sizes in the config file. Windows' default buffer size is much too small here. For me, I also haven't seen Please see the pushed changes 👍 |
I left a load test running for a few hours and saw an error when attempting to write
I'm not sure creating and closing handles so frequently is a great idea and only made the code more complicated. I've reimplemented |
@lewishazell Do you have a fork/build that I can test on Server 2012R2? I'm testing a nightly that includes #905 but I'm still stuck around 80 MPS according to netcps. Without Nebula, I'm hitting almost 300 MPS. |
Thanks for the contribution! Unfortunately we can't verify the commit author(s): Lewis Hazell <l***@p***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, sign the Salesforce Inc. Contributor License Agreement and this Pull Request will be revalidated. |
This initial implementation only supports IPv4 - Uses WSA methods as an equivalent to SYS_RECVMMSG - Respects configuration values for buffers and batch sizes
- Refactor ReadMulti to read the buffer at the index given by WSAWaitForMultipleEvents
- WSASendTo method eventually gave "WSAWaitForMultipleEvents: winapi error slackhq#6" under load - Simplified code with similar performance
…ws-wsa-fallback
5857720
to
8980eb2
Compare
Hey @robdplatt, my branch was a bit behind so I've merged in changes. I'll just test it out and get a build ready. |
@robdplatt I've created a release on my fork including the WSA fallback. From my testing on 2012R2 this evening, I'm not sure my changes will help. When RIO isn't supported, the following will appear in the logs:
I didn't see this and saw speeds of almost 300Mbps (over WiFi, don't have a great test rig to hand) which suggests RIO is supported in 2012R2. My changes will only help if you see the above message. |
Thank you for sharing these changes. However, I'm not seeing those speeds yet, even between two Windows 11 instances. I also didn't see "Falling back to standard udp sockets" in the logs when testing between two Server 2012 R2 instances. Maybe I'm missing something. Did you happen to specify the buffer sizes in your config? |
No problem! That's strange. As far as I know, RIO (which Windows 11 will use) doesn't use the buffer size config so, if you're not seeing any fallbacks, buffer size won't help. For WSA, I used buffer sizes of You could also try adjusting the MTU if you haven't already. That should apply to all three methods so might be your best bet here. |
Changing the MTU settings seems to make it worse. Your build and the nightly are giving me the same results. Win11 <--> Win11 on the same switch is getting me about 50MB/s. Without Nebula, I'm getting 100MB/s. Win11 <--> Server2012 R2 (again same switch) drops to about half that. ~25MB/s. This is, however, a lot better than the 5MB/s I was getting last year or so. For these tests I disabled the lighthouse and relays to ensure the clients were talking directly to each other. I don't see any strange errors in the logs either. |
Thanks for the work here. I want to apologize that this never made it into a release. However, the next release of Nebula, v1.9.0, will be dropping support for any version of Windows older than Windows 10 / Server 2016 - this is done because Go 1.21 has also dropped support for older versions. Nebula already has support for RIO which should improve performance for Windows 8+. Therefore, merging this PR would not improve the situation. |
As described in #589, the performance on Windows is very slow. From my experience, this seems to affect reading more than sending. This issue was partially solved in #410 with increased buffer sizes although this didn't seem to do much on my machine.
Looking at the Linux specific socket code, I saw there are a number of optimizations. It uses system calls directly instead of using the standard library and allows batching by making use of
SYS_RECVMMSG
. So, this pull request makes use of the closest Windows equivalent,WSARecvFrom
, to allow batching. It also respects buffer sizes from thelisten.write_buffer
andlisten.read_buffer
configuration values.My home connection is 50-60Mb/s and I am now able to consistently run iperf3 at full speed with these modifications:
Versus almost 4Mb/s with the recently released v1.7.1:
It may be worth testing this implementation with a faster connection but these results are very promising.