-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCTP throughput ~8x slower than TCP when the connection has latency #218
Comments
I found this issue from a few years back which looks potentially related? #62 |
Fantastic testing @iamcalledrob thanks for reporting! I don't know the latest but @enobufs mentioned he was looking at adding RACK to the implementation. Maybe some other things exist that we can do in the short term? I defer to him on what the best thing to do is :) |
@iamcalledrob Thanks for the detailed report. As I mentioned in #62, a modern TCP implementation automatically adjust receive buffer size, while pion/sctp uses a fixed buffer size, which is currently set to 1024KB. The receive buffer size sets the hard limit for the amount of data inflight (or the un-ack'd segments). It appears the throughput you observed is the result of this limit. I think pion/sctp should implement this auto tuning feature. Let me dig into more information about its algorithm. |
@Sean-Der, @enobufs Thank you both for looking into this! @enobufs regarding the buffer size, I was curious about that as well and decided to do some more testing! I benchmarked the effect of changing the buffer size via your Here's what I saw with round-trip latency of 300ms
As you can see, the throughput maxes out with a buffer size of 256kb—smaller than the default 1mb! Increasing the buffer size beyond 256kb has no effect. I'm curious about this—I would have assumed that manually setting a larger buffer would have improved throughput in this particular test, but that has not been the case. I'm happy to help with getting to the bottom of this, though I have to admit I'm not coming from a place of expertise here so pointers are very helpful :) I've attached trace logs for both the client and server in case it's helpful context. |
That indicates there would be another limiting factor. From my quick glance, the sender's congestion window kept growing during the 10 sec period but it appears too slow - smell of a bug... I may be wrong. Let me look at this later more carefully. |
I just found a bug in sctptest. The value with the What's going on is, the client side should have a ssthresh (slow start threshold) which should initially be set to the receive buffer window advertised by the peer. This value did not change even though I was increasing the value with |
After fixing the above in sctptest, I was still seeing the throughput was still capped at 1MB receive buffer size. (increasing the size beyond 1MB had no effect). From the trance log, I noticed that even though there's more room in the congestion window, the inflightQueue was not fully utilized. Then I realized, this was because sctptest has maxBufferedAmount set to 1MB (threshold being 512KB, both hardcoded), which was effectively limiting the send buffer size to 1MB. This is why the throughput did not grow beyond 1MB of receive buffer size. There was another related issue. The bufferedAmount in the current pion/sctp includes the number of bytes inflight. Looking at the FireFox's implementation, it does not include the size inflight (the size written to usrsctp). After fixing the all above, now I see the throughput grows proportionally to the size of buffers as shown below: I will write some unit tests then submit a pull-request. Once all these fixes land, we should consider whether we should dynamically adjust the receive buffer size. |
Following updates have been made:
TODO:
|
Thank you for looking into this! I'm excited to see what the throughput looks like when inflight size is excluded :) |
With your updated version of go-rudp/sctp-test (passing buffer size into pion/sctp assoc), I see dramatically better throughput. For the curious:
This is faster than TCP. Very excited for the use-cases this unlocks for pion/sctp! |
That's so exciting! Nice work @iamcalledrob and @enobufs :) We should add this to the |
@Sean-Der I think it is a brilliant idea to allow changing the send/recv buffer size from the SettingEngine as I think it may take sometime for us to implement auto buffer size tuning. |
Relates to pion/sctp#218
Relates to pion/sctp#218
Relates to pion/sctp#218
@enobufs I'm curious if the commit to exclude the in-flight size is ready to land (or could be!) I've been testing datachannels with the larger buffer offered through the SettingEngine, but am still seeing very low throughout when there's even a tiny amount of packet loss or latency. Testing with sctptest over LAN and the network link conditioner looked really positive--near 100% link utilisation. However in the real world I'm still seeing <20% link utilisation over WiFi or coast-to-coast levels of latency. For example, I have a TURN server which is 80ms away. iperf with a few TCP streams will get 600+Mbps throughput. PION datachannels with an 8mb buffer will get 20Mbps throughput. I was hoping the larger buffer would be a decent enough fix by itself in the "real world", but that doesn't seem to be the case! |
Relates to pion/sctp#218
v1.8.2 does v1.8.3 fixed |
@Sean-Der Is MaxReceiveBuffer arbitrary or is it supposed to match a UDP socket configuration? What are the tradeoffs if the value is arbitrary? |
Relates to pion/sctp#218
Your environment.
What did you do?
Test setup:
I ran
sctp-test
in both TCP and SCTP (sctp-test refers to this as "udp") modes and compared the throughput with varying levels of latency:./sctptest -network tcp -s <addr>
[client] and./sctptest -network tcp
[server]./sctptest -network udp -s <addr>
[client] and./sctptest -network udp
[server]What did you expect?
I expected the TCP and SCTP throughput to be in the same ballpark.
What happened?
SCTP overwhelmingly underperformed TCP. Approximate performance over gigabit ethernet:
Background
I ran into this performance issue when trying to diagnose why DataChannel performance was very slow when using a TURN server. My TURN server is about 100ms away, and I was seeing very poor performance (< 10 Mbps) relative to the speed of the internet connection.
SCTP.throughput.and.latency.mp4
The text was updated successfully, but these errors were encountered: