Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCTP throughput ~8x slower than TCP when the connection has latency #218

Open
iamcalledrob opened this issue Mar 26, 2022 · 15 comments
Open

Comments

@iamcalledrob
Copy link

iamcalledrob commented Mar 26, 2022

Your environment.

  • Version: sctp v1.8.2
  • Other Information: tested between macOS<–>macOS and macOS<–>debian bullseye

What did you do?

Test setup:

  • Throughput tested with @enobufs's sctptest, dependencies updated to use the latest version of pion/sctp
  • Latency controlled using macOS "Network Link Conditioner"
  • Tested between separate physical devices connected by gigabit ethernet

I ran sctp-test in both TCP and SCTP (sctp-test refers to this as "udp") modes and compared the throughput with varying levels of latency:

  • For TCP: ./sctptest -network tcp -s <addr> [client] and ./sctptest -network tcp [server]
  • For SCTP: ./sctptest -network udp -s <addr> [client] and ./sctptest -network udp [server]

What did you expect?

I expected the TCP and SCTP throughput to be in the same ballpark.

What happened?

SCTP overwhelmingly underperformed TCP. Approximate performance over gigabit ethernet:

Latency TCP SCTP
2ms 850 Mbps 850 Mbps
20ms 800 Mbps 200 Mbps
50ms 600 Mbps 85 Mbps
100ms 320 Mbps 45 Mbps
300ms 105 Mbps 17 Mbps

chart

Background

I ran into this performance issue when trying to diagnose why DataChannel performance was very slow when using a TURN server. My TURN server is about 100ms away, and I was seeing very poor performance (< 10 Mbps) relative to the speed of the internet connection.

SCTP.throughput.and.latency.mp4
@iamcalledrob
Copy link
Author

I found this issue from a few years back which looks potentially related? #62

@Sean-Der
Copy link
Member

Fantastic testing @iamcalledrob thanks for reporting!

I don't know the latest but @enobufs mentioned he was looking at adding RACK to the implementation. Maybe some other things exist that we can do in the short term? I defer to him on what the best thing to do is :)

@enobufs
Copy link
Member

enobufs commented Mar 27, 2022

@iamcalledrob Thanks for the detailed report. As I mentioned in #62, a modern TCP implementation automatically adjust receive buffer size, while pion/sctp uses a fixed buffer size, which is currently set to 1024KB. The receive buffer size sets the hard limit for the amount of data inflight (or the un-ack'd segments). It appears the throughput you observed is the result of this limit. I think pion/sctp should implement this auto tuning feature. Let me dig into more information about its algorithm.

@iamcalledrob
Copy link
Author

iamcalledrob commented Mar 28, 2022

@Sean-Der, @enobufs Thank you both for looking into this!

@enobufs regarding the buffer size, I was curious about that as well and decided to do some more testing!

I benchmarked the effect of changing the buffer size via your sctp-test utility—i.e. sctp-test -network udp -b N. I believe this sets both the read and write buffer sizes to 'N bytes' on the SCTP association as well as the underlying UDP conn.

Here's what I saw with round-trip latency of 300ms

Recv buffer size Throughput
16kb 0.5 Mbps
64kb 13 Mbps
256kb 17 Mbps
1mb 17 Mbps
4mb 17 Mbps

As you can see, the throughput maxes out with a buffer size of 256kb—smaller than the default 1mb! Increasing the buffer size beyond 256kb has no effect.

I'm curious about this—I would have assumed that manually setting a larger buffer would have improved throughput in this particular test, but that has not been the case.

I'm happy to help with getting to the bottom of this, though I have to admit I'm not coming from a place of expertise here so pointers are very helpful :)

I've attached trace logs for both the client and server in case it's helpful context.

client.log
server.log

@enobufs
Copy link
Member

enobufs commented Mar 29, 2022

That indicates there would be another limiting factor. From my quick glance, the sender's congestion window kept growing during the 10 sec period but it appears too slow - smell of a bug... I may be wrong. Let me look at this later more carefully.

@enobufs
Copy link
Member

enobufs commented Apr 3, 2022

I just found a bug in sctptest. The value with the -b option for the server does not propagate down to pion/sctp :( .

What's going on is, the client side should have a ssthresh (slow start threshold) which should initially be set to the receive buffer window advertised by the peer. This value did not change even though I was increasing the value with -b, then found this bug. I will fix it shortly.

@enobufs
Copy link
Member

enobufs commented Apr 3, 2022

After fixing the above in sctptest, I was still seeing the throughput was still capped at 1MB receive buffer size. (increasing the size beyond 1MB had no effect).

sctptest RTT=300ms
Screen Shot 2022-04-03 at 1 40 29 AM

From the trance log, I noticed that even though there's more room in the congestion window, the inflightQueue was not fully utilized. Then I realized, this was because sctptest has maxBufferedAmount set to 1MB (threshold being 512KB, both hardcoded), which was effectively limiting the send buffer size to 1MB. This is why the throughput did not grow beyond 1MB of receive buffer size.

There was another related issue. The bufferedAmount in the current pion/sctp includes the number of bytes inflight. Looking at the FireFox's implementation, it does not include the size inflight (the size written to usrsctp).

After fixing the all above, now I see the throughput grows proportionally to the size of buffers as shown below:
Screen Shot 2022-04-03 at 2 09 41 AM

Other related metrics:
Screen Shot 2022-04-03 at 2 18 25 AM

Screen Shot 2022-04-03 at 2 18 46 AM

I will write some unit tests then submit a pull-request.

Once all these fixes land, we should consider whether we should dynamically adjust the receive buffer size.

@enobufs
Copy link
Member

enobufs commented Apr 3, 2022

Following updates have been made:

  • go-rudp v0.0.3
  • sctp-test v0.0.2

TODO:

  • pion/sctp: exclude inflight size from bufferedAmount

@iamcalledrob
Copy link
Author

Thank you for looking into this! I'm excited to see what the throughput looks like when inflight size is excluded :)

@iamcalledrob
Copy link
Author

With your updated version of go-rudp/sctp-test (passing buffer size into pion/sctp assoc), I see dramatically better throughput.

For the curious:

RTT Recv buffer size Throughput
20ms 1MB 370Mbps
20ms 2MB 700Mbps
20ms 4MB 880Mbps
--- --- ---
50ms 1MB 150Mbps
50ms 2MB 300Mbps
50ms 4MB 590Mbps
50ms 8MB 800Mbps
--- --- ---
300ms 1MB 25Mbps
300ms 2MB 55Mbps
300ms 4MB 100Mbps
300ms 8MB 200Mbps
300ms 24MB 310Mbps
300ms 32MB+ (Unreliable)

This is faster than TCP.
To repro these tests, I had to make a small fix to cap conn.SetReadBuffer/SetWriteBuffer to ~5mb for mac (the OS UDP limit), while allowing the sctp buffer to be the specified size

Very excited for the use-cases this unlocks for pion/sctp!

@Sean-Der
Copy link
Member

That's so exciting! Nice work @iamcalledrob and @enobufs :)

We should add this to the SettingEngine today and publish the findings!

@enobufs
Copy link
Member

enobufs commented Apr 20, 2022

@Sean-Der I think it is a brilliant idea to allow changing the send/recv buffer size from the SettingEngine as I think it may take sometime for us to implement auto buffer size tuning.

enobufs added a commit that referenced this issue Apr 23, 2022
enobufs added a commit to pion/webrtc that referenced this issue Apr 23, 2022
enobufs added a commit to pion/webrtc that referenced this issue Apr 23, 2022
enobufs added a commit to pion/webrtc that referenced this issue Apr 28, 2022
enobufs added a commit to pion/webrtc that referenced this issue Apr 28, 2022
enobufs added a commit to pion/webrtc that referenced this issue Apr 28, 2022
enobufs added a commit to pion/webrtc that referenced this issue Apr 28, 2022
@iamcalledrob
Copy link
Author

@enobufs I'm curious if the commit to exclude the in-flight size is ready to land (or could be!)

I've been testing datachannels with the larger buffer offered through the SettingEngine, but am still seeing very low throughout when there's even a tiny amount of packet loss or latency.

Testing with sctptest over LAN and the network link conditioner looked really positive--near 100% link utilisation.

However in the real world I'm still seeing <20% link utilisation over WiFi or coast-to-coast levels of latency.

For example, I have a TURN server which is 80ms away. iperf with a few TCP streams will get 600+Mbps throughput. PION datachannels with an 8mb buffer will get 20Mbps throughput.

I was hoping the larger buffer would be a decent enough fix by itself in the "real world", but that doesn't seem to be the case!

@suconghou
Copy link

@enobufs

v1.8.2
poor performance than tcp in a tiny amount of packet loss or latency

does v1.8.3 fixed pion/sctp: exclude inflight size from bufferedAmount ?
or improved it .

@MousyBusiness
Copy link

@Sean-Der Is MaxReceiveBuffer arbitrary or is it supposed to match a UDP socket configuration? What are the tradeoffs if the value is arbitrary?

ourwarmhouse added a commit to ourwarmhouse/Smart-Contract---Go that referenced this issue May 1, 2024
ourwarmhouse added a commit to ourwarmhouse/Smart-Contract---Go that referenced this issue May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants