Inconsistent TURN throughput #1036
Replies: 4 comments 1 reply
-
|
I'm glad libdatachannel has been useful to you!
I assume the video is generated at a given rate and carried as non real-time data over data channels. As a side note, this can be problematic if you want to play the video in real-time at reception, especially in reliable mode, as if the data channel can't achieve the rate through the network at some point, it will start to buffer data to be sent, introducing uncontrolled delays. Is there a reason not to use the actual media transport? About the throughput issue itself, what is the OS? Is the issue for a specific OS or is it independent?
If swapping the servers does not change the behavior, I guess the issue is probably somewhere else. TCP transport should be avoided in practice, especially for running data channels, it's really just a hack for networks blocking UDP and it may not perform well.
Are you referring to the receive rate? Could you please post the output? The buffer size setting in the client (corresponding to the buffer amount threshold for the data channel) should only have marginal impact on network behavior, even if it can help sustain the throughput to set it higher than zero as it makes sure the application always sends new data before the data channel has nothing to send anymore. When you run tests, what do you compare the TURN fluctuating throughput to? For instance, the throughput across the Internet is expected to be lower than on a local network, which in itself will be somewhat lower than on the same machine. Are you sure the issue happens only when using a TURN server, and not between two peers across the Internet? What is the RTT between clients and TURN server? Just pings from clients to TURN server would give an idea. Is it consistent with what client-benchmark reports?
You could start by checking CPU usage on clients, to ensure a core is not maxed out. Especially if your connection has high capacity, depending on your OS and system settings, you could then try to increase the maximum UDP socket buffer size to 1MB on all machines and use libjuice/violet to test (for Linux it can be done with Eventually, if the sum of RTTs is higher than a few 10s of ms, you could try to increase the SCTP buffers used by data channels with rtc::SctpSettings settings;
settings.recvBufferSize = 10 * 1024 * 1024;
settings.sendBufferSize = 10 * 1024 * 1024;
rtc::SetSctpSettings(settings);You may also want to test with current master to see if it changes the behavior.
Only the API was refactored to make it more consistent with the Javascript WebRTC API, the actual behavior should not change. However, independently of the TURN throughput issue, you should definitely look into it (especially |
Beta Was this translation helpful? Give feedback.
-
|
Thanks very much for your response. I should've clarified a few things. We transmit live and recorded video over data channels. In this case, the customer is reporting fluctuating live frame rate, which we attempt to deliver in as close to real time as possible. ie. We monitor the buffered amount and "skip" segments as needed. We are very interested in implementing audio and video tracks in the future and appreciate the streamer example. We ship Ubuntu Server on our appliances, but we have tested client-benchmark on serveral macos and ubuntu client machines. Agreed, TCP transport was a "long shot" we could test with minimal code changes after eliminating our coturn server and reproducing the issue using xirsys. Below are the results I captured from our existing coturn server and an upgraded droplet. This is (slightly modified) client-benchmark v0.18.5. Below are results after switching to my linux box and swapping out coturn for violet on the same droplet. We also tested with v0.19.3 and with 1MB buffer size. The first run was promising, but the next run was like the others. Finally, I tested with violet on the same machine and then on my local network, but I didn't capture the output beyond what I described initially. When we run tests, we're comparing the throughput across each second of the run. Agreed regarding the same machine vs. local network vs. internet throughput, but the worsening fluctuation is the issue. This has us suspecting latency. This customer has no issue with the video performance on their local network, but I'm happy to compare the tests. A quick ping test of our upgraded droplet shows similar RTT times as client-benchmark. ~50ms I will rerun the client-benchmark test a few more times and watch CPU more closely. I will adjust the maximum UDP socket buffer size too. I also need to run these tests with v0.19.3 and/or master on an ubuntu 22.04 machine I setup over the weekend. After I posted, we started down the latency path and came accross a few issues/discussions that lead us congestion control. #354 and sctplab/usrsctp#297 To test SCTP_CC_HTCP, we upgraded our application image from bullseye to bookworm to get the latest libusrsctp and ran a speed test (1MB blob) in our player. Dan did report 70-200ms average 100ms RTT in this test. We also tried setting |
Beta Was this translation helpful? Give feedback.
-
If you want to be close to real-time, you should also make the datachannel unreliable, preferably with
Thank you for the results. Variations between each second are expected as other traffic might share the same network path during the test, especially at such a high RTT. The higher the RTT, the longer the recovery time after a congestion event. Have you tried deploying the TURN server closer to the peers to reduce RTT? Or have you compared with a scenario involving two peers across the Internet by any chance? To me, it looks like the throughput has a pattern of increasing for a while seconds then dropping randomly, which might indicate spurious packet loss. It would be interesting to look at runs with larger UDP send buffer as small UDP buffer might cause such losses.
Keep me updated about the results!
The conclusion of the congestion control discussion in #354 was actually that alternative congestion control algorithms in libusrsctp are not well tested and should be avoided for the sake of performance, so you should stick to the default one. Debian does not ship the latest libusrsctp, it looks like the version shipped in bookworm is 0.9.5.0, which is nearly 3 years old. I guess that you set
As I understand from the graphs you send to a Chrome browser, while the previous tests were performed with client-benchmark. Tuning the send buffer size probably won't have any effect when sending to a browser as you'll end up constrained by the receive buffer size on browser side (which is 1MB if I'm not mistaken). Have you tested setting the SCTP send and receive buffers to 10MB on both sides with client-benchmark? |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the suggestion regarding real-time data channels and confirming tracks are the way to go. Sorry for the libusrsctp confusion. We did bookworm to test with the latest version of libnice and I completely forgot that libusrsctp is a sub-module. We have never compiled with system usrsctp so it was a no op. I reset and went through 6 different test scenarios with today's version of master. Great point about browser. This is all client-benchmark with one side send only. 2023-11-21
libdatachannel master First practice run showed fluctuations down and up and cpu was 10-20%. 2023-11-21-01.txt baseline, cpu 5-15% sudo sysctl -w net.core.rmem_max=1048576 2023-11-21-04.txt even better, similar cpu Switched from coturn to violet on same droplet 2023-11-21-07.txt starts strong, craters root@coturn-nonprod-highperf: 2023-11-21-10.txt pretty bad Increase sctp send and recv buffer sizes to 10MB 2023-11-21-13.txt still bad diff.patch contains all client-benchmark changes Restored default sctp and udp buffer sizes 2023-11-21-17.txt much higher bandwidth, cpu 55-65%, 15% fluctuation Even with the larger UDP and SCTP buffer sizes, we still seem to encounter congestion events and struggle to recover. Short of deploying the TURN server even closer, is there anything else we should try before switching to tracks or unreliable data channels? I have not tested/compared with two peers across the Internet yet. Thanks for your time and consideration on this. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Hi Paul-Louis,
I'm an engineer at Qumulex, Inc. and we have been using libdatachannel in our commercial physical security platform for several years now. Overall, it has performed very well and we appreciate all the updates and fixes that come along.
Recently, one of our customers reported fluctuating video performance when outside their firewall. We learned the traffic was being turned through our TURN server, which is coturn running on a digital ocean droplet. We began troubleshooting and ended up upgrading our droplet to dedicated cpu w/10Gbps network, swapping out juice for nice, enabling tcp transport, evaluating xirsys, swapping out coturn for violet, etc. and could not get consistent throughput.
Finally, we started testing with the client-benchmark example and observed the same fluctuating throughput. (We tweaked it to accept a TURN server and only do relay.) Over a 60s run, it starts out very well (1000's of KB/s), but then decreases (100's of KB/s) and struggles to recover. We experimented w/buffersize, which seemed to increase total throughput, but still saw drops. When I run client-benchmark and violet on the same box, throughput is very high (10,000 of KB/s) and consistent. When I move violet to another box on my home network, it becomes less consistent and overall throughput goes down 25%.
Are there any other troubleshooting tips you would recommend? We transmit h.264, so we haven't really experimented with unordered/reliability. I did notice it was recently refactored. I believe my colleague Luke P. has reached out to you on discord regarding this, but I wanted to add a little more detail. I'm also happy to provide the client-benchmark diff and output from test runs. Lastly, we ship v0.18.5, but tested with v0.19.3 and violet v0.4.4.
Thanks in advance,
TJ
Beta Was this translation helpful? Give feedback.
All reactions