private-octopus / picoquic Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cubic performance drops on Satellite link with random packet losses #1221
Comments
|
From looking at the logs, I think I know what is happening. I see on the tests that the Window drops immediately after a packet loss. It will eventually converge to something like 1/loss-rate packets, which is way too slow. I think that the TCP implementation of Cubic in Linux has a some kind of filter, and it only reduces the window if it sees several packet losses. That's a way to disambiguate between "congestion" losses that should trigger a slow down and "random" losses caused by transmission issues that should not. My implementation of Cubic does not do that, that's a bug that should be fixed. |
|
Can you look at ACK frequency on return link ? A difference I see between QUIC and TCP is the ACK frequency. If your QUIC stack try to minimize acknowledgements flow to 1:10 or less (like with lsquic implementing draft-ietf-quic-ack-frequency with target of 1 ACKs per RTT), it could explain a burst of loss but more importantly, does not "sense" congestion by the increase of RTT at the right time. |
|
@FlavienRJ Please look at PR #1222. It fixes the issue by just inserting a filter on the packet loss events, so that isolated losses or double losses do not affect congestion control. This also fixes the variability issues found with Hystart, because early losses were causing early exit from Hystart. The ack frequency does not affect matters much, as long as ACKs are not too far apart. |
|
Fixed in PR #1222 |
On 8/19/2021 3:38 AM, PETROU Matthieu wrote:
I’m currently running some tests to compare the download time of QUIC (picoquic) and TCP (iperf3) on a satellite connection, and I saw some wide differences in the results that I wanted to share with you.
I use OpenSAND to emulate the satellite link. The bandwidth between the server and the client is 12 Mb/s on the forward link, and 3 Mb/s on the return link. The buffer size is fixed on the gateway to a bit more than BDP (1.2 times the BDP). I emulate packet losses with time traces, generated with the modele Gilbert and Elliot, with p = 0.01 and q = 0.167. Overall, there is 2.56% of loss time with an average burst length of 16.70ms and a maximum burst length of 43ms.
For the tests, I download either 20 MB of data with iperf3 and picoquic. Either there is only one flow which downloads 20 MB, or five concurrent flows, with each one of them downloading 20 MB. To be able to compare the concurrent flows, I launch 5 different iperf3 and picoquic threads for each flow (on different ports). Both QUIC and TCP are using CUBIC and have hystart disabled. I’m using iperf 3.7+ (cJSON 1.5.2) and picoquic v0.34f.
Here is the download time medians and the standard deviation of 30 tests in those conditions :
If I remove all link losses, here is the download time medians and the standard deviation of 30 tests :
With those results, it seems that QUIC (picoquic) is way more impacted by losses on a satellite link. However, I don’t really understand why, as both are using CUBIC algorithm and hystart is disabled.
I launched 30 tests in the same condition with CUBIC and hystart enabled, on both TCP and QUIC, here is the results:
The results are worse for TCP CUBIC with Hystart. For QUIC : worse with only one flow, and slightly better with five concurrent flows.
Of course, there are packet losses without Hystart, mainly because the slow start needs to overshoot to get out of the start phase. Hystart would avoid that with the measurement of the RTT which increases with buffer filling. However, the issue is that the flow goes out of the start phase when the RTT is greater by 16 ms than the minimum RTT. With a satellite, the jitter is important and a difference of 16 ms is easily reached. That’s why we decided to disable hystart for our tests with TCP and QUIC. We’re aiming more about the download time than packet losses.
The text was updated successfully, but these errors were encountered: