Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Congestion controller fixes #975

Merged
merged 1 commit into from
Jan 15, 2021
Merged

Congestion controller fixes #975

merged 1 commit into from
Jan 15, 2021

Conversation

Matthias247
Copy link
Contributor

This change fixes 2 issues in the congestion controller:

Extra slow slow start

The congestion controller had an issue where it didn't ramp up by
doubling the window in the slow start phase as expected. The reason
here was the usage of the "congestion_blocked" field, which aims to
remove the avoid growing the window when there is no data to send.
The issue with the field was that the first incoming ACK for every batch
would increase the congestion window, which would let the next
call to congestion_blocked() return false, and thereby let the
congestion controller ignore all of the following ACKs in a received
batch up the point where the next poll_transmit() call is made and
the window is filled again. Since often all ACKs for one round-trip
arrive in a single batch that means only the first ACK had an effect
and the others where ignored -> which lead to increasing the
window by 1 MTU instead of doubling it in the slow start phase.

The new approach introduces a new app_limited field which
caches whether the last poll_transmit attempt couldn't produce
data because there was no application data available.

Numeric precision issue in congestion avoidance

The formula

self.window += self.config.max_datagram_size * bytes / self.window;

which is specified in the RFC requires floating point arithmetic.
If integer arithmetic is used, the multiplication of the first 2 values
yields about 1.7MB for 1300 byte packets. As soon as the window is
bigger than this value, the window won't change at all due the
division leading to 0.

The new implementation follow this guidance from the quic recovery
specification:

In congestion avoidance, implementers that use an integer
representation for congestion_window should be careful with division,
and can use the alternative approach suggested in Section 2.1 of
[RFC3465].

Copy link
Collaborator

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but it probably also needs a review from @Ralith.

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is much needed!

if self.bytes_acked >= self.window {
self.bytes_acked -= self.window;
self.window += self.config.max_datagram_size;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be helpful to cite the RFC we got this logic from.

@@ -422,6 +428,8 @@ where
|| self.path_response.is_some()
});

let mut was_congestion_blocked = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere (e.g. handle_packet) we use "was_*" to refer to the state prior to running some logic which may change that state, whereas this is referring to the state after we're done. Could we say is_ here instead? Also, maybe transport_blocked (as opposed to application) since pacing can be responsible as well as congestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with removing was_. But I think even with pacing congestion_blocked is fine, because pacing is a form of congestion control

if self.window >= self.ssthresh {
// Exiting slow start
// Initialize `bytes_acked` for congestion avoidance
self.bytes_acked = self.window - self.ssthresh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this logic based on? Also, style nit:

Suggested change
self.bytes_acked = self.window - self.ssthresh
self.bytes_acked = self.window - self.ssthresh;

This change fixes 2 issues in the congestion controller:

Extra slow slow start
===

The congestion controller had an issue where it didn't ramp up by
doubling the window in the slow start phase as expected. The reason
here was the usage of the "congestion_blocked" field, which aims to
remove the avoid growing the window when there is no data to send.
The issue with the field was that the first incoming ACK for every batch
would increase the congestion window, which would let the next
call to `congestion_blocked()` return false, and thereby let the
congestion controller ignore all of the following ACKs in a received
batch up the point where the next `poll_transmit()` call is made and
the window is filled again. Since often all ACKs for one round-trip
arrive in a single batch that means only the first ACK had an effect
and the others where ignored -> which lead to increasing the
window by 1 MTU instead of doubling it in the slow start phase.

The new approach introduces a new `app_limited` field which
caches whether the last `poll_transmit` attempt couldn't produce
data because there was no application data available.

Numeric precision issue in congestion avoidance
===

The formula
```
self.window += self.config.max_datagram_size * bytes / self.window;
```
which is specified in the RFC requires floating point arithmetic.
If integer arithmetic is used, the multiplication of the first 2 values
yields about 1.7MB for 1300 byte packets. As soon as the window is
bigger than this value, the window won't change at all due the
division leading to 0.

The new implementation follow this guidance from the quic recovery
specification:

> In congestion avoidance, implementers that use an integer
> representation for congestion_window should be careful with division,
> and can use the alternative approach suggested in Section 2.1 of
> [RFC3465].
Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants