Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gorry's ECN rewrite #4059

Merged
merged 16 commits into from Sep 15, 2020
54 changes: 28 additions & 26 deletions draft-ietf-quic-transport.md
Expand Up @@ -3845,8 +3845,8 @@ packet {{?RFC8087}}. Endpoints react to reported congestion by reducing their
sending rate in response, as described in {{QUIC-RECOVERY}}.

To enable ECN, a sending QUIC endpoint first determines whether a path supports
ECN marking and whether the peer reports the ECN values in received IP headers; see
{{ecn-validation}}.
ECN marking and whether the peer reports the ECN values in received IP headers;
see {{ecn-validation}}.


### ECN Counts
martinthomson marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -3873,35 +3873,36 @@ received in the associated IP header are incremented once for each QUIC packet.

For example, if one each of an Initial, 0-RTT, Handshake, and 1-RTT QUIC packet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad example as there's no practical use case where 0-RTT would be coalesced with 1-RTT as one would never have both keys at once. I understand the point of the example, but I think it's important to keep to practical examples.

Suggested change
For example, if one each of an Initial, 0-RTT, Handshake, and 1-RTT QUIC packet
For example, if one each of an Initial, Handshake, and 1-RTT QUIC packet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, except that this invalidates the other part of this paragraph... This needs more thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, happy to fix this in a separate PR if that makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is technically legal, so I'm suggesting a note about how it's extremely unlikely.

are coalesced, the corresponding counts for the Initial and Handshake packet
number spaces will be incremented by one each and the counts for the application data
packet number space will be increased by two.
number spaces will be incremented by one each and the counts for the
application data packet number space will be increased by two.

An ECN count is not incremented when no QUIC packets from the received IP
martinthomson marked this conversation as resolved.
Show resolved Hide resolved
packet are processed. A QUIC packet detected by a receiver as a
duplicate also does not increase an ECN count; see {{security-ecn}} for relevant
security concerns.
packet are processed. A QUIC packet detected by a receiver as a duplicate also
does not increase an ECN count; see {{security-ecn}} for relevant security
martinthomson marked this conversation as resolved.
Show resolved Hide resolved
concerns.


### ECN Validation {#ecn-validation}

It is possible for faulty network devices to corrupt or erroneously drop packets
that carry a non-zero ECN codepoint. To provide robust connectivity in the presence of
such devices, an endpoint validates the ECN counts for each network path and
disables use of ECN on that path if errors are detected.
It is possible for faulty network devices to corrupt or erroneously drop
packets that carry a non-zero ECN codepoint. To ensure connectivity in the
presence of such devices, an endpoint validates the ECN counts for each network
path and disables use of ECN on that path if errors are detected.

To perform ECN validation for a new path:

* The endpoint sets an ECT(0) or ECT(1) codepoint in the IP header of early
outgoing packets sent on a new path to the peer ({{!RFC8311}}).
* The endpoint sets an ECT(0) codepoint in the IP header of early outgoing
packets sent on a new path to the peer ({{!RFC8311}}).

* The endpoint monitors whether all packets sent with an ECT codepoint are
eventually deemed lost (Section 6 of {{QUIC-RECOVERY}}), indicating
that ECN validation has failed.

To reduce the chances of misinterpreting loss of packets dropped by a faulty
network element, an endpoint could set an ECT codepoint for only the first ten
outgoing packets on a path, or for a period of three RTTs, whichever occurs
first.
If an endpoint has cause to expect that IP packets with an ECT codepoint might
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting ECN on 10 packets if you think they're going to be lost seems odd and I can't imagine anyone would do that. This text is such a departure from the existing text, I'm unsure how to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a departure from existing text -- see lines 3953-3956 in the old text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I’m going to chime-in here. Indeed it was in the “original” text, which is why I included this. However, we now have several people (me included) who doubt this is yet correct, so here are my thoughts:

(1) I think it is important to be robust and describe how the protocol falls back in the corner case where the path drops IP packets with a non-zero ECN field - and we should very clearly say this.
(2) I think this can actually happen, but we should not make it sound like this often happens in general. I’m confident this is not “normal”.
(3) If we need to measure the path characteristic, using 10 packets seems a good test to me. However, as a fall back detection mechanism in a transport protocol this seems odd to stop using ECN after 10 packets when we expect that ECT is forwarded. Note: The text says {{ecn-ack}} says 10, the current text does not.

So.... This is a proposed change: We seem to be converging on clear text, would it be possible to remove the "10 packets" and revise this mechanism to say that the ECN counts need to increment, and pathological loss with no incrementing of counts is an indication that the sender should become “failed” for this connection? I think the text already virtually says this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Pathological loss" isn't going to be good enough as a mechanism, and there's a rathole waiting there to define precisely what "pathological" means. We chose 10 packets based on the IW, and also because we wanted to start somewhere and then change the mechanism as we learned more through experience.

I suggest that we don't muddle this particular (large) PR with an additional, potentially-contentious discussion about a mechanism change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like Jana's proposal to not muddle this particular (large) PR with this "issue". Would others agree to finalise the PR and then discuss this as a separate topic - afterwards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this all ready once. I'm fine with 10 packet as it's the initial window (but maybe that's what we should say), however, given the result is the same I would rather state it as enable ECT marking and if all packet of your first flight are lost validation fails and you disable. Describing this specifically as probably makes it just sounds more complicated that it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The github diff didn't line up well, hence my confusion. The way this is described still seems a bit unrealistic to me, but I can see this PR doesn't substantively change it, so I'm fine with that.

be dropped by faulty network element, the endpoint could set an ECT codepoint
martinthomson marked this conversation as resolved.
Show resolved Hide resolved
for only the first ten outgoing packets on a path, or for a period of three
RTTs. If all packets marked with non-zero ECN codepoints are subsequently lost,
it can disable marking on the assumption that the marking causes in loss.

An endpoint thus attempts to use ECN and validates this for each new connection,
when switching to a server's preferred address, and on active connection
Expand All @@ -3924,9 +3925,10 @@ ECN markings that were applied to packets that are newly acknowledged in the ACK
frame.

If an ACK frame newly acknowledges a packet that the endpoint sent setting
either the ECT(0) or ECT(1) codepoint, ECN validation fails if the corresponding ECN counts
are not present in the ACK frame. This check detects a network element that
zeroes the ECN field or a peer that is unable to access ECN the markings.
either the ECT(0) or ECT(1) codepoint, ECN validation fails if the
corresponding ECN counts are not present in the ACK frame. This check detects a
network element that zeroes the ECN field or a peer that is unable to access
ECN the markings.
martinthomson marked this conversation as resolved.
Show resolved Hide resolved

ECN validation also fails if the sum of the increase in ECT(0) and ECN-CE counts
is less than the number of newly acknowledged packets that were originally sent
Expand All @@ -3936,14 +3938,14 @@ acknowledged packets sent with an ECT(1) marking. These checks can detect
remarking of ECN-CE markings by the network.
ianswett marked this conversation as resolved.
Show resolved Hide resolved

An endpoint could miss acknowledgements for a packet when ACK frames are lost.
It is therefore possible for the total increase in the ECT(0), ECT(1), and
ECN-CE counts to be greater than the number of packets acknowledged in an ACK
frame. This is why ECN counts are permitted to be larger than the value
It is therefore possible for the total increase in ECT(0), ECT(1), and ECN-CE
counts to be greater than the number of packets that are newly acknowledged by
an ACK frame. This is why ECN counts are permitted to be larger than the value
corresponding to the largest acknowledged packet number.

Out of order processing of the ECN counts can result in a validation failure.
An endpoint SHOULD skip ECN validation for an ACK frame that does not increase
the largest acknowledged packet number.
Validating ECN counts from reordered ACK frames can result in failure. An
endpoint MUST NOT fail ECN validation as a result of processing an ACK frame
that does not increase the largest acknowledged packet number.

ECN validation can fail if the received total count for either ECT(0) or ECT(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can fail? Isn't it required to fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reword of lines 3992-3996. I don't remember why this was written as a MAY earlier (as against a MUST, or just "fails"), but the new language is changing phrasing to "can" from "MAY". @martinthomson might remember why this isn't a failure requirement.

exceeds the total number of packets sent with each corresponding ECT codepoint.
Expand Down