Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Third ECN validation check and lost acknowledgments #3778

Closed
martinthomson opened this issue Jun 18, 2020 · 4 comments
Closed

Third ECN validation check and lost acknowledgments #3778

martinthomson opened this issue Jun 18, 2020 · 4 comments

Comments

@martinthomson
Copy link
Member

QUIC doesn't guarantee that every received packet will be acknowledged. At the same time, ECN validation uses counts of every received packet. The net effect of this is captured in this statement:

An endpoint could miss acknowledgements for a packet when ACK frames are lost. It is therefore possible for the total increase in ECT(0), ECT(1), and CE counts to be greater than the number of packets acknowledged in an ACK frame. When this happens, and if validation succeeds, the local reference counts MUST be increased to match the counts in the ACK frame.
-- https://quicwg.org/base-drafts/draft-ietf-quic-transport.html#section-13.4.2.2-4

For example, if a sender sends 10 ECT-marked packets and all of them arrive, they might only receive acknowledgments for 8 of these. The receiver will still report an increase of 10 for the corresponding markings. The sender compensates for this error by allowing this deviation and using the new reference number for future packets.

Note that validation also depends on the receiver counting but not acknowledging a packet, then later changing its mind and acknowledging the packet. We don't state this, but we shouldn't have to.

But that allowance only applies to the validation performed on the sum of ECT(0), ECT(1) and ECN-CE counts. The validation for cross-marking of ECT(0) to ECT(1) or vice versa does not include this same protection. Thus, failure to acknowledge a packet will probably result in ECN validation failing.

There are a few ways we might fix this. We could just allow for an increase here too. The net effect though is that we don't get to watch for cross-marking. We don't get any protection against cross marking like this. In other words, we can remove the check.

It might be better to limit the allowed increase by the total excess in the total. That is, if total_unaccounted = sum(ECT0+ECT1+ECN-CE) - newly_acknowledged.was_marked(ECT0 or ECT1) then the check is unaccounted_ect0 = sum(ECT0+ECN-CE) - newly_acknowledged.was_marked(ECT0); unaccounted_ect0 >= 0 and unaccounted_ect1 = sum(ECT1+ECN-CE) - newly_acknowledged.was_marked(ECT1); unaccounted_ect1 >= 0.

The combined check becomes total_unaccounted <= unaccounted_ect0 + unaccounted_ect1 <= total_unaccounted + ECN-CE. That is, increases to ECT(X) + CE above what was newly acknowledged has to be represented in the total unaccounted value. If all unaccounted packets were CE marked, then they will be unaccounted for twice in the individual counts. If none of the unaccounted packets were CE marked, then they will be counted just once.

That suggests the following text for the third check:

ECN validation fails if the sum of the increase in ECT(0) and ECT(1) counts, plus any increase in the ECN-CE count is less than the number of packets sent with the corresponding ECT codepoint. This step is a more specific check for remarking to Not-ECT.

This third check is now more precise than the second check, so I would suggest removing the combined check that we have current as step 2.

A new combined check might detect remarking between ECT(0) and ECT(1):

ECN validation fails if any unaccounted for increase in the sum of ECT(0) and ECN-CE counts, plus any unaccounted for increase in the sum of ECT(1) and ECN-CE counts, less any unaccounted for increase in the sum of the ECT(0), ECT(1), and ECN-CE counts is not between zero and the increase to the ECN-CE count. This step might detect erroneous remarking from ECT(0) to ECT(1) (or vice versa) by the network.

And finally, because most stacks only apply one type of marking:

An endpoint MAY also validate total counts based on the total number of packets that it sent with with a given marking. ECN validation fails if any increase in ECT(0) or ECT(1) counts exceeds the total number of packets sent with the corresponding marking over the lifetime of the connection. In particular, an endpoint that never uses a particular marking can fail validation when a count increases for that marking. This step detects any erroneous network application of ECT(0) or ECT(1) markings.

martinthomson added a commit that referenced this issue Jun 18, 2020
This attempts to create rules that won't break down unnecessarily.  It's
a little overwrought though, so I'm OK if we go with something less than
this.  The new check number 3 could be removed if we were comfortable
with the simplified check number 2 and the addition of the check for
unused markings.

Closes #3778.
@janaiyengar
Copy link
Contributor

janaiyengar commented Jun 18, 2020

I'm not convinced this needs fixing. To your point above:

But that allowance only applies to the validation performed on the sum of ECT(0),
ECT(1) and ECN-CE counts. The validation for cross-marking of ECT(0) to ECT(1) or
vice versa does not include this same protection. Thus, failure to acknowledge a
packet will probably result in ECN validation failing.

It is an exceedingly rare corner case, since this requires an ACK for a packet to not be sent or received, and then for there to be cross-marking within that fraction of connections. And in this case, ECN validation fails and is turned off.

We knew corner cases existed, but we also had the escape hatch of turning off ECN when things were not "quite right". Why is that answer not good enough any more? (Or am I missing something?)

@martinthomson
Copy link
Member Author

The problem is that we fixed this in one place, but not another. And we explicitly call out the need for the fix.

The goal of validation was - by my understanding - finding obvious mistakes. But this creates false positives.

If we want to do something minimal, then I would suggest that we simplify. That is, reduce the checks to incr_count[ECT0] + incr_count[CE] < marked[ECT0] or incr_count[ECT1] + incr_count[CE] < marked[ECT1].

@janaiyengar
Copy link
Contributor

That's how I read the third validation check:

  Any increase in either ECT(0) or ECT(1) counts, plus any increase in the CE
  count, MUST be no smaller than the number of packets sent with the
  corresponding ECT codepoint that are newly acknowledged in this ACK frame.
  This step detects any erroneous network remarking from ECT(0) to ECT(1) (or
  vice versa).

but I agree it's not very clear. I think the intent was to do the minimal version you suggest above (which is why it's "either" instead of "both" in the text). I would be happy with us clarifying the text to say that more clearly.

@martinthomson
Copy link
Member Author

Before this gets stuck. I'm abandoning this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants