Third ECN validation check and lost acknowledgments #3778

martinthomson · 2020-06-18T06:43:42Z

QUIC doesn't guarantee that every received packet will be acknowledged. At the same time, ECN validation uses counts of every received packet. The net effect of this is captured in this statement:

An endpoint could miss acknowledgements for a packet when ACK frames are lost. It is therefore possible for the total increase in ECT(0), ECT(1), and CE counts to be greater than the number of packets acknowledged in an ACK frame. When this happens, and if validation succeeds, the local reference counts MUST be increased to match the counts in the ACK frame.
-- https://quicwg.org/base-drafts/draft-ietf-quic-transport.html#section-13.4.2.2-4

For example, if a sender sends 10 ECT-marked packets and all of them arrive, they might only receive acknowledgments for 8 of these. The receiver will still report an increase of 10 for the corresponding markings. The sender compensates for this error by allowing this deviation and using the new reference number for future packets.

Note that validation also depends on the receiver counting but not acknowledging a packet, then later changing its mind and acknowledging the packet. We don't state this, but we shouldn't have to.

But that allowance only applies to the validation performed on the sum of ECT(0), ECT(1) and ECN-CE counts. The validation for cross-marking of ECT(0) to ECT(1) or vice versa does not include this same protection. Thus, failure to acknowledge a packet will probably result in ECN validation failing.

There are a few ways we might fix this. We could just allow for an increase here too. The net effect though is that we don't get to watch for cross-marking. We don't get any protection against cross marking like this. In other words, we can remove the check.

It might be better to limit the allowed increase by the total excess in the total. That is, if total_unaccounted = sum(ECT0+ECT1+ECN-CE) - newly_acknowledged.was_marked(ECT0 or ECT1) then the check is unaccounted_ect0 = sum(ECT0+ECN-CE) - newly_acknowledged.was_marked(ECT0); unaccounted_ect0 >= 0 and unaccounted_ect1 = sum(ECT1+ECN-CE) - newly_acknowledged.was_marked(ECT1); unaccounted_ect1 >= 0.

The combined check becomes total_unaccounted <= unaccounted_ect0 + unaccounted_ect1 <= total_unaccounted + ECN-CE. That is, increases to ECT(X) + CE above what was newly acknowledged has to be represented in the total unaccounted value. If all unaccounted packets were CE marked, then they will be unaccounted for twice in the individual counts. If none of the unaccounted packets were CE marked, then they will be counted just once.

That suggests the following text for the third check:

ECN validation fails if the sum of the increase in ECT(0) and ECT(1) counts, plus any increase in the ECN-CE count is less than the number of packets sent with the corresponding ECT codepoint. This step is a more specific check for remarking to Not-ECT.

This third check is now more precise than the second check, so I would suggest removing the combined check that we have current as step 2.

A new combined check might detect remarking between ECT(0) and ECT(1):

ECN validation fails if any unaccounted for increase in the sum of ECT(0) and ECN-CE counts, plus any unaccounted for increase in the sum of ECT(1) and ECN-CE counts, less any unaccounted for increase in the sum of the ECT(0), ECT(1), and ECN-CE counts is not between zero and the increase to the ECN-CE count. This step might detect erroneous remarking from ECT(0) to ECT(1) (or vice versa) by the network.

And finally, because most stacks only apply one type of marking:

An endpoint MAY also validate total counts based on the total number of packets that it sent with with a given marking. ECN validation fails if any increase in ECT(0) or ECT(1) counts exceeds the total number of packets sent with the corresponding marking over the lifetime of the connection. In particular, an endpoint that never uses a particular marking can fail validation when a count increases for that marking. This step detects any erroneous network application of ECT(0) or ECT(1) markings.

The text was updated successfully, but these errors were encountered:

This attempts to create rules that won't break down unnecessarily. It's a little overwrought though, so I'm OK if we go with something less than this. The new check number 3 could be removed if we were comfortable with the simplified check number 2 and the addition of the check for unused markings. Closes #3778.

janaiyengar · 2020-06-18T21:25:09Z

I'm not convinced this needs fixing. To your point above:

But that allowance only applies to the validation performed on the sum of ECT(0),
ECT(1) and ECN-CE counts. The validation for cross-marking of ECT(0) to ECT(1) or
vice versa does not include this same protection. Thus, failure to acknowledge a
packet will probably result in ECN validation failing.

It is an exceedingly rare corner case, since this requires an ACK for a packet to not be sent or received, and then for there to be cross-marking within that fraction of connections. And in this case, ECN validation fails and is turned off.

We knew corner cases existed, but we also had the escape hatch of turning off ECN when things were not "quite right". Why is that answer not good enough any more? (Or am I missing something?)

martinthomson · 2020-06-19T03:58:50Z

The problem is that we fixed this in one place, but not another. And we explicitly call out the need for the fix.

The goal of validation was - by my understanding - finding obvious mistakes. But this creates false positives.

If we want to do something minimal, then I would suggest that we simplify. That is, reduce the checks to incr_count[ECT0] + incr_count[CE] < marked[ECT0] or incr_count[ECT1] + incr_count[CE] < marked[ECT1].

janaiyengar · 2020-06-19T04:21:17Z

That's how I read the third validation check:

  Any increase in either ECT(0) or ECT(1) counts, plus any increase in the CE
  count, MUST be no smaller than the number of packets sent with the
  corresponding ECT codepoint that are newly acknowledged in this ACK frame.
  This step detects any erroneous network remarking from ECT(0) to ECT(1) (or
  vice versa).

but I agree it's not very clear. I think the intent was to do the minimal version you suggest above (which is why it's "either" instead of "both" in the text). I would be happy with us clarifying the text to say that more clearly.

martinthomson · 2020-06-23T22:13:27Z

Before this gets stuck. I'm abandoning this.

martinthomson mentioned this issue Jun 18, 2020

Try to make ECN validation more robust #3779

Closed

larseggert added the -transport label Jun 18, 2020

martinthomson closed this as completed Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Third ECN validation check and lost acknowledgments #3778

Third ECN validation check and lost acknowledgments #3778

martinthomson commented Jun 18, 2020

janaiyengar commented Jun 18, 2020 •

edited

Loading

martinthomson commented Jun 19, 2020

janaiyengar commented Jun 19, 2020

martinthomson commented Jun 23, 2020

Third ECN validation check and lost acknowledgments #3778

Third ECN validation check and lost acknowledgments #3778

Comments

martinthomson commented Jun 18, 2020

janaiyengar commented Jun 18, 2020 • edited Loading

martinthomson commented Jun 19, 2020

janaiyengar commented Jun 19, 2020

martinthomson commented Jun 23, 2020

janaiyengar commented Jun 18, 2020 •

edited

Loading