Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ACK_ECN frame encoding (e.g., use bit-vector) #1439

Closed
kazuho opened this issue Jun 12, 2018 · 42 comments
Closed

Improve ACK_ECN frame encoding (e.g., use bit-vector) #1439

kazuho opened this issue Jun 12, 2018 · 42 comments
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@kazuho
Copy link
Member

kazuho commented Jun 12, 2018

The encoding of ACK_ECN frame (proposed in #1372) has following issues.

  • size of ACK_ECN frame becomes bigger as more and more packets are exchanged, since the counters represent the total number of packets being exchanged using the connection
  • requires the receiver of the ACK_ECN frames to detect duplicates to avoid duplicate packet injection attacks
  • does not have full protection against duplicate packet injection attacks, since detection on the receiver cannot be perfect

As discussed in Kista, these issues are not blockers for merging #1372, but we might want to consider adopting different encoding scheme to address them.

@kazuho
Copy link
Member Author

kazuho commented Jun 12, 2018

A straw-man that addresses the issues is to use a bit-vector for marking CE packets.

Instead of carrying three counters, an ACK_ECN frame will carry a bit-vector that indicates which PNs in the ACK were CE-marked.

For example, when 3 packets (PN=8,3,2) are acknowledged, the bit vector will contain three bits that indicate whether the packet identified by the PN was CE-marked. A bit vector with a value 100 will indicate that PN=8 was CE-marked.

The straw-man addresses all of the three issues.

Re space efficiency of the encoding, the proposed approach consumes one bit per packet, whereas #1372 consumes 3 to 17 octets1 even when only one packet is being acknowledged.

Duplicate detection becomes the responsibility of the sender of the packet rather than that of the receiver. It also becomes reliable.

While processing an acknowledgement, an endpoint consults it's inflight packet map to see which packets have been newly acknowledged. With the proposed encoding, the endpoint can see if the CE bit has been set as part of that process, and only utilize the CE bits associated to a newly acknowledged packet.

The downside of the straw-man is that ACK_ECN frame does not indicate how many packets were ECN(1) marked. If we need to communicate that, we could for either retain the counter for ECN(1) marked packets, or use a different encoding in the bit vector (i.e. 0 - ECN(0), 10 - ECN(1), 11 - ECN-CE).

[1] It is assumed that the number of ECN(1) marked packets would remain zero.

@gloinul
Copy link
Contributor

gloinul commented Jun 12, 2018

I don't understand how the sender will be able to handle duplications: "Duplicate detection becomes the responsibility of the sender of the packet rather than that of the receiver. It also becomes reliable."

To my understanding the current ACK practice to include all packets whose ACK hasn't been ACKed already. Thus, a single PN will thus can be included in multiple consecutive ACK frames. Even without duplication a received CE will thus be included in several packets.

Next this doesn't suppress or explain how a sender can suppress a duplicate packet arriving to be ACKed in the same ACK frame. If a packet first arrives with a ECT(0) marking and then a duplicate with ECN-CE mark, the later triggering an immediate ACK. Then how is it accounted that both has been received? This is the particular scenario where the second CE mark may be on-the-side attack or not.

When it comes to efficiency, one could apply the older Packet Number encoding format to the ECT counters. Thus include the N lowest bits of the counters using the variable length format. Requiring the sender to track the higher bits based on the lower bits. The receiver will have to use enough bit-depth that the counter field doesn't wrap until multiple counter values in different ACKs have been ACKed by the sender. That way the counter's size would relate to the congestion window and RTT rather than connection lifetime.

And as you say the ECN-CE marks are not sufficient to make the verification to work. The ECT counters are still needed to track that ECN is working properly and not are bleached by the network. Thus, you need at least counters and a CE bit-vector. From my perspective, the CE bit-vector needs to be compared only with the CE counter, not all the counters.

I would also note that the size of the bit-vector is very much depending on the throughput and the RTT.
An implementation doing fully reliable ACKing will result in the ACKs containing all the packets sent the last RTT. Under this assumption the bit-vector actually will be worse then even the max length CE counters (8 bytes) in a variety of bandwidths and RTT as can be seen in below table. Even if not using full reliable ACKing, then we still likely have several ACKing intervals worth of information. How many packets that is very dependent on the ACK strategy regarding delayed ACKs.

Assuming 1200 bytes packet, the below table indicates packets per RTT for a combination of RTT and bandwidth:

  RTT (ms)        
Bandwidth (Mbps) 1 10 50 100 200
0,1 0,01 0,10 0,52 1,04 2,08
1 0,10 1,04 5,21 10,42 20,83
5 0,52 5,21 26,04 52,08 104,17
10 1,04 10,42 52,08 104,17 208,33
25 2,60 26,04 130,21 260,42 520,83
50 5,21 52,08 260,42 520,83 1041,67
100 10,42 104,17 520,83 1041,67 2083,33
250 26,04 260,42 1302,08 2604,17 5208,33
500 52,08 520,83 2604,17 5208,33 10416,67
1000 104,17 1041,67 5208,33 10416,67 20833,33
2500 260,42 2604,17 13020,83 26041,67 52083,33
5000 520,83 5208,33 26041,67 52083,33 104166,67
10000 1041,67 10416,67 52083,33 104166,67 208333,33

@larseggert
Copy link
Member

Half-baked idea: can we send Bloom filters instead of counts?

@mikkelfj
Copy link
Contributor

Half-baked idea: can we send Bloom filters instead of counts?

Not likely, they are not very precise and take up a lot of space, though there are compressed forms.

There are various compressed bitmap forms mixing runlength encoding. One consideration could be to send small bitmaps of 8 to 64-bits mixed with RLE. That is largely how I keep track of duplicates internally.

@martinthomson
Copy link
Member

If the concern is too many bits in the frame for high throughput scenarios, the bitmask could be capped at 64. In those high throughput cases, you should have enough ACKs to reassemble the markings, it's just that you wouldn't have a whole lot of redundancy for the marking signal. That bounds the size of the frame.

@kazuho
Copy link
Member Author

kazuho commented Jun 13, 2018

@gloinul

I don't understand how the sender will be able to handle duplications: "Duplicate detection becomes the responsibility of the sender of the packet rather than that of the receiver. It also becomes reliable."

It is possible, because the sender has the list of unacknowledged packets that it has sent. When receiving an acknowledgement, the sender updates the list at the same time using the information to update the connection state (e.g., retire acknowledged data from send buffer).

The core idea here is to use that mechanism to determine if a PN with the CE bit set is acknowledged for the first time. Use of a bit-vector is one of the possible approaches.

And as you say the ECN-CE marks are not sufficient to make the verification to work. The ECT counters are still needed to track that ECN is working properly and not are bleached by the network.

I would suggest detecting bleaching on the receiver side. If the receiver notices bleaching, it can send an ACK frame instead. I'd assume that that could (or should) be done in the encoding proposed in #1372 as well.

I would also note that the size of the bit-vector is very much depending on the throughput and the RTT.

I agree that the size of the ACK frames could be an issue on high bandwidth connection, although I am not sure if we need to care about spending 0.01% of bandwidth (1 bit / 1280 octets) in response to a high-speed flow. But if we do care about that, capping the bitmask to certain size as @martinthomson suggests can be a solution.

@kazuho
Copy link
Member Author

kazuho commented Jun 13, 2018

If the concern is too many bits in the frame for high throughput scenarios, the bitmask could be capped at 64. In those high throughput cases, you should have enough ACKs to reassemble the markings, it's just that you wouldn't have a whole lot of redundancy for the marking signal. That bounds the size of the frame.

@martinthomson Or, considering the fact that we are requiring recipients of the ECN-CE flag to send back the information as soon as possible, and that we also want to restrict the frequency of feeding in the CE event to the congestion controller to maybe once per RTT, defining an ACK_WITH_CE frame that carries just one PN that had the CE bit set might be sufficient.

+ 0                   1                   2                   3
		
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
		
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
		
+|                     Largest Acknowledged (i)                ...
		
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
		
+|                          ACK Delay (i)                      ...
		
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
		
+|                       ACK Block Count (i)                   ...
		
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
		
+|                          ACK Blocks (*)                     ...
		
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
		
+|                         CE ACK Index (i)                    ...
		
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
		

In case of this example, "CE ACK Index" will designate the index of the PN among the PNs that were carried using the ACK_WITH_CE frame. Ordinary ACK frame can be used when none of the packets being acknowledged had the ECN-CE bit set.

@mikkelfj
Copy link
Contributor

This could interfere with redundancy in repeat ACK frames: if there is a new CE bit the old PN cannot be present or would have CE cleared. But then again, recent ACK design tends toward not sending redundant ACK's in favor of retransmission on ACK loss, so ...? What is the take in this?

@gloinul
Copy link
Contributor

gloinul commented Jun 13, 2018

@kazuho

The core idea here is to use that mechanism to determine if a PN with the CE bit set is acknowledged for the first time. Use of a bit-vector is one of the possible approaches.

But, that doesn't work unless you are able to ACK each duplication in separate ACKs, as well as having no redundancy in the ACKing. I think that will make QUIC work very poorly if any ACK packet loss is experienced. There need to be some redundancy to ensure that ACK loss, doesn't equal forward loss from the senders perspective.

I would suggest detecting bleaching on the receiver side. If the receiver notices bleaching, it can send an ACK frame instead. I'd assume that that could (or should) be done in the encoding proposed in #1372 as well.

I don't see that working reliably during capability checking, especially not when we add ECN black hole mitigation into the mix. It is the sender that knows which of the packets that are marked during the capability check. If all packets are marked, then yes a trustworthy receiver could detect bleaching. But then we need a signal where the receiver tells the sender to stop mark.

I think I see one advantage for something like the bit-vector reporting of CE marks. That is in cases where one doesn't use the normal ECN marking scheme, such as L4S. L4S marks at much higher frequency, thus more CE marks will be received. In this case immediate ACK on CE marks will result in a bit excessive ACK rates. So, from my perspective this would be a reason for a solution with more detailed CE information. However, I don't see your proposal working out when it comes to duplication handling nor the capability checks.

My strawman if one like to address providing more detailed information on which packets are CE marked, ensure robustness around recovery periods and ensure that sender side verifications work is the following:

  • Receiver side suppression of duplicates
  • An ACK format for ECN where
    • ECT(0) and ECT(1) are counters. However, they are variable encoding only the least number of bits needed to detect roll over, i.e. receiver track ACKs of ACKs to determine how many bits this needs to be.
    • ECN-CE marks as bit-vector. Only needing to go as far back as there are CE marks. i.e. bit-vector may be shorter than full PN space covered by ACK. Truncation can be considered.
  • ACKs are assumed to have redundancy, either all the way to latest ACK of ACKs or at least across the last couple of ACKs.
  • ACK loss handling for when there are wholes in ACK information received by the sender due to ACK loss and ECN-CE vector truncation. This is to handle continuous verification. ACK holes will trigger loss response also.

I would note that these changes would be geared toward making ACKing possible to be delayed in general to as happen as seldom as only four times per RTT assuming having something to ACK in each interval.

@kazuho
Copy link
Member Author

kazuho commented Jun 14, 2018

tl;dr: In this comment I explain that the concerns raised against the use of bit-vector is not true, and at the bottom propose a new encoding that IMO meets all the requirements that we have argued for.

@gloinul

The core idea here is to use that mechanism to determine if a PN with the CE bit set is acknowledged for the first time. Use of a bit-vector is one of the possible approaches.

But, that doesn't work unless you are able to ACK each duplication in separate ACKs, as well as having no redundancy in the ACKing. I think that will make QUIC work very poorly if any ACK packet loss is experienced. There need to be some redundancy to ensure that ACK loss, doesn't equal forward loss from the senders perspective.

I do not see why you think that it does not work when folding duplication into single ACK. That is something that should have been handled by existing implementations. Folding of the CE bit is a natural extension to that. If I sounded like there would be no deduplication on the receiver side, then I apologize. The requirement to send exactly which PN arrived to with the CE bit enforces the receiver to deduplicate using the ACK queue.

Re your second point, the proposed encoding does work under loss, because the sender can tell exactly which newly acknowledged packet was received by the peer with the CE bit set.

Consider the case you are sending an ACK_ECN frame for PN 2,3,4 where 3 had the CE bit. The bit vector of the ACK_ECN frame will be 010. When the ACK_ECN frame is retransmitted, the bit vector will once again contain 010. Either of the two ACK_ECN frames that first reaches the sender will be considered as the frame that newly acknowledges PN 3, and the CE bit will be taken into consideration.

If the receiver has no duplication detection and an attacker injects a duplicate of packet 4 with CE bit set, the receiver will send ACK_ECN frame with the bit vector set to 011. If that packet reaches the sender first, the sender will take that bit into consideration.

However, this is not an issue. It is just that where deduplication is applied is changing from the receiver to the server. And as stated in my first comment, detecting duplication on the sender side is a better guard against an injection attack, because the sender has the logic to recognize an acknowledgement at most once for every one packet it sends, whereas it is hard to implement such complete guard on the receiver side.

I don't see that working reliably during capability checking, especially not when we add ECN black hole mitigation into the mix. It is the sender that knows which of the packets that are marked during the capability check. If all packets are marked, then yes a trustworthy receiver could detect bleaching. But then we need a signal where the receiver tells the sender to stop mark.

Sorry for my confusion. Bleaching here is the term that means ECN-CE turning into ECN-CT(0) or ECN-CT(1).

For that case and also for blackhole detection, what you would do in the proposed encoding scheme is this:

  • sender remembers in it's inflight packet map which packets were sent with ECN or with the CE bit set
  • if sender receives an ACK frame or an ACK_ECN frame that does not indicate that packet had the CE bit set, bleaching is happening
  • if sender detects loss of the packet that was sent with ECN, there is a blackhole

As you can see, support for ECN can be nicely tied to the inflight packet map and the detection of issues become accurate, if we allow the receiver to send back exactly which packet was received with the CE bit set.

Note that we do not even need to have a separate phase for detecting these issues. Sender can just periodically send packets with CE bit set, and ignore the signal from receiver expressing that a certain packet was received with the CE bit set, if that packet was sent with the CE bit set. Or if the sender detects a consecutive loss of the packet that had the CE bit set, there is a blackhole.

And, thank you for the strawman. My comments in-line.

My strawman if one like to address providing more detailed information on which packets are CE marked, ensure robustness around recovery periods and ensure that sender side verifications work is the following:

  • Receiver side suppression of duplicates

A limited form of that is inherently part of the bit-vector-based approach. Please see the first lines of this comment.

  • An ACK format for ECN where
    • ECT(0) and ECT(1) are counters. However, they are variable encoding only the least number of bits needed to detect roll over, i.e. receiver track ACKs of ACKs to determine how many bits this needs to be.

Assuming that the sender needs to know the breakdown of the number of packets received without the CE bit set, I would prefer sending that information rather than using a counter, because it simplifies (or eliminates the need for) the logic to detect bleaching / blackhole.

* ECN-CE marks as bit-vector. Only needing to go as far back as there are CE marks. i.e. bit-vector may be shorter than full PN space covered by ACK. Truncation can be considered.
  • ACKs are assumed to have redundancy, either all the way to latest ACK of ACKs or at least across the last couple of ACKs.
  • ACK loss handling for when there are wholes in ACK information received by the sender due to ACK loss and ECN-CE vector truncation. This is to handle continuous verification. ACK holes will trigger loss response also.

I can understand the tendency to have those requirements.

Having all that said, let me propose another encoding, that IMO will have all the properties both of us have been looking for. How about expressing ECN information the same way we express gaps?

In the current draft, ACK Block is a repetition of two types of "blocks": PNs that are acked and PNs that are not acked (i.e. gap). We can extend it to express 5 types of "blocks": non-ECT acked, ECT(0) acked, ECT(1) acked, ECN-CE acked, gap.

In the approach, the definition of ACK Block will become:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Block (i)                        ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The value of the Block will be calculated as:
block_mode = gap (0), non-ECT (1), ECT0 (2), ECT1 (3), ECT-CE(4)
block_size = the number of contiguous PNs that were received (or not) under the block_mode
block_value = ((block_size - 1) << 3) | block_mode

For example, ACK frame that is encoded as 04 00 03 0a 00 04 means:

  • Largest Acknowledged: 4
  • ACK Delay: 0
  • ACK Block Count: 3
  • first Block (0a): PN 4 and 3 received with ECT0
  • second Block (00): gap for one packet
  • third Block (04): PN 1 received with ECT-CE

The benefits of this encoding are:

  • sender will have complete information on how the packets were received
    • therefore it is easy to see if there is bleaching / blackhole (see the discussion above)
  • the overhead of sending ECN information is as same as sending gap
    • therefore the overhead will be a non-issue, assuming that ECN bit flips will happen as likely as gaps will appear on ECN-unaware networks
  • no additional overhead for supporting ECN
  • only one ACK frame type
  • definition of the ACK frame becomes simpler!

The downside is that a single-octet Block can only express up to 8 packet numbers, because block_mode takes 3 bits and the varint length bits take 2 bits. I do not think that is an issue considering the fact that the ACK_ECN frame proposed in #1372 increases the size of an acknowlegement by at least 3 octets. But if we want to optimize, we can adopt the PN encoding used in the packet header.

@martinthomson
Copy link
Member

martinthomson commented Jun 14, 2018

I'm not sure that this is simpler. If we were going this way, maybe it needs a better encoding than bit-stuffing a varint. Not sure what that looks like though.

You could save a whole bit by making it impossible to send two blocks of the same type in a row and encoding the mode as n = (next_mode - last_mode + 4) % 5 and decoding with next_mode = (last_mode + n + 1) % 5. The cost here is that a single ECN-CE-marked datagram costs 3 octets to encode. The benefit is that you have one ACK frame; and if you don't have ECN markings, the overhead is limited to a slightly less efficient encoding of longer sequences of packets (>16 with my hack).

Seems a little complicated

@kazuho
Copy link
Member Author

kazuho commented Jun 14, 2018

@martinthomson

I'm not sure that this is simpler.

Yeah. What I was trying to say by stating that the ACK frame becomes simpler is that we can eliminate the distinction between "first ack block", "additional ack block", "gap" by having type assigned to each block. And that we will have only one ACK encoding rather than having ACK and ACK_ECN.

You could save a whole bit by making it impossible to send two blocks of the same type in a row and encoding the mode as n = (next_mode - last_mode + 4) % 5 and decoding with next_mode = (last_mode + n + 1) % 5.

That definitely works. I had some hesitation against making the interpretation of the mode depend on the mode of the previous block, but that might be fine considering the fact that where the block begins depends on the bit pattern of the previous block (which is a varint).

The cost here is that a single ECN-CE-marked datagram costs 3 octets to encode. The benefit is that you have one ACK frame; and if you don't have ECN markings, the overhead is limited to a slightly less efficient encoding of longer sequences of packets (>16 with my hack).

Yes. And my argument regarding the cost is that spending 3 octets should be fine, because we spend the same amount to express a gap. If it is not fine, we should reconsider how we express gaps.

@gloinul
Copy link
Contributor

gloinul commented Jun 14, 2018

@kazuho I think I better understand your thinking, but I am still uncertain that I fully understand. So lets see if I understand correctly how the receiver would act in the case of a packet duplications happening.

So the receiver at the point of sending an ACK has received PN=2,3,4 where 4 has a CE mark. Then the ACK will say: Largest PN=4, length=3, CE vector = 001.

But then arrives PN=4, 5, 5 before it is time to send the next ACK. So the duplicate of PN=4 has ECN=ECT(0), and first 5 has ECT(0) and second ECN-CE set. Thus the ACK will be:
Largest PN=5, length 4, CE vector 0001?

In this case the sender doesn't see that PN=5 was duplicated with different CE values. Or do you say that this must be sent as two ACKs, where the first is:
Largest PN=5, length 4, CE vector 0000 and the next:
Largest PN=5, length 4, CE vector 0001
So any duplication must trigger ACKing?

The issues I see with this, if I have correctly understood it are:

  • Packet duplication results in additional ACKs
  • In the first case where the CE marks are in the first arrival and then not in the late coming duplication an ACK loss of the first will mask the CE mark.
  • Also in the case of ACK loss of the first PN=5 ACK will result in that the sender can't know if this CE mark was present in the lost ACK or not, thus resulting in an uncertainty for a sender if an CE mark is more likely an attack packet. As late remarked packets a bigger risk for the attack included in the security consideration.

When it comes to bleaching, we are not only considering ECN-CE to ECT but also ECT to Not-ECT. The later more relevant when it comes to the capability check as networks that don't handle ECN correctly usually sets all the ECN bits to Not-ECT independently of value. Thus we do need to determine also when some packets has not been marked at the receiver. So, I see your proposed format work for the purpose of the sender learn the received value as long as there are no packet duplication. I still don't see this format helping with the issues that arise with packet duplication. Then you would need yet another set of modes saying duplicate with which of the 4 ECN values.

From my perspective receiver suppressing duplicated packets are a work saver for the receiver. It doesn't need to do more than PN decryption for the packet. The ECN field of the later packets are of very limited value. And the receiver requirement for implementing this is very low. And if you are not suppressing duplicated packets early, then you will have to consider what you do with each individual frame type. Is it safe to process a duplicate or do you need to suppress it at frame type level. To me it opens up the implementation to much more attacks.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jun 14, 2018

I don't think I get the entire concept here but I'm wondering:

Assume receiver does not detect or reject duplicate packets, but it does track an ACK backlog and there has some natural ACK dectection capability.

If packet 4 in the above example first sets CE, this will be visible in the backlog. A second packet 4 without CE can then see that 4 is already pending ACK and can choose to a) overwrite the CE bit, b) do nothing since a packet is present, or c) use inclusive OR on the CE bit.

If packet 4 is not in the backlog when packet 4 v2 arrives then it has already been ACK'ed leaving it to the sender, or it never received packet 4 v1. In either case it it business as usual.

Now, assume the receiver does reject duplicates by tracking more than the latest ACK backlog and that it simply drops any later receiver packets. In this case this corresponds to case b) in the above.

The question is then - if the above is reasonable - how should a second update to the CE bit be handled receiver side, when the sender is prepared to handle duplicate ACKS / CE. Is replace, drop, or OR the proper approach?

@mikkelfj
Copy link
Contributor

On the more general question of receiver duplicate tracking: yes I agree that not tracking duplicates probably opens up a lot of vulnerabilities and places an additional burden on the implementation (though a hostile peer could just repeat frames in new packets, so some robustness is required regardless).

That said, duplicate detection is a major burden of the receiver because packet numbers can in principle have arbitrary gaps so a hash table is not safe, a lookup table will not cover the full range, and a binary tree/range tree can consume a lot of memory if every second packet is a gap.

By allowing duplicates to slip through, a duplicate filter can be made much simpler, for example by hashing the last N packets and dropping any hash table collisions, or use any other simplified robust imprecise detector.

The sender generally has more book-keeping, but it knows exactly which packets it sends can therefore optimize its data structure accordingly. Only the ACKs add some uncertainty.

@kazuho
Copy link
Member Author

kazuho commented Jun 14, 2018

So the receiver at the point of sending an ACK has received PN=2,3,4 where 4 has a CE mark. Then the ACK will say: Largest PN=4, length=3, CE vector = 001.

But then arrives PN=4, 5, 5 before it is time to send the next ACK. So the duplicate of PN=4 has ECN=ECT(0), and first 5 has ECT(0) and second ECN-CE set. Thus the ACK will be:
Largest PN=5, length 4, CE vector 0001?
#1439 (comment) by @gloinul

I'd assume that the second ACK will contain CE vector of 0010.

Quoting what @mikkelfj says, the specification can require one of the three behaviors. My understanding is that b is the behavior we want in order to mitigate the impact of injection attacks racing against the original packet.

Assume receiver does not detect or reject duplicate packets, but it does track an ACK backlog and there has some natural ACK dectection capability.
If packet 4 in the above example first sets CE, this will be visible in the backlog. A second packet 4 without CE can then see that 4 is already pending ACK and can choose to a) overwrite the CE bit, b) do nothing since a packet is present, or c) use inclusive OR on the CE bit.
#1439 (comment) @mikkelfj

From my perspective receiver suppressing duplicated packets are a work saver for the receiver.
#1439 (comment) @gloinul

A receiver can suppress duplicates to some extent, but it can never be perfect.

I do not think it is a good idea to require receivers to implement complicated countermeasures against packet injection attacks for two reasons: 1) it's not enforceable 2) having complex requirement will take people away from implementing ECN support.

Sending back the signal that tells which PN arrived with CE bit set simplifies the architecture at the same time giving us a guaranteed level of protection. See #1439 (comment) by @mikkelfj.

It doesn't need to do more than PN decryption for the packet.
#1439 (comment) by @gloinul

The receiver is required to decrypt the payload even if it has seen the PN before, unless it knows that the payload is identical in some other ways (e.g. by comparing the hash of the payload). Please refer to QUIC-TLS draft section 10.3.

And if you are not suppressing duplicated packets early, then you will have to consider what you do with each individual frame type. Is it safe to process a duplicate or do you need to suppress it at frame type level. To me it opens up the implementation to much more attacks.
#1439 (comment) by @gloinul

It does not. Frames are guaranteed to be idempotent. See #1424.

They need to be idempotent because there is no reliable way for a receiver to tell if a packet it has received is a duplicate.

@mikkelfj
Copy link
Contributor

A receiver can suppress duplicates to some extent, but it can never be perfect.

and

They need to be idempotent because there is no reliable way for a receiver to tell if a packet it has received is a duplicate.

this is not quite true if one choose not to receive packets with packet numbers below a certain threshold. Above the threshold duplicates can be perfectly filtered and below the threshold they can also be perfectly filtered, but below the threshold non-duplicates might also get filtered. However, in this case fast retransmission will likely already have a new packet inbound with similar content.

@kazuho
Copy link
Member Author

kazuho commented Jun 14, 2018

@mikkelfj That's true. Thank you for pointing that out. OTOH, I would argue that we do not want every QUIC stack to implement such duplication detection logic, because it is complex, memory-consuming, and fragile (i.e. there is a risk of triggering a loss event if you set the threshold too small).

@mikkelfj
Copy link
Contributor

Right, just pointing out facts. It is new to me that an hash table based cache could filter duplicates becuase frames are now idempotent. I think this could be a really good solution, but it does require that CE does not require filtering, or at least that it is tolerant to a degree.

@mikkelfj
Copy link
Contributor

@larseggert this is where your bloom filter comes in handy: just define epochs for the filters lifetime and check duplicates against the two most recent bloom filter epochs and let idempentency deal with false positives. Any encrypted bits could be used directly to index a bloom filter. For example if you have width 256 bits, you can choose the first four encrypted bytes to index a bit in the bloom filter (do some math to get the probabilities right). Eventually the filter gets too densely populated and a new epoch is needed.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jun 14, 2018

doh - bloomfilters won't work. A filter can have false negatives, but not false positives unless one wants to increase packet loss.

@ianswett
Copy link
Contributor

I'm going to reiterate: I don't think duplicate detection is a hard problem. You have an IntervalSet that is used to generate an ack. That goes back some distance in packet number space. At some point, you drop older/smaller blocks off that data structure. When you do, I'm suggesting you stop trying to decrypt packets with packet numbers than whatever packet number you've decided you're no longer waiting for(whether due to acks of acks or some other algorithm). There might be some cases when this InteralSet needs to be a bit larger than what you serialize to an ack frame, but that's not too difficult to deal with.

The truncated packet number encoding is designed with increasing packet numbers in mind, so the only thing we're trying to allow is accepting packets within some reordering threshold. Certainly a receiver can decide to have a longer reordering threshold than an RTT, but at some point not too much longer than an RTT, it's reasonable to assume the peer retransmitted the data anyway.

@kazuho
Copy link
Member Author

kazuho commented Jun 14, 2018

@ianswett

I'm going to reiterate: I don't think duplicate detection is a hard problem. You have an IntervalSet that is used to generate an ack.

That is true, however I do not think that the fact that we can have a bullet-proof duplicate detection means that we should require every QUIC stack to have it, or that the fact justifies the encoding proposed in #1372.

Using the ACK frame (or ACK_ECN frame) to notify which PN had the CE flag set (or other bits of ECN) will simplify the sender implementation, because you can use the signal to determine certain things that the encoding proposed in #1372 cannot (or hard to) tell; e.g., when 1-RTT passes since you've seen the last CE, if there was a consecutive loss of a ECN packet (i.e. blackhole detection).

On the receiver side, I might argue that reapplying the frames without detecting a duplicate could also be a valid approach, considering the fact that we cannot decide to skip decrypting the payload just by checking if the decrypted PN collides with a previously seen PN (see QUIC-TLS draft section 10.3). Using the ACK frame that carries PNs with their ECN flags attached allows this kind of approach.

To summarize, I still think that we should consider one of the following encodings, assuming that the size of the ACK (or ACK_ECN) frame will be acceptable, if not smaller.

@mikkelfj
Copy link
Contributor

A 10Gbps link with 1500 byte packets would have 10**9 / 1500 / 10 bytes in flight if RTT is 100ms which equals 2/3 * 100,000. If each duplicate entry is 32 bytes in a RB tree (left, right, color, PN) as is a common representation, a single connection consumes 32 * 2/3 * 100,000 = 2MB.

Even if range maps are more efficient, they do not handle malicious gaps and they actually consume more space in that case. Other data strutures are more effective, but also more complex and less likely to be implemented.

Granted, it is limited how many 10Gbps connections you can maintain concurrently with current tech, so 2MB might not be that bad but why place such a constraint if it can be avoided?

@ad-l
Copy link
Contributor

ad-l commented Jun 14, 2018

Throwing away a packet after decrypting its packet number is quite clearly forbidden by the spec.

An attacker can guess values for packet numbers and have an endpoint
confirm guesses through timing side channels. If the recipient of a
packet discards packets with duplicate packet numbers without
attempting to remove packet protection they could reveal through
timing side-channels that the packet number matches a received
packet. For authentication to be free from side-channels, the entire
process of packet number protection removal, packet number recovery,
and packet protection removal MUST be applied together without timing
and other side-channels.

This text is there for a reason - packet number encryption (which is already very brittle) becomes mostly useless if you do this. In any case, @kazuho ECN proposal does not require packet deduplication from what I understand.

@ianswett
Copy link
Contributor

@mikkelfj I'm ok with 2MB per connection given I'd be spending a lot more than that on send/retransmit buffer. And if I wanted to limit state for malicious attackers, I can just limit the number of ranges and the only cost is some very old packets(ie: 1000 ranges means over 2000 packets ago) will not be decryptable, which seems fine.

@kazuho
Copy link
Member Author

kazuho commented Jun 14, 2018

@ad-l That's true. OTOH, I assume that we do not disallow throwing the packet away without applying the frames after decrypting the payload when observing a PN collision?

@mikkelfj
Copy link
Contributor

mikkelfj commented Jun 14, 2018

2000 packets is then 3% of the inflight data.

EDIT: but who is counting - it is the attacker that gets to retransmit.

@gloinul
Copy link
Contributor

gloinul commented Jun 15, 2018

@kazuho

So the receiver at the point of sending an ACK has received PN=2,3,4 where 4 has a CE mark. Then the ACK will say: Largest PN=4, length=3, CE vector = 001.

But then arrives PN=4, 5, 5 before it is time to send the next ACK. So the duplicate of PN=4 has ECN=ECT(0), and first 5 has ECT(0) and second ECN-CE set. Thus the ACK will be:
Largest PN=5, length 4, CE vector 0001?
#1439 (comment) by @gloinul

I'd assume that the second ACK will contain CE vector of 0010.

So the receiver would thus indicate that for PN=4 you still have a CE mark, and report only on the first PN=5 packet that it is not marked. And then send an additional ACK with a CE vector of 0011?

Sorry, I find your duplication handling to be very inconsistent. It appears to do suppression in some cases, and reporting others. I don't quite see what rules you have. Could you please clarify what rules the receiver uses and what the reporting and ACKing rules truly are?

I had missed the side channel attack, if the protection against this is to spend cycles on verifying the packet, so be it then. I still don't see why throwing it away directly after that is not simply the easiest option. Handling ACKs that are duplicate, sure due to the tracking of "newly" acked should yield 0 new ACKed packets and thus no impact on the congestion window state. The data is supposed to be the same, so if you copy it to the same buffer the result should be the same, except what happens is that receiver buffer is already consumed by the application? Which I guess means tracking and knowing that the buffer is not any longer available. I think this shows that although it is possible to handle duplicated packets on a frame type basis, one still need to consider what checks against existing state one has to do to avoid affecting that state again.

When it comes to the requirement on packet duplication handling in QUIC, from my perspective the ECN part is only a minor part of this. I think it would be good to settle that, and that has its own issue #1405. ECN can be made to cope with counting packet duplicates the things needed are:

  • Avoiding the duplicates causing slack in the verification mechanisms. For a counter mechanism this works as long as one also know how many duplicates has been counted. This is either its own counter or another mechanism in the ACK indicating which PNs has been received multiple times.
  • Being able to determine in which order different markings has been received. This to combat the attacker on the side that receives copies of stream packets and then manipulates the packets and forwards them.

So the requirement on the encoding for ECN will be dependent if duplicated packets are processed or not. However, it is something that can be dealt with, but it will affect the required overhead.

@mikkelfj
Copy link
Contributor

The data is supposed to be the same, so if you copy it to the same buffer the result should be the same, except what happens is that receiver buffer is already consumed by the application? Which I guess means tracking and knowing that the buffer is not any longer available. I think this shows that although it is possible to handle duplicated packets on a frame type basis, one still need to consider what checks against existing state one has to do to avoid affecting that state again.

Same content in the receive buffer of a stream happens frequently with retransmissions in new packets so duplicate packets are just a special case of this. Therefore frame based deduplication is a must have and does not get any easier with duplicate packet filtering.

@kazuho
Copy link
Member Author

kazuho commented Jun 15, 2018

@gloinul

Sorry, I find your duplication handling to be very inconsistent. It appears to do suppression in some cases, and reporting others. I don't quite see what rules you have. Could you please clarify what rules the receiver uses and what the reporting and ACKing rules truly are?

I would argue that what truly matters is if sender recognizes a congestion event. Not what is sent in ACK.

By using ACK frames to carry the PNs along with their CE bits, it is possible for the sender to find and reject duplicate signals. The information that reaches the sender first is adopted (in contrast to the approach of #1372 that adopts the information that reaches the receiver first). That is the principle.

But note that the encodings proposed on this issue does not forbid receivers from detecting and rejecting duplicates the same way that is required in #1372. It is just that such behavior becomes optional.

When it comes to the requirement on packet duplication handling in QUIC, from my perspective the ECN part is only a minor part of this. I think it would be good to settle that, and that has its own issue #1405.

I disagree.

My understanding is that the encoding introduced in #1372 is the only thing that requires the receiver to implement duplication detection. Part of my motivation for proposing an alternative encoding is to remove that requirement, thereby closing #1405 or relaxing the requirements that will be introduced there.

@kazuho
Copy link
Member Author

kazuho commented Jun 15, 2018

@martinthomson

If we were going this way, maybe it needs a better encoding than bit-stuffing a varint. Not sure what that looks like though.

An alternative approach will be to use something similar to the 6+ encoding of HPACK / QPACK, like below:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
|M M|  Length   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    [Additional Length (i)]                  ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In this example,

  • next mode is represented as two bits (M), using the formula proposed in Improve ACK_ECN frame encoding (e.g., use bit-vector) #1439 (comment).
  • block length between 1 to 63 is directly encoded using the 6 bits of the Length field. In that case, the Additional Length field is omitted.
  • if the block length is greater than 63, then Length field is encoded as zero, and block_length - 64 is encoded as a varint in the Additional Length field.

There will be no additional overhead on a non-ECN-capable path for short blocks (blocks below 64), and one octet overhead for longer blocks. And the encoding overhead of ECN flags will be exactly the same as what we will have with gaps.

If we consider mode depending on the previous mode is an issue, we could split the first octet into 5*51 instead of 4*64.

@gloinul
Copy link
Contributor

gloinul commented Jun 18, 2018

@kazuho

By using ACK frames to carry the PNs along with their CE bits, it is possible for the sender to find and reject duplicate signals. The information that reaches the sender first is adopted (in contrast to the approach of #1372 that adopts the information that reaches the receiver first). That is the principle.

Okay, if that is the general principle to make this work, you can't only provide CE to PN mappings. You need to indicate which packets get ECT or not-ECT marks also. i.e. you need to provide the ECN field value for all the packets included in the ACK. Then the other mechanisms needed for ECN than congestion event detection also will work.

Yes, as long as we have any counters in the ECN encoding, the receiver will need to either suppress the duplicates from being counted, or include a counter for how many duplicate packets it has included in the counters for the various ECN field values received.

So in regards to duplicated packets, what is your policy for handling that truly when a receiver don't suppress the duplicate prior to ECN processing? [Additional required for counter handling when have CE vector and counters for ECT]
A) The receiver reports what was first received for a PN, i.e. a duplicate is not allowed to overwrite any already stored ECN field value. [Any duplicate ECN field value is not added to counters]
B) The receiver reports what is stored in receiver when generating the ACK, i.e. the latest received information. Thus duplicates may overwrite ECN field values of the first, if ACK hasn't been sent yet. [Added to counter in the interest of simplicity]
C) Reception of an ECN field value that is different from what already exists forces an ACK, then the field value is overwritten and reported in the next ACK packet. [Counter values added for all packets]
D) The ACK reports the ECN Field values explicit for first received and then any duplicates making it clear that this is multiple received packets for the PN? [Using counters this require a counter for duplicates]

My comment on these policies are the following:
A) Works and is an ECN level suppression of duplicate packets
B) Sensitive to on-the-side ECN field value attacks. [Duplication causes slack in detection]
C) Forces a more complex handling of packets and triggering of ACKs and in cases of ACK loss an on-the-side attack may get its information in. [Duplication causes slack in detection when subject to ACK loss]
D) Full capability for sender to interpret information, more complex encoding the ECN information to enable multiple copies of a PN to be correctly represented. [Works when counting duplicates]

So beyond the primary purpose of ECN to detect congestion events, the mechanism needs to have capability also to:

  • Detect working paths
  • Detect bleaching in mid-connection
  • Detect working path after connection migration
  • Mitigate attacks, at least off-path or on-side attacks. On path attackers can cause the same response as well as additional issues by dropping packets for a connection.

I hope this clarifies my view why I don't think CE vectors are sufficient. For a solution that doesn't suppress duplicates prior to ECN processing, one either need explicit counting of number of duplicates if using counters, or a per packet information that that includes both bits of the ECN field as well as indicating any duplicates not suppressed by the receiver. I think a efficiency comparison both in regards to size of ACKs as well as implementation complexity between these two are fine and should be done.

@kazuho
Copy link
Member Author

kazuho commented Jun 18, 2018

@gloinul

Okay, if that is the general principle to make this work, you can't only provide CE to PN mappings. You need to indicate which packets get ECT or not-ECT marks also. i.e. you need to provide the ECN field value for all the packets included in the ACK. Then the other mechanisms needed for ECN than congestion event detection also will work.

That's a fair point. And I think that the two approaches being discussed in this issue are capable of carrying the necessary signal.

In the bit-vector approach, assuming that only the distinction between three types (i.e. non-ECT, ECT, ECT-CE) is necessary, a receiver can send an ACK frame (instead of an ACK_ECN frame) when it receives a packet (that carries a not-yet-processed PN) with a non-ECT mark. ACK_ECN frame that carries the bit vector (that uses 1-bit per PN showing the distinction between ECT and ECT-CE) will be used only when all the packets that the receiver has received with one of the ECT flags set are being acknowledged.

In the ACK-block-with-mode approach (1, 2, 3), all 4 types (ECT, ECT(0), ECT(1), ECT-CE) are carried in the ACK frame for each PN.

So in regards to duplicated packets, what is your policy for handling that truly when a receiver don't suppress the duplicate prior to ECN processing?

The "sender" of the packet uses the ECN information that was first reported for the PN that it has sent, which is provided by ACK (or ACK_ECN) frame that is sent back from the receiver.

This is very simple to implement, because a sender iterates though the PNs found in an ACK frame to see if each of them has been newly reported, to process acknowledgements.

So beyond the primary purpose of ECN to detect congestion events, the mechanism needs to have capability also to:

  • Detect working paths

Assuming that you are referring to ECN blackhole, this is something that can be detected in either of the two schemes proposed in this PR. Note that you can detect a blackhole in any ECN scheme, because not seeing an ACK for a packet that was sent with a ECN flag is the signal we will be looking for.

  • Detect bleaching in mid-connection

This can be done in either of the two schemes proposed in this PR. Please refer to the discussion on top of this comment.

  • Detect working path after connection migration

This would be indifferent from the process we would run for the initial path.

  • Mitigate attacks, at least off-path or on-side attacks. On path attackers can cause the same response as well as additional issues by dropping packets for a connection.

I agree that we do not need to care about on-path attacks. My understanding is that for off-path and on-side attacks, detecting duplicates on the "sender" side is sufficient.

Besides, note that I am not opposed to implementations detecting duplicates on the receiver-side. In fact, we can expect that even a naive QUIC stack, would delay sending an ACK. And if such stack receives a packet that carries the same PN as one that is found in the ACK queue, it can discard the newly received packet along with the ECN information.

Let me restate that transmitting PNs along with their ECN bits has benefits. It improves the quality of the signal at the same time simplifying the sender code (because the sender can accurately tell which packet carried which ECN flag), and also has the chance of making the ACK frame encoding smaller / simpler.

@gloinul
Copy link
Contributor

gloinul commented Jun 19, 2018

@kazuho Yes, now we are getting to proposals that appear to be working from my perspective. I think the one bit proposal is actually quite interesting, but I think there need to be feedback from people who better understand L4S and any other proposals for using ECT(1).

Why I say I think it is workable is that if the assumption holds that a packet sender will mark with only one of the ECT codepoints. And the use of ECT(1) and the response function is negotiated, likely in Transport Parameters then we don't usually need to care about which ECT codepoint it was. The only thing that likely should be there is that if a receiver sees a packet being remarked from the assumed ECT(1) to ECT(0) for example then it could report that remarking using the ACK frame, i.e. make it equivalent to a bleaching to Not-ECT to indicate that change and thus likely result in ECN being turned off due to the bleaching occurring.

However, I would really prefer if we can get #1372 landed first, then draft a detailed proposal for the change, so that anyone can clearly evaluate the pro and cons of the change.

@kazuho
Copy link
Member Author

kazuho commented Jun 20, 2018

@gloinul Thank you for your comments and your efforts on this issue. Using TP to indicate the use of ECT(1) sounds like a novel approach.

However, I would really prefer if we can get #1372 landed first, then draft a detailed proposal for the change, so that anyone can clearly evaluate the pro and cons of the change.

Agreed.

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Jul 11, 2018
@martinthomson
Copy link
Member

Coming back to this (because I want this closed, and the current situation is annoying me a little).

In general, there are three common things that we might want to signal:

  1. a packet was received and it had the default marking (whatever the expectations are regarding that)
  2. a packet was not received (a gap)
  3. a packet was received with an ECN-CE marking

Receiving a packet with a non-default marking (either no marking at all, or ECT(1) if ECT(0) is expected) is an error condition and one that we don't necessarily want to optimize for.

So here is my suggestion, stealing the design @kazuho proposed, which includes bit-packing varints (something I'm not super-happy about, but the alternatives all seem to be worse, so I'm holding my nose a little).

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Largest Acknowledged (i)                ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          ACK Delay (i)                      ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Marking(8)  |        ACK Block Count (i)                  ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           ACK (i)                          ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          [ACK (i) ...]                     ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Largest Acknowledged and ACK Delay are the same as always.

The Marking field takes three values that describes what the "default marking" is for packets that are acknowledged in this frame. It takes three values: 0 means that it's ECT(0), 1 means ECT(1), and 2 means that there is no marking. This means that if you get non-default markings, or you lose markings, then you might need to send multiple ACK frames. (This is just for the sake of an example; I imagine that it would be better to define three types of ACK frame here to save the octet.)

The ACK Block Count indicates how many additional ACK blocks are signaled past the first.

Each ACK is a single varint. A type is encoded into the lowest bit of the integer value. Recovering type is stateful. The observation is that though there are three types, the same type can't be repeated. So, starting with the assumption that things start with a type of "gap", you can cover all possible angles.

Thus, a frame can be decoded using the following pseudocode:

largest_acknowledged = read_varint()
ack_delay = read_varint()
default_marking = read_octet()
ack_count = read_varint()

ack_type = 0
default_types = [ 'ECT(0)', 'ECT(1)', 'unmarked' ]
ack_types  = ['gap', default_types[default_marking], 'marked']
start = largest_acknowledged
for i in range(0, ack_count+1):
   ack = read_varint()
   ack_type =  (ack_type + 1 + (ack&1)) % 3
   end = start + 1 + (ack>>1)
   print(f'{ack_types[ack_type]} from {start} to {end}\n')
   start = end + 1

The consequences of a change like this that I can see:

  • It is much less efficient to signal changes in default marking (a design goal, see above)
  • Markings are expressed perfectly (no counts and the associated problems with overcounting we have today)
  • A gap can be signaled at the end of a block (we might want to forbid this explicitly: I can't see any point in allowing it)
  • There is just one ACK frame format, even if it takes three frame type codepoints to express

Yes, the processing is a tiny bit more complicated, but I think that these changes represent a net win.

@mikkelfj
Copy link
Contributor

mikkelfj commented Aug 1, 2018

I like the idea of avoiding counters, but depending on how packet history is handled, couldn't this lead to more overhead?

Consider a sliding bitmap tracking all packets that are not yet ACK'ed. If ECN state is explicit, more bits might be required in the bitmap - or they could just be dropped - but I am not sufficiently into ECN processing to tell.

@kazuho
Copy link
Member Author

kazuho commented Aug 1, 2018

@martinthomson That is an interesting design.

I can understand the motivation for having more bits to represent the size of an ACK block using just one octet. OTOH, the complexity of the design makes me anxious.

The issue with the design is that, if there is an exception (i.e. packets received with more than two types of flags) you need to either iterate though the PN list more than once or to generate multiple ACK frames concurrently while iterating though the PN list.

Honestly speaking, I think that the multi-modal block approach that uses 2 bits per block (the one you suggested in
#1439 (comment)) is dense enough. It would mean that we can pack up to 16 packet numbers into a single-octet ACK block. I do not think that spending a second octet for the 17th packet number is an issue, because it means spending at worst 0.5 bit more for every packet that you receive.

If we need to care that much about the bandwidth consumed by ACKs, I think we should rather consider how to reduce the frequency of sending acknowledgements, considering the fact that we have at least 18-octet overhead for every packet that we send.

@gloinul
Copy link
Contributor

gloinul commented Aug 22, 2018

I spent some time analyzing the overhead impact of these two proposals and compared them against the original. In this analysis I have looked at cost of encoding gaps/loss and ECN-CE marks for a varying number of packets included in a single acking. As loss gaps as well as CE marks are expected to result in immediate acking the most realistic cases are that a single ACK even with one or two RTTs of overlap actually will not contain more than one or two loss or CE mark sequences. Thus I focused on these. But I also looked at the cost if one gets a lot of transitions back and forth. I also would like to note that this is focused on the case with working ECN so no strange behaviors.

The overhead for what is currently in -14:

Loss/gap CE Packets per ACK              
1 2 5 10 21 56 112 319 2000 8000
0 0 19 19 19 19 19 19 21 21 21 21
0 1 19 19 19 19 19 19 21 21 21 21
1 0 - 21 21 21 21 21 21 25 25 25
1 1 - 21 21 21 21 21 21 25 25 25
2 1 - 23 23 23 23 23 23 29 29 29
2 2 - 23 23 23 23 23 23 29 29 29
0 2 - 19 19 19 19 19 21 21 21 21
0 All 19 19 19 19 19 19 21 21 21 21
5 15 -   - 29 29 29 29 29 41 41
20 50 -       59 59 59 59 59 101
100 300             219 219 219 219

For Martins proposal above but encoding marking in the frame type:

Loss / GAP CE Packets per ACK                
    1 2 5 10 21 56 112 319 2000 8000
0 0 7 7 7 7 7 7 8 8 8 8
0 1 14 14 14 16 16 16 22 22 22 22
1 0   9 9 9 9 9 13 13 13 13
1 1   18 18 18 18 18 20 26 26 26
2 1   20 20 20 20 20 22 30 30 30
2 2   24 24 24 24 24 24 38 38 38
0 2   7 20 20 20 20 20 30 30 30
0 All 7 7 7 7 7 7 8 8 8 8
5 15         82 82 82 82 154 154
20 50           252 252 252 252 494
100 300               1412 1412 1412

For the single octet blocks with 2-bit marking type:

Loss / GAP CE Packets per ACK                
    1 2 5 10 21 56 112 319 2000 8000
0 0 7 7 7 7 7 7 8 11 38 131
0 1 9 9 9 9 9 9 9 12 39 132
1 0 9 9 9 9 9 9 9 12 39 132
1 1 11 11 11 11 11 11 11 11 41 131
2 1 13 13 13 13 13 13 13 13 41 132
2 2 15 15 15 15 15 15 15 15 42 132
0 2 11 11 11 11 11 11 11 11 41 131
0 All 7 7 7 7 7 7 8 11 38 131
5 15         47 47 47 47 47 170
20 50           148 148 148 148 148
100 300                 808 808

The conclusions I can draw that is that Martin's scheme for one or two changes in the range being acked are basically equivalent to the current scheme while providing more explicit information about which packets being marked. The single byte ack block outperforms both the current and martins proposal for low rate of changes and for ack vector lengths up to a few hundred packets. After that the ACK vector growth becomes larger due to the one byte per 64 packets. However, this growth is low enough that a single minimal MTU (1280 bytes) will be able to ack a range that is more than 70000 packets. Compare this to the Bandwidth delay product to packet rate table above for perspective.

For cases with high frequency of alternating marking the one byte scheme will be more efficient. This because if one have mix of packet losses and CE marks (likely under severe congestion) then one ends up encoding both CE marks and losses as gaps in the ECT ack vector. Thus martins proposal will have a higher total number of transitions needed to be encoded than the one byte proposal. I will note that calculation performed is the worst case when it comes to transitions with them evenly spread between arriving packets both gaps and CE marks.

There is also a difference when it comes to performing verification of the ECN between these solutions. So Martin's proposal require for different markings or frame types to express if a packet arrived with Not-Ect, ECT(0), ECT(1), and ECN-CE. The 1-byte block can only do marking in two bits by employing information loss for a non-expected case. The non-default ECT marking will have to be reported in the Not-ECT type to indicate this non-expected behavior. And if anyone like to use this for a QUIC connection that uses both ECT(0) and ECT(1) it will not be possible to carry through, even if one could define a transport parameter to say that there is no default ECT and both shall be reported as ECT.

In the end I think this comes down to the question of how much processing overhead the schemes incur versus the differences there actually are between them. I personally need to think a bit more about that.

@gloinul
Copy link
Contributor

gloinul commented Aug 24, 2018

After having read Martin's updated proposal on the mailing list I wanted to post overhead figures for that encoding also. So this is the case of using 2 bits per block to encode the change delta value of the type so that all 5 types can be used, and then have a variant of the variable length encoding but with 2 less usable bits for values.

Loss / GAP CE Packets per ACK                
    1 2 5 10 21 56 112 319 2000 8000
0 0 7 7 7 7 8 8 8 8 8 9
0 1 9 9 9 9 9 12 12 12 12 12
1 0 9 9 9 9 9 12 12 12 12 12
1 1 11 11 11 11 11 11 16 16 16 16
2 1 13 13 13 13 13 13 13 20 20 20
2 2 15 15 15 15 15 15 15 24 24 24
0 2 11 11 11 11 11 11 16 16 16 16
0 All 7 7 7 7 7 7 8 8 8 9
5 15         47 47 47 47 88 88
20 50             148 148 148 289
100 300                 808 808

As I see it this has the best properties when it comes to overhead. More efficient for small ack vectors, efficient encoding for larger ranges. Yes the additional byte comes at 16 rather than 64 consecutive packets but the the other choices are worse from my perspective.

martinthomson added a commit that referenced this issue Aug 27, 2018
This takes the design discussed on the list and enacts it.  This turns
out to be quite simple.  Lots of red in this change.

This treats ECT(1) as equivalent to unmarked, meaning that QUIC won't be
able to use ECT(1) without an extension or revision.  That keeps the
encoding simpler.

Closes #1439.
@martinthomson
Copy link
Member

Discussed in NYC and we decided that the failings of the existing scheme are the least worst option.

@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

8 participants