Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragility in ECN counters with lost acks #1481

Closed
janaiyengar opened this issue Jun 26, 2018 · 6 comments
Closed

Fragility in ECN counters with lost acks #1481

janaiyengar opened this issue Jun 26, 2018 · 6 comments
Labels
-recovery -transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@janaiyengar
Copy link
Contributor

janaiyengar commented Jun 26, 2018

The text for ECN says: "The increase in ECT(0) and ECT(1) counters MUST be no greater than the number of packets newly acknowledged that were sent with the corresponding codepoint."

As @gloinul observed in his PR and @martinthomson noted in his review, this can cause ECN to be disabled when acks are lost. A sender may never receive an ack for a packet that was in fact acknowledged, and might see an increase in ECN counters without a corresponding packet being newly acked. This should be addressed.

@martinthomson writes: "Say a marked packet gets through and gets acknowledged, but the acknowledgment is lost. And the recipient abandons the ACK. The counters will increase, but the number of newly acknowledged packets will be lower than the increase to the counters. That suggests a tweak, though it complicates things a little:

If the acknowledgments cover a range of packet numbers that extend past (or adjacent to) the largest acknowledged packet number that was previously acknowledged, then the counters must match precisely. If the acknowledgments do not extend to the largest previously acknowledged packet number, then the numbers MAY increase by the number of unacknowledged packets between the previously largest acknowledged and the newly lowest acknowledged.

That is, if an endpoint has seen an ACK_ECN for packet 10, then receives a new ACK_ECN for packet 20-30, it needs to allow for an increase in the counters for packets 11 through 19 (inclusive). It can then reset both the largest acknowledged and the values it tracks for each counter."

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport -recovery labels Jun 26, 2018
@gloinul
Copy link
Contributor

gloinul commented Jun 27, 2018

I think the general approach of handling this is correct. I don't think the MAY in the fourth paragraph is appropriate. The counters can increase due to their markings, not something that is specified by a RFC 2119 MAY.

"It can then reset both the largest acknowledged and the values it tracks for each counter." I don't think reset is the appropriate word for something that is in practice baseline offsets to the comparison.

@janaiyengar
Copy link
Contributor Author

I don't think this is adequate. @ianswett pointed out that a sender may choose to not retain state about a packet that is not yet acknowledged because, for example, the contents were retransmitted in a new packet. When an ack for such a "forgotten" packet arrives, the endpoint won't know it to be an as yet unacknowledged packet, since there's no state for the packet. At the same time, the ECN counters would have increased.

@martinthomson
Copy link
Member

Right now the text requires an exact match - the number of markings on newly acknowledged packets need to exactly match the increase in counters. If you forgot a packet (and so don't know if it's newly acknowledged), then add one to the upper limit of the number. If there is a gap between the lowest acknowledged packet number in this ACK and the largest packet number previously acknowledged, increase the upper limit by the same amount.

@gloinul
Copy link
Contributor

gloinul commented Jul 3, 2018

I agree with @martinthomson here. Assuming that ACK losses resulting in that the sender don't know what happened to some PNs at all, then adding slack corresponding to the unknown packets ensures that ECN usage does not terminate due to the loss of the ACKs. And by updating the comparison baseline on such an event one can correctly detect issues going forward. And I think implementations will want to ensure that that ACK loss that results that no indication of reception for a particular PN is rare as that would be treated as a loss, even if the packet was received.

Regarding the sender memory of what it has seen ACKs for and what not. If the sender has chosen to forget about if a PN was received or not, then it has passed beyond some Horizon. And if the packets data has been retransmitted, the sender have received some indication that the packet was lost. Which means it is likely not unaccounted for. We have to assume that the sender has a deep enough history of what it sent and has outstanding. And if the sender anyway are declaring packet loss and going into recovery, then potentially having missed some ECN-CE does not matter. Resetting the ECN counter state is thus safe in this case.

@martinthomson
Copy link
Member

I realize that my example was flawed. You have to allow for an increase for any unacknowledged packets less than the minimum acknowledged packet number in the ACK. In the example above, the last ACK stopped at 10 and the new one starts at 20. The naive interpretation says that 9 packets might have arrived, been acknowledged, and had the acknowledgment lost and subsequently abandoned. However, packet 8 might have arrived after 10 was acknowledged.

The other thing is that this assumes that if you acknowledge packet 20, then you would also acknowledge packets with higher numbers. This is probably a safe assumption, but we would have to require it.

@gloinul
Copy link
Contributor

gloinul commented Jul 12, 2018

I have drafted an initial text to resolve this. See PR #1555. I took the simpler way out that if one detects ACK loss one simple skip to do the counter comparisons. That should be less error prone and is something that should happen very seldom.

@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery -transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

4 participants