NEW_CONNECTION_ID text is too restrictive on receivers #3534

martinduke · 2020-03-19T19:48:02Z

This is a small point, but it's pointlessly creating pain in our implementation and possibly some others. In Sec 19.15 (NEW_CONNECTION_ID frame):

An endpoint that receives a NEW_CONNECTION_ID frame with a sequence number smaller than the Retire Prior To field of a previously received NEW_CONNECTION_ID frame MUST immediately send a corresponding RETIRE_CONNECTION_ID frame that retires the newly received connection ID.

IMO this is poorly formulated. If the receiver has already sent an RCID frame for the sequence number and that RCID has been acked, it may have thrown away state for that sequence number and shouldn't have to resend a frame the peer has received. In particular, a buggy or malicious peer might send a very old sequence number, or the NCID receiver might have preemptively retired a sequence number it hadn't seen when it got the retire-prior-to field.

Simply appending 'if has not already done so' to the text above would do the trick I think. I'm happy to file a PR unless most people think this is a terrible idea.

The text was updated successfully, but these errors were encountered:

martinduke · 2020-03-19T19:51:47Z

As a related point, it would be useful to say that an NCID receiver SHOULD NOT immediately retire old CIDs solely as the result of the NCID frame, except when instructed by Retire-Prior-to. This can lead to infinite loops if the peer tries to keep a specific number of CIDs available.

kazuho · 2020-03-19T23:07:49Z

Simply appending 'if has not already done so' to the text above would do the trick I think.

I think this would be a nice clarification.

As a related point, it would be useful to say that an NCID receiver SHOULD NOT immediately retire old CIDs solely as the result of the NCID frame, except when instructed by Retire-Prior-to. This can lead to infinite loops if the peer tries to keep a specific number of CIDs available.

I think we already prohibit such behavior through the use of active_connection_id_limit TP. Section 5.1.1 states: endpoints store received connection IDs for future use and advertise the number of connection IDs they are willing to store with the active_connection_id_limit transport parameter.

To paraphrase, by using the TP, an endpoint willing to retain N active connection IDs prohibits the peer from sending more than N.

martinduke · 2020-03-19T23:12:46Z

I think we already prohibit such behavior through the use of active_connection_id_limit TP.

I was probably unclear. For a while, our implementation was doing something dumb: each time it got a new CID, it started using it and retired all previous ones regardless of Retire Prior To. This created infinite loops where the client was trying to make sure that we always had two CIDs. We should probably warn against that.

Please see if the text in the PR is clear enough.

erickinnear · 2020-03-19T23:44:57Z

Taking a look at the PR, I'm finding the proposed text to be a bit unclear, too.

Based on the discussion on the list, this seems to be a misreading of the text itself.
From the list:

I MUST send RCID for everything below retire-prior-to, whether or not I've seen the NCID for it.

I don't believe that this is actually what the text says. The text says:

An endpoint that receives a NEW_CONNECTION_ID frame with a sequence number 
smaller than the Retire Prior To field of a previously received NEW_CONNECTION_ID 
frame MUST immediately send a corresponding RETIRE_CONNECTION_ID frame 
that retires the newly received connection ID.

This to me reads as "If you said everything below CID 10 is done, and then you send me CID 8, I must immediately send a RCID frame for 8 since I can't use it".

CID handling that doesn't require potentially large amounts of state to keep track of long-retired CIDs

If the receiver has already sent an RCID frame for the sequence number and that RCID has been acked, it may have thrown away state for that sequence number and shouldn't have to resend a frame the peer has received. In particular, a buggy or malicious peer might send a very old sequence number, or the NCID receiver might have preemptively retired a sequence number it hadn't seen when it got the retire-prior-to field.

I don't believe that there is any CID-specific state required at all for this. You do need to store the current highest value that you've ever seen for RPT, which is a single integer, I'll call it currentRPT.

Beyond that, when you get a NCID frame, you check the sequence number. If it's less than currentRPT, you should immediately enqueue (which I think might be a reasonable wording change to make, since immediately seems to have caused concern when the intent was that you put it on the tracks leaving the station, not that you force the train out the door before it's allowed) a RCID frame for that sequence number.

Note that a RCID frame contains a single value, the sequence number of the connection ID being retired, which incidentally is contained by the NCID frame you're responding to.

We can argue as to whether or not you should bother to send RCID again when the same frame has been acked, but (a) storing that is actually more state and (b) we're working pretty hard to avoid tracking ACKs of packets.

I might be missing something here -- can you help me understand the concern about storing state in order to meet the requirements of this text?

martinduke · 2020-03-19T23:49:51Z

Yes, I concede that the original email was garbled. I read the original text the same way you do.

The issue is that if I decide to retire immediately without actually having the sequence number, that should be sufficient. Making me send it again after the NCID is gratuitous, and in at least one implementation creates state solely to track this corner case.

erickinnear · 2020-03-19T23:51:43Z

Ah cool, that's excellent then! :)

What's the state that gets created to track that case? If you retired them all upon getting RPT without checking to see if you had them yet, don't you still have to keep track of the RPT value or which ones you've already retired? I'm probably missing some other piece of state in this case...

kazuho · 2020-03-19T23:59:33Z

I think what an endpoint is required to track are:

active connection IDs
sequence number of connection IDs that are being retired (but are yet to be acknowledged)

When receiving a NCID frame with RPT, an endpoint checks if there are any active CIDs below the RPT, and retires them. I do not think an endpoint has to track RPT, though admittedly, I haven't implement them (yet).

martinduke · 2020-03-20T00:27:43Z

@kazuho what you suggest would certainly work. However, a malicious peer could send an infinite number of NCIDs as long as it kept advancing rpt and didn't ack RCIDs, spiraling the state upwards.

To avoid this, we keep a window of unacked but retired sequence numbers and construct RCID from that. That's why having an NCID come in well below the window results in additional state for us.

kazuho · 2020-03-20T00:31:02Z

@martinduke Exactly. We have an issue for that problem: #3509. Agreeing on what to do to resolve the unbounded state problem might help resolving this issue too.

martinthomson · 2020-03-20T00:46:17Z

Taking this as editorial based on the proposal (and promised tweaks).

martinduke mentioned this issue Mar 19, 2020

Added more text to NCID section #3535

Merged

martinthomson added -transport editorial An issue that does not affect the design of the protocol; does not require consensus. labels Mar 20, 2020

larseggert added this to Triage in Late Stage Processing via automation Mar 20, 2020

larseggert moved this from Triage to Editorial Issues in Late Stage Processing Mar 20, 2020

martinthomson closed this as completed in #3535 Mar 23, 2020

Late Stage Processing automation moved this from Editorial Issues to Text Incorporated Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEW_CONNECTION_ID text is too restrictive on receivers #3534

NEW_CONNECTION_ID text is too restrictive on receivers #3534

martinduke commented Mar 19, 2020

martinduke commented Mar 19, 2020

kazuho commented Mar 19, 2020 •

edited

martinduke commented Mar 19, 2020

erickinnear commented Mar 19, 2020

martinduke commented Mar 19, 2020

erickinnear commented Mar 19, 2020

kazuho commented Mar 19, 2020

martinduke commented Mar 20, 2020 •

edited

kazuho commented Mar 20, 2020

martinthomson commented Mar 20, 2020

NEW_CONNECTION_ID text is too restrictive on receivers #3534

NEW_CONNECTION_ID text is too restrictive on receivers #3534

Comments

martinduke commented Mar 19, 2020

martinduke commented Mar 19, 2020

kazuho commented Mar 19, 2020 • edited

martinduke commented Mar 19, 2020

erickinnear commented Mar 19, 2020

martinduke commented Mar 19, 2020

erickinnear commented Mar 19, 2020

kazuho commented Mar 19, 2020

martinduke commented Mar 20, 2020 • edited

kazuho commented Mar 20, 2020

martinthomson commented Mar 20, 2020

kazuho commented Mar 19, 2020 •

edited

martinduke commented Mar 20, 2020 •

edited