Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW_CONNECTION_ID text is too restrictive on receivers #3534

Closed
martinduke opened this issue Mar 19, 2020 · 10 comments · Fixed by #3535
Closed

NEW_CONNECTION_ID text is too restrictive on receivers #3534

martinduke opened this issue Mar 19, 2020 · 10 comments · Fixed by #3535
Labels
-transport editorial An issue that does not affect the design of the protocol; does not require consensus.

Comments

@martinduke
Copy link
Contributor

This is a small point, but it's pointlessly creating pain in our implementation and possibly some others. In Sec 19.15 (NEW_CONNECTION_ID frame):

An endpoint that receives a NEW_CONNECTION_ID frame with a sequence number smaller than the Retire Prior To field of a previously received NEW_CONNECTION_ID frame MUST immediately send a corresponding RETIRE_CONNECTION_ID frame that retires the newly received connection ID.

IMO this is poorly formulated. If the receiver has already sent an RCID frame for the sequence number and that RCID has been acked, it may have thrown away state for that sequence number and shouldn't have to resend a frame the peer has received. In particular, a buggy or malicious peer might send a very old sequence number, or the NCID receiver might have preemptively retired a sequence number it hadn't seen when it got the retire-prior-to field.

Simply appending 'if has not already done so' to the text above would do the trick I think. I'm happy to file a PR unless most people think this is a terrible idea.

@martinduke
Copy link
Contributor Author

As a related point, it would be useful to say that an NCID receiver SHOULD NOT immediately retire old CIDs solely as the result of the NCID frame, except when instructed by Retire-Prior-to. This can lead to infinite loops if the peer tries to keep a specific number of CIDs available.

@kazuho
Copy link
Member

kazuho commented Mar 19, 2020

Simply appending 'if has not already done so' to the text above would do the trick I think.

I think this would be a nice clarification.

As a related point, it would be useful to say that an NCID receiver SHOULD NOT immediately retire old CIDs solely as the result of the NCID frame, except when instructed by Retire-Prior-to. This can lead to infinite loops if the peer tries to keep a specific number of CIDs available.

I think we already prohibit such behavior through the use of active_connection_id_limit TP. Section 5.1.1 states: endpoints store received connection IDs for future use and advertise the number of connection IDs they are willing to store with the active_connection_id_limit transport parameter.

To paraphrase, by using the TP, an endpoint willing to retain N active connection IDs prohibits the peer from sending more than N.

@martinduke
Copy link
Contributor Author

I think we already prohibit such behavior through the use of active_connection_id_limit TP.

I was probably unclear. For a while, our implementation was doing something dumb: each time it got a new CID, it started using it and retired all previous ones regardless of Retire Prior To. This created infinite loops where the client was trying to make sure that we always had two CIDs. We should probably warn against that.

Please see if the text in the PR is clear enough.

@erickinnear
Copy link
Contributor

Taking a look at the PR, I'm finding the proposed text to be a bit unclear, too.

Based on the discussion on the list, this seems to be a misreading of the text itself.
From the list:

I MUST send RCID for everything below retire-prior-to, whether or not I've seen the NCID for it.

I don't believe that this is actually what the text says. The text says:

An endpoint that receives a NEW_CONNECTION_ID frame with a sequence number 
smaller than the Retire Prior To field of a previously received NEW_CONNECTION_ID 
frame MUST immediately send a corresponding RETIRE_CONNECTION_ID frame 
that retires the newly received connection ID. 

This to me reads as "If you said everything below CID 10 is done, and then you send me CID 8, I must immediately send a RCID frame for 8 since I can't use it".

CID handling that doesn't require potentially large amounts of state to keep track of long-retired CIDs

If the receiver has already sent an RCID frame for the sequence number and that RCID has been acked, it may have thrown away state for that sequence number and shouldn't have to resend a frame the peer has received. In particular, a buggy or malicious peer might send a very old sequence number, or the NCID receiver might have preemptively retired a sequence number it hadn't seen when it got the retire-prior-to field.

I don't believe that there is any CID-specific state required at all for this. You do need to store the current highest value that you've ever seen for RPT, which is a single integer, I'll call it currentRPT.

Beyond that, when you get a NCID frame, you check the sequence number. If it's less than currentRPT, you should immediately enqueue (which I think might be a reasonable wording change to make, since immediately seems to have caused concern when the intent was that you put it on the tracks leaving the station, not that you force the train out the door before it's allowed) a RCID frame for that sequence number.

Note that a RCID frame contains a single value, the sequence number of the connection ID being retired, which incidentally is contained by the NCID frame you're responding to.

We can argue as to whether or not you should bother to send RCID again when the same frame has been acked, but (a) storing that is actually more state and (b) we're working pretty hard to avoid tracking ACKs of packets.

I might be missing something here -- can you help me understand the concern about storing state in order to meet the requirements of this text?

@martinduke
Copy link
Contributor Author

Yes, I concede that the original email was garbled. I read the original text the same way you do.

The issue is that if I decide to retire immediately without actually having the sequence number, that should be sufficient. Making me send it again after the NCID is gratuitous, and in at least one implementation creates state solely to track this corner case.

@erickinnear
Copy link
Contributor

Ah cool, that's excellent then! :)

What's the state that gets created to track that case? If you retired them all upon getting RPT without checking to see if you had them yet, don't you still have to keep track of the RPT value or which ones you've already retired? I'm probably missing some other piece of state in this case...

@kazuho
Copy link
Member

kazuho commented Mar 19, 2020

I think what an endpoint is required to track are:

  • active connection IDs
  • sequence number of connection IDs that are being retired (but are yet to be acknowledged)

When receiving a NCID frame with RPT, an endpoint checks if there are any active CIDs below the RPT, and retires them. I do not think an endpoint has to track RPT, though admittedly, I haven't implement them (yet).

@martinduke
Copy link
Contributor Author

martinduke commented Mar 20, 2020

@kazuho what you suggest would certainly work. However, a malicious peer could send an infinite number of NCIDs as long as it kept advancing rpt and didn't ack RCIDs, spiraling the state upwards.

To avoid this, we keep a window of unacked but retired sequence numbers and construct RCID from that. That's why having an NCID come in well below the window results in additional state for us.

@kazuho
Copy link
Member

kazuho commented Mar 20, 2020

@martinduke Exactly. We have an issue for that problem: #3509. Agreeing on what to do to resolve the unbounded state problem might help resolving this issue too.

@martinthomson martinthomson added -transport editorial An issue that does not affect the design of the protocol; does not require consensus. labels Mar 20, 2020
@martinthomson
Copy link
Member

Taking this as editorial based on the proposal (and promised tweaks).

@larseggert larseggert added this to Triage in Late Stage Processing via automation Mar 20, 2020
@larseggert larseggert moved this from Triage to Editorial Issues in Late Stage Processing Mar 20, 2020
Late Stage Processing automation moved this from Editorial Issues to Text Incorporated Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport editorial An issue that does not affect the design of the protocol; does not require consensus.
Projects
Late Stage Processing
  
Issue Handled
Development

Successfully merging a pull request may close this issue.

4 participants