Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer interval for retire the connection IDs #3215

Closed
gorryfair opened this issue Nov 11, 2019 · 11 comments
Closed

Timer interval for retire the connection IDs #3215

gorryfair opened this issue Nov 11, 2019 · 11 comments
Labels
-transport design has-consensus

Comments

@gorryfair
Copy link
Contributor

@gorryfair gorryfair commented Nov 11, 2019

This could be linked to #2130?

/Failing to
retire the connection IDs within approximately one PTO can cause
packets to be delayed, lost, or cause the original endpoint to send a
stateless reset in response to a connection ID it can no longer route
correctly./

  • I am unsure what was intended : I’ll argue this is the wrong period and that one PTO interval seems like it could be small to me. To me, the PTO is an estimate of the likely response time of the remote endpoint - that’s an effective period to commence a retransmission or stimulate a probe/refresh. Sending a stateless reset is not a minor event, and as such be tolerant to MSL or at least what in TCP is the RTO. In addition, I don’t have a precise understanding of “route” in this context, and expected something about the path rather than these words?
@MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Nov 11, 2019

It's tied up in why you would retire CIDs. Various conditions mean that a CID issuer's infrastructure might no longer be able to honor old CIDs (i.e. recognize them as valid and get them to the correct recipient). However, you usually either know that's coming or can extend the period with some temporary extra state.

The CID issuer will stop accepting the CIDs in question 3 PTO after sending the frame requesting retirement. That 3-PTO period covers transit time to the peer, any time the peer decides to wait before it stops using the old CIDs, the return transit time for any packets that were already in flight before the peer stopped, and any extra time those packets might have been delayed. If packets arrive late, they will not be processed and will generate stateless resets. Those shouldn't be connection-fatal; if the peer has dropped the CIDs and corresponding tokens correctly, then they won't recognize the stateless resets as matching the current connection and will drop them. But you've caused packet "loss" in excess of what happened on the network if you let it get to that point.

This is advice that the peer not wait more than 1 PTO, to make it likely that the issuer is not still receiving packets with old CIDs after the 3-PTO timer has expired. It's an arbitrary period, no doubt, but it's a better arbitrary period than anything else we've come up with. Alternative suggestions welcome.

@martinthomson
Copy link
Member

@martinthomson martinthomson commented Nov 11, 2019

Now that I see this all written down like this, I wonder whether we should be insisting on the issuer waiting for acknowledgment first.

@MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Nov 11, 2019

Waiting for the retirement, or waiting for the ACK of the NCID frame that carried the demand for retirement?

@martinthomson
Copy link
Member

@martinthomson martinthomson commented Nov 11, 2019

Before starting any timer. At least that way any propagation of the request to retire is taken out of consideration.

@erickinnear
Copy link
Contributor

@erickinnear erickinnear commented Nov 11, 2019

The text itself around 1PTO is just a general note of "if you don't retire then your traffic might stop working". Would it be better to remove the specifics of 1PTO (then the 3PTO timer can either wait for ACK or not)?

@MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Nov 12, 2019

In the other place, it just says "SHOULD... in a timely manner," but there's some value in defining what constitutes "timely."

@kazuho
Copy link
Member

@kazuho kazuho commented Nov 12, 2019

I might argue that the 3 PTO suggestion for the issuer makes sense. We need some guidance for the issuer that is easy to implement and also tolerant to reordering. 3 PTO is a good fit for that.

The question IMO is if we need something more than that. It might be possible to argue that the current recommendation of 1 PTO is a rough derivation of the 3 PTO rule, but it's confusing.

Maybe something like "SHOULD ... in a timely manner, as the issuer of the CIDs might drop the retired CIDs as early as 3 PTO" would be enough.

@mnot mnot added this to Triage in Late Stage Processing Nov 12, 2019
@larseggert larseggert added the design label Feb 4, 2020
@project-bot project-bot bot moved this from Triage to Design Issues in Late Stage Processing Feb 4, 2020
@larseggert
Copy link
Member

@larseggert larseggert commented Feb 5, 2020

Discussed in ZRH. Proposed resolution is to talk it over over lunch.

@kazuho
Copy link
Member

@kazuho kazuho commented Feb 5, 2020

To me it seems that there is confusion regarding what "retire" means.

I'd argue that when receiving a NCID frame carrying a retirement request, the receiver can promptly stop using CIDs that are to be retired, and stop recognizing the Stateless Reset Tokens that are associated with the CIDs being retired. At the same time, it could well be the case that the receiver might not be able to send RETIRE_CONNECTION_ID frames at that point (due to CWND being restricted, etc.).

Considering this, I think what we should do is be clear about the distinction. To be precise we can change the following existing text:

Upon receipt, the peer MUST first retire the corresponding connection IDs using RETIRE_CONNECTION_ID frames and then add the newly provided connection ID to the set of active connection IDs. Failure to retire the connection IDs within approximately one PTO can cause packets to be delayed, lost, or cause the original endpoint to send a stateless reset in response to a connection ID it can no longer route correctly.

to:

Upon receipt, the peer MUST promptly stop using the corresponding connection IDs and stop recognizing Stateless Reset Tokens associated with those connection IDs. Failure to doing so within approximately one PTO can cause packets to be delayed, lost, or cause the original endpoint to send a stateless reset in response to a connection ID it can no longer route correctly. After stopping use of those connection IDs and Stateless Reset Tokens, the receiver MUST send RETIRE_CONNECTION_ID frames to indicate the peer that those have been retired.

We can also change the 1PTO and 3PTO requirement, but I do not see a reason.

@larseggert
Copy link
Member

@larseggert larseggert commented Feb 5, 2020

Discussed in ZRH. Proposed resolution is to close with no action. In addition, potential new Editorial issue to clean up existing text. Discussion on removing Retire-Prior-To is in #3420.

@larseggert larseggert added the proposal-ready label Feb 5, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Feb 5, 2020
@MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Feb 5, 2020

@kazuho: The reason for "first retire... and then...." is to enable working within the CID limit. If you're at the limit already, the server can ask you to retire everything and give you a new one. That does mean there are "limit + 1" CIDs outstanding at that time. Looks like Martin has opened #3422 to make this clearer.

@LPardue LPardue moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Feb 19, 2020
@LPardue LPardue removed the proposal-ready label Feb 19, 2020
@LPardue LPardue added the call-issued label Feb 26, 2020
@LPardue LPardue added has-consensus and removed call-issued labels Mar 4, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Mar 4, 2020
Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design has-consensus
Projects
Late Stage Processing
  
Issue Handled
Development

No branches or pull requests

7 participants