quicwg / base-drafts Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which DCID do Handshake retransmissions use? #3348
Comments
|
As I can see, there are three options:
I prefer (c) (or maybe (b)), because (a) has corner cases even though it might seem easy. Consider the case where a server sends TP.preferred_address then as the first packet, sends the NCID frame with RTP set to 2. If we adopt (a), this would mean that the client would have only one CID that it can use (the one being embedded in the NCID frame) even though it needs two (one for the original path, and another for the path specified by TP.preferred_address). As you can see, the design of the NCID has been based on the assumption that providing one new CID alongside a retirement request guarantees that the receiver would have enough active CIDs. But in this particular case, that is not the case. Based on this observation, and based on my assumption that a non-zero RTP would be sent by only some of the endpoints (the necessity of CID retirement is questionable for HTTP/3, the scope of QUIC v1, as the HTTP connections are typcially short-lived and can most likely be gracefully terminated to induce a new connection when retirement of CIDs is necessary), my preference goes to (c): an endpoint that uses RTP should cover the cost. |
|
@kazuho: (I think you mean RPT, not RTP) Thank you for the analysis. I prefer (c) as well, except that I do not want to condition it on non-zero RPT, to keep things simpler. Otherwise we're still left with what to do at the receiver. Ensuring that the handshake confirmation is acked means that there are no HS packets pending. |
|
I'll also note that this issue was found through a resilience test we've built for the interop runner -- high loss handshake test. We're excited to see how everyone's handshake machinery performs under 50% loss :-) |
|
At the time of this writing, ngtcp2 does (b). |
|
@kazuho for you problem with (a) isn't that just a general problem with having multiple paths and then getting a NEW_CONNECTION_ID with RPT for all previous CIDs? You end up in a scenario where you only have 1 CID (the new one) and multiple paths that just had all their CIDs retired. If you can get away with using that one CID for the active path, you can still make forward progress. If not, the connection dies. |
|
I believe we specify (b) already: |
|
@ianswett that is just indicating when a client should change DCID in response to receiving a Initial/Retry packet with a new CID. I don't think anything restricts either side from sending NEW_CONNECTION_ID frames in their first 1-RTT packets and their peers immediately using them for any packets (including handshake) they send. Personally, I don't have a problem with (a). I think that no matter what receivers will have to handle the multiple paths but single CID scenario, so I would prefer not to special case the handshake. I also don't think we have handshake confirmation problems because an endpoint should only send a NEW_CONNECTION_ID out when it's able to accept it. Then, if the peer can decrypt the packet with the new CID, I see no reason why it shouldn't be allowed to use it immediately. |
|
@nibanks my reading of the text I quoted is that you cannot use any DCIDs in long header packets that weren't either the original client DCID/SCID or from a server Initial or Retry. |
@kazuho There's a fourth option:
As soon as an endpoint confirms the handshake, it drops the Handshake keys (I'm assuming #3145 here). As a result, the endpoint obviously doesn't care about Handshake packets any longer (and especially doesn't care which DCID was used to send the packet). |
|
@ianswett: The part about what to do with DCID changes is underspecified. From the part of the spec you quote -- "Any additional changes are not permitted; if subsequent packets of those types include a different Source Connection ID, they MUST be discarded." -- it's not clear what these additional changes are limited to. I think the intent was for all long-form packets, but that's certainly not clear here. In any case, the problem with that approach is that the client will have to carefully construct HS packets with the old DCID while sending everything else with the new DCID. |
|
@marten-seemann : (d) works too, but an endpoint still has to decide which DCID to use for sending HS packets in the case that the HANDSHAKE_DONE frame is lost. I'd rather not leave that unspecified. |
"of those types" => Retry or Initial, since that's what it was talking about. That's trying to avoid the case where the server statelessly generates a new Retry (or Initial, in the rare case that would be stateless) with a second SCID upon receiving a retransmission or duplication of the client's Initial. But it doesn't say anything about the server issuing new CIDs via an NCID frame in 1-RTT before the handshake is confirmed. I think (a) or (b) would be fine; the server is clearly intending to accept the other CID if it issued it, but could conceivably use separate logic for long-header versus short-header routing. |
|
Can someone explain why using the new connection ID is a problem? |
|
@martinthomson I think that that is a good way at looking into the issue. I agree that the endpoints can start using the new CIDs during handshake, if the peer sends NCID frames. The only problem that I can think of is related to retirement (see #3348 (comment)). When the server provides TP.preferred_address, and also decides to retire the CIDs immediately after sending all the handshake transcript, there's going to be a very small time window in which it has to send two CIDs (one for the current path and another for the path using the preferred address). Unless the QUIC stack allows bundling of two NCID frames when building a QUIC packet, the client might end up not having sufficient amount of CIDs, in which case it might terminate the connection. It is my view that this is very different from the ordinary case of retiring CIDs, in which case there would be enough time window to guarantee that a handful number of new generation CIDs are delivered before the older ones are delivered (cc @nibanks). That said, I now tend to think that the most we might have to do is the following two:
|
@kazuho Wouldn't we be able to solve this by using my proposed option d? |
|
Artificial constraints on when the frame can be sent would be terrible to enforce and would introduce a performance cost for using NEW_CONNECTION_ID. In a great many cases, the 0.5RTT from the server is idle, which makes it a great way to get things like NEW_CONNECTION_ID sent without taking capacity from real work. Asking the server to defer sending would mean that NEW_CONNECTION_ID could compete with HTTP responses and the like. I see no reason not to allow use of connection IDs when they are available. They apply to the connection as a whole. You will note that we allow changing them with the Initial/Retry mini-protocol in ways that don't correlate with other connection-level events, so this isn't any different in that regard. Forcing the connection ID state to synchronize with other state changes seems more likely to complicate things than help. The caution @kazuho mentions is always true. Whenever an endpoint is forced to retire connection IDs, a short supply is always a risk to the connection. |
I think QUIC allows more than 1 NCID frames in a single QUIC packet. Now my preference is (a). If server utilizes retire-prior-to, it should know the consequences and should provide sufficient backup connection IDs. Otherwise, it would be misconfiguration of server software. |
The problem is that the client might end up receiving only some of the NCID frames, unless the server always sends (and retransmits) the set of NCID frames as a whole. But anyways, I agree that (a) is probably fine. That is because it is trivial to design a server that never requests CID retirement during the handshake. All you need to do is start issuing the new generation of CIDs at least N seconds before sending NCID frames that asks for the retirement of the previous generation, where N is your handshake timeout. Knowing that this design is why I initially preferred (c), not realizing that it works equally well with (a). But know that I understand that, I prefer (a) as it is simpler. |
|
Given a server has to sent NCID to have the CID change, I'm fine with (a), but I thought it was not what we'd previously agreed to and specified, so I believed (b) was the status quo. |
|
I don't think this is a serious problem, because if the server doesn't want the long header DCID to change, it can wait before sending NEW_CONNECTION_ID. That said, we do have an issue here in that different readers of the draft had different interpretations of what was allowed. All that to say I don't much care if this is allowed or prohibited, but we should at least make an editorial change to clarify things. |
|
I agree with @DavidSchinazi here. Any alternative is ok, but the draft should say whether servers can expect that the client might switch DCID and whether the clients can expect the server to send NCID w/ RPT>0 before the handshake is complete. |
|
Let's keep it simple: no migration before Handshake Done, period. |
|
@huitema The problem is not that simple. Consider the case where the server receives ClientFinished and sends two 1-RTT packets, the first packet containing HANDSHAKE_DONE, and the second packet containing an NCID with RPT set to 1. If the first 1-RTT packet gets lost, the client receives a request to retire the connection ID before it confirms the handshake. |
|
I think I'd prefer a design that meets the two requirements stated below:
The intention is to minimize additional complexity to existing (and future) QUIC stacks. One solution that meets these requirements is to state that a NCID frame confirms the handshake, exactly the same way as HANDSHAKE_DONE frame does. That would be a trivial change to any of the QUIC stacks, and there would be no ambiguity regarding the state changes. |
|
Discussed in ZRH. Proposed resolution is to clarify that issuing a CID means being prepared to seeing it used at any time. Additional Editorial issue on packetization issues. |
|
Editorial PR on packetization challenges with retransmitted frames is #3424. |
… during the handshake
#3348 Clarify that you can use any issued CID at any time, even during the handshake
|
Ugh, editorial changes for a design issue. Reopened for consensus calling. |
We're currently not specifying which DCID is supposed to be used on retransmissions of Handshake packets.
A client might receive a NEW_CONNECTION_ID frame in the first couple of 1-RTT packets it receives, before receiving either a HANDSHAKE_DONE or an acknowledgement for a 1-RTT packet is sent itself. Therefore, it's still running loss recovery on the Handshake packets it sent out earlier.
If there's an acknowledgement outstanding for one of those Handshake packets, the client will have to retransmit that packet. Which DCID is it supposed to use on the retransmission? Is it the DCID that was used on the original packet? Or is it the CID provided in the NEW_CONNECTION_ID frame?
Note that the NEW_CONNECTION_ID frame might already have requested the retiring of the CID used during the Handshake.
The text was updated successfully, but these errors were encountered: