Required state for retaining unacked RETIRE_CONNECTION_ID frames is unbound #3509

kazuho · 2020-03-10T23:14:08Z

When an endpoint sends a RETIRE_CONNECTION_ID frame carrying a connection ID that has been active until that point, that endpoint is expected to open room for accepting a new connection ID.

That means that if the peer does not acknowledge the packets that contains RETIRE_CONNECTION_ID frames, but keeps on sending NEW_CONNECTION_ID frames with increasing values in the Retire Prior To field, the number of unacknowledged RETIRE_CONNECTION_ID frames (or to be precise, the number of unacked retiring connection IDs) keeps on increasing.

kazuho · 2020-03-10T23:17:55Z

To address the issue, I think we need to do one of the following:

State that the attack exists, and that QUIC stacks should implement mitigations.
Introduce MAX_CONNECTION_IDS frame. The reason we do not have similar issue in stream concurrency control is because the credit is controlled by MAX_STREAMS frames, rather than using FIN (or reset) as an implicit signal to indicate availability of new credit. We could do the same for connection IDs.

martinthomson · 2020-03-11T03:28:22Z

Mathematically, this might be unbounded, but it's more likely that in practice the limit is 2*active_connection_id_limit as it would be unlikely that unacknowledged RETIRE_CONNECTION_ID frames would remain so once more NEW_CONNECTION_ID frames start arriving. More so as the issuer of those NEW_CONNECTION_ID frames has to have received the RETIRE_CONNECTION_ID frames.

Adding new frames for this seems a little unwieldy, so I'd prefer to just acknowledge that the problem exists.

Note also that Retire Prior To has the effect of making this limit squishy in other ways.

(This is a design issue.)

marten-seemann · 2020-03-11T03:49:31Z

I'm not sure I understand the attack here. Why would an implementation need to keep track of acknowledgements of RETIRE_CONNECTION_ID frames (other than for purposes of loss recovery, but then this applies to any retransmittable frame type)?

An endpoint sends a RETIRE_CONNECTION_ID frame when it decides to not use the respective CID any longer. At this point, it can forget about the connection ID altogether. It might make sense to hold on to the stateless reset token for a while (3 PTO?) longer, in order to be able to detect stateless resets, but this is an optimization and wouldn't lead to unbounded state anyway.

kazuho · 2020-03-11T03:50:28Z

@martinthomson

Mathematically, this might be unbounded, but it's more likely that in practice the limit is 2*active_connection_id_limit as it would be unlikely that unacknowledged RETIRE_CONNECTION_ID frames would remain so once more NEW_CONNECTION_ID frames start arriving. More so as the issuer of those NEW_CONNECTION_ID frames has to have received the RETIRE_CONNECTION_ID frames.

I think 2*active_connection_id_limit is fine, assuming that the endpoint would just stop retransmitting oldest RETIRE_CONNECTION_ID frames. That limit would be too stern to be used for detecting potential attacks (and therefore closing the connection with PROTOCOL_VIOLATION).

kazuho · 2020-03-11T03:53:30Z

@marten-seemann

I'm not sure I understand the attack here. Why would an implementation need to keep track of acknowledgements of RETIRE_CONNECTION_ID frames (other than for purposes of loss recovery, but then this applies to any retransmittable frame type)?

This is about keeping state for retransmission for recovering loss. As stated in my first comment, the issue is unique to RETIRE_CONNECTION_ID frames, because this frame type is (ab)used for providing the peer additional credit, compared to other flow control that uses dedicated frame types (e.g., MAX_*) for communicating credit.

marten-seemann · 2020-03-11T04:45:47Z

Thank you for the explanation, @kazuho. I think I now understood the problem.

Mathematically, this might be unbounded, but it's more likely that in practice the limit is 2*active_connection_id_limit as it would be unlikely that unacknowledged RETIRE_CONNECTION_ID frames would remain so once more NEW_CONNECTION_ID frames start arriving. More so as the issuer of those NEW_CONNECTION_ID frames has to have received the RETIRE_CONNECTION_ID frames.

This would require endpoints to make assumptions about how the peer handles the active_connection_id_limit and how it issues new CIDs. This can get complicated really quickly and leave the connection in an unrecoverable state (or at the least a state with less CIDs than it should have). Introducing a new frame, as @kazuho suggested, seems a lot cleaner to me.

kazuho · 2020-03-12T02:09:11Z

Just to be clear, while I used RPT as a way of explaining the design issue, I think it is not a requirement.

A malicious client can repeatedly migrate to a new address, initiating the use of new CID pairs, at the same time intentionally not acknowledging packets containing RETIRE_CONNECTION_ID frames sent from the server. If the client does that, the number of CIDs that the server has to track for retirement increases as time goes.

erickinnear · 2020-03-12T02:28:09Z

Clarification question: If the person sending RCID isn't getting ACKs back, I think we still need to assume that those CIDs are being replaced? If not, then the non-RPT side of this would just mean the person sending RCID frames runs out of CIDs to use and has to stop.

So the problem statement is that: this person is running over their max_active_cid limit from the perspective of the person withholding ACKs, but each side has their own view and so the person retiring them stopped counting them as active once the frame was sent (but not acknowledged).
And that can happen via RPT as well, just because it means that they're going to be instructed to send the RCID, but the problem is the same either way?

(Just want to make sure I'm understanding the problem correctly)

kazuho · 2020-03-12T02:46:54Z

@erickinnear Yes, I think your problem statement is correct.

Or to be even more concise, the problem is that the issuer of CIDs is given additional credit when it receives RCID, which happens before the consumer of CIDs drops the state it needs to retain for retransmitting RCID.

For the attack to work, an attacker has to let the peer consume and retire CIDs. RPT and intentional migration are examples that makes that happen.

MikeBishop · 2020-03-23T20:11:59Z

So the issue here is fundamentally that you can induce the peer to send retransmittable frames which aren't subject to some kind of flow control, and this is the only instance we have of that.

STOP_SENDING and failing to acknowledge the RESET_STREAM appears to have a similar profile, except that the number of RESET_STREAMs you could have outstanding is bounded by the number of streams the peer has opened, which they control.

You could execute the same attack at a larger scale by choosing particular regions of stream data to never acknowledge, forcing the peer to keep them in memory for retransmission, but the peer could stop sending until that region gets through (applying backpressure) or simply reset the stream. Since a RESET_STREAM is smaller than the data region and only one is needed per stream, this collapses into the previous case.

What's different here is that the endpoint being attacked doesn't choose whether to retire these CIDs and has no mechanism to apply backpressure.

It seems like there are a couple paths to resolving this.

As you suggest, we could define an explicit frame that grants the additional CID space, and implementations would not send / could withhold this frame until the RCID is ACK'd. This creates a backpressure mechanism and has the side benefit of allowing an implementation to vary the number of CIDs it wants to track over time, but the disadvantage of requiring an extra round trip before CIDs can be replaced in the normal case.
We could mimic the "collapsing" behavior above by defining a frame that retires all CIDs up to a given sequence number. Then the implementation under memory pressure chooses to retire a few extra CIDs and collapses the whole set waiting to be retransmitted to a single frame.

kazuho · 2020-03-24T02:31:16Z

@MikeBishop I think your summarization of the problem is accurate.

Regarding an attacker selectively acking packets carrying STREAM frames, yes, the attack exists, but it the sender can mitigate that problem in certain ways. We already state that in section 21.5.

This issue is the only case that a protocol cannot be implemented properly, with a bounded amount of state.

Regarding how we should fix the problem, if we are going make a protocol change, my preference goes to applying the same design principle that we have in other places (i.e. introduce MAX_CONNECTIONS_IDS frame). I do not like designing a mechanism specific for this purpose, as there is a chance that we might make mistakes.

I can also live with what @martinthomson proposed in #3509 (comment) (i.e. encourage leaky behavior).

ianswett · 2020-03-24T16:38:43Z

Similar to @martinthomson I'd prefer to acknowledge the problem and indicate that peers may want to limit the number of unacknowledged RETIRE_CONNECTION_ID frames to the number of connection IDs they support.

The other option I can imagine is to make the RETIRE_CONNECTION_ID frame cumulative, which provides a strong incentive to use connection IDs in order. Though we may have to make an exception for the connection ID in the received packet if we did that.

martinthomson · 2020-03-24T22:36:19Z

So the bounds on this aren't that terrible. Say you had to retire 1000 connection IDs all at once. You could track each one individually as you send RETIRE_CONNECTION_ID. That's 1000 things to track, which might be more state than you want (it's not 1000 bits because you need to maintain relationships with packet numbers).

Or you could just keep CID[smallest unacknowledged retire frame] - CID[largest acknowledged retire frame] strictly limited, no matter how many connection IDs you need to retire. Tracking that, plus what you have been requested to retire should bound the state required to implement retirement.

In other words, you do control how much state you commit to retirement, just like everything else.

martinduke · 2020-03-25T17:22:05Z

@martinthomson

Or you could just keep CID[smallest unacknowledged retire frame] - CID[largest acknowledged retire frame] strictly limited, no matter how many connection IDs you need to retire. Tracking that, plus what you have been requested to retire should bound the state required to implement retirement.

That's my preferred solution, but in my reading of the spec this is not allowed. We MUST send RCID for each sequence number for which the peer has sent retire-prior-to. I think the language in Section 5.1.2 needs to change.

martinduke · 2020-03-25T17:34:51Z

@ianswett's idea of allowing RCID to be cumulative is also good and computationally cleaner, although we'd need a different frame type or an additional field in the frame (I don't think we can get rid of retiring non-continuous sequence numbers)

martinduke · 2020-03-25T17:35:50Z

I'm happy to file a PR for the @martinthomson or @ianswett approach based on the overall sentiment.

MikeBishop · 2020-03-25T19:18:06Z

One purely implementation option: It's always legitimate to repeat a frame. Therefore, if you track only the smallest and largest un-acked RCID sequence numbers, you can just keep retiring the whole range every time you retransmit. ACKs will shrink the range (hopefully to zero). There's no harm beyond unnecessary bytes in retransmitting something a few extra times to avoid excessive state tracking.

kazuho · 2020-03-25T20:42:03Z

@ianswett

The other option I can imagine is to make the RETIRE_CONNECTION_ID frame cumulative, which provides a strong incentive to use connection IDs in order. Though we may have to make an exception for the connection ID in the received packet if we did that.

While I won't say that's impossible, I'd argue that the design is going to be complex.

When CIDs are used in a legitimate manner, they are not going to be retired one by one. A CID is retired when a path using that CID is abandoned.

Consider the case where an endpoint has 4 active CIDs (CID0 to CID3), and two paths. One path could be using CID0, the other path could be using CID3. When the endpoint abandon the path that is using CID3, it has to retire CID3 but it cannot retire others.

@MikeBishop The amount of CIDs that can be sent on wire at a time is restricted by the CWND, therefore there is no guarantee that you can send RCID frames for all CIDs that you want to retire at once. In other words, I think the approach at best collapses into what @martinthomson has proposed (i.e. leaky behavior).

ianswett · 2020-03-25T22:07:40Z

At this point, I also think @martinthomson solution is the best approach, since it is a very minor change, and changing RETIRE_CONNECTION_ID to be cumulative is a larger change than I want to accept at this point. @kazuho, thanks for your examples.

ianswett · 2020-03-26T21:56:36Z

One idea: Can we simplify everything by not requiring sending RETIRE_CONNECTION_ID for CIDs less than the Retire Prior To? RETIRE_CONNECTION_ID would then only be used when a CID was used and needed to be retired so more CIDs could be issued to the user of the CID.

martinduke · 2020-03-26T22:06:06Z

@ianswett see my comment in the PR. More disruptive to current implementations, and eliminates a useful signal. The current proposal will not matter for well-behaved clients but will protect servers from some attacks.

ianswett · 2020-03-26T22:22:59Z

We haven't implemented this yet, so it's not disruptive to our implementation FWIW

I wonder how useful the signal is. Retire Prior To is really saying "These are going away sometime soon, whether you like it or not." to the peer. If an implementation never sent RETIRE_CONNECTION_ID, would that create a problem of some sort, because I suspect someone will do just that.

kazuho · 2020-03-26T23:46:58Z

@ianswett As stated in #3509 (comment), this attack can be mounted without using Retire Prior To.

martinduke · 2020-03-27T00:08:19Z

@kazuho in this case isn't the client limited by active_connection_id_limit? As a server, if the client has in no way indicated it has accepted my retirement, I'm not going to accept new CIDs.

The NCID/RPT case is different because the spec specifically allows the client to NCID speculatively assuming the server will immediately retire the CID.

kazuho · 2020-03-27T00:25:03Z

@martinduke

in this case isn't the client limited by active_connection_id_limit?

The attack here is that when a malicious client retires a CID to which the server has responded, that server would retire the CID that it has used on that path, and also provide a new CID to the client. But the client never ACKs the RCID frames that the server sends.

As an example, consider the case where a client uses sCID1 on a new path, the server responds on that path using cCID1, the client abandons that path and sends RCID (seq=sCID1). When receiving this RCID frame, the server would send RCID (seq=cCID1), and also send NCID(sCID2). The client intentionally does not ack the packet carrying these frames, but uses sCID2 on a new path, that carries NCID(cCID2), repeating this procedure.

martinduke · 2020-03-27T00:42:35Z

Ah, I see now that my defense doesn't work, because if I reject the NCID I have to kill the connection, and it may just be because of unlucky ack losses.

This whole section is not very precisely written and suffers from two-generals problems about when a CID is actually retired.

I will revise to account for this, but I'm not sure how.

kazuho · 2020-03-30T04:09:21Z

@martinthomson

I know that you have characterized this as "leaky", but I'm not sure that I understand exactly what is being leaked. Nothing is lost in #3547, except that the endpoint requesting retirement has to tolerate a potentially extended period without old connection IDs being retired.

In the approach we take in #3547, there is a possibility that the issuer of the CID might not receive the sequence numbers of all the CIDs that have been retired by the peer, when the issuer is issuing new CIDs at high rate and the receiver is also consuming them at a hight rate. That is the consequence of endpoints limiting the amount of inflight RCID frames it tracks.

I think we can provide advice so that the "leak" would not be an issue in practice, and assuming that would happen, I am happy with #3547 being the solution.

I think #3553 is cleaner, but I acknowledge that it is more disruptive than #3547. Part of my intent behind creating #3553 is to see how a clean fix (in terms of state machinery) would look like, so that it can be compared against #3547.

In case this helps, I have no big preference between #3547 and #3550. There is a certain elegance to #3550 that might tip me toward that, but it is slightly more disruptive.

As stated in #3550 (comment), #3550 by itself only partly fixes the problem. I am not against adopting #3550 along with #3547 though.

#3548 is also a partial fix, I also think that it is a non-starter.

martinthomson · 2020-03-30T04:45:24Z

I think that I understand your point; thanks for your patience.

I didn't really classify it as a leak though, so the choice of words threw me a little, it's more of a lag.

Let's say that you have a pathological endpoint that sends an infinite series of NEW_CONNECTION_ID frames with Sequence Number N and Retire Prior To of (N-1). That's legal and will never violate any limits. But unless they maintain acknowledgments for RETIRE_CONNECTION_ID frames at a similar rate (which, to be fair should be easy as sizeof(NEW_CONNECTION_ID) > sizeof(RETIRE_CONNECTION_ID) > sizeof(ACK) for the same values), they could get far ahead of their peer.

I don't think that it is strictly a leak, just a lag between the connection IDs being pushed and the connection IDs being successfully retired. Without #3547 or something else this could be problematic. And I now understand why #3550 is really just an orthogonal refinement, though it might make other defenses less likely to be required by making sizeof(sum(RETIRE_CONNECTION_ID)) zero in cases where an endpoint isn't using discretionary retirement.

The defense in #3553 is to make sizeof(sum(RETIRE_CONNECTION_ID)) effectively constant, no matter how many frames need to be retired. My main objection is really that it is a maximal design where only a minimal one is warranted.

It's probably worth pointing out that there is also the simplest defense: If you can't keep up with RETIRE_CONNECTION_ID, stop talking to the peer that does this to you. This is just another example from that category of abuse we don't have specific protocol mechanisms to mitigate.

ianswett · 2020-03-30T15:40:18Z

I agree that I'd like to see a minimal solution, so I'm leery of heading towards #3553.

In order for a peer to need to send a lot of RETIRE_CONNECTION_ID frames without Retire Prior To, a lot of 5-tuple + CID changes need to be coming in. One way to rate limit that naturally is to stop giving out CIDs if the peer hasn't acknowledged your RETIRE_CONNECTION_ID frames.

ie: Limit the number of NEW_CONNECTION_ID frames in flight to:
max_active_connection_ids +1 - num(RETIRE_CONNECTION_ID frames)

So if the peer stops acknowledging your retirements, you stop giving them new CIDs. I think this avoids the need for any other limits? I added 1 in my example so one NCID could be sent even if all of the peer's CIDs had just been retired, though that concern disappears if we decided to also do something like #3550

martinduke · 2020-03-30T17:00:38Z

What a mess we've made! Is there interest in doing a Zoom about this? Maybe 0900 Tokyo/1100 Sydney/1700 Seattle/2000 Boston on Tue/Wed?

The root cause is that the flow control mechanism isn't all that well designed. IMO #3553 is the actual correct fix to this problem, but as people say, it's disruptive.

#3548 uses ACKs as an implicit signal, which seems like a problem. #3553 has a different implicit signal, and I don't think it solves the case where the client is withholding acks for unsolicited RCIDs.

I really think we should just use #3547 or #3553 depending on our appetite for changing the wire image. #3547 is great for the RCID sender, and if the receiver won't miss any RCIDs unless it's doing pathological stuff or the ack loss patterns are tremendously unlucky.

#3553, OTOH, is a comprehensive fix to the problem.

ianswett · 2020-03-30T17:09:42Z

I think #3547 with some added text about limiting outstanding NCIDs when there are lots of RCIDs would be sufficiently robust.

kazuho · 2020-03-31T02:05:58Z

I'm fine with trying to see if we can reach an agreement on #3547.

marten-seemann · 2020-03-31T02:50:36Z

I'd prefer #3553 over #3547. I can't really see why it would more disruptive than #3547.

The problem with #3547 is that it will likely work find in most cases, but there are some corner case (when the ACKs for RETIRE_CONNECTION_ID frames are lost), in which things will break. Properly accounting for these corner cases requires some non-trivial tracking logic in order to avoid risking the connection being closed with a CONNECTION_ID_LIMIT_ERROR.

On the other hand, #3553 seems to be conceptually simpler. I'd rather implement a few changes to my frame parser than introduce some ACK-tracking logic.

kazuho · 2020-03-31T05:27:53Z

Just to give another example of how an endpoint might end up required to track the retirement of more than active_connection_id_limit, when talking to a well-behaving peer:

A server tries to supply three active CIDs to the peer. At the moment, it has provided CID₀, CID₁, CID₂ to the client.
The client sends RCID(CID₁), RCID(CID₂). The server provides NCID(CID₃), NCID(CID₄). The two NCID frames are delivered to the client, but the ACK for the RCID frames gets lost.
The client decides to probe two paths concurrently, by using CID₃ and CID₄, but shortly after the packets are sent using those paths, the underlying network for those two paths disappear.
The client now needs to make sure that RCID frames carrying 4 CIDs reach the peer.

In this example, we might argue that it would be possible for the client to figure out that CID₃ and CID₄ were provided as substitutions for CID₁ and CID₂. But do we want to implement logic that detects such condition into our code? Note that tracking of substitutions can become tricky; if RCID(CID₁) and RCID(CID₂) were sent in different packets, it would be impossible for the client to see if CID₄ was provided as a supplement for CID₁ or for CID₂.

nibanks · 2020-03-31T14:51:17Z

I agree with @kazuho that if we go with #3547 the limit should be higher than active_connection_id_limit.

janaiyengar · 2020-03-31T23:13:10Z

I finally caught up with this issue, and implementer advice is definitely my preferred way forward. We're going to need some complexity to (and some magic numbers) no matter what, let's do this while limiting mechanism to a minimum. To that end, I agree with #3547 as a good resolution.

marten-seemann · 2020-04-01T08:46:50Z

While increasing the limit will certainly help, this introduces a really unpleasant flakiness in the protocol. QUIC, so far, has the property that no amount or pattern of packet loss will lead to a protocol violation (it might lead to a connection timeout if all packets are blackholed, but that's a different kind of error). This means that when running QUIC in production, screening for (transport-level) errors will be very useful for finding bugs in implementations.

My fear regarding #3547 is that no matter how high we choose the limit, there will always be a loss pattern that will trigger a protocol violation.

kazuho · 2020-04-01T12:30:27Z

@marten-seemann

My fear regarding #3547 is that no matter how high we choose the limit, there will always be a loss pattern that will trigger a protocol violation.

Thank you for raising the concern, I think the recommended behavior when exceeding that limit is to stop sending RCID frames for some sequence numbers rather than triggering a protocol violation. That's actually what I had in mind hence called it "leaky."

The benefit of stopping retiring some sequence numbers is that the worst case becomes endpoints running out of CIDs - something that is expected to happen on ordinary connections too. Note also that in HTTP/3, reestablishing a connection in 0-RTT is always a viable option.

To summarize, I think that the concern can be resolved for HTTP/3 by changing the recommended behavior from resetting the connection to stop sending RCIF frames for some sequence numbers.

ianswett · 2020-04-01T13:36:11Z

@kazuho I don't think not sending RCID frames is that easy. Tracking something that needs to be sent consumes comparable state to tracking it in flight. Or are you suggesting a different algorithm that indicates a RCID frame doesn't need to be sent?

If the peer has to wait for one Retire Prior To to take effect before it's increased again, that leaves only the peer migrating really quickly and lost ACKs as the unbounded case. As @kazuho said, I think it's more likely a well-behaved peer would run out of CIDs in this case than any other failure mode. I previously suggested stopping sending out NCID frames when the RCID frame limit had been reached to force the issue, but that just changes who has to close the connection.

FWIW, we already have sanity checks where our implementation closes the connection when a datastructure becomes too large. The limits are high enough that they're almost never hit unless there is a bug, but they've been helpful in finding bugs. I would advocate everyone have these. @marten-seemann You'll never get to 0 transport errors. Some implementations will have a bug, and some peers or servers will have faulty or corrupted memory. For example, in Google QUIC we found we received the ClientHello on a stream besides 1 surprisingly often.

martinthomson · 2020-04-01T22:56:24Z

I think that the idea is to reduce the set of RCID frames needed to a range of sequence numbers. You only have to remember the few exceptions for sequence numbers that are in use, sequence numbers for which you have outstanding frames, and a range of sequence numbers that require retirement.

As long as your discretionary use is sequential, this is bounded state. As mentioned, the risk is that lack of acknowledgment leads to potentially large delays in getting new connection IDs, but if you ever run short, abandon the connection. Either you have a pathological loss pattern or your peer is messing with you. In either case, it is probably not worth continuing.

kazuho · 2020-04-01T23:38:18Z

@martinthomson

I think that the idea is to reduce the set of RCID frames needed to a range of sequence numbers. You only have to remember the few exceptions for sequence numbers that are in use, sequence numbers for which you have outstanding frames, and a range of sequence numbers that require retirement.

As long as your discretionary use is sequential, this is bounded state.

Yes, this is one method of limiting the state that is being used for retaining sequence numbers that need to be sent. And it works even when discretionary use is not sequential, as the number of "exceptions" (i.e. gaps) is bounded by max_connection_id_limit. The other assumption of using this method is that you would somehow bound the state that is used to track RCID frames being inflight. There are various ways of accomplishing that, e.g., have a limit on the number of RCID frames sent at once, track bunch of inflight RCID frames as one meta-frame.

And it is correct that the outcome would be a "lag" in this method.

The downside of this method is that it might be a disruptive change to existing implementations. It is essentially equivalent to what #3553 proposes, with the exception being that tracking RCID frames are difficult (as you track to send many).

For existing implementations, it would be easier to adopt a "leaky" method, i.e.:

let endpoints have a counter counting on the number of sequence numbers that are retired but are yet to be acknowledged
when a CID is being retired, if the value of the counter is below a certain threshold (i.e. 2 * max_connection_id_limit), the counter is incremented, and a RCID frame carrying the sequence number of that CID is added to the outstanding queue. If the counter was not below the threshold, no RCID frame will be sent (leak!)
when an RCID frame is deemed lost, it will be resubmitted into the outstanding queue
when an RCID frame is acknowledged before deemed lost, the counter is decremented
when an RCID frame is late-acked, we do nothing

Assuming that the solution we prefer is the least-disruptive approach, I think we need to recommend the latter. If we have distaste in the latter and prefer the former, I think we'd better switch to #3553, as it more cleanly reflects the design of the former.

@ianswett I think the latter is not that a big change to existing implementations.

martinthomson · 2020-04-01T23:43:33Z

Yep, I think that is a good approach. Obviously, the range encoding is superior in terms of how much state it can hold, but a simple list is fine. And the effects of lag will be limited by the tolerance for retaining retired connection IDs, which seems proportional to me.

martinduke · 2020-04-01T23:50:32Z

@kazuho I doubt that's less impactful on existing implementations, certainly from an interoperability standpoint. Today endpoints should hold on to incoming CIDs until they are formally retired. Making them leaky can lead to trapped state.

Sending them all, but with a delay, is no added state but solves this problem.

martinthomson · 2020-04-08T02:13:54Z

After a lot of back-and-forth, I think that we've all generally converged on #3547.

mnot added this to Triage in Late Stage Processing Mar 11, 2020

larseggert added the -transport label Mar 17, 2020

kazuho mentioned this issue Mar 20, 2020

NEW_CONNECTION_ID text is too restrictive on receivers #3534

Closed

martinduke mentioned this issue Mar 26, 2020

Limit RCID state #3547

Merged

LPardue added the design An issue that affects the design of the protocol; resolution requires consensus. label Mar 31, 2020

project-bot bot moved this from Triage to Design Issues in Late Stage Processing Mar 31, 2020

martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Apr 8, 2020

martinthomson moved this from Design Issues to Consensus Emerging in Late Stage Processing Apr 14, 2020

LPardue added call-issued An issue that the Chairs have issued a Consensus call for. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Apr 29, 2020

project-bot bot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Apr 29, 2020

LPardue added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed call-issued An issue that the Chairs have issued a Consensus call for. labels May 13, 2020

project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing May 13, 2020

martinthomson closed this as completed in #3547 May 17, 2020

Late Stage Processing automation moved this from Consensus Declared to Issue Handled May 17, 2020

Required state for retaining unacked RETIRE_CONNECTION_ID frames is unbound #3509

Required state for retaining unacked RETIRE_CONNECTION_ID frames is unbound #3509

Comments

kazuho commented Mar 10, 2020 • edited

kazuho commented Mar 10, 2020

martinthomson commented Mar 11, 2020

marten-seemann commented Mar 11, 2020

kazuho commented Mar 11, 2020 • edited

kazuho commented Mar 11, 2020 • edited

marten-seemann commented Mar 11, 2020

kazuho commented Mar 12, 2020 • edited

erickinnear commented Mar 12, 2020

kazuho commented Mar 12, 2020

MikeBishop commented Mar 23, 2020

kazuho commented Mar 24, 2020

ianswett commented Mar 24, 2020

martinthomson commented Mar 24, 2020

martinduke commented Mar 25, 2020

martinduke commented Mar 25, 2020

martinduke commented Mar 25, 2020

MikeBishop commented Mar 25, 2020

kazuho commented Mar 25, 2020

ianswett commented Mar 25, 2020

ianswett commented Mar 26, 2020

martinduke commented Mar 26, 2020

ianswett commented Mar 26, 2020

kazuho commented Mar 26, 2020

martinduke commented Mar 27, 2020

kazuho commented Mar 27, 2020

martinduke commented Mar 27, 2020

kazuho commented Mar 30, 2020 • edited

martinthomson commented Mar 30, 2020

ianswett commented Mar 30, 2020

martinduke commented Mar 30, 2020

ianswett commented Mar 30, 2020

kazuho commented Mar 31, 2020

marten-seemann commented Mar 31, 2020

kazuho commented Mar 31, 2020 • edited

nibanks commented Mar 31, 2020

janaiyengar commented Mar 31, 2020

marten-seemann commented Apr 1, 2020

kazuho commented Apr 1, 2020

ianswett commented Apr 1, 2020

martinthomson commented Apr 1, 2020

kazuho commented Apr 1, 2020 • edited

martinthomson commented Apr 1, 2020

martinduke commented Apr 1, 2020

martinthomson commented Apr 8, 2020

kazuho commented Mar 10, 2020 •

edited

kazuho commented Mar 11, 2020 •

edited

kazuho commented Mar 11, 2020 •

edited

kazuho commented Mar 12, 2020 •

edited

kazuho commented Mar 30, 2020 •

edited

kazuho commented Mar 31, 2020 •

edited

kazuho commented Apr 1, 2020 •

edited