New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Required state for retaining unacked RETIRE_CONNECTION_ID frames is unbound #3509
Comments
To address the issue, I think we need to do one of the following:
|
Mathematically, this might be unbounded, but it's more likely that in practice the limit is Adding new frames for this seems a little unwieldy, so I'd prefer to just acknowledge that the problem exists. Note also that Retire Prior To has the effect of making this limit squishy in other ways. (This is a design issue.) |
I'm not sure I understand the attack here. Why would an implementation need to keep track of acknowledgements of RETIRE_CONNECTION_ID frames (other than for purposes of loss recovery, but then this applies to any retransmittable frame type)? An endpoint sends a RETIRE_CONNECTION_ID frame when it decides to not use the respective CID any longer. At this point, it can forget about the connection ID altogether. It might make sense to hold on to the stateless reset token for a while (3 PTO?) longer, in order to be able to detect stateless resets, but this is an optimization and wouldn't lead to unbounded state anyway. |
I think |
This is about keeping state for retransmission for recovering loss. As stated in my first comment, the issue is unique to RETIRE_CONNECTION_ID frames, because this frame type is (ab)used for providing the peer additional credit, compared to other flow control that uses dedicated frame types (e.g., MAX_*) for communicating credit. |
Thank you for the explanation, @kazuho. I think I now understood the problem.
This would require endpoints to make assumptions about how the peer handles the |
Just to be clear, while I used RPT as a way of explaining the design issue, I think it is not a requirement. A malicious client can repeatedly migrate to a new address, initiating the use of new CID pairs, at the same time intentionally not acknowledging packets containing RETIRE_CONNECTION_ID frames sent from the server. If the client does that, the number of CIDs that the server has to track for retirement increases as time goes. |
Clarification question: If the person sending RCID isn't getting ACKs back, I think we still need to assume that those CIDs are being replaced? If not, then the non-RPT side of this would just mean the person sending RCID frames runs out of CIDs to use and has to stop. So the problem statement is that: this person is running over their max_active_cid limit from the perspective of the person withholding ACKs, but each side has their own view and so the person retiring them stopped counting them as active once the frame was sent (but not acknowledged). (Just want to make sure I'm understanding the problem correctly) |
@erickinnear Yes, I think your problem statement is correct. Or to be even more concise, the problem is that the issuer of CIDs is given additional credit when it receives RCID, which happens before the consumer of CIDs drops the state it needs to retain for retransmitting RCID. For the attack to work, an attacker has to let the peer consume and retire CIDs. RPT and intentional migration are examples that makes that happen. |
So the issue here is fundamentally that you can induce the peer to send retransmittable frames which aren't subject to some kind of flow control, and this is the only instance we have of that. STOP_SENDING and failing to acknowledge the RESET_STREAM appears to have a similar profile, except that the number of RESET_STREAMs you could have outstanding is bounded by the number of streams the peer has opened, which they control. You could execute the same attack at a larger scale by choosing particular regions of stream data to never acknowledge, forcing the peer to keep them in memory for retransmission, but the peer could stop sending until that region gets through (applying backpressure) or simply reset the stream. Since a RESET_STREAM is smaller than the data region and only one is needed per stream, this collapses into the previous case. What's different here is that the endpoint being attacked doesn't choose whether to retire these CIDs and has no mechanism to apply backpressure. It seems like there are a couple paths to resolving this.
|
@MikeBishop I think your summarization of the problem is accurate. Regarding an attacker selectively acking packets carrying STREAM frames, yes, the attack exists, but it the sender can mitigate that problem in certain ways. We already state that in section 21.5. This issue is the only case that a protocol cannot be implemented properly, with a bounded amount of state. Regarding how we should fix the problem, if we are going make a protocol change, my preference goes to applying the same design principle that we have in other places (i.e. introduce MAX_CONNECTIONS_IDS frame). I do not like designing a mechanism specific for this purpose, as there is a chance that we might make mistakes. I can also live with what @martinthomson proposed in #3509 (comment) (i.e. encourage leaky behavior). |
Similar to @martinthomson I'd prefer to acknowledge the problem and indicate that peers may want to limit the number of unacknowledged RETIRE_CONNECTION_ID frames to the number of connection IDs they support. The other option I can imagine is to make the RETIRE_CONNECTION_ID frame cumulative, which provides a strong incentive to use connection IDs in order. Though we may have to make an exception for the connection ID in the received packet if we did that. |
So the bounds on this aren't that terrible. Say you had to retire 1000 connection IDs all at once. You could track each one individually as you send RETIRE_CONNECTION_ID. That's 1000 things to track, which might be more state than you want (it's not 1000 bits because you need to maintain relationships with packet numbers). Or you could just keep In other words, you do control how much state you commit to retirement, just like everything else. |
That's my preferred solution, but in my reading of the spec this is not allowed. We MUST send RCID for each sequence number for which the peer has sent retire-prior-to. I think the language in Section 5.1.2 needs to change. |
@ianswett's idea of allowing RCID to be cumulative is also good and computationally cleaner, although we'd need a different frame type or an additional field in the frame (I don't think we can get rid of retiring non-continuous sequence numbers) |
I'm happy to file a PR for the @martinthomson or @ianswett approach based on the overall sentiment. |
One purely implementation option: It's always legitimate to repeat a frame. Therefore, if you track only the smallest and largest un-acked RCID sequence numbers, you can just keep retiring the whole range every time you retransmit. ACKs will shrink the range (hopefully to zero). There's no harm beyond unnecessary bytes in retransmitting something a few extra times to avoid excessive state tracking. |
While I won't say that's impossible, I'd argue that the design is going to be complex. When CIDs are used in a legitimate manner, they are not going to be retired one by one. A CID is retired when a path using that CID is abandoned. Consider the case where an endpoint has 4 active CIDs (CID0 to CID3), and two paths. One path could be using CID0, the other path could be using CID3. When the endpoint abandon the path that is using CID3, it has to retire CID3 but it cannot retire others. @MikeBishop The amount of CIDs that can be sent on wire at a time is restricted by the CWND, therefore there is no guarantee that you can send RCID frames for all CIDs that you want to retire at once. In other words, I think the approach at best collapses into what @martinthomson has proposed (i.e. leaky behavior). |
At this point, I also think @martinthomson solution is the best approach, since it is a very minor change, and changing RETIRE_CONNECTION_ID to be cumulative is a larger change than I want to accept at this point. @kazuho, thanks for your examples. |
One idea: Can we simplify everything by not requiring sending RETIRE_CONNECTION_ID for CIDs less than the Retire Prior To? RETIRE_CONNECTION_ID would then only be used when a CID was used and needed to be retired so more CIDs could be issued to the user of the CID. |
@ianswett see my comment in the PR. More disruptive to current implementations, and eliminates a useful signal. The current proposal will not matter for well-behaved clients but will protect servers from some attacks. |
We haven't implemented this yet, so it's not disruptive to our implementation FWIW I wonder how useful the signal is. Retire Prior To is really saying "These are going away sometime soon, whether you like it or not." to the peer. If an implementation never sent RETIRE_CONNECTION_ID, would that create a problem of some sort, because I suspect someone will do just that. |
@ianswett As stated in #3509 (comment), this attack can be mounted without using Retire Prior To. |
@kazuho in this case isn't the client limited by active_connection_id_limit? As a server, if the client has in no way indicated it has accepted my retirement, I'm not going to accept new CIDs. The NCID/RPT case is different because the spec specifically allows the client to NCID speculatively assuming the server will immediately retire the CID. |
The attack here is that when a malicious client retires a CID to which the server has responded, that server would retire the CID that it has used on that path, and also provide a new CID to the client. But the client never ACKs the RCID frames that the server sends. As an example, consider the case where a client uses sCID1 on a new path, the server responds on that path using cCID1, the client abandons that path and sends RCID (seq=sCID1). When receiving this RCID frame, the server would send RCID (seq=cCID1), and also send NCID(sCID2). The client intentionally does not ack the packet carrying these frames, but uses sCID2 on a new path, that carries NCID(cCID2), repeating this procedure. |
Ah, I see now that my defense doesn't work, because if I reject the NCID I have to kill the connection, and it may just be because of unlucky ack losses. This whole section is not very precisely written and suffers from two-generals problems about when a CID is actually retired. I will revise to account for this, but I'm not sure how. |
In the approach we take in #3547, there is a possibility that the issuer of the CID might not receive the sequence numbers of all the CIDs that have been retired by the peer, when the issuer is issuing new CIDs at high rate and the receiver is also consuming them at a hight rate. That is the consequence of endpoints limiting the amount of inflight RCID frames it tracks. I think we can provide advice so that the "leak" would not be an issue in practice, and assuming that would happen, I am happy with #3547 being the solution. I think #3553 is cleaner, but I acknowledge that it is more disruptive than #3547. Part of my intent behind creating #3553 is to see how a clean fix (in terms of state machinery) would look like, so that it can be compared against #3547.
As stated in #3550 (comment), #3550 by itself only partly fixes the problem. I am not against adopting #3550 along with #3547 though. #3548 is also a partial fix, I also think that it is a non-starter. |
I think that I understand your point; thanks for your patience. I didn't really classify it as a leak though, so the choice of words threw me a little, it's more of a lag. Let's say that you have a pathological endpoint that sends an infinite series of NEW_CONNECTION_ID frames with Sequence Number N and Retire Prior To of (N-1). That's legal and will never violate any limits. But unless they maintain acknowledgments for RETIRE_CONNECTION_ID frames at a similar rate (which, to be fair should be easy as I don't think that it is strictly a leak, just a lag between the connection IDs being pushed and the connection IDs being successfully retired. Without #3547 or something else this could be problematic. And I now understand why #3550 is really just an orthogonal refinement, though it might make other defenses less likely to be required by making The defense in #3553 is to make It's probably worth pointing out that there is also the simplest defense: If you can't keep up with RETIRE_CONNECTION_ID, stop talking to the peer that does this to you. This is just another example from that category of abuse we don't have specific protocol mechanisms to mitigate. |
I agree that I'd like to see a minimal solution, so I'm leery of heading towards #3553. In order for a peer to need to send a lot of RETIRE_CONNECTION_ID frames without Retire Prior To, a lot of 5-tuple + CID changes need to be coming in. One way to rate limit that naturally is to stop giving out CIDs if the peer hasn't acknowledged your RETIRE_CONNECTION_ID frames. ie: Limit the number of NEW_CONNECTION_ID frames in flight to: So if the peer stops acknowledging your retirements, you stop giving them new CIDs. I think this avoids the need for any other limits? I added 1 in my example so one NCID could be sent even if all of the peer's CIDs had just been retired, though that concern disappears if we decided to also do something like #3550 |
What a mess we've made! Is there interest in doing a Zoom about this? Maybe 0900 Tokyo/1100 Sydney/1700 Seattle/2000 Boston on Tue/Wed? The root cause is that the flow control mechanism isn't all that well designed. IMO #3553 is the actual correct fix to this problem, but as people say, it's disruptive. #3548 uses ACKs as an implicit signal, which seems like a problem. #3553 has a different implicit signal, and I don't think it solves the case where the client is withholding acks for unsolicited RCIDs. I really think we should just use #3547 or #3553 depending on our appetite for changing the wire image. #3547 is great for the RCID sender, and if the receiver won't miss any RCIDs unless it's doing pathological stuff or the ack loss patterns are tremendously unlucky. #3553, OTOH, is a comprehensive fix to the problem. |
I think #3547 with some added text about limiting outstanding NCIDs when there are lots of RCIDs would be sufficiently robust. |
I'm fine with trying to see if we can reach an agreement on #3547. |
I'd prefer #3553 over #3547. I can't really see why it would more disruptive than #3547. The problem with #3547 is that it will likely work find in most cases, but there are some corner case (when the ACKs for RETIRE_CONNECTION_ID frames are lost), in which things will break. Properly accounting for these corner cases requires some non-trivial tracking logic in order to avoid risking the connection being closed with a CONNECTION_ID_LIMIT_ERROR. On the other hand, #3553 seems to be conceptually simpler. I'd rather implement a few changes to my frame parser than introduce some ACK-tracking logic. |
Just to give another example of how an endpoint might end up required to track the retirement of more than active_connection_id_limit, when talking to a well-behaving peer:
In this example, we might argue that it would be possible for the client to figure out that CID3 and CID4 were provided as substitutions for CID1 and CID2. But do we want to implement logic that detects such condition into our code? Note that tracking of substitutions can become tricky; if RCID(CID1) and RCID(CID2) were sent in different packets, it would be impossible for the client to see if CID4 was provided as a supplement for CID1 or for CID2. |
I finally caught up with this issue, and implementer advice is definitely my preferred way forward. We're going to need some complexity to (and some magic numbers) no matter what, let's do this while limiting mechanism to a minimum. To that end, I agree with #3547 as a good resolution. |
While increasing the limit will certainly help, this introduces a really unpleasant flakiness in the protocol. QUIC, so far, has the property that no amount or pattern of packet loss will lead to a protocol violation (it might lead to a connection timeout if all packets are blackholed, but that's a different kind of error). This means that when running QUIC in production, screening for (transport-level) errors will be very useful for finding bugs in implementations. My fear regarding #3547 is that no matter how high we choose the limit, there will always be a loss pattern that will trigger a protocol violation. |
Thank you for raising the concern, I think the recommended behavior when exceeding that limit is to stop sending RCID frames for some sequence numbers rather than triggering a protocol violation. That's actually what I had in mind hence called it "leaky." The benefit of stopping retiring some sequence numbers is that the worst case becomes endpoints running out of CIDs - something that is expected to happen on ordinary connections too. Note also that in HTTP/3, reestablishing a connection in 0-RTT is always a viable option. To summarize, I think that the concern can be resolved for HTTP/3 by changing the recommended behavior from resetting the connection to stop sending RCIF frames for some sequence numbers. |
@kazuho I don't think not sending RCID frames is that easy. Tracking something that needs to be sent consumes comparable state to tracking it in flight. Or are you suggesting a different algorithm that indicates a RCID frame doesn't need to be sent? If the peer has to wait for one Retire Prior To to take effect before it's increased again, that leaves only the peer migrating really quickly and lost ACKs as the unbounded case. As @kazuho said, I think it's more likely a well-behaved peer would run out of CIDs in this case than any other failure mode. I previously suggested stopping sending out NCID frames when the RCID frame limit had been reached to force the issue, but that just changes who has to close the connection. FWIW, we already have sanity checks where our implementation closes the connection when a datastructure becomes too large. The limits are high enough that they're almost never hit unless there is a bug, but they've been helpful in finding bugs. I would advocate everyone have these. @marten-seemann You'll never get to 0 transport errors. Some implementations will have a bug, and some peers or servers will have faulty or corrupted memory. For example, in Google QUIC we found we received the ClientHello on a stream besides 1 surprisingly often. |
I think that the idea is to reduce the set of RCID frames needed to a range of sequence numbers. You only have to remember the few exceptions for sequence numbers that are in use, sequence numbers for which you have outstanding frames, and a range of sequence numbers that require retirement. As long as your discretionary use is sequential, this is bounded state. As mentioned, the risk is that lack of acknowledgment leads to potentially large delays in getting new connection IDs, but if you ever run short, abandon the connection. Either you have a pathological loss pattern or your peer is messing with you. In either case, it is probably not worth continuing. |
Yes, this is one method of limiting the state that is being used for retaining sequence numbers that need to be sent. And it works even when discretionary use is not sequential, as the number of "exceptions" (i.e. gaps) is bounded by max_connection_id_limit. The other assumption of using this method is that you would somehow bound the state that is used to track RCID frames being inflight. There are various ways of accomplishing that, e.g., have a limit on the number of RCID frames sent at once, track bunch of inflight RCID frames as one meta-frame. And it is correct that the outcome would be a "lag" in this method. The downside of this method is that it might be a disruptive change to existing implementations. It is essentially equivalent to what #3553 proposes, with the exception being that tracking RCID frames are difficult (as you track to send many). For existing implementations, it would be easier to adopt a "leaky" method, i.e.:
Assuming that the solution we prefer is the least-disruptive approach, I think we need to recommend the latter. If we have distaste in the latter and prefer the former, I think we'd better switch to #3553, as it more cleanly reflects the design of the former. @ianswett I think the latter is not that a big change to existing implementations. |
Yep, I think that is a good approach. Obviously, the range encoding is superior in terms of how much state it can hold, but a simple list is fine. And the effects of lag will be limited by the tolerance for retaining retired connection IDs, which seems proportional to me. |
@kazuho I doubt that's less impactful on existing implementations, certainly from an interoperability standpoint. Today endpoints should hold on to incoming CIDs until they are formally retired. Making them leaky can lead to trapped state. Sending them all, but with a delay, is no added state but solves this problem. |
After a lot of back-and-forth, I think that we've all generally converged on #3547. |
When an endpoint sends a RETIRE_CONNECTION_ID frame carrying a connection ID that has been active until that point, that endpoint is expected to open room for accepting a new connection ID.
That means that if the peer does not acknowledge the packets that contains RETIRE_CONNECTION_ID frames, but keeps on sending NEW_CONNECTION_ID frames with increasing values in the Retire Prior To field, the number of unacknowledged RETIRE_CONNECTION_ID frames (or to be precise, the number of unacked retiring connection IDs) keeps on increasing.
The text was updated successfully, but these errors were encountered: