Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow reuse of stateless reset tokens #2785

Closed
DavidSchinazi opened this issue Jun 12, 2019 · 15 comments
Closed

Disallow reuse of stateless reset tokens #2785

DavidSchinazi opened this issue Jun 12, 2019 · 15 comments
Assignees
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@DavidSchinazi
Copy link
Contributor

As part of the discussions on #2645 / #2769, we've found that reusing stateless reset tokens (SRTs) can allow attackers that delay packets to close connections. Even though the text in #2733 does a good job explaining why you shouldn't reuse SRTs, I don't see any reason why one would want to reuse SRTs across connection IDs in the same connection. Therefore I would prefer we ban reusing them, and make it a PROTOCOL_VIOLATION to do so. Apologies for not raising this on #2732, I had not thought about the problem at that time.

@kazuho
Copy link
Member

kazuho commented Jun 12, 2019

Apologies for not raising this on #2732, I had not thought about the problem at that time.

I do not think you missed anything. Instead, you spotted the issue at the exact moment; it's my understanding that #2769 is introducing this issue. There is no problem in the current design, because receiving a packet with an outdated CID is not expected to cause a stateless reset.

Regarding why you'd want to reuse a SRT, the simple reason is because you do not need to create different SRTs for every Connection ID.

A QUIC stack that generates a CID by a construction like AES_ECB(cid_key, conn_id || path_index)1 can use HMAC(srt_key, conn_id) as a construction for generating SRT.

For QUIC stacks that aren't interested in retiring CIDs (which seems to be the majority for me), this construction makes sense. I think my preference would be to retain the design; it's just about adding a warning when you want to retire a CID.

1: path_index is a sequence number of path for each connection

@DavidSchinazi
Copy link
Contributor Author

I don't think #2769 introduces this issue. It does make it worse but the issue was already present. It's reasonable for a server implementation to drop a CID from their state when the client retires it.

There are two options to guarantee safety:

  1. make sure each CID has a distinct SRT
  2. make sure once a CID is sent in NEW_CONNECTION_ID it always routes back to this connection

I think (1) is simpler and giving implementations two options here not only makes the spec more complicated, but can also lead to accidental vulnerabilities.

I think it would be safer for the implementations you mention to use HMAC(srt_key, conn_id || path_index) to generate their SRT. That also makes it simpler to generate the SRT from the cleartext CID.

In general, it's been this spec philosophy to add constraints that can be verified by the peer whenever possible. Forcing distinct CIDs ensures verifiable security whereas requiring that CIDs get routed appropriately for the duration of the connection is not enforceable and therefore easier to get wrong.

@MikeBishop
Copy link
Contributor

MikeBishop commented Jun 12, 2019

Essentially, the current text says that an issuer can't actually forget a retired CID until all CIDs with the same SRT have been retired. At that point and not before, you can forget all of them at once. #2769 doesn't change this at all -- doesn't even really make it worse.

That PR provides a mechanism for asking a client to retire a certain set of CIDs; if the set only partially overlaps with the set of CIDs covered by an SRT, that mechanism is not terribly useful because you can't actually forget all the CIDs you asked the peer to retire. It could be very useful to ask the peer to retire the last CID in one of these groups because you've got 200 of them you'd really like to clear out.

But that doesn't change the core fact: An issuer can't actually forget a retired CID until all CIDs with the same SRT have been retired. If each SRT maps uniquely to a CID, this is a simple requirement to satisfy. If you choose to do something else, that's your problem.

@kazuho
Copy link
Member

kazuho commented Jun 13, 2019

I think what @MikeBishop says is correct.

OTOH, I now think that forbidding reuse of SRT might help the receiver of the stateless resets.

Without the prohibition, when retiring a CID, an endpoint needs to consult other SRTs that have been issued for the same connection to see if it can unregister the SRT corresponding to the CID being retired, or use a ref-counted hashmap for maintaining the mapping from SRTs to connections.

We might argue that this requirement is easy to miss, and hard to test.

Having the prohibition removes this requirement and therefore can be considered as a simplification.

Contrary to that, we might argue that allowing the reuse of SRT has marginal benefit. It is trivial to construct different SRT for each CID, as pointed out by @DavidSchinazi.

@martinthomson
Copy link
Member

@MikeBishop's analysis is entirely correct.

I don't think that we should put in rules to prevent self-harm through idiocy. And this is definitely a case of that. An endpoint that forgets a CID when the associated SRT is active only hurts themselves.

The text is sufficient as it stands. A prohibition won't prevent implementations from doing dangerous things. As it stands, you need to take extraordinary steps to put yourself in position to mess this up. The construction @kazuho describes is a totally sensible one, but it is very hard to forget connection IDs with that scheme. More relevant to this, the SRT construction process described in the spec won't have this problem.

What exactly do you think this prohibition will achieve?

@kazuho
Copy link
Member

kazuho commented Jun 13, 2019

What exactly do you think this prohibition will achieve?

If there is a prohibition, an endpoint can remove the SRT of the CID being retired, without consulting the SRTs of other CIDs belonging to the same connection.

While incorrect in terms of current spec, such design works flawlessly with endpoints that issue different SRT for each CID. It seems that most implementations would issue different SRT for each CID. Then, there's chance that we might see stacks implementing this incorrect but simpler approach.

Having the prohibition eliminates the chance of seeing these "incorrectness."

@marten-seemann
Copy link
Contributor

I agree that we should remove the text that allows the reuse of SRTs from the spec (In fact, last night I was planning to open exactly this issue, and woke up this morning to see that @DavidSchinazi already did so while I was asleep).

I don’t see any reason why you’d want to reuse a SRT in the first place - computing one HMAC shouldn’t be more expensive than applying the packe protection to send a single packet. With regards to how you store connection IDS, this design is only simpler if you do the wrong thing and don‘t keep track of expired connection IDs that still have active SRTs. Doing it the right way will most likely result in a more complex design than associating each CID with its own SRT.

In short: it’s easy to mess up, and there’s little benefit, so let’s remove it.

@mikkelfj
Copy link
Contributor

I'm not totally against a unique SRT per CID, but I am concerned about the static memory use of otherwise passive connections.

If a CID and SRT can be 255 bytes each (depending on where discussions go) that is half a kilobyte per CID, and there might be 8 of those, possibly more. That is 4K per connection even if you retire all active buffers because there isn't currently any traffic.

@DavidSchinazi
Copy link
Contributor Author

@mikkelfj SRTs are 16 bytes long

@mikkelfj
Copy link
Contributor

@DavidSchinazi I'm not sure about that - isn't there text about hiding SRT among ordinary traffic. If so it cannot be shorter than the CID + overhead, well the packet that is. But I'm probably mixing up packet size with storage requirements.

@ianswett
Copy link
Contributor

The SRT is only 16 bytes, but the packet it's sent in is supposed to look like a typical QUIC short header packet, and therefore needs to be of a minimum size.

@MikeBishop
Copy link
Contributor

MikeBishop commented Jun 13, 2019

So essentially, we're saying that we maybe didn't really have consensus on the answer to #2732 being "yes." But since there was a consensus call that said we did, now the question is whether we have consensus to reverse that decision. Let me suggest that be taken to the list.

@mnot mnot added this to Triage in Late Stage Processing Jun 14, 2019
@mnot mnot added the design An issue that affects the design of the protocol; resolution requires consensus. label Jun 27, 2019
@mnot mnot moved this from Triage to Design Issues in Late Stage Processing Jun 27, 2019
@larseggert
Copy link
Member

Discussed in Montreal, next step is for @martinthomson to do a PR and follow normal process

@martinthomson
Copy link
Member

Conclusion: prohibit the use of duplicate tokens. Allow, but not require the use of PROTOCOL_VIOLATION if a duplicate is detected.

@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Sep 3, 2019
@mnot mnot moved this from Design Issues to Consensus Emerging in Late Stage Processing Sep 13, 2019
@mnot mnot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Sep 25, 2019
@mnot mnot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Oct 15, 2019
@mnot mnot added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Oct 15, 2019
@martinthomson
Copy link
Member

#2968 covered this.

Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
Late Stage Processing
  
Issue Handled
Development

No branches or pull requests

9 participants