Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateless Reset needs "on-path" proof #1230

Closed
igorlord opened this issue Mar 18, 2018 · 11 comments
Closed

Stateless Reset needs "on-path" proof #1230

igorlord opened this issue Mar 18, 2018 · 11 comments
Assignees
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@igorlord
Copy link
Contributor

igorlord commented Mar 18, 2018

Stateless Reset contains a proof that the Server sent it. However, if the server's key is compromised, Stateless Reset can be forged wholesale off-path. Stateless Reset must also contain a proof that the sender observed the original packet that caused Stateless Reset. Clearly, there are many simple ways to do so.

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Mar 18, 2018
@MikeBishop
Copy link
Contributor

MikeBishop commented Apr 4, 2018

Another attack to consider along similar lines: If an active attacker observes the CID being used by a client, sends a packet with that CID to a different server endpoint that shares the same SR token algorithm/key, then it can get the other endpoint to effectively be an oracle for the stateless reset token. It can then inject a Stateless Reset with the token into the current connection.

One mitigation would be to say that it's the server deployment's problem -- you should have different SR token-generation keys between different endpoints. Another is to require that a Stateless Reset was sent by someone who had both a specific packet you sent and the token, without disclosing the token in a form that could be adapted to a different rejected packet.

(This is a similar case to #1264; since the generator of the SR can't validate the crypto, they also can't know whether it was generated by the client, misdirected by an attacker, or fully generated by an attacker. That means they also can't avoid being a CID->SR oracle.)

@martinthomson
Copy link
Member

Yes, this is a deployment issue. If packets with the same connection ID can end up at a node that shares a stateless reset key, but can't access connection state so that it generates a stateless reset, we have an oracle. We should document that attack.

@kazuho
Copy link
Member

kazuho commented Apr 10, 2018

While I agree that this can be considered a deployment issue, I think we should forbid servers deploying the same static secret used for generating the stateless reset token among the servers that do not share the connection state, because it not only goes against what "authenticated" reset is but also has privacy concerns.

My understanding is that many of the server-side deployments that care about this are those that use BGP to distribute their packets among multiple POPs. Those operators tend to serve multiple hostnames.

That means unless we mandate such server operators to provide stateless reset tokens only in a secure manner, attackers can force a connection to terminate (by using a stateless reset token obtained from a different POP) and then see the SNI carried in the handshake of a new connection.

Therefore, I think that we should either forbid server deployments from sharing the static key without sharing connection state, or, look for a technical approach to prevent the attack.

One such approach would be something like below:

  • for every QUIC connection, let the server advertise a "state-store ID" that designates the ID of the state-store that the connection is bound to
  • let a Stateless Reset packet carry encrypted data, which is encrypted by a key (other than stateless reset token) derived from the server CID and the static key used to generate the stateless reset token
    • that encryption key will be sent together with the stateless reset token in the NEW_CONNECTION_ID frame
  • a Stateless Reset packet will carry the "state-store ID" of the server sending that packet in an encrypted form
  • a client can compare the "state-store ID" of the connection with the value found in the Stateless Reset, and determine that:
    • if the values match, the connection has been reset
    • if the values do not match, the path is being rejected

There could be other ways, but I think sending encrypted data in a stateless reset and verifying that on the client side would be the necessary thing to do here.

WDYT?

@martinthomson
Copy link
Member

martinthomson commented Apr 10, 2018

Yes, if the extent of the state storage distribution matches the extent of the key distribution, which is a natural design, then this is naturally OK.

@kazuho, I'm struggling with your state store ID design. If the state and key storage is co-extant, then the packet that hits a different store will generate a response that the client won't recognize as a valid stateless reset. So I'm failing to see how the extra steps you describe help with the problem.

Unless you are suggesting that this key needs to be global (unlike the token, which is scoped to a particular zone). I don't think going global works though. An attacker can obtain a key for a given connection ID in one cluster and use it - in combination with the stateless reset oracle Mike described - to attack another.

If we wanted to prove that the server can generate the Stateless Reset without compromising later uses of the Stateless Reset, then we could feed the incoming packet into the process. For example, rather than include the token in the stateless reset directly, it includes a constant value that is generated by using HMAC(token, some fixed content) (or use HKDF[1]). A client can use that to detect that this is a stateless reset. In addition, the reset might also include a separate proof of receipt such as HMAC(token, incoming packet or part thereof). A client might choose to ignore a stateless reset that had an invalid secondary proof.

Of course, the risk here is that clients don't track enough from their outbound packets to reconstruct the input to the HMAC. After all, it requires tracking packet ciphertext - no other part of a packet is unpredictable enough. So they might have to treat receipt of a possible stateless reset as a trigger to enable whatever tracking they need, after which they send another packet and hopefully get a stateless reset. Aside from adding a whole round trip to the process, it just more than doubled the cost of a stateless reset at the server.

[1] We originally had another hash for stateless resets, but removed that in favour of the current, simpler design.

@kazuho
Copy link
Member

kazuho commented Apr 10, 2018

@martinthomson

Unless you are suggesting that this key needs to be global (unlike the token, which is scoped to a particular zone).

That is exactly what I am suggesting, under the assumption that some would be willing to do so (if none are, we should simply ban a single key shared among the servers that do not share the connection state, instead of discussing how a client should act against such servers as proposed in #1259).

In my proposal, a stateless reset token becomes an identifier of the connection (or path) that is being reset. It is the tag of the encrypted data (that contains the "state-store ID") that proves that the stateless reset has been sent from a server. Then, a client uses the "state-store ID" to see if the reset was sent by a server that should have known the connection (means a connection reset), or not (means a path being rejected).

In other words, a stateless reset token becomes usable once per every "state-store", instead of once per connection.

PS. maybe I am too abstract (I am trying to not go into certain design decisions). I can come up with a more concrete description if that's preferable.

@martinthomson
Copy link
Member

OK, thanks for clearing that up. I think that you have an attack, though it might not be that interesting.

Connection IDs are scoped to a state store, whereas your proposed key is global. That means that an attacker that can learn the key for a given connection ID in any state store can use it to attack all other state stores. For high entropy connection IDs, that takes some doing, but it isn't generically safe, especially now we allow as little as 32 bits.

What is generically safe is co-extant state. That is, the connection ID has to be valid everywhere the static key is used and thus a packet with a given connection ID either causes a valid stateless reset for that connection or it is accepted.

Given the complexity of the additional mechanism and that exposure, I'd rather concentrate on the attack that this issue was originally raised to address: the absence of a proof-of-receipt in the stateless reset packet. Given that we now have a little as 32 bits of entropy (or less) in a connection ID, the potential for a stateless reset oracle is bothering me a little.

@kazuho
Copy link
Member

kazuho commented Apr 10, 2018

OK, thanks for clearing that up. I think that you have an attack, though it might not be that interesting.

Connection IDs are scoped to a state store, whereas your proposed key is global. That means that an attacker that can learn the key for a given connection ID in any state store can use it to attack all other state stores. For high entropy connection IDs, that takes some doing, but it isn't generically safe, especially now we allow as little as 32 bits.

Thank you for considering the approach and pointing out the issue. I had not considered of the attack vector.

And considering of the attack vector, I realize that there is a less complex approach.

Assuming that the ID of the POP (i.e. the "state-store") is included in the DCID, a server can determine if it should have known the state a connection that the DCID designates, and send a connection reset or a path rejection based on that.

With that said, I am personally fine with requiring co-extant state. As expressed in #1230 (comment), my intent behind the proposal has been under the premise that some might want to reject path creation when packets arrive at a POP that does not have access tho the state store.

@martinthomson
Copy link
Member

Yeah, I don't think that path rejection is going to work that well. Packets that arrive at a POP that doesn't have access to a state store will either enter a black-hole (if they know about the connection ID, in which case it will be dropped after packet protection removal fails), or generate a stateless reset. More of the latter as connection IDs get longer and more sparse.

That's what makes me think that we need some sort of verification of intent in addition to a token. A routing flap, misconfiguration, or attack might cause a storm of stateless resets that might be reusable by an attacker. Adding a liveness check is probably a good idea to avoid that.

@martinthomson martinthomson self-assigned this Jun 4, 2018
@martinthomson
Copy link
Member

@martinduke made a good point about this, which made me reconsider this idea. A man-on-the-side attacker can copy whatever details they need from packets an endpoint sends in its use of a stateless reset oracle. That is, if the stateless reset depends on data in a packet that an endpoint sends (and it can't depend on anything more than that), then the attacker simply copies whatever it needs from a genuine packet. A liveness check therefore only really makes the attackers job harder. Since the fix here that forces the attacker to be live is to make the stateless reset more complex, we should be very careful to consider the trade-off.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jun 6, 2018

A loose thought, but intuition suggests:

With more roundtrips it would probably be possible to design a challenge response scheme. Dead endpoint issues a stateless reset challenge. Live endpoint responds by hashing the challenge with a negotiated reset key. Dead end point issues a new reset by hashing the live response.

MITM cannot use observed packets to issue new resets without triggering a roundtrip with proper routing between the endpoints, and the supposedly dead endpoint would not cooperate if it isn't dead.

@martinthomson
Copy link
Member

Discussed in Kista. Conclusion was that the marginal benefit was minor, if not non-existent, and the cost was significant. A man-on-the-side could build the oracle. We will instead concentrate on documenting the constraints on the routing infrastructure.

@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

6 participants