-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preventing KEY_PHASE bit from being used as a tool to correlate CIDs #1322
Comments
The other way to address the issue is to remove the KEY_PHASE bit. For example, we can remove the bit and instead state that the key phase can be changed only when a new CID is used. When an endpoint receives a new CID, it trial decrypts the packet using the current generation key and the next generation key to see if the key phase has shifted. |
I agree that we should do both. The intent of the text was to state that preventing leak of information from KEY_PHASE bit that could be used to correlate CIDs, makes sense only if we prevent leak from PN in some way. |
I like the idea of using migration to a new CID as an alternative to key rollover. Come to think of it, we could transmit the key phase bit as part of the NEW CONNECTION ID frame. Could even make it a key rollover sequence. But that requires allowing different paths to use different key phases. If all paths migrate to new CID at the same time, we still have leakage. |
That sounds like budget encryption, which I think we should avoid like the plague. See #1317 (comment) for an alternative design. |
The positive side of the current design is that key updates are unilateral. Using NEW_CONNECTION_ID for signalling means that an endpoint will instruct the peer to update the key. I am not sure if doing so is a good thing.
I agree that that is an issue. One trivial way of avoiding such an issue is to implement the following strategy:
We might want to suggest something like that. |
Key updates require reciprocation. Using a new connection ID would be the same. So no change there. One advantage of coupling NEW_CONNECTION_ID to key updates is that it means that there is a single mechanism that supports migration (and multipath) and key update. You can't support one without also enabling the other. One thing I've been concerned about with key update is that no one will implement it. NSS only very recently grew that capability in TLS 1.3; it's a small feature, but one that took ages to implement because there wasn't much incentive to build it. We still don't have the DTLS variant because it messes with our timers in ways that are hard to reason about. I agree that simultaneous migration would be a mistake. Better to drive rollovers based on packet counts as @kazuho suggests. Though I think that we can probably ignore the 262 limit and throw connections away if they ever get to that point. |
I support the idea of changing keys only when changing CID, but make it optional until the packet count reaches at most 2^32. CID change can be probalisitic after say 2^31 packets to reduce linkage. |
FWIW, picotls doesn't yet have key updates (and I wonder if I would ever implement it). Considering that, I am perfectly fine with dropping key update entirely from QUIC v1, leaving it as an issue to be solved by an extension or v2 issue. |
@kazuho I don't have the numbers fresh in memory, but if you don't do key updates, you limit yourself to a few GB of transfer, or you create a serious security violation because GCM is weak when you send too much data. Or am I missing something about key updates? If it isn't already stated, an endpoint should close a conneciton with protocol violation if it does not refresh keys in due time. |
@mikkelfj Thank you for the link. I have actually been looking for it. And rereading it, I agree that we need to have some form of key update in QUIC (or change PN to a 32-bit value, considering that AES-GCM is the default cipher). |
Now that #1079 has been merged, and assuming that we would not have per-CID PN, let me propose another simple solution: never update PN protection key throughout the connection, and define the generation of payload protection key as PN mod 232. Since the sample used for PNE is 128 bits long, I would assume (or hope) that we can encrypt 262 packet numbers using a single key. In other words, there is no need to update the PN protection key. Then, we can use the value of the decrypted PN to determine the generation of the payload protection key. |
I like that proposal. It makes key rotation straightforward, and it does solve the phase bit leakage issue. |
That could work. There is an on path attack where large packet numbers are skipped to expend resources on frequent rekeying, so perhaps som text to guard against this - notably in relation to optimistic ACK packet skipping. I was considering switching between CID's instead of skipping packets - but that is no good if the ACK doesn't care about the path and PN's have one linear space. |
That’s an interesting proposal. |
@kazuho: I still believe that per-CID PN is the best way to get to multipath. Even if we don't do it yet, I expect that we will get there eventually. I wouldn't want to make the assumption that we use a single PN space yet. I like the idea of tying key updates to new CIDs, but there's a gotcha. Both endpoints might decide to do zero-length CIDs, and you then lose any ability to do a key update. Why not do both? That is, KEY_PHASE bit used as normal, but CID resets the key phase. In other words, CID represents the high-order bits of the key phase, and the KEY_PHASE bit represents the lowest order bit. Does this make sense? |
I am not sure if that is a realistic attack considering the fact that attacker would be required to spend same amount of CPU cost, and that an attacker cannot trigger key update once every 2 RTT (note: a sender cannot send PN that is greater than the largest acknowledged value + 231). |
I share the concern, and I think that we can grease the feature. One way would be to occasionally start with a PN near 232. The other way would be to for some portion of connections inject huge gaps during the early stages of the connection so that the PN would soon become above 232. |
Is there any limit on how many key updates can safely be done? If not, another option would be to lower the threshold to e.g. 2^16 (this would correspond to roughly 80MB of data transferred). This would ensure that a significant fraction of QUIC connections would do a key update. |
Agreed.
I think that I at least agree with you that my proposal of eliminating the KEY_PHASE bit by using PN mod 232 as the generation number is orthogonal to the discussion of #1317.
That should be possible, but it might be better to derive the first generation of the encryption key using HKDF that takes CID as the input, then update the key for every 232 PN. This is because we cannot skip key generations in the current derivation formula, which is: |
I would prefer setting the threshold to something greater than the largest delta that can be represented by the PN field of a QUIC packet. That property ensures you that having 2 generations of encryption keys is enough, regardless of how the client injects gaps. That is why I proposed 232. |
I don’t know if many implementations would actually implement a key update that only allows ales place after 5 TB of transferred data. In most cases, it would probably be easier to shut down the connection and reconnect. |
@marten-seemann Maybe you missed #1322 (comment)? I think you can enforce the use in some ways. |
I don't think that PN mod 232 works without trial decryption. Let's say that we have an 8 bit encoding and we are approaching the crossover point. When the crossover happens, the recipient uses the old PN key and gets a random 8 bit value, which might be a valid value less than the crossover point. That packet will fail to be decrypted. I also agree with @marten-seemann that we should favour designs that are based on explicit signals rather than simple counts. 232 is large enough that some implementations might not encounter the rollover point. I have experience with fixed rollovers in NSS and testing them is ugly. On the other hand, sending KeyUpdate is easy. I don't think that a randomized packet number is the right fix for that. Finally, though I suspect @janaiyengar is right with respect to per-CID spaces, but the decision to roll keys based on a change in connection ID can be made without going that far. All we need to do is decide to keep the sequencing of connection IDs. We also need to recognize that two sets of keys might be maintained for some time during migration; that would be somewhat longer than the text currently suggests because of probing, but it's not that much. |
@martinthomson Maybe you missed the point that I proposed keeping the PN key stable across the entire connection? #1322 (comment) |
No, I saw that - and it's a necessary precondition for this. Nonetheless, the problem remains. |
Then, would you mind clarifying what you mean by “old PN key”? |
Oh, I'm being daft. I still think that the other reasons are enough. |
PS. assuming that we can use one PN key across the entire connection, the other solution would be to steal one bit from the varint encoding of the PN (in clear) and use that as the KEY_PHASE bit. The bit will then become unobservable. |
Yes, I have considered that option. I still prefer coupling connection ID change to key migration. It has the benefit of consolidating mechanisms for migration and key update, which means that the mechanism is more likely to see use. |
I can see the benefit. OTOH, there is a downside. As @janaiyengar has pointed out, a server with a zero-length CID cannot let the Connection ID change. |
FWIW, in #989 (comment) I have proposed encrypting certain bits (including the KEY_PHASE bit) of the first octet in-place. The pros / cons of the approach is discussed on the comment there. |
Skipping packet numbers to induce key rotation seems like (danish proverb) crossing the creek to fetch water since PNE just gained a linear number space. |
Yes, sort of. If there is a 1/N chance of a successfull attack on a key, and you have M keys, the chance is improved to M/N. As I understand this ties more to AES than to GCM and since AES is much harder than GCM, this is not an immediate concern, but too many keys is not a good idea. See |
In #1322 (comment):
@martinthomson I think I missed one important aspect in my comment here. The issue with only allowing key update on CID change would mean that an endpoint (both the client and the server) cannot update the key if the peer advertises a zero-length CID. I'd assume that most clients will be advertising zero-length CIDs. That would mean that for most connections a server would not be able to trigger a key update, unless we allow key updates to happen without migrating the connection. |
Unless keyphase is a degenerate CID. If so, CID's are not zero length, they are at least one bit. |
I think that all of this is too complicated. Here's a simple solution for this issue: NEW_CONNECTION_ID can carry an extra hard-to-guess bit that masks the KEY_PHASE bit so that it can't be used to correlate flows. We also recommend that endpoints that don't use connection IDs can randomly do a key update when they migrate to a new path to break linkability. That creates uncertainty about which half of the population they are correlated with. |
I am not sure if that is a good solution. While it is true that we can scramble the initial value, I do not think that it prevents the leak, because an observer can use the fact of the bit belonging to multiple CID's being flipped at the same time as the correlator. Considering that, I think that we need to encrypt the information in some way. So far, we have seen three proposals:
As discussed, a does not work well when the peer advertises a zero-length CID (which would be the typical behavior for clients). I personally prefer |
Oh, I see, you are also trying to solve the problem for multipath at the same time. I agree that what I suggested doesn't work for that. I think that your b and c are shades of the same design. Both require that the PN protection key is fixed (or maybe dependent on connection ID). I'm OK with c, but the reduced ability to force an update in b is still a concern. Why not just encrypt phase along with the packet number? |
Yes. That's correct. And we have a restricted form of multipath in v1 (i.e. path probing).
I am fine with c. The trade-offs between b and c (i.e. spending one-bit vs. ease of updating the key) seem minor to me. |
Aside from the architectural correctness of this proposal -- the path really doesn't need to see KEY_PHASE, so we shouldn't expose it -- this discussion does seem to take as given that protecting the KEY_PHASE bit will have actual benefits in preventing linkability, which seems to be to be a very dubious assertion. Very simple timing analysis in most cases will be so much more useful in achieving CID linkability than any of the edge cases we're optimizing away now that I'm afraid any additional benefit here is lost in the underflow -- I've filed details on this on ops-drafts as quicwg/ops-drafts#31 Again, this seems like a perfectly reasonable thing to do for the sake of cleanliness and correctness, but I wouldn't want us to do this with any hope that it is actually useful against linkability. |
Fixed in #2006. |
Once we decide to either encrypt PN (#1079) or have per-CID PN (#1317), KEY_PHASE bit will be left as a tool for an observer to correlate CIDs in order to determine which of those CIDs belong to a particular QUIC connection.
Let's consider the case where a client, in an attempt to grease the bit, flips the KEY_PHASE bit for every 65,536 packets it sends. That would mean that it is effectively leaking the 17th bit (from LSB) of an unencrypted PN. The leak would be sufficient for an observer to determine which CIDs belonged to the same connection, for connections consisting of CIDs that carry more than 65,356 packets.
Another bad example is a client that greases the bit for some minor portion of the connections (e.g. 1%). Then, an observer will have a better chance of tracking that 1% of connection.
Considering these two attacks, I think that we should require implementations to flip the KEY_PHASE bit at randomly, and that we should flip either 50% or 100% of the connections that we handle.
The questions are:
The text was updated successfully, but these errors were encountered: