Preventing KEY_PHASE bit from being used as a tool to correlate CIDs #1322

kazuho · 2018-04-24T07:26:22Z

Once we decide to either encrypt PN (#1079) or have per-CID PN (#1317), KEY_PHASE bit will be left as a tool for an observer to correlate CIDs in order to determine which of those CIDs belong to a particular QUIC connection.

Let's consider the case where a client, in an attempt to grease the bit, flips the KEY_PHASE bit for every 65,536 packets it sends. That would mean that it is effectively leaking the 17th bit (from LSB) of an unencrypted PN. The leak would be sufficient for an observer to determine which CIDs belonged to the same connection, for connections consisting of CIDs that carry more than 65,356 packets.

Another bad example is a client that greases the bit for some minor portion of the connections (e.g. 1%). Then, an observer will have a better chance of tracking that 1% of connection.

Considering these two attacks, I think that we should require implementations to flip the KEY_PHASE bit at randomly, and that we should flip either 50% or 100% of the connections that we handle.

The questions are:

do we need to document such requirements?
are there any other ways for an observer to use the bit?

kazuho · 2018-04-24T07:28:21Z

The other way to address the issue is to remove the KEY_PHASE bit.

For example, we can remove the bit and instead state that the key phase can be changed only when a new CID is used. When an endpoint receives a new CID, it trial decrypts the packet using the current generation key and the next generation key to see if the key phase has shifted.

martinthomson · 2018-04-26T08:34:29Z

Once we decide to either encrypt PN (#1079) or have per-CID PN (#1317),

Why not both?!?

kazuho · 2018-04-26T12:24:21Z

I agree that we should do both.

The intent of the text was to state that preventing leak of information from KEY_PHASE bit that could be used to correlate CIDs, makes sense only if we prevent leak from PN in some way.

huitema · 2018-04-30T21:03:11Z

I like the idea of using migration to a new CID as an alternative to key rollover. Come to think of it, we could transmit the key phase bit as part of the NEW CONNECTION ID frame. Could even make it a key rollover sequence. But that requires allowing different paths to use different key phases. If all paths migrate to new CID at the same time, we still have leakage.

martinthomson · 2018-04-30T23:57:40Z

That sounds like budget encryption, which I think we should avoid like the plague. See #1317 (comment) for an alternative design.

kazuho · 2018-05-01T03:46:56Z

Come to think of it, we could transmit the key phase bit as part of the NEW CONNECTION ID frame. Could even make it a key rollover sequence.

The positive side of the current design is that key updates are unilateral. Using NEW_CONNECTION_ID for signalling means that an endpoint will instruct the peer to update the key. I am not sure if doing so is a good thing.

If all paths migrate to new CID at the same time, we still have leakage.

I agree that that is an issue. One trivial way of avoiding such an issue is to implement the following strategy:

endpoints should switch to a new path after sending 2³² packets on that path
endpoints should start using an updated key for every 2⁶¹ packets that it sends over one QUIC connection

We might want to suggest something like that.

martinthomson · 2018-05-01T05:17:52Z

Key updates require reciprocation. Using a new connection ID would be the same. So no change there.

One advantage of coupling NEW_CONNECTION_ID to key updates is that it means that there is a single mechanism that supports migration (and multipath) and key update. You can't support one without also enabling the other.

One thing I've been concerned about with key update is that no one will implement it. NSS only very recently grew that capability in TLS 1.3; it's a small feature, but one that took ages to implement because there wasn't much incentive to build it. We still don't have the DTLS variant because it messes with our timers in ways that are hard to reason about.

I agree that simultaneous migration would be a mistake. Better to drive rollovers based on packet counts as @kazuho suggests. Though I think that we can probably ignore the 2⁶² limit and throw connections away if they ever get to that point.

mikkelfj · 2018-05-01T07:10:42Z

I support the idea of changing keys only when changing CID, but make it optional until the packet count reaches at most 2^32. CID change can be probalisitic after say 2^31 packets to reduce linkage.

kazuho · 2018-05-01T08:12:32Z

@martinthomson

One thing I've been concerned about with key update is that no one will implement it. NSS only very recently grew that capability in TLS 1.3; it's a small feature, but one that took ages to implement because there wasn't much incentive to build it. We still don't have the DTLS variant because it messes with our timers in ways that are hard to reason about.

I agree that simultaneous migration would be a mistake. Better to drive rollovers based on packet counts as @kazuho suggests. Though I think that we can probably ignore the 2⁶² limit and throw connections away if they ever get to that point.

FWIW, picotls doesn't yet have key updates (and I wonder if I would ever implement it). Considering that, I am perfectly fine with dropping key update entirely from QUIC v1, leaving it as an issue to be solved by an extension or v2 issue.

mikkelfj · 2018-05-01T08:33:46Z

@kazuho I don't have the numbers fresh in memory, but if you don't do key updates, you limit yourself to a few GB of transfer, or you create a serious security violation because GCM is weak when you send too much data.

Or am I missing something about key updates?

If it isn't already stated, an endpoint should close a conneciton with protocol violation if it does not refresh keys in due time.

mikkelfj · 2018-05-01T08:38:58Z

http://www.isg.rhul.ac.uk/~kp/TLS-AEbounds.pdf

kazuho · 2018-05-02T06:35:42Z

@mikkelfj Thank you for the link. I have actually been looking for it. And rereading it, I agree that we need to have some form of key update in QUIC (or change PN to a 32-bit value, considering that AES-GCM is the default cipher).

kazuho · 2018-05-03T13:11:45Z

Now that #1079 has been merged, and assuming that we would not have per-CID PN, let me propose another simple solution: never update PN protection key throughout the connection, and define the generation of payload protection key as PN mod 2³².

Since the sample used for PNE is 128 bits long, I would assume (or hope) that we can encrypt 2⁶² packet numbers using a single key. In other words, there is no need to update the PN protection key. Then, we can use the value of the decrypted PN to determine the generation of the payload protection key.

huitema · 2018-05-03T14:00:11Z

I like that proposal. It makes key rotation straightforward, and it does solve the phase bit leakage issue.

mikkelfj · 2018-05-03T16:13:55Z

That could work. There is an on path attack where large packet numbers are skipped to expend resources on frequent rekeying, so perhaps som text to guard against this - notably in relation to optimistic ACK packet skipping. I was considering switching between CID's instead of skipping packets - but that is no good if the ACK doesn't care about the path and PN's have one linear space.

marten-seemann · 2018-05-03T23:28:46Z

That’s an interesting proposal.
I see a problem that the key rotation is exercised very infrequently. 2^32 packets allow the transfer of roughly 5 TB of data, so only a tiny fraction of QUIC connections will use it at all. And code that doesn’t get used is prone to bugs.

janaiyengar · 2018-05-04T00:51:27Z

@kazuho: I still believe that per-CID PN is the best way to get to multipath. Even if we don't do it yet, I expect that we will get there eventually. I wouldn't want to make the assumption that we use a single PN space yet.

I like the idea of tying key updates to new CIDs, but there's a gotcha. Both endpoints might decide to do zero-length CIDs, and you then lose any ability to do a key update.

Why not do both? That is, KEY_PHASE bit used as normal, but CID resets the key phase. In other words, CID represents the high-order bits of the key phase, and the KEY_PHASE bit represents the lowest order bit. Does this make sense?

kazuho · 2018-05-04T00:54:54Z

@mikkelfj

There is an on path attack where large packet numbers are skipped to expend resources on frequent rekeying, so perhaps som text to guard against this - notably in relation to optimistic ACK packet skipping.

I am not sure if that is a realistic attack considering the fact that attacker would be required to spend same amount of CPU cost, and that an attacker cannot trigger key update once every 2 RTT (note: a sender cannot send PN that is greater than the largest acknowledged value + 2³¹).

kazuho · 2018-05-04T01:00:47Z

@marten-seemann

I see a problem that the key rotation is exercised very infrequently. 2^32 packets allow the transfer of roughly 5 TB of data, so only a tiny fraction of QUIC connections will use it at all. And code that doesn’t get used is prone to bugs.

I share the concern, and I think that we can grease the feature. One way would be to occasionally start with a PN near 2³². The other way would be to for some portion of connections inject huge gaps during the early stages of the connection so that the PN would soon become above 2³².

marten-seemann · 2018-05-04T01:09:41Z

Is there any limit on how many key updates can safely be done? If not, another option would be to lower the threshold to e.g. 2^16 (this would correspond to roughly 80MB of data transferred). This would ensure that a significant fraction of QUIC connections would do a key update.

kazuho · 2018-05-04T01:19:15Z

@janaiyengar

I like the idea of tying key updates to new CIDs, but there's a gotcha. Both endpoints might decide to do zero-length CIDs, and you then lose any ability to do a key update.

Agreed.

Why not do both?

I think that I at least agree with you that my proposal of eliminating the KEY_PHASE bit by using PN mod 2³² as the generation number is orthogonal to the discussion of #1317.

That is, KEY_PHASE bit used as normal, but CID resets the key phase. In other words, CID represents the high-order bits of the key phase, and the KEY_PHASE bit represents the lowest order bit. Does this make sense?

That should be possible, but it might be better to derive the first generation of the encryption key using HKDF that takes CID as the input, then update the key for every 2³² PN. This is because we cannot skip key generations in the current derivation formula, which is: client_pp_secret<N+1> = QHKDF-Expand(client_pp_secret<N>, "client 1rtt", Hash.length).

kazuho · 2018-05-04T01:23:02Z

@marten-seemann

Is there any limit on how many key updates can safely be done? If not, another option would be to lower the threshold to e.g. 2^16 (this would correspond to roughly 80MB of data transferred). This would ensure that a significant fraction of QUIC connections would do a key update.

I would prefer setting the threshold to something greater than the largest delta that can be represented by the PN field of a QUIC packet. That property ensures you that having 2 generations of encryption keys is enough, regardless of how the client injects gaps. That is why I proposed 2³².

marten-seemann · 2018-05-04T01:42:19Z

I don’t know if many implementations would actually implement a key update that only allows ales place after 5 TB of transferred data. In most cases, it would probably be easier to shut down the connection and reconnect.
I also don’t see how we could possibly integration test and interop test this, without manually modifying the constant every time we run a test (which kind of defeats the purpose of such a test anyway).

kazuho · 2018-05-04T01:46:09Z

@marten-seemann Maybe you missed #1322 (comment)? I think you can enforce the use in some ways.

martinthomson · 2018-05-04T01:47:48Z

I don't think that PN mod 2³² works without trial decryption.

Let's say that we have an 8 bit encoding and we are approaching the crossover point. When the crossover happens, the recipient uses the old PN key and gets a random 8 bit value, which might be a valid value less than the crossover point. That packet will fail to be decrypted.

I also agree with @marten-seemann that we should favour designs that are based on explicit signals rather than simple counts. 2³² is large enough that some implementations might not encounter the rollover point. I have experience with fixed rollovers in NSS and testing them is ugly. On the other hand, sending KeyUpdate is easy. I don't think that a randomized packet number is the right fix for that.

Finally, though I suspect @janaiyengar is right with respect to per-CID spaces, but the decision to roll keys based on a change in connection ID can be made without going that far. All we need to do is decide to keep the sequencing of connection IDs. We also need to recognize that two sets of keys might be maintained for some time during migration; that would be somewhat longer than the text currently suggests because of probing, but it's not that much.

kazuho · 2018-05-04T01:59:29Z

I don't think that PN mod 2³² works without trial decryption.
Let's say that we have an 8 bit encoding and we are approaching the crossover point. When the crossover happens, the recipient uses the old PN key and gets a random 8 bit value, which might be a valid value less than the crossover point. That packet will fail to be decrypted.

@martinthomson Maybe you missed the point that I proposed keeping the PN key stable across the entire connection? #1322 (comment)

martinthomson · 2018-05-04T02:02:37Z

No, I saw that - and it's a necessary precondition for this. Nonetheless, the problem remains.

kazuho · 2018-05-04T02:04:59Z

Then, would you mind clarifying what you mean by “old PN key”?

martinthomson · 2018-05-04T02:09:42Z

Oh, I'm being daft. I still think that the other reasons are enough.

kazuho · 2018-05-04T02:09:46Z

PS. assuming that we can use one PN key across the entire connection, the other solution would be to steal one bit from the varint encoding of the PN (in clear) and use that as the KEY_PHASE bit. The bit will then become unobservable.

martinthomson · 2018-05-04T03:00:44Z

Yes, I have considered that option. I still prefer coupling connection ID change to key migration. It has the benefit of consolidating mechanisms for migration and key update, which means that the mechanism is more likely to see use.

kazuho · 2018-05-04T04:51:04Z

I still prefer coupling connection ID change to key migration. It has the benefit of consolidating mechanisms for migration and key update, which means that the mechanism is more likely to see use.

I can see the benefit. OTOH, there is a downside. As @janaiyengar has pointed out, a server with a zero-length CID cannot let the Connection ID change.

kazuho · 2018-05-04T04:53:07Z

FWIW, in #989 (comment) I have proposed encrypting certain bits (including the KEY_PHASE bit) of the first octet in-place. The pros / cons of the approach is discussed on the comment there.

mikkelfj · 2018-05-04T05:06:39Z

Skipping packet numbers to induce key rotation seems like (danish proverb) crossing the creek to fetch water since PNE just gained a linear number space.

mikkelfj · 2018-05-04T08:28:38Z

@marten-seemann

Is there any limit on how many key updates can safely be done? If not, another option would be to lower the threshold to e.g. 2^16 (this would correspond to roughly 80MB of data transferred). This would ensure that a significant fraction of QUIC connections would do a key update.

Yes, sort of. If there is a 1/N chance of a successfull attack on a key, and you have M keys, the chance is improved to M/N. As I understand this ties more to AES than to GCM and since AES is much harder than GCM, this is not an immediate concern, but too many keys is not a good idea.

See
1.3 Security Degradation for Multiple Keys

http://www.isg.rhul.ac.uk/~kp/TLS-AEbounds.pdf

kazuho · 2018-05-04T18:11:35Z

In #1322 (comment):

I still prefer coupling connection ID change to key migration. It has the benefit of consolidating mechanisms for migration and key update, which means that the mechanism is more likely to see use.

I can see the benefit. OTOH, there is a downside. As @janaiyengar has pointed out, a server with a zero-length CID cannot let the Connection ID change.

@martinthomson I think I missed one important aspect in my comment here.

The issue with only allowing key update on CID change would mean that an endpoint (both the client and the server) cannot update the key if the peer advertises a zero-length CID.

I'd assume that most clients will be advertising zero-length CIDs. That would mean that for most connections a server would not be able to trigger a key update, unless we allow key updates to happen without migrating the connection.

mikkelfj · 2018-05-04T19:41:06Z

Unless keyphase is a degenerate CID. If so, CID's are not zero length, they are at least one bit.

martinthomson · 2018-05-07T00:45:32Z

I think that all of this is too complicated. Here's a simple solution for this issue:

NEW_CONNECTION_ID can carry an extra hard-to-guess bit that masks the KEY_PHASE bit so that it can't be used to correlate flows.

We also recommend that endpoints that don't use connection IDs can randomly do a key update when they migrate to a new path to break linkability. That creates uncertainty about which half of the population they are correlated with.

kazuho · 2018-05-07T01:30:33Z

NEW_CONNECTION_ID can carry an extra hard-to-guess bit that masks the KEY_PHASE bit so that it can't be used to correlate flows.

I am not sure if that is a good solution. While it is true that we can scramble the initial value, I do not think that it prevents the leak, because an observer can use the fact of the bit belonging to multiple CID's being flipped at the same time as the correlator.

Considering that, I think that we need to encrypt the information in some way. So far, we have seen three proposals:

a. tie key phase to CID
b. tie key phase to PN (which it now encrypted)
c. encrypt key phase using PN key

As discussed, a does not work well when the peer advertises a zero-length CID (which would be the typical behavior for clients). I personally prefer
b due to the fact that it does not waste a bit on the wire, however I can understand people looking for a more flexible way of flipping the bit, which is c.

martinthomson · 2018-05-07T01:51:58Z

Oh, I see, you are also trying to solve the problem for multipath at the same time. I agree that what I suggested doesn't work for that.

I think that your b and c are shades of the same design. Both require that the PN protection key is fixed (or maybe dependent on connection ID). I'm OK with c, but the reduced ability to force an update in b is still a concern. Why not just encrypt phase along with the packet number?

kazuho · 2018-05-07T02:05:48Z

Oh, I see, you are also trying to solve the problem for multipath at the same time. I agree that what I suggested doesn't work for that.

Yes. That's correct. And we have a restricted form of multipath in v1 (i.e. path probing).

I think that your b and c are shades of the same design. Both require that the PN protection key is fixed (or maybe dependent on connection ID). I'm OK with c, but the reduced ability to force an update in b is still a concern. Why not just encrypt phase along with the packet number?

I am fine with c. The trade-offs between b and c (i.e. spending one-bit vs. ease of updating the key) seem minor to me.

britram · 2018-05-07T06:42:03Z

Aside from the architectural correctness of this proposal -- the path really doesn't need to see KEY_PHASE, so we shouldn't expose it -- this discussion does seem to take as given that protecting the KEY_PHASE bit will have actual benefits in preventing linkability, which seems to be to be a very dubious assertion.

Very simple timing analysis in most cases will be so much more useful in achieving CID linkability than any of the edge cases we're optimizing away now that I'm afraid any additional benefit here is lost in the underflow -- I've filed details on this on ops-drafts as quicwg/ops-drafts#31

Again, this seems like a perfectly reasonable thing to do for the sake of cleanliness and correctness, but I wouldn't want us to do this with any hope that it is actually useful against linkability.

martinthomson · 2018-11-20T23:11:35Z

Fixed in #2006.

martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport -tls labels Apr 26, 2018

kazuho mentioned this issue May 1, 2018

per-CID PN space to avoid PN leak becoming a privacy issue? #1317

Closed

kazuho mentioned this issue May 4, 2018

varint packet numbers #989

Closed

kazuho mentioned this issue Jul 29, 2018

Keep PN out of AEAD #1578

Closed

martinthomson closed this as completed Nov 20, 2018

mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019

Preventing KEY_PHASE bit from being used as a tool to correlate CIDs #1322

Preventing KEY_PHASE bit from being used as a tool to correlate CIDs #1322

Comments

kazuho commented Apr 24, 2018

kazuho commented Apr 24, 2018

martinthomson commented Apr 26, 2018

kazuho commented Apr 26, 2018

huitema commented Apr 30, 2018

martinthomson commented Apr 30, 2018

kazuho commented May 1, 2018

martinthomson commented May 1, 2018

mikkelfj commented May 1, 2018

kazuho commented May 1, 2018

mikkelfj commented May 1, 2018

mikkelfj commented May 1, 2018

kazuho commented May 2, 2018

kazuho commented May 3, 2018 • edited Loading

huitema commented May 3, 2018 • edited Loading

mikkelfj commented May 3, 2018

marten-seemann commented May 3, 2018

janaiyengar commented May 4, 2018

kazuho commented May 4, 2018

kazuho commented May 4, 2018

marten-seemann commented May 4, 2018

kazuho commented May 4, 2018

kazuho commented May 4, 2018

marten-seemann commented May 4, 2018

kazuho commented May 4, 2018

martinthomson commented May 4, 2018

kazuho commented May 4, 2018

martinthomson commented May 4, 2018

kazuho commented May 4, 2018

martinthomson commented May 4, 2018

kazuho commented May 4, 2018 • edited Loading

martinthomson commented May 4, 2018

kazuho commented May 4, 2018

kazuho commented May 4, 2018

mikkelfj commented May 4, 2018

mikkelfj commented May 4, 2018

kazuho commented May 4, 2018

mikkelfj commented May 4, 2018

martinthomson commented May 7, 2018

kazuho commented May 7, 2018

martinthomson commented May 7, 2018

kazuho commented May 7, 2018 • edited Loading

britram commented May 7, 2018

martinthomson commented Nov 20, 2018

kazuho commented May 3, 2018 •

edited

Loading

huitema commented May 3, 2018 •

edited

Loading

kazuho commented May 4, 2018 •

edited

Loading

kazuho commented May 7, 2018 •

edited

Loading