Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include epoch in the AAD or the nonce? #3661

Closed
ekr opened this issue May 15, 2020 · 15 comments
Closed

Include epoch in the AAD or the nonce? #3661

ekr opened this issue May 15, 2020 · 15 comments
Labels
-tls design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@ekr
Copy link
Collaborator

ekr commented May 15, 2020

Over in TLS we have been discussing the data which is included in the AEAD transform.

  1. Some parts of the context for encryption (epoch, length, etc.) are implicit.

  2. The AD only covers the bits on the wire.

This intuitively seems like it ought to be OK because these values do affect how the protection is done, but only implicitly. For instance, length delineates the data, the packet number/record sequence number controls the nonce, etc., but nevertheless only some of these values are in the AD. It turns out that there is a bit of a gap between analysis we have of this so far and the
current state, and specifically we don't have analysis of the epoch, which is sent only partially in both QUIC and DTLS [0]. Note that QUIC is actually worse than DTLS here because in DTLS the epoch is folded into the sequence number used to make the nonce, but in QUIC the nonce is just the packet number. This issue is intended to raise the question of if we should revisit that, for instance by putting the epoch in the AD.

[0] https://mailarchive.ietf.org/arch/msg/tls/Ri33zZmFNb1kZ4HToDHs9cDDTPY/

@ekr
Copy link
Collaborator Author

ekr commented May 16, 2020

@kazuho
Copy link
Member

kazuho commented May 17, 2020

In case of QUIC, I wonder if epoch is included already in AAD, as the packet type in the first byte.

@martinthomson
Copy link
Member

As Kazuho says, the short value is covered by the AAD; it's only the full value that we need to worry about.

This is only an intuition, but my expectation is that the analysis offered in https://www.felixguenther.info/docs/QUIPS2020_RobustChannels.pdf could extend to epochs. That is, the packet contains the key phase, which is analogous to the packet number. Endpoints only accept packets within a very narrow window. For packet numbers, this is determined by the length of the packet number encoding in that analysis; for epochs, this would be the two epochs you keep live, which matches the length of the key phase. I can make an argument that keeping the number of keys less than the number of epoch values is no worse than one key and no encoded epoch values.

That's not formal though; there might be a trick needed to resolve the odd loop where the value you are relying on is authenticated by the key that it identifies (I don't see that question being addressed in the referenced thread).

The question for me is whether you believe that this requires formal analysis. Obviously it would be great if someone could find a way to subject this to more formal analysis, but we need to determine how much of a deal-breaker this could be.

Folks in TLS seem to be pushing toward adding more to the AAD, but that doesn't seem to be well supported by anything other than an excess of caution in the absence of any real analysis. I don't think we should change something without formal analysis supporting additional measures.

@LPardue LPardue added this to Triage in Late Stage Processing May 18, 2020
@ekr
Copy link
Collaborator Author

ekr commented May 19, 2020

I think where TLS is going to land is with no change, because the nonce includes the full epoch. I have the same intuitions as you, but as you say they aren't formal.

With that said, as you observed in our call today, it seems like there is an easy fix: both AES_GCM and ChaCha20/Poly1305 take a 96-bit nonce which we are just left-padding with 0s. If we are willing to restrict to <2^32 key changes, we could encode the epoch in those 0s, right?

@martinthomson
Copy link
Member

Given the limits we have on usage, 2^32 would mean exhausting epochs before we exhaust packet numbers. Both are pretty large values, and maybe we can avoid that limit for some future AEAD. For AES, that means 2^56.5, which equates to roughly 1e20 bytes (100 exabytes) at a 1400 byte MTU, which seems like it would be tolerable.

That said, this is a fairly fundamental change with fairly thin rationale. If people don't want to do this, then I'm not inclined to push very hard for a change.

@LPardue LPardue added the design An issue that affects the design of the protocol; resolution requires consensus. label May 19, 2020
@project-bot project-bot bot moved this from Triage to Design Issues in Late Stage Processing May 19, 2020
@kazuho
Copy link
Member

kazuho commented Jun 2, 2020

As as implementer, I hope that this issue will be closed with no action, assuming that we do not have strong reason for making the change.

Up until now, we have tried hard to have minimum divergence in the record encryption algorithms being used in QUIC and TLS. We use HKDF and AEAD the same way, with the only difference being the labels. The proposed change would be a departure, as it means that the AEAD layer of QUIC would no longer be using a sequence number, but rather a tuple of epoch and a sequence number.

It does break the existing AEAD API of picotls, as it assumes the use of a 64-bit sequence number (this might have been a shortsighted decision, but anyways).

@ekr

With that said, as you observed in our call today, it seems like there is an easy fix: both AES_GCM and ChaCha20/Poly1305 take a 96-bit nonce which we are just left-padding with 0s. If we are willing to restrict to <2^32 key changes, we could encode the epoch in those 0s, right?

Am I correct to understand that this is effectively the same as XORing the first four bytes of IV with the epoch? I think that is the case because the moment an endpoint changes the send epoch is when it moves to a new set of key and IV.

Based on that view, I wonder what the benefit is by changing IV from HKDFepoch to HKDFepoch XOR epoch is. I'm eager to admit that I do not understand the problem, but I would assume that, when applying HKDF multiple times, we do need to retain a counter counting how many times HKDF was applied, and then mix that into the output.

@martinthomson
Copy link
Member

Kazuho, you have the gist of it. There is no material performance impact. The base nonce that you xor the packet number with is still constant. But the theory would be that you don't get confusion about which epoch was used. The counter-argument being that the keys (and the nonce) should be completely different anyway.

As you can see, the concern is that there is a risk inherent.

As for divergences, it's not divergent from DTLS, which is where this is coming from. Because the analytical framework for the record/packet layer in DTLS is more similar to QUIC, that makes this somewhat more compelling.

All that said, I'm inclined toward no action myself, but I want that to have a firmer basis than we currently have. I am prepared to accept that we won't get that though, in which case we will just have to make a call.

@ianswett
Copy link
Contributor

ianswett commented Jun 2, 2020

Without evidence this is a problem, I strongly prefer the status quo which aligns more closely with TLS.

If we get new information, we can always fix this in QUICv2.

@fxguenther
Copy link
Contributor

As already mentioned, there's little analytical support regarding key updates in QUIC (and DTLS), including our analysis. As stated for DTLS, my understanding is that some form of authentication of the epoch number would be helpful for a clean analysis.

My thought is: with QUIC's current approach, you could in principle have that the key+IV across two epochs collide, which then would mean you can replay packets across those two epochs. If the epoch is authenticated through the AAD, that's prevented. With XOR-ing into the nonce it's more messy, as now you could have a colliding IV 'cancel out' the XOR (the same affects DTLS, actually).

That said, I would expect the concrete loss for key+IV collisions to probably be an additional birthday-bound term in the order of #epochs^2 / 2^{|key|+|IV|}. Here, I am thinking of #epochs = number of epochs in a connection, coming from an/our analysis viewpoint on the full channel. In extending that to multiple keys one would usually want to abstract out key generation and assume key/IV across all epochs to be independent, hence the birthday bound.

Martin pointed out to me that (a) #epochs might be reduced to the number of currently active epochs (2-3), and (b) |IV| might be too much due to the sequence number range, which may lead to nonce collisions between ranges.

So, overall, while having the epochs in the AAD might be cleaner from an analysis point of view, I don't have a strong basis for that and think no-change likely comes with a small loss only in comparison.

@larseggert
Copy link
Member

It sounds like the proposal is to close with no action, or alternatively, to park this issue until new information is available? It would be great if we could resolve this soon, since we are hoping WGLC the next set of documents w/o open design issues.

@ekr
Copy link
Collaborator Author

ekr commented Jun 4, 2020

Thanks @fxguenther. Based on this I am happy to close with no action

@larseggert
Copy link
Member

Great, I will mark this proposal-ready, where the proposal is to close with no action. We'll confirm that together with the WGLC on -29.

@larseggert larseggert added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Jun 4, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Jun 4, 2020
@DavidSchinazi
Copy link
Contributor

I support close with no action based on the analysis in this thread.

@LPardue
Copy link
Member

LPardue commented Jun 9, 2020

Just reiterating that based on the discussion here, the proposal is to close this issue with no action.

@LPardue LPardue added call-issued An issue that the Chairs have issued a Consensus call for. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Jun 10, 2020
@project-bot project-bot bot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Jun 10, 2020
@LPardue LPardue added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed call-issued An issue that the Chairs have issued a Consensus call for. labels Jul 9, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Jul 9, 2020
@LPardue
Copy link
Member

LPardue commented Jul 12, 2020

Closed with no action.

@LPardue LPardue closed this as completed Jul 12, 2020
Late Stage Processing automation moved this from Consensus Declared to Issue Handled Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-tls design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
Late Stage Processing
  
Issue Handled
Development

No branches or pull requests

8 participants