mlswg / mls-protocol Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify epoch secret derivation? #325
Comments
|
I think a number of people want to revisit the key schedule:
At this point it would be good to collect all feedback first and do one big change instead of many small ones. |
|
|
Not sure if this is everything, but I know I wanted to:
|
|
Konrad and I have a draft PR for a number of changes, mostly relating to version upgrades (with group branching support considered if we want that feature in the future). Additional: derive authentication secret per-epoch separate from exporter key (i.e. secrets used in and for the protocol should be derived separately from secrets used beyond the protocol). |
|
It seems to me that most of these changes to the key schedule are orthogonal, so maybe everyone should just make a PR for their preferred change? Regarding the changes proposed by @Bren2010, I think we should be careful such that we (1) still get the properties we need from the keys and (2) can prove that we actually do. I'm all for optimizing MLS for ease of implementation, but I think it at least warrants a discussion. My take is that 1. does not impact key pseudorandomness but rather the collision resistance properties of that key (and the keys derived from it). From a proof-standpoint (using state-separating proofs ymmv with other techniques) we need that property at least when we next use the corresponding group key in an HKDF-extract function. If we include the transcript (which includes the groupstate) into the key at some point before, it should be doable as well. That being said, it would be much easier if we could make that argument "up the tree" as opposed to just in the end. So I think we have an "ease-of-implementation" vs. "ease-of-proof" trade-off here. My experience with a proof of the TLS key schedule we did recently is that this sort of step complicates the proof quite a bit (although that probably depends on your proof technique). I'd be interested what the other parties think that are doing (or are going to do) proofs. Regarding 2.: Are we not just using HKDF-expand in most cases? If the output of the hash function used to instantiate the hmac used in the HKDF is equal to the desired length of the key, there should only be one hash operation per call to HKDF-expand and it should not be much of a performance penalty if we include context at those points. Please correct me if I'm wrong. |
I don't understand the argument about collision resistance here. We're including explicit data in the call to HKDF-Expand (particularly, node id and generation) that's implicitly authenticated by the structure of HKDF-Expand calls used to build the tree. The tree's root is given by the key schedule, which is integrating all of the context you need. If the hash function is weak, how would including more data in its input mitigate an attack?
The HKDF-Expand is one HMAC call, and HMAC is two hashes. HMAC would be three hashes if the key were a different length. |
My impression was that you wanted to remove the node-id from the HKDF-expand call. Is that not the case? Again, I'm not saying it's impossible to prove uniqueness of the resulting group key if we remove the node-id. I'm just saying it might make it harder to prove (just as leaving them in might make it harder to implement).
It's not about the properties of the hash function. It's about ensuring that the output key is distinct from the keys derived in the other nodes. If we assume that the hash function is ideal with regard to collision resistance, the input uniquely determines the output. Thus, if we include the unique node id, we get a unique key, even if the rest of the input is potentially the same. This allows us to assume that the output key is unique, even if the adversary chooses the rest of the input.
Ah, I was under the impression that HMAC was just one hash operation. In any case, I'm not an expert regarding properties of hash functions, but my understanding from [1] is that simple hashing is not advisable as a KDF, even if the input is already pseudorandom. Are there any further results on this? I'm happy to go to just a simple hash, but we need to be sure that we actually get good keys as a result. [1] Krawczyk, Cryptographic Extraction and Key Derivation: |
Yes, that's what I want to do. I want to remove
If the output key for different nodes is the same, that clearly implies you've found a collision in the hash function, so the hash function is weak. That's because you are hashing different things at every point, in particular,
We can generate the bulk of the tree with a hash, and still use HKDF for the chain at the leaves of the ASTree. But perhaps just one call to HKDF-Extract to generate both the key and nonce at once. |
But if we're not including that, then how is that "implicitly authenticated by the structure of HKDF-Expand calls used to build the tree" as you said in your earlier reply?
I'm talking about arguments in the context of a proof, where the goal is to figure out if we get good keys in the end, under a set of assumptions. One of the assumptions is that we can replace the hash function we use by an ideal one, without an adversary "noticing the difference". It would make the proof easier if we can assume that the output of the key derivation step doesn't collide even if the adversary can control the input key material. Then the inclusion of the node-specific label would give us that guarantee. We might be able to do the proof iteratively across the layers of the tree, but I'm not certain about that. That is why I wouldn't want to throw out that extra information in the key derivation step. I'm happy to explain our proof methodology in more detail to make clear what I'm worried about, but I'm not going to do that in a git issue.
I'm not sure that works, because for HKDF-Extract to give you a good (pseudorandom) key, you need the input to be pseudorandom. And I'm not sure there is solid basis to assume that the output of a hash function is indeed a pseudorandom string. If there was, I'd be happy to agree to just replace all HKDF-Expand calls with hash function calls. |
A hash is definitely not enough for a key derivation. Just a quick look at the research should be enough to make that statement convincing. There is a reason why we use a KDF for key derivation to achieve sufficient pseudorandomness, and switching it out with a hash as a quick fix will be asking for problems down the line. If we use just a hash, then the output should NOT be called a secret or used as a key, as a general rule (exceptions depending on input). As to removing context from the key derivation: I think that there is an underlying assumption being presented in the argument for removal that the remaining inputs are unique and tied sufficiently to individual nodes already. Hopefully this is the case, but there is no guarantee. Consequently the result of removing them is not simply that proofs may become more difficult, but that the ensuing reductions (for computational proofs) may become less tight. In (quite loose) terms, this means that there are more conditions on the contexts in which security is achieved, and increased possibilities for breaking it. |
Take for example, the chain at each leaf of the ASTree. We include a
People have been complaining about the difficulty of proving properties of the key schedule the whole time the wg has existed, trying to make it more expensive and materially less secure, to simplify proofs. In reality, it's one of the few "obviously secure" parts of the protocol and there are still vulnerabilities elsewhere. Proof efforts should be directed to parts of the protocol that are actually difficult for humans to reason about.
I think it's very fair to assume that a secure hash function is a random oracle. HKDF is just the output of a hash. We would want to keep HKDF-Expand at the leaves for pragmatic reasons: infinite stream derivation (because key and nonce size are variable), and safely combining the label ("key", "nonce", "secret") with the IKM. I believe Britta's comments are addressed above. |
I agree with the assessment that there are many potential vulnerabilities elsewhere that need focus. However, the above statement not only lacks core reasoning on the problem, but indicates misunderstanding of analysis methods and results. You are making an ad hominem argument for ignoring what could have serious implications. See my comment above about tightness of security reductions, which you did not address. Tightness is not about ease of proof, it is about what claims can be proven. Your changes will likely have an effect on that.
OK, but a decade of cryptographic research disagrees with that general statement. Making wide claims about what you think should be assumed does not progress the decision; it would be helpful if you founded your argument instead on the what the exact inputs are to the different hash uses (and there is a difference based on node height) and explain why you think there is sufficient randomness in the input to not use a KDF. No one so far has said your suggestion is wrong, only that you have not put forth any supporting arguments, i.e. Konrad's comment:
It is in everyone's interest to progress with decisions as scientifically as possible, with descriptive, shared reasoning behind changes - especially ones that have the potential to affect the randomness of keys - and without taking requests for justification personally. Given the extended discussion on this issue, I suggest that it be moved to an interim meeting where it can be given appropriate attention. Engaging further in a git issue thread on the topic is unlikely to achieve satisfactory results, and it clearly warrants further attention. |
I'm saying we should de-prioritize provable security in the key schedule for three reasons:
This way, we can make the key schedule much faster and easier to implement. |
It doesn't seem like the
Derive-Secretstep between theearly_secretand theepoch_secretis helping anything. Could we remove it?The text was updated successfully, but these errors were encountered: