-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keystores and key derivation #1361
Conversation
* dev: MAX_PERSISTENT_COMMITTEE_SIZE -> TARGET_PERSISTENT_COMMITTEE_SIZE Update specs/networking/p2p-interface.md Minor corrections and clarifications to the network specification
* dev: Update specs/simple-serialize.md Add summaries and expansions to simple-serialize.md
I would specifically appreciate feedback on the key derivation mechanism defined by
This means that a public keys can be derived for an arbitrary Can I please get a sanity check here? |
Just realised that A simple solution is to use:
Also note that because this is a linear combination, knowledge of privkeys at two different |
Can't you just do |
@dankrad Thanks for taking a look.
That works for the hardened case (and is basically what is done in these specs), but in the non-hardened case it doesn't give the property that it is possible for someone else to derive a public key for an arbitrary
Yup, the bilinear paring ensures that this is the case, unfortunately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale behind changes in key derivation wrt BIP-32?
If this is done in order to utilize KeyGen function from a draft BLS standard then I would suggest another way of doing that.
Replace: The returned child key ki is parse256(IL) + kpar (mod n). With: The returned child key ki is hkdf_mod_n(IL) + kpar (mod n). Similar replacement for public key derivation: The returned child key Ki is point(hkdf_mod_n(IL)) + Kpar.
If we'd like to use the other way of key derivation according to some security or other kind of considerations then could you, please, elaborate on that.
|
||
This document is the same as BIP39 save for the following exceptions: | ||
|
||
* HMAC-SHA512 is replaced with HKDF-SHA256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on the format. A JSON-Schema for file format would be a really nice way to make it precise. Otherwise, a clear and exact description of the file structure will go a long way.
What's the value in making changes from the base BIP39 format? Are the security and simplicity gains worth deviating and so requiring development of new libraries instead of just allowing use of existing ones? Also, could be interesting to specify a non-BIP39-derived alternative standard for deriving signing keys from withdrawal keys; even something like |
I had the same thoughts. Diverging from BIP39 seems like something we should have a really good reason for doing since it's quite a mature and broadly adopted standard. |
On the deviation from BIP39As @pipermerriam and @vbuterin mention, BIP39 is a well adopted standard with lots of tooling surrounding it. As a map from a mnemonic to a 32-byte seed, it fulfills its requirements effectively. There are a few niggles I have against it:
On the deviation from BIP32BIP32 needs to be changed due to the decreased sub-group size of BLS12-381. BIP32 takes the approach of ignoring keys that fall outside of the curve order (2**-127 chance of happening) which is infeasible for BLS12-381 as ~54.7% of random 32-byte values are greater than the sub-group. Furthermore, the use of chain codes to derive values is an unnecessary complication. BIP32 also uses the string "BITCOIN_SEED" as a part of the derivation of the master node which irks me. In general:Using BIP32, 39, 44, and Eth1 keystores results in using:
The new constructions in this PR reduce that to:
The idea is for this standard to be simple to understand and implement, both in an Eth2 context and for the greater BLS-using blockchain community and I think one of the best ways to do this is to reduce the number of crypto constructions one needs to find implementations of in order to build out the standard.
(Assuming you mean BIP32, @vbuterin:) While this is simple and would work, I think it is preferable to avoid having another way of deriving keys as that would require yet another standard to implement. |
specs/keys/eth2.md
Outdated
|
||
## Validator withdrawal and signing keys | ||
|
||
A validator's withdrawal and signing keys are elements of the key-tree as described below. They are designed to be stored in [keystore files](./eth2.md) with the idea being that clients need only concern themselves with ingesting a signing-key keystore and that this sufficient for a to launch a validator instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A validator's withdrawal and signing keys are elements of the key-tree as described below. They are designed to be stored in [keystore files](./eth2.md) with the idea being that clients need only concern themselves with ingesting a signing-key keystore and that this sufficient for a to launch a validator instance. | |
A validator's withdrawal and signing keys are elements of the key-tree as described below. They are designed to be stored in [keystore files](./eth2.md) with the idea being that clients need only concern themselves with ingesting a signing-key keystore and that this is sufficient to launch a validator instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about enshrining signing and withdrawal keys to come from the same keystore file. In the normal secure case, the active keys would be generated from some hot keystore, whereas the withdrawal key would be generated offline, security in a cold keystore. I'm worried that deriving them from the same source promotes the keystore (and source of withdrawal keys) being accessible in an online computer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I was envisioning handling this problem is as follows:
- There is a single mnemonic underlying both the withdrawal keys and signing keys, knowing this mnemonic provides access to all keys.
- There is a single keystore file which also retains access to all the keys as it contains the master key from the tree. (Users don't necessarily need to generate or store this file if they feel safer just holding onto the mnemonics)
- Each validator instance gets its own keystore file which contains the the signing key for just that validator instance.
The above provides separation into a cold keystore and hot keystores, but comes at the cost of needing one file per validator. This could be seen as it is easier to ensure separation between instances when they use different signing files than when they are issued just an index in the key tree.
- Pros:
- Knowledge of withdrawal key yields signing key
- Cons:
- 1 keystore per validator instance for the signing keys.
Alternatives
The alternatives, as far as I can tell, are:
-
Using the same base mnemonic but appending some (standardised) bytes to each to separate signing and withdrawal key-trees. This would mean that a single mnemonic produces both the signing and withdrawal master nodes in their respective key-trees which could be stored in their own keystores.
- Pros:
- Single mnemonic
- Hard separation between signing & withdrawal keys
- Cons:
- Signing keys cannot be derived from withdrawal keys. The mnemonic becomes the only relation between the two key types.
- It will be harder to get other projects involved in the BLS standardisation effort to use this derivation standard with such application specific machinery built in.
- Pros:
-
Having signing and and withdrawal keytrees be sub-trees in the same greater keytree. This is a family of solutions, the best variant of which (that I can see), has the root of the signing sub-tree be a sibling of the withdrawal keys (with its index offset by a constant). This allows for a single tree-node to provide the all signing and withdrawal keys while offering separable signing keys which don't leak any information about the withdrawal keys.
- Pros:
- Single withdrawal keystore yields all signing and withdrawal keys
- Separable hot keystore for signing keys.
- Uses key-tree as intended - for the separation of keys for different purposes.
- Cons:
- Knowledge of a withdrawal key does not provide knowledge of the corresponding signing key, knowledge of the parent node is required instead.
- Pros:
|
||
## Deriving the seed from the mnemonic | ||
|
||
The seed is derived from the mnemonic and is used as the building point for obtaining more keys. The seed is designed to be the source of entropy for the master node in the [tree KDF](./tree_kdf.md). The seed is derived by passing the mnemonic and the password `"mnemonic" + password` (where `password=''` if the users does not supply one) into the scrypt key derivation function (defined in [RFC 7914](https://tools.ietf.org/html/rfc7914)). The mnemonic and password are both encoded in Unicode NFKD format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to look up the "mnemonic" + password
thing in BIP39 and it's there! So weird... Do you know the value of concatenating with "mnemomic" here? Is it because scrypt
can't handle a empty string as salt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure, the reason I included it was its use in BIP39. The best explanation I can think of is that if no input is supplied to the KDF, then some libraries autogenerate a salt and that there may be some confusion between salt=None
and salt=""
.
|
||
## Hardened keys | ||
|
||
This specification provides the functionality of both *non-hardened* and *hardened* keys. A hardened key has the property that given a the parent public key and the siblings of the desired child, it is not possible to derive any information about the child key. Hardened keys should be considered the default key type and should be used unless there is an explicit reason not to do so. Hardened keys are defined as all keys with index `i >= 2**31`. For ease of notation, hardened keys are indicated with a `'` where `i'` means the key at index `i + 2**31`, thus `m / 0 / 42'` should be parsed as `m / 0 / 4294967338`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the parent public key and the siblings of the desired child
Should this read "given the parent public key and the pubkeys of the siblings of the desired child"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the hardened key logic. They are still related by a relatively small factor. I mean 2**32 keys are easy to try. That's only 4 billion EC operations to find the parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this read "given the parent public key and the pubkeys of the siblings of the desired child"
It's actually a stronger assumption, knowledge of the parent pubkey and sibling priv and pubkeys is assumed. (I clarified this in a recent commit.)
I don't understand the hardened key logic. They are still related by a relatively small factor. I mean 2**32 keys are easy to try. That's only 4 billion EC operations to find the parent.
But even knowledge of all 2**31 hardened key-pairs will still not leak any information about the parent. Hardened keys provide the property that knowing ((sk_i, pk_i) ∀ 2**31 ≥ i > 2**32, i ≠ j ) and pk_parent, yields no information about (sk_j, pk_j) nor sk_parent. (Assuming SHA256 and ECDLP)
```python | ||
def bytes_to_privkey(ikm: bytes) -> int: | ||
okm = hkdf(master=ikm, salt="BLS-SIG-KEYGEN-SALT-", key_len=48, hashmod=sha256) | ||
return int.from_bytes(okm, byteorder=big) % curve_order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, does modulo the curve_order successfully remove the modulo bias?
I was under the impression it did not, but spec seems to say otherwise..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this derivation function does suffer from modulo bias and initially I took issue with this too, but the bias is very very small here. The trick lies in the use of 48 bytes which means the output space of the KDF is >2**129
times larger than the space of integers less than the curve order and thus the bias is vanishingly small.
specs/keys/keystore.md
Outdated
|
||
## Definition | ||
|
||
Private key is obtained by taking the bitwise XOR of the `ciphertext` and the `derived_key`. The `derived_key` is obtained by running scrypt with the user-provided password and the `scryptparams` obtained from within the keystore file as parameters. If a keystore file is being generated for the first time, the `salt` KDF parameter must be obtained from a CSPRNG. The `ciphertext` is simply read from the keystore file. The length of the `ciphertext` and the output key length of scrypt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of the
ciphertext
and the output key length of scrypt.
What do you mean here? Seems like this sentence is missing something
specs/keys/keystore.md
Outdated
|
||
**Why use scrypt over PBKPRF2?**\ | ||
|
||
scyrpt and PBKPRF2 both rely on the security of their underlying hash-function for their safety (SHA256), however scrypt additionally provides memory hardness. The benefit of this is greater ASIC resistance meaning brute-force attacks against scyrpt are generally slower and harder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean PBKDF2 https://en.wikipedia.org/wiki/PBKDF2?
Also you saying that scrypt has greater ASIC resistance, but ASICs which can bruteforce scrypt present on market for years. For example this is the latest one https://cryptomining-blog.com/10810-new-innosilicon-a6-ltc-master-2-2-ghs-scrypt-asic-miner/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean PBKDF2 https://en.wikipedia.org/wiki/PBKDF2?
Yup, definitely didn't mean "PBKPRF2". 😆
Also you saying that scrypt has greater ASIC resistance, but ASICs which can bruteforce scrypt present on market for years. For example this is the latest one https://cryptomining-blog.com/10810-new-innosilicon-a6-ltc-master-2-2-ghs-scrypt-asic-miner/.
Indeed, scrypt ASICS have been around for years and they have dramatically increased scrypt hashrates. That said, scrypt should retain a lower throughput than PBKDF2 in spite of the presence of ASICS & FPGAs because of its exponential memory blow up. I am not arguing that scrypt is ASIC-proof only that it offers better resistance than PBKDF2.
Eth1 keystore is much more flexible then proposed here. Currently Eth1 keystore supports 2 KDF algorithms (scrypt and pbkdf2) but can be easily extended to use more modern KDF algorithms (for example Argon2). Eth1 keystore also supports various PRF schemes (most often used HMAC-SHA256), but it can be easily extended with more digest algorithms and block sizes. While in this specification we are locked to use only scrypt and SHA256. scrypt is already ASIC friendly KDF, and current bruteforce speed is 2.2GH/s. I dont think its a good idea to put all the eggs (KDF, PRF) into one basket of (scrypt, SHA256), maybe it is better to take key features from Eth1 keystore and just add more schemes (Argon2 as KDF, and some digests for PRF)? |
* dev: (94 commits) Added misc beacon chain updates to ToC explicitly cast to bool fix tests remove bad length checks from process attestation; ensure committee count and committee size not equal fix linter error Shard slot -> slot for PHASE_1_FORK_SLOT part2 Shard slot -> slot for PHASE_1_FORK_SLOT Update specs/core/1_beacon-chain-misc.md Update specs/core/1_beacon-chain-misc.md Update sync_protocol.md lint Update specs/core/1_beacon-chain-misc.md Update specs/core/1_beacon-chain-misc.md Use `get_previous_power_of_two` from merkle proofs spec `MINOR_REWARD_QUOTIENT` for rewarding the proposer for including shard receipt proof Update specs/core/0_beacon-chain.md Update specs/core/0_beacon-chain.md Persistent -> period, process_shard_receipt: add _proofs Fix ToC remove Optional None from get_generalized_index. instead throw ...
The level of security is still the 126 bits provided by the BLS 12-381 scheme. While there are only 2**31 hardened child-keys at a level, they are indistinguishable from a random integer mod the curve order. There is no way to enumerate the hardened keys without knowing the parent sk.
I dont see how this is different from what is being proposed here (aside from the use of + instead of *). If you define your prf(T, n) to be (n + 1) * T then your solution is basically what is done by In the websever scenario, the keystore generates s and t (what I call |
The problem is that given three points, you can figure out sG and tG using a number of guesses that is much smaller than the security parameter. If And if you have more points, I think you can probably do much more efficient things, but I would need to think about those. I think a PRF seems like the safer option. |
What about security reasons? What if proposed algorithm will be declared as vulnerable or not reliable. Billions of keystores will be vulnerable. If you have "choice" you can just simply switch from one scheme to another (from |
My stance after having spoken with @CarlBeek about this a bit.
|
* dev: (25 commits) Update README.md Update README.md Update sync_protocol.md Update sync_protocol.md Update sync_protocol.md Deposit contract fixes (#1362) fix minor testing bug Update specs/networking/p2p-interface.md add note on local aggregation for interop Fix ssz-generic bitvector tests: those invalid cases of higher length than type describes, but same byte size, should have at least 1 bit set in the overflow to be invalid minor formatting Minor corrections and clarifications to the network specification doc standardization for networking spec (#1338) discuss length-prefixing pro/con, consider for removal, add link cleanups Updates apply more editorial suggestions. apply editorial suggestions. fmt. document doctoc command for posterity. ...
Deprecating this PR in favour of the BLS key standards repo. The reason for this move is that the key store-age and derivation standards are becoming inter chain standards with (hopefully) a similar level of adoption to the BLS standardisation efforts. |
Proposal for keystores and key derivation for Eth2.0 and BLS12-381-supporting projects. It is designed to be maximally simple and relies on as few crypto assumptions as possible.
The specification can be broken down into two parts:
The following tasks need to be completed before merge of this PR: