Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keystores and key derivation #1361

Closed
wants to merge 40 commits into from
Closed

Keystores and key derivation #1361

wants to merge 40 commits into from

Conversation

CarlBeek
Copy link
Contributor

@CarlBeek CarlBeek commented Aug 14, 2019

Proposal for keystores and key derivation for Eth2.0 and BLS12-381-supporting projects. It is designed to be maximally simple and relies on as few crypto assumptions as possible.

The specification can be broken down into two parts:

  • Keystores: For the storage and exchange of private keys
  • Key Derivation: Which describes how to take a Mnemonic and turn it into a practically limitless number of private keys

The following tasks need to be completed before merge of this PR:

  • Test Vectors
  • README.md explaining how everything is laid out and what each component does.
  • Decide on how to derive Lamport signature pubkey from key_tree

* dev:
  MAX_PERSISTENT_COMMITTEE_SIZE -> TARGET_PERSISTENT_COMMITTEE_SIZE
  Update specs/networking/p2p-interface.md
  Minor corrections and clarifications to the network specification
* dev:
  Update specs/simple-serialize.md
  Add summaries and expansions to simple-serialize.md
@CarlBeek
Copy link
Contributor Author

I would specifically appreciate feedback on the key derivation mechanism defined by derive_child_privkey() in specs/keys/key_derivation/tree_kdf.md. In the non-hardened case, the goal is to have a pubkey that, when given extra information, allows for the derivation of any of the children without leaking information about the privkey. The current strategy is as follows (for a key at index i):

a = int(hash(seed))
b = int(hash(a))
privkey = (a + b*i) mod q

This means that a public keys can be derived for an arbitrary i given a*P and b*P. Furthermore, knowledge of privkey_i, a*P, and b*Pdoes not leak any information about privkey_j (with j !=i).

Can I please get a sanity check here?

@CarlBeek
Copy link
Contributor Author

CarlBeek commented Aug 15, 2019

a = int(hash(seed))
b = int(hash(a))
privkey = (a + b*i) mod q

Just realised that (a + b*i) mod q is not a great strategy because in the case 1=0, privkey = a and therefore knowledge of the 0th privkey grants knowledge of all the sibling privkeys.

A simple solution is to use:

privkey = (a + b + b*i) mod q

Also note that because this is a linear combination, knowledge of privkeys at two different is is sufficient to calculate all the other siblings.

specs/keys/key_derivation/path.md Outdated Show resolved Hide resolved
specs/keys/key_derivation/tree_kdf.md Outdated Show resolved Hide resolved
specs/keys/keystore.md Show resolved Hide resolved
Co-Authored-By: Diederik Loerakker <proto@protolambda.com>
@dankrad
Copy link
Contributor

dankrad commented Aug 15, 2019

a = int(hash(seed))
b = int(hash(a))
privkey = (a + b*i) mod q

Just realised that (a + b*i) mod q is not a great strategy because in the case 1=0, privkey = a and therefore knowledge of the 0th privkey grants knowledge of all the sibling privkeys.

A simple solution is to use:

privkey = (a + b + b*i) mod q

Also note that because this is a linear combination, knowledge of privkeys at two different is is sufficient to calculate all the other siblings.

Can't you just do hash(seed, i) to generate the i-th subkey? That seems to be the standard thing. I think any linear scheme leads to some form of relation between the keys.

@CarlBeek
Copy link
Contributor Author

@dankrad Thanks for taking a look.

Can't you just do hash(seed, i) to generate the i-th subkey? That seems to be the standard thing.

That works for the hardened case (and is basically what is done in these specs), but in the non-hardened case it doesn't give the property that it is possible for someone else to derive a public key for an arbitrary i non-interactively. The idea here is for exchanges/ shops to allow customers to derive a pubkey locally and have the private key be derivable later for withdrawal by the service provider.

I think any linear scheme leads to some form of relation between the keys.

Yup, the bilinear paring ensures that this is the case, unfortunately.

Copy link
Collaborator

@mkalinin mkalinin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale behind changes in key derivation wrt BIP-32?

If this is done in order to utilize KeyGen function from a draft BLS standard then I would suggest another way of doing that.

Replace:
The returned child key ki is parse256(IL) + kpar (mod n).

With:
The returned child key ki is hkdf_mod_n(IL) + kpar (mod n).

Similar replacement for public key derivation:
The returned child key Ki is point(hkdf_mod_n(IL)) + Kpar.

If we'd like to use the other way of key derivation according to some security or other kind of considerations then could you, please, elaborate on that.

specs/keys/eth2.md Outdated Show resolved Hide resolved

This document is the same as BIP39 save for the following exceptions:

* HMAC-SHA512 is replaced with HKDF-SHA256
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one occurrence of HMAC-SHA512 in BIP39 which says that it's used as a PRNG function for PBKDF2.

If we use script which specification is tightly coupled with PBKDF2-HMAC-SHA-256 then where this replacement does take place?

specs/keys/key_derivation/tree_kdf.md Outdated Show resolved Hide resolved
Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the format. A JSON-Schema for file format would be a really nice way to make it precise. Otherwise, a clear and exact description of the file structure will go a long way.

@vbuterin
Copy link
Contributor

What's the value in making changes from the base BIP39 format? Are the security and simplicity gains worth deviating and so requiring development of new libraries instead of just allowing use of existing ones?

Also, could be interesting to specify a non-BIP39-derived alternative standard for deriving signing keys from withdrawal keys; even something like k + hash(k) would work.

@pipermerriam
Copy link
Member

What's the value in making changes from the base BIP39 format?

I had the same thoughts. Diverging from BIP39 seems like something we should have a really good reason for doing since it's quite a mature and broadly adopted standard.

@CarlBeek
Copy link
Contributor Author

On the deviation from BIP39

As @pipermerriam and @vbuterin mention, BIP39 is a well adopted standard with lots of tooling surrounding it. As a map from a mnemonic to a 32-byte seed, it fulfills its requirements effectively. There are a few niggles I have against it:

  • Support for too many mnemonic lengths
  • PBKDF2 is not memory-hardened and can be dramatically sped up with ASICS/GPUs (it is repeated SHA256 at its core)
  • I am trying to use as few constructions as possible

On the deviation from BIP32

BIP32 needs to be changed due to the decreased sub-group size of BLS12-381. BIP32 takes the approach of ignoring keys that fall outside of the curve order (2**-127 chance of happening) which is infeasible for BLS12-381 as ~54.7% of random 32-byte values are greater than the sub-group. Furthermore, the use of chain codes to derive values is an unnecessary complication. BIP32 also uses the string "BITCOIN_SEED" as a part of the derivation of the master node which irks me.

In general:

Using BIP32, 39, 44, and Eth1 keystores results in using:

  • PBKDF2
  • HMAC-SHA512
  • SHA256
  • scyrpt
  • AES-CTR-128
  • Keccak 256

The new constructions in this PR reduce that to:

  • scrypt (This implicitly requires SHA256, HMAC-SHA256 and can be defined using PBKDF2, but can be treated as a black-box for most purposes)
  • SHA256

The idea is for this standard to be simple to understand and implement, both in an Eth2 context and for the greater BLS-using blockchain community and I think one of the best ways to do this is to reduce the number of crypto constructions one needs to find implementations of in order to build out the standard.

Also, could be interesting to specify a non-BIP39-derived alternative standard for deriving signing keys from withdrawal keys; even something like k + hash(k) would work.

(Assuming you mean BIP32, @vbuterin:) While this is simple and would work, I think it is preferable to avoid having another way of deriving keys as that would require yet another standard to implement.


## Validator withdrawal and signing keys

A validator's withdrawal and signing keys are elements of the key-tree as described below. They are designed to be stored in [keystore files](./eth2.md) with the idea being that clients need only concern themselves with ingesting a signing-key keystore and that this sufficient for a to launch a validator instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A validator's withdrawal and signing keys are elements of the key-tree as described below. They are designed to be stored in [keystore files](./eth2.md) with the idea being that clients need only concern themselves with ingesting a signing-key keystore and that this sufficient for a to launch a validator instance.
A validator's withdrawal and signing keys are elements of the key-tree as described below. They are designed to be stored in [keystore files](./eth2.md) with the idea being that clients need only concern themselves with ingesting a signing-key keystore and that this is sufficient to launch a validator instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about enshrining signing and withdrawal keys to come from the same keystore file. In the normal secure case, the active keys would be generated from some hot keystore, whereas the withdrawal key would be generated offline, security in a cold keystore. I'm worried that deriving them from the same source promotes the keystore (and source of withdrawal keys) being accessible in an online computer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I was envisioning handling this problem is as follows:

  • There is a single mnemonic underlying both the withdrawal keys and signing keys, knowing this mnemonic provides access to all keys.
  • There is a single keystore file which also retains access to all the keys as it contains the master key from the tree. (Users don't necessarily need to generate or store this file if they feel safer just holding onto the mnemonics)
  • Each validator instance gets its own keystore file which contains the the signing key for just that validator instance.

The above provides separation into a cold keystore and hot keystores, but comes at the cost of needing one file per validator. This could be seen as it is easier to ensure separation between instances when they use different signing files than when they are issued just an index in the key tree.

  • Pros:
    • Knowledge of withdrawal key yields signing key
  • Cons:
    • 1 keystore per validator instance for the signing keys.

Alternatives

The alternatives, as far as I can tell, are:

  • Using the same base mnemonic but appending some (standardised) bytes to each to separate signing and withdrawal key-trees. This would mean that a single mnemonic produces both the signing and withdrawal master nodes in their respective key-trees which could be stored in their own keystores.

    • Pros:
      • Single mnemonic
      • Hard separation between signing & withdrawal keys
    • Cons:
      • Signing keys cannot be derived from withdrawal keys. The mnemonic becomes the only relation between the two key types.
      • It will be harder to get other projects involved in the BLS standardisation effort to use this derivation standard with such application specific machinery built in.
  • Having signing and and withdrawal keytrees be sub-trees in the same greater keytree. This is a family of solutions, the best variant of which (that I can see), has the root of the signing sub-tree be a sibling of the withdrawal keys (with its index offset by a constant). This allows for a single tree-node to provide the all signing and withdrawal keys while offering separable signing keys which don't leak any information about the withdrawal keys.

    • Pros:
      • Single withdrawal keystore yields all signing and withdrawal keys
      • Separable hot keystore for signing keys.
      • Uses key-tree as intended - for the separation of keys for different purposes.
    • Cons:
      • Knowledge of a withdrawal key does not provide knowledge of the corresponding signing key, knowledge of the parent node is required instead.


## Deriving the seed from the mnemonic

The seed is derived from the mnemonic and is used as the building point for obtaining more keys. The seed is designed to be the source of entropy for the master node in the [tree KDF](./tree_kdf.md). The seed is derived by passing the mnemonic and the password `"mnemonic" + password` (where `password=''` if the users does not supply one) into the scrypt key derivation function (defined in [RFC 7914](https://tools.ietf.org/html/rfc7914)). The mnemonic and password are both encoded in Unicode NFKD format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to look up the "mnemonic" + password thing in BIP39 and it's there! So weird... Do you know the value of concatenating with "mnemomic" here? Is it because scrypt can't handle a empty string as salt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure, the reason I included it was its use in BIP39. The best explanation I can think of is that if no input is supplied to the KDF, then some libraries autogenerate a salt and that there may be some confusion between salt=None and salt="".

specs/keys/eth2.md Outdated Show resolved Hide resolved
specs/keys/eth2.md Outdated Show resolved Hide resolved
specs/keys/key_derivation/path.md Outdated Show resolved Hide resolved

## Hardened keys

This specification provides the functionality of both *non-hardened* and *hardened* keys. A hardened key has the property that given a the parent public key and the siblings of the desired child, it is not possible to derive any information about the child key. Hardened keys should be considered the default key type and should be used unless there is an explicit reason not to do so. Hardened keys are defined as all keys with index `i >= 2**31`. For ease of notation, hardened keys are indicated with a `'` where `i'` means the key at index `i + 2**31`, thus `m / 0 / 42'` should be parsed as `m / 0 / 4294967338`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the parent public key and the siblings of the desired child

Should this read "given the parent public key and the pubkeys of the siblings of the desired child"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the hardened key logic. They are still related by a relatively small factor. I mean 2**32 keys are easy to try. That's only 4 billion EC operations to find the parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this read "given the parent public key and the pubkeys of the siblings of the desired child"

It's actually a stronger assumption, knowledge of the parent pubkey and sibling priv and pubkeys is assumed. (I clarified this in a recent commit.)

I don't understand the hardened key logic. They are still related by a relatively small factor. I mean 2**32 keys are easy to try. That's only 4 billion EC operations to find the parent.

But even knowledge of all 2**31 hardened key-pairs will still not leak any information about the parent. Hardened keys provide the property that knowing ((sk_i, pk_i) ∀ 2**31 ≥ i > 2**32, i ≠ j ) and pk_parent, yields no information about (sk_j, pk_j) nor sk_parent. (Assuming SHA256 and ECDLP)

```python
def bytes_to_privkey(ikm: bytes) -> int:
okm = hkdf(master=ikm, salt="BLS-SIG-KEYGEN-SALT-", key_len=48, hashmod=sha256)
return int.from_bytes(okm, byteorder=big) % curve_order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, does modulo the curve_order successfully remove the modulo bias?
I was under the impression it did not, but spec seems to say otherwise..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this derivation function does suffer from modulo bias and initially I took issue with this too, but the bias is very very small here. The trick lies in the use of 48 bytes which means the output space of the KDF is >2**129 times larger than the space of integers less than the curve order and thus the bias is vanishingly small.

specs/keys/keystore.md Outdated Show resolved Hide resolved

## Definition

Private key is obtained by taking the bitwise XOR of the `ciphertext` and the `derived_key`. The `derived_key` is obtained by running scrypt with the user-provided password and the `scryptparams` obtained from within the keystore file as parameters. If a keystore file is being generated for the first time, the `salt` KDF parameter must be obtained from a CSPRNG. The `ciphertext` is simply read from the keystore file. The length of the `ciphertext` and the output key length of scrypt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of the ciphertext and the output key length of scrypt.

What do you mean here? Seems like this sentence is missing something

specs/keys/keystore.md Outdated Show resolved Hide resolved

**Why use scrypt over PBKPRF2?**\

scyrpt and PBKPRF2 both rely on the security of their underlying hash-function for their safety (SHA256), however scrypt additionally provides memory hardness. The benefit of this is greater ASIC resistance meaning brute-force attacks against scyrpt are generally slower and harder.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean PBKDF2 https://en.wikipedia.org/wiki/PBKDF2?

Also you saying that scrypt has greater ASIC resistance, but ASICs which can bruteforce scrypt present on market for years. For example this is the latest one https://cryptomining-blog.com/10810-new-innosilicon-a6-ltc-master-2-2-ghs-scrypt-asic-miner/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean PBKDF2 https://en.wikipedia.org/wiki/PBKDF2?

Yup, definitely didn't mean "PBKPRF2". 😆

Also you saying that scrypt has greater ASIC resistance, but ASICs which can bruteforce scrypt present on market for years. For example this is the latest one https://cryptomining-blog.com/10810-new-innosilicon-a6-ltc-master-2-2-ghs-scrypt-asic-miner/.

Indeed, scrypt ASICS have been around for years and they have dramatically increased scrypt hashrates. That said, scrypt should retain a lower throughput than PBKDF2 in spite of the presence of ASICS & FPGAs because of its exponential memory blow up. I am not arguing that scrypt is ASIC-proof only that it offers better resistance than PBKDF2.

@cheatfate
Copy link

Eth1 keystore is much more flexible then proposed here. Currently Eth1 keystore supports 2 KDF algorithms (scrypt and pbkdf2) but can be easily extended to use more modern KDF algorithms (for example Argon2).

Eth1 keystore also supports various PRF schemes (most often used HMAC-SHA256), but it can be easily extended with more digest algorithms and block sizes.

While in this specification we are locked to use only scrypt and SHA256. scrypt is already ASIC friendly KDF, and current bruteforce speed is 2.2GH/s.

I dont think its a good idea to put all the eggs (KDF, PRF) into one basket of (scrypt, SHA256), maybe it is better to take key features from Eth1 keystore and just add more schemes (Argon2 as KDF, and some digests for PRF)?

* dev: (94 commits)
  Added misc beacon chain updates to ToC
  explicitly cast to bool
  fix tests
  remove bad length checks from process attestation; ensure committee count and committee size not equal
  fix linter error
  Shard slot -> slot for PHASE_1_FORK_SLOT part2
  Shard slot -> slot for PHASE_1_FORK_SLOT
  Update specs/core/1_beacon-chain-misc.md
  Update specs/core/1_beacon-chain-misc.md
  Update sync_protocol.md
  lint
  Update specs/core/1_beacon-chain-misc.md
  Update specs/core/1_beacon-chain-misc.md
  Use `get_previous_power_of_two` from merkle proofs spec
  `MINOR_REWARD_QUOTIENT` for rewarding the proposer for including shard receipt proof
  Update specs/core/0_beacon-chain.md
  Update specs/core/0_beacon-chain.md
  Persistent -> period, process_shard_receipt: add _proofs
  Fix ToC
  remove Optional None from get_generalized_index. instead throw
  ...
@CarlBeek
Copy link
Contributor Author

CarlBeek commented Sep 3, 2019

The non-hardened keys are not the problem. What I don't understand is the hardened keys -- if they are related to eachother at a security level of just 2**32, then calling them hardened might be misleading.

The level of security is still the 126 bits provided by the BLS 12-381 scheme. While there are only 2**31 hardened child-keys at a level, they are indistinguishable from a random integer mod the curve order. There is no way to enumerate the hardened keys without knowing the parent sk.

So I am suggesting: Keystore generates secret s and t. It gives s*G and t to the webserver. Then the webserver can generate an arbitrary number of cryptographically secure keys using prf(t, n) * s * G. The keystore knows the private keys to all of these because it knows s and t.

I dont see how this is different from what is being proposed here (aside from the use of + instead of *). If you define your prf(T, n) to be (n + 1) * T then your solution is basically what is done by derive_child_privkey().

In the websever scenario, the keystore generates s and t (what I call parent_hash and parent_double_hash) and the server is given s * G and t * G.

@dankrad
Copy link
Contributor

dankrad commented Sep 3, 2019

In the websever scenario, the keystore generates s and t (what I call parent_hash and parent_double_hash) and the server is given s * G and t * G.

The problem is that given three points, you can figure out sG and tG using a number of guesses that is much smaller than the security parameter. If A = sG + a * tG, B = sG + b * tG, C = sG + c * tG, then (A - B) / (a - b) == (A - C) / (a - c). Then you can start guessing a-b and a-c and when you've found a match you've got tG. That takes (232)2 = 264 guesses. Then another 232 guesses you've got sG as well.

And if you have more points, I think you can probably do much more efficient things, but I would need to think about those. I think a PRF seems like the safer option.

@cheatfate
Copy link

Eth1 keystore is much more flexible then proposed here. Currently Eth1 keystore supports 2 KDF algorithms (scrypt and pbkdf2) but can be easily extended to use more modern KDF algorithms (for example Argon2).
Eth1 keystore also supports various PRF schemes (most often used HMAC-SHA256), but it can be easily extended with more digest algorithms and block sizes.

Eth1 keystores are definitely more flexible, which is very nice from a customisability and future proofing standpoint. On the flip side, however, this customisability dramatically increases implementation complexity as many variants need to be supported in order to be compliant and the first item on the Eth2.0 design principals is "to minimize complexity, even at the cost of some losses in efficiency". Annother source of complexity is the design of the Eth1 keystores themselves, there interacting parts even at a high level.

What about security reasons? What if proposed algorithm will be declared as vulnerable or not reliable. Billions of keystores will be vulnerable.

If you have "choice" you can just simply switch from one scheme to another (from scrypt to argon2 or pbkdf2). But what happens if you do not have "choice" and you are validator.

@pipermerriam
Copy link
Member

My stance after having spoken with @CarlBeek about this a bit.

  1. The deviation from the BIP32 spec for HD wallets appears necessary and justified. 👍
  2. The deviation from the BIP32 mnemonic spec doesn't seem necessary. 👎
  3. I'm supportive of the latest v4 keystore format and am in favor of supporting both scrypt and pbkdf2: 👍

My review of these is high level and shouldn't be considered as having evaluated the specs for correctness or any deep evaluation of the security. It might be appropriate to look into including these in bug bounties or something to incentivize external audits.

* dev: (25 commits)
  Update README.md
  Update README.md
  Update sync_protocol.md
  Update sync_protocol.md
  Update sync_protocol.md
  Deposit contract fixes (#1362)
  fix minor testing bug
  Update specs/networking/p2p-interface.md
  add note on local aggregation for interop
  Fix ssz-generic bitvector tests: those invalid cases of higher length than type describes, but same byte size, should have at least 1 bit set in the overflow to be invalid
  minor formatting
  Minor corrections and clarifications to the network specification
  doc standardization for networking spec (#1338)
  discuss length-prefixing pro/con, consider for removal, add link
  cleanups
  Updates
  apply more editorial suggestions.
  apply editorial suggestions.
  fmt.
  document doctoc command for posterity.
  ...
@CarlBeek
Copy link
Contributor Author

CarlBeek commented Oct 7, 2019

Deprecating this PR in favour of the BLS key standards repo. The reason for this move is that the key store-age and derivation standards are becoming inter chain standards with (hopefully) a similar level of adoption to the BLS standardisation efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants