Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended mnemonic codes #602

Closed
wants to merge 7 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions EIPS/eip-draft-extended-mnemonics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
eip: <to be assigned>
title: Extended mnemonic codes
author: Nick Johnson <nick@ethereum.org>, Micah Zoltu (@MicahZoltu)
discussions-to: <URL>
status: Draft
type: Standards Track
category: ERC
created: 2017-04-13
requires: eip-draft-hdwallets
---

## Abstract

This EIP specifies a method for generating mnemonic codes based on [BIP39](https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki). Mnemonic codes provide an easy to remember or transcribe sequence of words that act as a seed for a deterministic wallets.

This EIP extends BIP39 by providing additional metadata for applications. This metadata makes new mnemonic-based applications possible, while also helping to standardise derivation paths in Ethereum-based applications.

## Motivation

BIP39 defines a method for generating mnemonic codes that are easily written down and entered, and for deriving HD wallet seeds from these mnemonic codes. This allows for easy backup and recovery of cryptographic keys, eliminating user error when recording and entering them.

Several Ethereum wallets have adopted BIP39 mnemonics for use on the Ethereum platform, as have some other products such as paper wallets. As a rule, adopting BIP39 without changes works well, but there are two shortcomings:

- Confusion over derivation paths for the resulting HD wallets, as described in eip-draft-hdwallets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a link to eip-draft-hdwallets here? It's this: #601 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's #601. I can't easily link to it since it's currently only in my repo.

- No extension mechanism to support new kinds of mnemonics.

Examples of new kinds of mnemonics that would be enabled by the provision of an extension mechanism include multi-part mnemonics based on secret sharing schemes, and trustless encrypted paper wallets similar to those described in [BIP38](https://github.com/bitcoin/bips/blob/master/bip-0038.mediawiki), derived from mnemonics instead of binary data.

The specification consists of two parts: generating the mnemonic, and converting it into a binary seed. This seed can be later used to generate deterministic wallets using BIP-0032 or similar methods.

## Specification

Mnemonics must encode entropy in a multiple of 32 bits. With more entropy security is improved but the sentence length increases.

### Encoding standard (BIP39) mnemonics

First, an initial entropy of ENT bits is generated. The allowable length of ENT is from 128 to 256 bits. A checksum is generated by taking the first `ENT / 32` bits of its SHA256 hash. This checksum is appended to the end of the initial entropy. Next, these concatenated bits are split into groups of 11 bits, each encoding a number from 0-2047, serving as an index into a wordlist. Finally, we convert these numbers into words and use the joined words as a mnemonic sentence.

The following table describes the relation between the initial entropy length (ENT), the checksum length (CS) and the length of the generated mnemonic sentence (MS) in words.

```
CS = ENT / 32
MS = (ENT + CS) / 11
```

| ENT | CS | ENT+CS | MS |
| ----- | -- | ------ | ---- |
| 128 | 4 | 132 | 12 |
| 160 | 5 | 165 | 15 |
| 192 | 6 | 198 | 18 |
| 224 | 7 | 231 | 21 |
| 256 | 8 | 264 | 24 |

### Generating seeds from standard (BIP39) mnemonics

To create a binary seed from the mnemonic, we use the PBKDF2 function with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 bits (= 64 bytes).

This seed can be later used to generate deterministic wallets using BIP-0032 or similar methods.

The conversion of the mnemonic sentence to a binary seed is completely independent from generating the sentence. This results in rather simple code; there are no constraints on sentence structure and clients are free to implement their own wordlists or even whole sentence generators, allowing for flexibility in wordlists for typo detection or other purposes.

Although using a mnemonic not generated by the algorithm described in "Generating the mnemonic" section is possible, this is not advised and software must compute a checksum for the mnemonic sentence using a wordlist and issue a warning if it is invalid.

### Encoding extended mnemonics

An extended mnemonic follows the same process as a standard mnemonic, but the checksum is the logically inverted. As a result, the set of valid standard mnemonics and the set of valid extended mnemonics are disjoint. This allows client software to detect the type of mnemonic entered by a user by computing the expected checksum and comparing it to the one provided; if it matches, the mnemonic is a standard mnemonic; if the inverse matches, the mnemonic is an extended mnemonic, and if neither matches, the mnemonic is invalid.

Extended mnemonics further assign meaning to the leftmost (most significant) bits of the data. The first bits encode a 'type' for the mnemonic, as follows:

| Binary representation |
| --- |
| 0b0*** **** |
| 0b10xx xxxx |

Here, '*' represents a data bit, while 'x' represents a type bit. All mnemonics starting with `0b11xx xxxx` are reserved for future expansion.

The following mnemonics are presently defined:

| First byte | Name | EIP |
| --- | --- | --- |
| 0b0*** **** | Basic extended mnemonic | eip-draft-extended-mnemonics |
| 0b1000 0000 | Password protected extended mnemonic | eip-draft-extended-mnemonics |

EIPs may assign themselves mnemonic IDs by amending this EIP with the required assignments.

Software implementations MUST check the type ID in the decoded entropy data and behave appropriately, refusing to accept mnemonics with a type ID they do not recognise or support.

#### Encoding basic extended mnemonics

Basic extended mnemonics act in all ways like a standard mnemonic, with the exception that when used to derive a deterministic wallet address for use with Ethereum, they MUST conform to eip-draft-hdwallets. Implementations MUST NOT derive BIP44 type HD wallets from a basic extended mnemonic, if that wallet is to be used with an Ethereum network.

Further, basic extended mnemonics MUST NOT use a password as described in BIP39; for password protection use a 'password protected extended mnemonic' described below.

To generate a basic extended mnemonic, generate a data field with the most significant bit set to 0, followed by 32n - 1 bits of entropy (eg, 127, 159, 191, 223 or 255). Encode this as a mnemonic phrase as described above in "Encoding standard (BIP39) mnemonics", logically inverting the checksum bits as described in "Encoding extended mnemonics".

#### Encoding password protected extended mnemonics

Password protected extended mnemonics are encoded the same way as basic extended mnemonics, with the exception of the type ID and the presence of a password. Passwords are used in the seed-generation step.

To generate a password protected extended mnemonic, generate a data field with 32n - 8 bits of entropy, and prefix it with the byte 0x80.

### Generating seeds from extended mnemonics

The procedure to generate a seed phrase from an extended mnemonic depends on the type of extended mnemonic being encoded.

#### Generating seeds from basic extended mnemonics

Follow the procedure described in "Generating seeds from standard (BIP39) mnemonics" with no variations.

#### Generating seeds from password protected extended mnemonics

Solicit a password from the user, then follow the procedure described in "Generating seeds from standard (BIP39) mnemonics", replacing the salt value with the string "mnemonic" concatenated with the user's password in UTF-8 NFKD form.

### Wordlists
Wordlists are the same as those used in BIP39, enumerated [here](https://github.com/bitcoin/bips/blob/master/bip-0039/bip-0039-wordlists.md).

## Rationale

Because BIP39 mnemonics lack a version or type field, it was necessary to define some mechanism by which one could be added. Defining a checksum for extended mnemonics that is the inverse of the BIP39 checksum ensures that extended mnemonics will never be mistaken for BIP39 mnemonics, and vice-versa, at the cost of one bit of checksum data. We believe this to be a worthwhile tradeoff, given the amount of contextual checks available to mnemonics (eg, the use of a word list), and given the provision of an extension mechanism in this EIP that precludes the need to repeat this process and sacrifice further checksum fidelity.

The assignment of type 0 to a 'standard mnemonic' type ensures that transitioning to extended mnemonics for existing applications sacrifices only a single bit of randomness compared to an equal length BIP39 mnemonic, ensuring no significant security compromise for switching.

The use of a variable length integer encoding similar to that in UTF-8 permits assigning a large number of type IDs (larger than is likely to be necessary), while also minimising the size of metadata, which is critical given manual entry of mnemonic codes.

## Backwards Compatibility

Extended mnemonics are deliberately designed to be incompatible with existing code that expects BIP39 mnemonics. This prevents users entering extended mnemonics into existing implementations and receiving incompatible derived addresses.

Existing libraries for decoding and encoding mnemonics will need to be updated or rewritten to support extended mnemonics. We recommend that when doing so, they provide a method that returns the entropy field of a decoded mnemonic, so applications wishing to add new mnemonic types can do so without updating the library in question.

## Test Cases
TBD

## Implementation
None yet.

## Reference

Sections of this EIP were adapted from [BIP39](https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki).

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).