Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect mnemonic calculated from entropy #52

Closed
Bertrand256 opened this issue Jan 23, 2017 · 9 comments
Closed

Incorrect mnemonic calculated from entropy #52

Bertrand256 opened this issue Jan 23, 2017 · 9 comments

Comments

@Bertrand256
Copy link

When choosing "Supply my own source of entropy" option and entering mnemonic length any from 12 to 24 words, mnemonic calculated is incorrect.

Let's use 128 bit entropy example from Andreas Antonopolous book: 0c1e24e5917779d297e14d45f14e1a1a. Enter this to the "Entropy" field and then, from combo "Mnemonic length" choose "12 words" option. After that, mnemonic calculated is as follows: "enact swarm curious dash next scorpion couple fabric hour loop can diamond" instead of the correct one: "army van defense carry jealous true garbage claim echo media make crunch".

The problem is inside the function setMnemonicFromEntropy. If user selects any option other than the first one in the "Mnemonic Length" combobox, all bits from user's entropy are overwritten with the SHA256 hash of it. After that, calculation of mnemonic is based on hash instead of entropy.

@Bertrand256 Bertrand256 changed the title Incorrect mnemonics calculated from entropy Incorrect mnemonic calculated from entropy Jan 23, 2017
@Bertrand256
Copy link
Author

Bertrand256 commented Jan 23, 2017

Fix is quite simple, but I don't know the intention of all this hashing inside setMnemonicFromEntropy function. Form my purpose I changed this part of code:

        if (mnemonicLength != "raw") {
            // Get bits by hashing entropy with SHA256
            var hash = sjcl.hash.sha256.hash(entropy.cleanStr);
            var hex = sjcl.codec.hex.fromBits(hash);
            bits = BigInteger.parse(hex, 16).toString(2);
            while (bits.length % 256 != 0) {
                bits = "0" + bits;
            }
            // Truncate hash to suit number of words
            mnemonicLength = parseInt(mnemonicLength);
            var numberOfBits = 32 * mnemonicLength / 3;
            bits = bits.substring(0, numberOfBits);
        }

with this:

        if (mnemonicLength != "raw") {
            mnemonicLength = parseInt(mnemonicLength);
            var numberOfBits = 32 * mnemonicLength / 3;
            if(bits.length < numberOfBits) {
                // Get lacking bits by hashing entropy with SHA256
                var hash = sjcl.hash.sha256.hash(entropy.cleanStr);
                var hex = sjcl.codec.hex.fromBits(hash);
                var bitsHash = BigInteger.parse(hex, 16).toString(2);
                while (bitsHash.length % 256 != 0) {
                    bitsHash = "0" + bitsHash;
                }
                // Truncate hash to suit number of words and append it to bits provided
                bits = bits + bitsHash.substring(0, numberOfBits-bits.length);
            }
            else
                bits = bits.substring(0, numberOfBits);
        }

@iancoleman
Copy link
Owner

The reason for using a hash is given in this comment of issue 33 - using shuffled cards loses entropy
#33 (comment)

and further context for entropy is in #21 and #33

The hash of entropy is used because when sourcing entropy from a deck of cards, the total bits of entropy the user has entered is unknown. By hashing the entropy value, the user can generate their desired length of mnemonic at their desired strength, regardless of the calculated number of bits their entropy may or may not represent.

There is no 'correct' conversion of entropy to mnemonic. If a standard exists I would be glad to know of it. You can see from the linked issues that I greatly favor the first option of 'raw entropy', however it isn't always appropriate.

Since it's possible to create the 'correct' mnemonic from Andreas's entropy using the tool, I'm unclear what change is desired from this issue.

Steps to produce the 'correct' behavior:

  • enter Andreas's entropy
  • select 'use raw entropy'
  • see the mnemonic matches the expected value

@Bertrand256
Copy link
Author

The problem is with the second step, because there is no "use raw entropy" option in the combo. There is option ("From entropy length (3 words per 32 bits)"), which produces expected result, but I think its name is not obvious for the user.

Maybe I don't clearly understand the purpose of the tool's input controls labeled "Entropy" and "Mnemonic length", but I assumed, that the meaning of the former is the same as the concept of 'entropy' from BIP39 specification (https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki). According to this (and my understanding), entropy is always the last stage before converting it's value to mnemonic (and is always 'raw'). Yes, generating entropy has no the only and 'correct' way and should be as random as possible, but generating mnemonics from existing entropy seems to be strict and simple:

  1. create a checksum of the entropy by taking the first x bits of its SHA256 hash (x = entropy length/32)
  2. append checksum to the entropy
  3. divide sequence of bits of concatenated entropy into sections of 11 bits
  4. convert each of 11-bit sections to an integer number, which is then used as an index in wordlist
  5. for each index pick a word from predefined BIP39 dictionary

Your tool was mentioned in a Andreas' Antonopolous book (second edition of "Mastering Bitcoin"). When reading "Mnemonic Code Words (BIP-39)" chapter I decided to verify examples provided by the author and found that the results differ, so I was a bit confused. I had to debug code to deduct what I should do to generate expected results.

I also verified python-mnemonic library of Satoshi Labs (AFAIK they issued BIP39) and it gives result corresponding with my understanding of BIP39 and identical with this from and Adreas' book.
Here is example python code:
import mnemonic m = mnemonic.Mnemonic('english') mnemonic = m.to_mnemonic('0c1e24e5917779d297e14d45f14e1a1a'.decode('hex')) print(mnemonic)

Maybe such behavior was intended, but as a user I expected, that selecting specific number of words in the combo will end in the following behavior:

  • if entropy is too long, it will be truncated
  • if it is too short, some random bits will be appended and the end or a message is displayed
    But I didn't expect it will be modified by hashing.

From my point of view entropy provided by the user should not be modified, regardless what user selects in the combobox or the it's label should be different.

BTW thanks for the great and quite helpful tool.

@iancoleman
Copy link
Owner

Good points and I agree with what you say. Raw entropy should be the default.

Just to clarify one aspect of entropy

generating mnemonics from existing entropy seems to be strict and simple:

  1. create a checksum of the entropy...

'the entropy' is not as simple or strict in the context of this tool vs that of BIP39. 'the entropy' used by BIP39 should be a binary string (or equivalent encoding), but this tool accepts inputs that are not necessarily an encoding of binary. This happens for card entropy (specifically drawing cards without replacement). There's no clear way to convert that sequence of events to binary entropy.

Consider the sequence of Entropy Events > Entropy As Binary > Mnemonic. This tool has code for all three stages, but BIP39 only has code for the last two. Hence the need for hashing (sometimes).

I think appropriate steps to resolve this issue would be

  • rename 'from entropy length' to 'from raw entropy'
  • use raw entropy by default

@dooglus your thoughts on the proposed changes would be welcome.

A final aside:
Card entropy is already hashed in entropy.js : processCardEntropy so the additional hashing for producing a fixed word length is simply a convenience for the user and should have no impact on the security of the generated mnemonic relative to their entropy.

@Bertrand256
Copy link
Author

Yes, from my perspective both proposals are OK. Renaming combobox item to "from raw entropy" would eliminate my assumption, that all combobox items relate to an entropy in a raw form.
Thanx.

@iancoleman
Copy link
Owner

See 5ed50bd - Raw Entropy is the default for mnemonic length

@Bertrand256
Copy link
Author

Great. Thanx.

@leafcutterant
Copy link

It was confusing for me as well that providing 128 bits of entropy, I get different result with "Raw entropy" and "12 words".

If my entropy source is reliably pseudo-random, which option is considered to be more secure?

@iancoleman
Copy link
Owner

The 'X words' options in the select is a way to force 'Y bits of entropy' into 'X words'. I personally don't like it because it isn't following any sort of standard (I prefer only using raw entropy), but it does allow encoding of any amount of entropy into any amount of words, and I can understand why people might want that.

The process of converting any entropy into 'X words' is to sha256 the entropy and take the appropriate number of bits to form the mnemonic.

Neither is more secure than the other, although raw entropy is closer to the bip39 standard so for that reason I say it's the superior option. In practice they are equally secure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants