Incorrect mnemonic calculated from entropy #52

Bertrand256 · 2017-01-23T20:57:19Z

When choosing "Supply my own source of entropy" option and entering mnemonic length any from 12 to 24 words, mnemonic calculated is incorrect.

Let's use 128 bit entropy example from Andreas Antonopolous book: 0c1e24e5917779d297e14d45f14e1a1a. Enter this to the "Entropy" field and then, from combo "Mnemonic length" choose "12 words" option. After that, mnemonic calculated is as follows: "enact swarm curious dash next scorpion couple fabric hour loop can diamond" instead of the correct one: "army van defense carry jealous true garbage claim echo media make crunch".

The problem is inside the function setMnemonicFromEntropy. If user selects any option other than the first one in the "Mnemonic Length" combobox, all bits from user's entropy are overwritten with the SHA256 hash of it. After that, calculation of mnemonic is based on hash instead of entropy.

Bertrand256 · 2017-01-23T21:04:41Z

Fix is quite simple, but I don't know the intention of all this hashing inside setMnemonicFromEntropy function. Form my purpose I changed this part of code:

        if (mnemonicLength != "raw") {
            // Get bits by hashing entropy with SHA256
            var hash = sjcl.hash.sha256.hash(entropy.cleanStr);
            var hex = sjcl.codec.hex.fromBits(hash);
            bits = BigInteger.parse(hex, 16).toString(2);
            while (bits.length % 256 != 0) {
                bits = "0" + bits;
            }
            // Truncate hash to suit number of words
            mnemonicLength = parseInt(mnemonicLength);
            var numberOfBits = 32 * mnemonicLength / 3;
            bits = bits.substring(0, numberOfBits);
        }

with this:

        if (mnemonicLength != "raw") {
            mnemonicLength = parseInt(mnemonicLength);
            var numberOfBits = 32 * mnemonicLength / 3;
            if(bits.length < numberOfBits) {
                // Get lacking bits by hashing entropy with SHA256
                var hash = sjcl.hash.sha256.hash(entropy.cleanStr);
                var hex = sjcl.codec.hex.fromBits(hash);
                var bitsHash = BigInteger.parse(hex, 16).toString(2);
                while (bitsHash.length % 256 != 0) {
                    bitsHash = "0" + bitsHash;
                }
                // Truncate hash to suit number of words and append it to bits provided
                bits = bits + bitsHash.substring(0, numberOfBits-bits.length);
            }
            else
                bits = bits.substring(0, numberOfBits);
        }

iancoleman · 2017-01-24T00:19:15Z

The reason for using a hash is given in this comment of issue 33 - using shuffled cards loses entropy
#33 (comment)

and further context for entropy is in #21 and #33

The hash of entropy is used because when sourcing entropy from a deck of cards, the total bits of entropy the user has entered is unknown. By hashing the entropy value, the user can generate their desired length of mnemonic at their desired strength, regardless of the calculated number of bits their entropy may or may not represent.

There is no 'correct' conversion of entropy to mnemonic. If a standard exists I would be glad to know of it. You can see from the linked issues that I greatly favor the first option of 'raw entropy', however it isn't always appropriate.

Since it's possible to create the 'correct' mnemonic from Andreas's entropy using the tool, I'm unclear what change is desired from this issue.

Steps to produce the 'correct' behavior:

enter Andreas's entropy
select 'use raw entropy'
see the mnemonic matches the expected value

Bertrand256 · 2017-01-24T14:36:58Z

The problem is with the second step, because there is no "use raw entropy" option in the combo. There is option ("From entropy length (3 words per 32 bits)"), which produces expected result, but I think its name is not obvious for the user.

Maybe I don't clearly understand the purpose of the tool's input controls labeled "Entropy" and "Mnemonic length", but I assumed, that the meaning of the former is the same as the concept of 'entropy' from BIP39 specification (https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki). According to this (and my understanding), entropy is always the last stage before converting it's value to mnemonic (and is always 'raw'). Yes, generating entropy has no the only and 'correct' way and should be as random as possible, but generating mnemonics from existing entropy seems to be strict and simple:

create a checksum of the entropy by taking the first x bits of its SHA256 hash (x = entropy length/32)
append checksum to the entropy
divide sequence of bits of concatenated entropy into sections of 11 bits
convert each of 11-bit sections to an integer number, which is then used as an index in wordlist
for each index pick a word from predefined BIP39 dictionary

Your tool was mentioned in a Andreas' Antonopolous book (second edition of "Mastering Bitcoin"). When reading "Mnemonic Code Words (BIP-39)" chapter I decided to verify examples provided by the author and found that the results differ, so I was a bit confused. I had to debug code to deduct what I should do to generate expected results.

I also verified python-mnemonic library of Satoshi Labs (AFAIK they issued BIP39) and it gives result corresponding with my understanding of BIP39 and identical with this from and Adreas' book.
Here is example python code:
import mnemonic m = mnemonic.Mnemonic('english') mnemonic = m.to_mnemonic('0c1e24e5917779d297e14d45f14e1a1a'.decode('hex')) print(mnemonic)

Maybe such behavior was intended, but as a user I expected, that selecting specific number of words in the combo will end in the following behavior:

if entropy is too long, it will be truncated
if it is too short, some random bits will be appended and the end or a message is displayed
But I didn't expect it will be modified by hashing.

From my point of view entropy provided by the user should not be modified, regardless what user selects in the combobox or the it's label should be different.

BTW thanks for the great and quite helpful tool.

iancoleman · 2017-02-02T23:32:18Z

Good points and I agree with what you say. Raw entropy should be the default.

Just to clarify one aspect of entropy

generating mnemonics from existing entropy seems to be strict and simple:

create a checksum of the entropy...

'the entropy' is not as simple or strict in the context of this tool vs that of BIP39. 'the entropy' used by BIP39 should be a binary string (or equivalent encoding), but this tool accepts inputs that are not necessarily an encoding of binary. This happens for card entropy (specifically drawing cards without replacement). There's no clear way to convert that sequence of events to binary entropy.

Consider the sequence of Entropy Events > Entropy As Binary > Mnemonic. This tool has code for all three stages, but BIP39 only has code for the last two. Hence the need for hashing (sometimes).

I think appropriate steps to resolve this issue would be

rename 'from entropy length' to 'from raw entropy'
use raw entropy by default

@dooglus your thoughts on the proposed changes would be welcome.

A final aside:
Card entropy is already hashed in entropy.js : processCardEntropy so the additional hashing for producing a fixed word length is simply a convenience for the user and should have no impact on the security of the generated mnemonic relative to their entropy.

Bertrand256 · 2017-02-09T19:57:55Z

Yes, from my perspective both proposals are OK. Renaming combobox item to "from raw entropy" would eliminate my assumption, that all combobox items relate to an entropy in a raw form.
Thanx.

iancoleman · 2017-02-10T07:17:42Z

See 5ed50bd - Raw Entropy is the default for mnemonic length

Bertrand256 · 2017-02-12T16:51:19Z

Great. Thanx.

leafcutterant · 2017-12-14T20:15:14Z

It was confusing for me as well that providing 128 bits of entropy, I get different result with "Raw entropy" and "12 words".

If my entropy source is reliably pseudo-random, which option is considered to be more secure?

iancoleman · 2017-12-18T02:29:20Z

The 'X words' options in the select is a way to force 'Y bits of entropy' into 'X words'. I personally don't like it because it isn't following any sort of standard (I prefer only using raw entropy), but it does allow encoding of any amount of entropy into any amount of words, and I can understand why people might want that.

The process of converting any entropy into 'X words' is to sha256 the entropy and take the appropriate number of bits to form the mnemonic.

Neither is more secure than the other, although raw entropy is closer to the bip39 standard so for that reason I say it's the superior option. In practice they are equally secure.

Bertrand256 changed the title ~~Incorrect mnemonics calculated from entropy~~ Incorrect mnemonic calculated from entropy Jan 23, 2017

iancoleman closed this as completed Feb 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect mnemonic calculated from entropy #52

Incorrect mnemonic calculated from entropy #52

Bertrand256 commented Jan 23, 2017

Bertrand256 commented Jan 23, 2017 •

edited by iancoleman

iancoleman commented Jan 24, 2017

Bertrand256 commented Jan 24, 2017

iancoleman commented Feb 2, 2017

Bertrand256 commented Feb 9, 2017

iancoleman commented Feb 10, 2017

Bertrand256 commented Feb 12, 2017

leafcutterant commented Dec 14, 2017

iancoleman commented Dec 18, 2017

Incorrect mnemonic calculated from entropy #52

Incorrect mnemonic calculated from entropy #52

Comments

Bertrand256 commented Jan 23, 2017

Bertrand256 commented Jan 23, 2017 • edited by iancoleman

iancoleman commented Jan 24, 2017

Bertrand256 commented Jan 24, 2017

iancoleman commented Feb 2, 2017

Bertrand256 commented Feb 9, 2017

iancoleman commented Feb 10, 2017

Bertrand256 commented Feb 12, 2017

leafcutterant commented Dec 14, 2017

iancoleman commented Dec 18, 2017

Bertrand256 commented Jan 23, 2017 •

edited by iancoleman