Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid using PRNG? #21

Closed
dooglus opened this issue Apr 21, 2016 · 22 comments
Closed

avoid using PRNG? #21

dooglus opened this issue Apr 21, 2016 · 22 comments

Comments

@dooglus
Copy link
Contributor

dooglus commented Apr 21, 2016

I recently saw a reddit post in which the author was expressing concern about entropy collection in your bip39 code.

I replied here https://www.reddit.com/r/Bitcoin/comments/4fomsb/how_secure_are_the_hd_wallets_generated_by_bip39/d2bid2j with a modified version of your site in which the user can enter a long string which is hashed to provide the 'randomness' rather than using the PRNG. This allows the user to generate an HD wallet by rolling dice, flipping coins, or just hammering on the keyboard for ten minutes.

Is this something that you would consider adding to the site?

I can see it could be misused. If you use a short memorable phrase then it is just as insecure as a traditional single key 'brain wallet' - but for people who know what they are doing it allows them to use true physical randomness, avoiding any worry that their PRNG is somehow weak.

My change is here: dooglus@bed8b774

@iancoleman
Copy link
Owner

I would not consider adding this to the site. Using a long string is essentially a brainwallet, and we all know how many problems there's been with that.

Thanks for the input and the reply on reddit.

@dooglus
Copy link
Contributor Author

dooglus commented Nov 2, 2016

This came up again on reddit:

https://www.reddit.com/r/Bitcoin/comments/5abf1f/mentor_monday_october_31_2016_ask_all_your/d9fbjq8/

Again I referred the poster here.

Using a long string isn't essentially a brain wallet unless you attempt to memorize the long string, which I don't think these people are intending to do.

The use case is the following:

"I want to generate a secure 12 word phrase. I don't trust the pseudo-random number generator in Android, or in Chrome, so I'm going to roll a 6 sided dice 100 times and type the rolls in to make a long string. I want to generate my 12 word seed from that 100 digit number, then throw the number away. Then I know my 12 word seed is properly random and not limited to the range produced by a potentially broken PRNG."

I personally did exactly that to generate my 24 word seed, except that I used playing cards rather than dice to generate my "long string". I believe the resulting seed is safer than if I had generated it in a web browser, since I trust the randomness of my shuffling more than I trust whatever Google Chrome uses.

Anyway, I just thought I'd comment again since you seem open to suggestions and may have simply misunderstood the use case the first time I brought this up. I'm not suggesting that people should use it as a "brain wallet". You could require the "long string" to be at least 100 characters or something to make sure it's not something they are going to try to remember. That allows for 100 dice rolls or 52 two-character playing cards.

@iancoleman
Copy link
Owner

Since this has been requested multiple times, I will add it. I think combined with zxcvbn or some other measure of strength it should allow users to appreciate if they're shooting themselves in the foot.

Using .toMnemonic() as per your implementation should work well. Is sha256 of the entropy standard? For example, bip32jp uses the raw entropy. I would prefer to do that rather than a 'magical' step of sha256 the phrase. Using a sha256 means all entropy will be 12 words, perhaps giving the wrong impression of how strong the phrase is with low entropy. What's your thoughts?

Thanks for persisting with this and helping make this tool great.

@iancoleman iancoleman reopened this Nov 2, 2016
@dooglus
Copy link
Contributor Author

dooglus commented Nov 3, 2016

Using a sha256 means all entropy will be 12 words

I think each set of 3 words gives 32 bits of entropy (and 1 bit of checksum). So 12 words is 128 bits.

Wouldn't that mean that sha256 (256 bits) gives enough entropy to generate 24 words (assuming the input to the hash has enough entropy)?

Edit: another redditor looking for a secure way of generating offline keys: https://www.reddit.com/r/Bitcoin/comments/5am3vn/what_is_more_secure_bitaddress_bulk_wallet_or/ -- I pointed him your way.

@iancoleman
Copy link
Owner

This feature is now live.

See c6624d5

It's mostly compatible with bip32jp entropy, see the test for compatibility and caveats.

There are a lot of tests specifying how this feature works, see tests.js lines 1963-2568

Thanks for the suggestions and persistence with pointing out feature requests from users.

@dooglus
Copy link
Contributor Author

dooglus commented Nov 4, 2016

That's great, thanks. Now I can point people at your tool and not have to worry about the entropy generation.

A few initial thoughts for what they're worth:

  1. When I load the page and click "Supply my own source of entropy" it immediately provides a seed and a set of addresses ("m/44'/0'/0'/0/0 1LSCRUdhpiUvR6QnCxHHTLK54ixMeRURiB ...") before I've entered any entropy at all. That's presumably a bug. The naive user might end up using that address and have his coins stolen.
  2. If I type 8 hex digits, I get a set of addresses generated (as expected). If I backspace over those 8 hex digits, the addresses go back to the set described above (1LSCR...), but if I select the 8 hex digits by typing control-A or double-clicking them, and then hit backspace to delete them all at once, the displayed seeds and addresses stay as they were before I hit backspace.
  3. Rather than rolling dice I typically shuffle a deck of cards. 52 randomly ordered cards gives 52! permutations, for ~225 bits of entropy. I'm unable to get your tool to generate a mnemonic based on my shuffled deck strings. That's presumably by design although I'd like to be able to continue using my card-shuffling entropy generation technique. My deck strings are 104 characters long, and match /^([A2-9TJQK][CDHS]){52}$/i. Maybe that's too 'niche' to support. I understand you don't want to be sha256'ing arbitrary text strings because people will use song lyrics, but you also don't want to be parsing decks of cards.
  4. It's a little counter intuitive that entering 56 bits of entropy ("f00d1234123456") gives the exact same output as entering just 32 bits of entropy ("f00d1234"). I may not have an exact multiple of 32 bits of entropy, but would still like them all used.
  5. I was surprised to see that if I kept typing hex digits I could go beyond 24 word mnemonics, and was further surprised to see that longer mnemonics were accepted by the tool as input too. I thought BIP39 limited us to 24 words, but that's apparently not the case.
  6. I don't seem to be able to select the number of words in the mnemonic. Maybe I have 100 dice rolls but only want a 12 word seed.
  7. The entropy calculation seems a little off. A single dice roll has 2.585 bits of entropy, but your tool shows it as 3. Is it rounding up? Rounding down would be safer. Three dice rolls have 7.755 bits of entropy, but your tool shows it as 8 and considers it enough to generate a 3 word seed. It appears to be rounding up - for 65 dice rolls there are 168.02 bits of entropy, but your tool shows 169.
  8. I entered "66666666666666666666666666666666666666666666666666666666666666666". It told me I had 169 bits of entropy, but the real count is much lower. Not all those 6's were randomly generated. I don't know if it's worth using zxcvbn or similar to more accurately detect such silliness to protect the user.
  9. It would be useful to see a count of the number of dice rolls I have entered, and the number of words in the generated mnemonic.

@iancoleman
Copy link
Owner

Great insights.

  1. I'll fix this bug where blank entropy incorrectly generates addresses.
  2. Correctly handling input events for entropy will also be fixed.
  3. I will add a 'cards' parser to the entropy detection, which will prevent arbitrary strings but will allow cards to be used. Thanks for specifying the desired format, it makes it easier for me to implement. I like the card shuffling technique, but wanted to get mvp for this feature before implementing it. I may also add octal and base 20 for those types of dice.
  4. I'm not sure how to use all the bits of entropy without losing the 1:1 mapping between characters entered and bits of entropy. It's possible the leftover entropy could be padded with zeros then XORed with the prior entropy, but this doesn't increase the total bits of entropy or strength of the phrase. I guess it comes down to 'can there be a mnemonic that's between 33 to 63 bits strong', to which I would say 'no'. Open to suggestions here.
  5. Mnemonics longer than 24 words reminds me of this issue about mnemonic length and is a valid point, but retaining a consistent mapping between 'amount of entropy' and 'mnemonic length' is an important aspect of the user experience. This design means the possibility of mnemonics more than 24 words long is unavoidable.
  6. If a user wants 12 words, they enter the correct amount of entropy. Why do 100 dice rolls if 50 suffice? It's a user experience thing. Also it's an 'entropy fungibility' thing: 256 bits of entropy to generate 128 bit strong mnemonic is not any stronger than 128 bits of entropy, so just use 128 bits of entropy in the first place.
  7. Yes the entropy is rounded. As you say, flooring would be safer, so I'll do that instead.
  8. I agree, and something like zxcvbn should probably be used by the entropy detection. Trying the example of 65 repeated 6s on the zxcvbn demo page returned a score of 0/4 so it does provide useful results. I'll add zxcvbn to the entropy detection and some text to indicate the quality of the entropy beyond just the number of bits.
  9. Agreed, I'll add a 'total event count' and indicate it to the user.

So, changes resulting from this are:

  • Fix bug for zero-length entropy
  • Improve user-input-event detection for entropy field
  • Add card-parsing to entropy detection
  • Change placeholder text indicating valid entropy types to a more detailed explanation
  • Change entropy measure from round to floor
  • Use zxcvbn to measure entropy strength and indicate the result to the user
  • Show total dice rolls / events to the user

@iancoleman iancoleman reopened this Nov 7, 2016
@dooglus
Copy link
Contributor Author

dooglus commented Nov 7, 2016

  1. I'm not sure how to use all the bits of entropy without losing the 1:1 mapping between characters entered and bits of entropy

Why do you want this 1:1 mapping? Nobody ever wants to convert their mnemonic back to the entropy that generated it.

can there be a mnemonic that's between 33 to 63 bits strong

Take a string with 50 bits of entropy. Put it though sha256. Use the result to generate a mnemonic. So long as the mnemonic is 6 words or longer it has 50 bits of entropy, no?

  1. retaining a consistent mapping between 'amount of entropy' and 'mnemonic length' is an important aspect of the user experience

I don't know that it is. I don't care that the 24 word mnemonic that I generated from a shuffled deck of cards doesn't have the full 256 bits of entropy. It has enough bits for my purposes, and that's all I care about. The (hypothetical) wallet I was importing it into only accepted 12 or 24 word mnemonics, and 12 wasn't enough to capture all the entropy in the shuffled deck.

@iancoleman
Copy link
Owner

Why do you want this 1:1 mapping? Nobody ever wants to convert their mnemonic back to the entropy that generated it.

  • This relationship is outlined in the bip39 spec: "The following table describes the relation between the initial entropy length (ENT)... and the length of the generated mnemonic sentence (MS) in words." Having word length vary with entropy length is definitely appropriate for a BIP39 implementation.
  • Seeing a mnemonic of a certain length causes the user to think it has a certain strength. To see a 12 word mnemonic and be unaware it's derived from only a few bits of entropy is potentially dangerous. Preserving the relationship between entropy length and mnemonic length is important for preserving the relationship between mnemonic length and mnemonic strength.

So long as the mnemonic is 6 words or longer it has 50 bits of entropy, no?

No. A mnemonic generated from the sha256 of a single bit does not contain 50 bits of entropy. So even though a mnemonic with six words or more should imply at least 50 bits of entropy, it may not. This ambiguity can be removed if raw entropy is used, and not hashed entropy. Users should be able to trust the relationship between entropy length, mnemonic length and mnemonic strength. Using hashed entropy breaks that trust.

From the BIP39 Motivation: "It's not a way to process user-created sentences (also known as brainwallets) into a wallet seed." Is it fair to replace 'user-created sentences' with 'weak entropy'?

As always I am open to suggestions here but I am very wary of introducing weak entropy.

@dooglus
Copy link
Contributor Author

dooglus commented Nov 7, 2016

So long as the mnemonic is 6 words or longer it has 50 bits of entropy, no?

No. A mnemonic generated from the sha256 of a single bit does not contain 50 bits of entropy

I was referring to the mnemonic generated by this recipe:

Take a string with 50 bits of entropy. Put it though sha256. Use the result to generate a mnemonic.

So long as that specific mnemonic is 6 words or longer it has 50 bits of entropy, no?

Is it fair to replace 'user-created sentences' with 'weak entropy'?

Some user-created sentences have weak entropy. Some have strong entropy. They are generally weaker than the creator would guess though. Allowing me to create a wallet from 100 sixes is worse than letting me create a wallet from a long enough nonsense sentence (see 'correct horse battery staple'). The important thing is to estimate the amount of entropy in the input and disallow the user of weak entropy that way.

@iancoleman
Copy link
Owner

Your points have caused me to realize the mistake in my logic. I have been thinking mnemonic strength is directly related to the entropy length and thus the number of words. But the strength is only at most as strong as the number of words. It may be weaker. You can't know just from the number of words in the mnemonic how strong it is. Assuming a 12 word mnemonic always has 128 bits of underlying entropy can be dangerous.

So I will also add to the todo list

  • Allow user to select word length from raw entropy length or as a fixed number of words from the hash of the entropy
  • Reformat the feedback about entropy strength to be easier to read (because it will have more information now).

Thanks for working through it with me. I'll be working on these changes over the next few days.

@dooglus
Copy link
Contributor Author

dooglus commented Nov 7, 2016

Thanks for taking my feedback as it was intended rather than getting offended by it. My goal is only to have your tool be as great as it can be.

Thanks also for all your hard work on this very useful project.

@iancoleman
Copy link
Owner

These changes have been implemented, starting with 52e6eb8 ending with 18abe53

I'll leave the issue open for a week in case of any further feedback.

@dooglus
Copy link
Contributor Author

dooglus commented Nov 13, 2016

That's looking pretty good now.

For what it's worth:

  1. The little suit icons when I type cards are a little too small for me to see clearly. You might consider making them larger and/or coloring them (lots of card games use a 4 color deck for visibility: C=green, D=blue, H=red, S=black (google "4 color deck" and I'm sure you'll see lots of examples).
  2. There's a lot of vertical space used, making it hard to have the entropy I'm typing and the mnemonic that's generated on the screen at the same time. Maybe put "Type" and "Strength" side by side on one line, and "Event Count" and "Bits per Event" side by side on the next. Also the gap between those lines is large and could comfortably be reduced without things looking squashed. Also, putting the "This is an advanced feature" text above the entropy textarea would help.

That's all for now. When I can only find 2 things to complain about you're doing it right! ;)

Edit:

  1. My original point 9 was "It would be useful to see a count of the number of dice rolls I have entered, and the number of words in the generated mnemonic". I'm still having to count the words in the mnemonic to see if I have rolled enough dice to get 12 words yet. The menu entry "from entropy length (3 words per 32 bits)" could be updated to include the current number of words that the current entropy length corresponds to: "from entropy length (3 words per 32 bits) - currently 9 words"

@iancoleman
Copy link
Owner

Good suggestions, these changes have been made and are live.

I also changed the way entropy is truncated, using modulo instead of truncating from the right. This was the result of conversations with /u/kinoshitajona and improves compatibility with bip32jp.github.io

Changes made:

  • Card suits are easier to read (larger font and unique colors)
  • Layout of entropy feedback is more compact
  • Excess entropy is removed from the left of the binary string rather than the right

@dooglus
Copy link
Contributor Author

dooglus commented Nov 14, 2016

That's better, thanks.

I notice when I enter a shuffled deck of 52 cards it tells me I have 296 bits of entropy, but I calculate it as 225.5 bits:

>>> math.log(52, 2) * 52
296.4228653433368
>>> math.log(factorial(52), 2)
225.5810031237028

I guess you are assuming I am picking 52 cards out of the deck with replacement, but I'm not. The first card is 1 in 52, but the 2nd is only 1 in 51, etc.

I'm thinking it would be useful if your tool could detect typos in the deck. I expect to type each card exactly once. If I typo one of the cards it would be easy to detect, because you'd end up with a duplicate. Maybe tag "(full deck)" on the end of the "filtered entropy" report if it's a full deck, or "duplicate 2D" or "missing 6S" if there's a duplicate or missing card. Not that it's necessarily an error to have a missing or duplicate card, but for people using a full shuffled deck it is, and having the tool check these things is quite an effective safeguard against typos. It wouldn't detect the transposition of two cards of course, but that's unlikely to happen when typing a deck in.

In general you have the same problem as my "6666666" example earlier. Your tool tells me that "6666666" has 18 bits of entropy when it doesn't really. Without knowing how I generated those 7 digits you really can't say how much entropy they have (though you can guess and be pretty sure that I just hit the same key seven times).

@iancoleman
Copy link
Owner

Great suggestions, I'm enjoying the subtlety of this entropy issue.

  • 02f05d3 - Entropy strength for cards assumes no replacement
  • 391c7f2 - Card duplicates and use of full deck is detected
  • fc8c404 - Multiple decks of cards are possible (but also a needless oversupply of entropy)

zxcvbn has limits, for example a perfectly sorted deck isn't detected as weak. Nor is [0,1,2,3,...,25].join("") detected as weak. Not sure there's much can be done about that except adding custom checks, which could easily get out of hand!

@dooglus
Copy link
Contributor Author

dooglus commented Nov 16, 2016

I agree. It's a never-ending job to try to detect all 'obvious' patterns. Better not to start down that road.

  • I found a small bug: "AH JD 6D 6H 3C KH 5S JC 4C 7C TC AD QS 8H 3H 8D TS 4H 5D 3S 2S 2H 4D 2C 3D 8S KC QD 7H JS 9H 5C 9C 8C AC 9D JH QC AS 6C 4S 2D KD KS 7D TD 7S 6S QH 5H TH as" is detected as a 'full deck', but it has both 'AS' and 'as' in it. It seems that mixing up the case breaks the duplicate test. I guess you need to normalize the case before checking for dups.
  • If I delete the last 'as', it doesn't tell me which card I'm missing for the full deck. I'm not sure if it's meant to.
  • When I enter a full deck I see "event count: 52, bits per event: 5.7, total bits: 226". The first two don't multiply to give the 3rd like you would naively expect. Perhaps it's better to say "bits per event: 5.7 or less" or something, to indicate that not all events contribute the same amount of entropy.

As an aside, I have been using the following one-line shell script to generate shuffled decks for testing:

$ echo $(for r in {2..9} T J Q K A; do for s in C D H S; do echo $r$s; done; done | sort --random-sort --random-source=/dev/random)
8D 3C 5H AC QH 9D QS TC 8C 2D JD JS 2C 9S 7C 3S 4D TH 4C TD 6D 6H KH 9C 4S 4H KC 5C 2S 7H 5D 5S 7S 3D AH AD QC KS 9H JH 8S 2H TS 8H 6C JC 7D AS KD QD 3H 6S
$ 

Obviously for generating proper entropy I'd use a physical deck, but I find that command useful for testing your new code.

@dooglus dooglus closed this as completed Nov 16, 2016
@dooglus
Copy link
Contributor Author

dooglus commented Nov 16, 2016

I just missed the 'comment' button and hit 'close and comment' by mistake. I'll reopen it now. Sorry!

@dooglus dooglus reopened this Nov 16, 2016
@dooglus
Copy link
Contributor Author

dooglus commented Nov 16, 2016

When I enter a full deck I see "event count: 52, bits per event: 5.7, total bits: 226"

Actually it's worse than that. By default it is generating a 27 word mnemonic from a full deck. But 226 bits should only give 21 words. Looks like it's using the full 52*5.7 = 296 bits to decide how many words to create.

@iancoleman
Copy link
Owner

I should have done it 'right' from the start by putting it in the entropy library rather than screw around mutating it in the ui logic...

  • 5c653a1 - Duplicate card detection is case insensitive
  • bbc29c8 - Missing cards are detected
  • b0c07d7 - Bits Per Event says 'or less' for card entropy

And the correction to converting cards to binary is in

  • 6422c1c - Entropy library assumes cards are discarded (and removed the duplicate logic from the UI logic.)

I'm not sure about the calculation of the binary value for cards. It scales the 296 bit value down to 226 bits - source code and illustration:

Concept to convert 296 bits to 226:

0 |-------------X--------------| 52^n
0 |---------X-----------| 52!/(52-n)!

Ratio of X between 0 and MAX is retained

@iancoleman
Copy link
Owner

To be continued using separate issues as they arise. See #33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants