Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking your math... #1

Closed
bwbug opened this issue Aug 6, 2022 · 2 comments
Closed

Checking your math... #1

bwbug opened this issue Aug 6, 2022 · 2 comments

Comments

@bwbug
Copy link

bwbug commented Aug 6, 2022

In The Similar-Words Problem in your Readme, you wrote:

If we assumed a hypothetical 18,000 word list that was just 9,000 words and their plurals, I think the odds of getting at least one "awkward double" in a 4-word passphrase is (1/18000) * (2/18000) * (3/18000), which is a really small number. But check my math!

Although your conclusion is correct ("the odds...is a really small number"), the odds of this happening is over 600 million times more probable than what you have estimated.

The correct probability is 1/9000 + 2/9000 + 3/9000 - 11/9000**2 + 6/9000**3.

To prove this for a word list containing N words and their plurals (2 N words total), if P1 is the probability of getting at least one "awkward double", and if P0 is the probability of getting no awkward doubles, then

P1 =1 - P0

The probability if getting no awkward doubles (P0) is the number of passphrases containing only unique stems (i.e., once a word has been selected, it cannot be reselected itself, and neither can its conjugate -- the plural or singular form, whichever was not picked in the previous selection), divided by the total number of possible passphrases. For a passphrase consisting of k words, the total number of passphrases is

Ntotal = (2 N)k

To compute the number of passphrases containing only unique stems, the size of the word pool is reduced by 2 each time a word is selected (because the word itself is eliminated from further consideration, as is the plural/singular form of that word):

Nunique = (2 N)(2 N - 2)(2 N - 4) ... (2 N - 2 k +2)

Therefore, the probability of getting only unique stems is

P0 = Nunique/Ntotal = (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)

Therefore, the general solution for the probability of getting at least one "awkward double" is

P1 = 1 - (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)

For k=4, the math works out to the following result:

P1 = 6/N - 11/N2 + 6/N3

If, in the general solution, one neglects higher-order terms (N-2, N-3, etc.), the following approximate solution is obtained:

P1 ≈ 1/N + 2/N + ... + (k - 1)/N = (k(k - 1))/(2 N)

@sts10
Copy link
Owner

sts10 commented Aug 6, 2022

I can't quite follow this all the way through right now, but I trust your work! README has been updated. Thank you so much!

@sts10 sts10 closed this as completed Aug 6, 2022
@bwbug
Copy link
Author

bwbug commented Aug 6, 2022

Thanks! Let me know if you want me to try to clarify any particular steps of the derivation -- I did skip over some algebra steps in a few of the equations above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants