Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only produce Unknown names that are just long enough #7

Closed
rhagenson opened this issue Nov 1, 2019 · 3 comments
Closed

Only produce Unknown names that are just long enough #7

rhagenson opened this issue Nov 1, 2019 · 3 comments

Comments

@rhagenson
Copy link
Owner

rhagenson commented Nov 1, 2019

Produce names that are just long enough to cover all randomness. The necessary randomness would be enough to cover separating every known individual from every other known individual by --max-distance number of unknown individuals.

Characters needed  =ceiling( log(Number of possible unknown) / log(Number of possible characters) )

Number of possible unknown = maxDist * nKnown * (nKnown - 1) * 1/2

Number of possible characters = 26 = len("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
@rhagenson
Copy link
Owner Author

Because it is cheap and significantly reduces the possibility of random name clashes I would likely also add 1 to this value so there is always Number of possible characters * Characters needed possibilities.

@rhagenson
Copy link
Owner Author

gonum provides combinatorics calculations: https://godoc.org/gonum.org/v1/gonum/stat/combin#Binomial

@rhagenson
Copy link
Owner Author

Calculating the number of necessary characters to represent all possible unknowns with: 500 known individuals, all separated by a distance of 9, and 26 possible characters needs only 5 characters. How many are needed to represent 1000 known individuals, all separated by a distance of 9, and 26 possible characters?...5. 5^{26} = 1.49e18

The answer to having enough randomness is to just use 5 characters. Hand-rolling a way to produce 5 random characters also proved slow so I wen back to using xid, but I only take the final 6 (changing) characters. Why 6, because 6^{26} = 1.70e20 and I might as well add an additional character as it cheap to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant