Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9.x] Random function doesn't generate evenly distributed random chars #45916

Merged
merged 4 commits into from
Feb 2, 2023

Conversation

janh-kramer
Copy link
Contributor

The random function does not generate evenly distributed random characters particularly for short strings (<12 chars). For example, if exactly 1 character is generated, the letters A, Q, g and w are significantly overrepresented.

This is due to the conversion of the randomly generated bytes using the base64 function. If the number of bytes is not divisible by three, zeros are added at the end in the algorithm (see https://en.wikipedia.org/wiki/Base64).

This leads to the increased frequency of A (000000), Q (010000), g (100000) and w (110000).

Evenly distributed random characters are obtained if it is ensured that the number of generated random bytes modulo 3 is equal to 0:

$bytesSize = (int) ceil(($size) / 3) * 3;

@driesvints driesvints changed the title Bugfix: random function doesn't generate evenly distributed random chars [10.x] Random function doesn't generate evenly distributed random chars Feb 2, 2023
@taylorotwell taylorotwell merged commit 4f2e46c into laravel:9.x Feb 2, 2023
@janh-kramer
Copy link
Contributor Author

@taylorotwell, When changing the test, you forgot to adjust the delta, so now the test will always pass.

@taylorotwell
Copy link
Member

ok, what should I change the delta to?

@janh-kramer
Copy link
Contributor Author

At most 200, could even be less

@janh-kramer
Copy link
Contributor Author

Should I target this to 10.x?

@taylorotwell
Copy link
Member

Sure

@giggsey
Copy link
Contributor

giggsey commented Feb 3, 2023

It's a minor complaint, but can the comments be updated to match the change in the iterations in 30681bc

        // take 6.200.000 samples, because there are 62 different characters
        for ($i = 0; $i < 620000; $i++) {

@driesvints driesvints changed the title [10.x] Random function doesn't generate evenly distributed random chars [9.x] Random function doesn't generate evenly distributed random chars Feb 3, 2023
@driesvints driesvints removed the bug-fix label Feb 3, 2023
@driesvints
Copy link
Member

@janh-kramer this is already merged into 10.x 👍

@janh-kramer
Copy link
Contributor Author

Is this PR then accepted as a candidate for the Bughunt?

@driesvints
Copy link
Member

It was sent to 9.x so no sorry.

@janh-kramer
Copy link
Contributor Author

come on ..

@driesvints
Copy link
Member

I'm very sorry @janh-kramer but the rules are very clear on this. Only bug fixes sent to 10.x are eligible. The contest is specifically to figure out breaking changes in Laravel v10. Bugs that already exist in 9.x need to be solved on that version.

Please see https://blog.laravel.com/laravel-v10-bug-hunt

@janh-kramer
Copy link
Contributor Author

no problem thanks, it was just a try. Glad to contribute the first time

@ziming
Copy link
Contributor

ziming commented Feb 3, 2023

I dont see any harm in allowing. not like 1 counted PR will win the contest. plus 7 feb is near. but that's just an opinion

@driesvints
Copy link
Member

@janh-kramer we very much appreciate your contribution 👍

@ziming the problem with allowing one now would be unfair to everyone else who has sent in bug fixes to 9.x in the past weeks.

@ziming
Copy link
Contributor

ziming commented Feb 3, 2023

Ah i see. :)

@GrahamCampbell
Copy link
Member

This seems overly complex? Why not just base62 encode the output (by which i mean, divide and ciel the number of chars to determine the number of bytes we need, gen the bytes, base62 encode the output, then trim to the correct length - which will remove either 0 or 1 chars off of the end)?

@GrahamCampbell
Copy link
Member

The reason this is correct is that bsse62 can be thought of as stretching out the bytes over the charset we want, which means we get the perfect distribution we want.

@taylorotwell
Copy link
Member

@janh-kramer let us think about the bug hunt stuff - so far we only have 3 bug fixes contributed to 10.x which feels a bit lame to me and I think it would be better to include 9.x as well personally. Will revisit this.

chu121su12 pushed a commit to chu121su12/framework that referenced this pull request Feb 6, 2023
…rs (laravel#45916)

* Bugfix: The random function does not generate evenly distributed random characters particularly for short strings (<12 chars)

* fixed styling

* fixed styling

* formatting

---------

Co-authored-by: Jan Kramer <j.kramer@codenker.de>
Co-authored-by: Taylor Otwell <taylor@laravel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants