-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[9.x] Use secure randomness in Arr:random and Arr:shuffle #46105
Conversation
A quick benchmark showed this version of I would suggest adding a dedicated method |
src/Illuminate/Collections/Arr.php
Outdated
$keys = array_keys($array); | ||
$shuffled = []; | ||
|
||
for ($i = count($keys) - 1; $i >= 0; $i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given a zero-based array, the Fisher-Yates algorithm iterates from count - 1
down to 1
.
Therefore, it should be: for ($i = count($keys) - 1; $i >= 1; $i--) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I just noticed that in 9f39ac6 you tweaked the original algorithm, so I can't say for sure…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the core counts down to 1
not 0
, but since it shuffles in-place while I'm injecting into a new array to preserve keys, the final element was always dropped off the end. This change ensures the final element is added into the new array.
Since it shouldn't be preserving keys, I will go back to the original algorithm.
src/Illuminate/Collections/Arr.php
Outdated
$shuffled = []; | ||
|
||
foreach ($keys as $key) { | ||
$shuffled[$key] = $array[$key]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote the code from scratch on my end, and got the same result, except that in this line I had:
$shuffled[] = $array[$key];
Indeed, shuffle() does not preserve the keys, not even the string keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the keys should not be preserved, we could use array_values()
, so that $shuffled
can be made in one iteration, instead of needing a second iteration to build it from the shuffled keys:
$shuffled = array_values($array);
for ($i = count($shuffled) - 1; $i >= 1; $i--) {
$j = random_int(0, $i);
$temp = $shuffled[$i];
$shuffled[$i] = $shuffled[$j];
$shuffled[$j] = $temp;
}
return $shuffled;
It is shorter, and a bit faster.
src/Illuminate/Collections/Arr.php
Outdated
} | ||
|
||
return $results; | ||
return array_slice(static::shuffle($array), 0, $number, $preserveKeys); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your current implementation of Arr::shuffle()
changed from not preserving the keys, to preserving the keys. Changing it back to not preserving the keys, the above line in Arr::random()
which makes use of Arr::shuffle()
and array_slice() (with preserve_keys
parameter) would no longer work, the code would need to be adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my bad. I added in preserving keys because I needed it for the Arr:random($num)
usage, and overlooked the fact that shuffle didn't preserve them. I'll split them out so random()
can preserve it's keys and shuffle()
doesn't.
Awesome feedback, thanks @vlakoff!
Personally, I'm ok with the performance hit because I prefer secure by default, and I know Laravel's helpers are never going to be as efficient as a core PHP function. However I can completely understand this isn't everyone's view, and the 16x 👍 votes your comment has is showing agreement with your point, so I'm happy to concede the point. I'm not a fan of |
@vlakoff can you share your benchmarking details, showing the 10x performance hit? |
I've run my own benchmarks. Shuffling 1,000 items: Shuffling 1,000,000 items: Random with 1,000 items: Random with 1,000,000 items: Shuffle definitely sees a hit in performance for large numbers of items. Random less so, but it does depend on number of items you're pulling out (for obvious reasons). |
(indeed, the 16 thumbs up surprised me as well)
I just ran the following: $nb = 1000;
$data = range(1, 5000);
$t1 = microtime(true);
for ($ii = $nb; $ii--; ) {
shuffle_v1($data);
}
$t2 = microtime(true);
for ($ii = $nb; $ii--; ) {
shuffle_v2($data);
}
$t3 = microtime(true);
echo $t2 - $t1;
echo "\n";
echo $t3 - $t2;
echo "\n\n";
echo $t2 - $t1 == 0 ? 'N/A' : ($t3 - $t2) * 100 / ($t2 - $t1);
function shuffle_v1($array) {
shuffle($array);
return $array;
}
function shuffle_v2($array) {
$values = array_values($array);
for ($i = count($values) - 1; $i >= 1; --$i) {
$j = random_int(0, $i);
$temp = $values[$i];
$values[$i] = $values[$j];
$values[$j] = $temp;
}
return $values;
} Results:
|
Shuffle is broken. Try this: dd(Arr::shuffle(['foo', 'bar', 'baz'])); |
I would still suggest adding new methods Case in point, currently the |
We don't currently document the |
This is a major breaking change! The current random implementation of Laravel is highly dependent on the state of the PHP VM, and making this change is expected to break numerous systems. (includes laravel-octane!) These problems are not only in Laravel, but in PHP as well, and the Randomizer class was implemented in PHP 8.2 to solve the problem. https://wiki.php.net/rfc/rng_extension Here's an example of how it might break things. Before srand(1);
Arr::random([1, 2, 3]);
dump(rand(0, 100)); // 25
dump(rand(0, 100)); // 37
dump(rand(0, 100)); // 100 After srand(1);
Arr::random([1, 2, 3]);
dump(rand(0, 100)); // 23
dump(rand(0, 100)); // 25
dump(rand(0, 100)); // 37 |
To clarify, you're saying any app that uses |
@zeriyoshi your example just shows a different generated number. How does this break anything? |
I have code that stores seeds in the database so I can achieve the same results everytime. Adding this change will alter the expected results. |
Please read the RFC on the PHP side for details, but anything that depends on the result will be broken. In fact, as taylor indicates, the $seed parameter exists to achieve repeatability. The value generated from the passed seed value is fully reproducible. I understand that in many cases, shuffle and random methods do not require repeatability. However, since the $seed argument already exists and is used for repeatability (yes, I already use it that way), I don't think such a change is appropriate. |
@taka-oyama can you explain how this works? We're talking about generating random numbers. How do you use something like this to expect the same number each time you use it? |
Thanks, I now see how this can be breaking. I'll follow this up with Taylor. |
Even in the absence of seeding, the random state of the PHP VM is not changed when the CSPRNG is used, resulting in a BC Break. It also extends beyond the boundaries of Laravel and even affects PHP functions. mt_srand(1234);
mt_rand(); // 411284887
collect(range(0, 10))->shuffle();
mt_rand();
// before: UNKNOWN (shuffle() is re-seeding finishing itself)
// after: 1068724585 (fixed) |
@valorin Also, in spite of recent questionable destructive changes to PHP, there is a move to deprecate srand / rand itself in PHP 8.3. |
@zeriyoshi Does the polyfill solve that problem, or is it still an issue that needs to wait for 8.2? Couldn't this still be reverted in 9.x but kept in 10.x as a breaking change? |
@valorin I am not familiar with Laravel, but do we need to get this fix in ASAP? PHP's random implementation has been messed up for a long time. Can't we just change Randomizer to accept optional arguments in 11.x? (or Accept BC with polyfill in 10.x) I am sorry my English is so messed up. I have no intention to offend you in any way. |
Is the breaking change primarily with Arr::random or with Arr::shuffle? |
Both. Simply call the following function to change PHP's internal random number sequence. They should not be call, nor should you stop calling them.
|
Not at all! You've raised a very important issue, and I've been asking many questions to try and understand the extent of it. 🙂 To put the issue another way: every time one of those randomness methods is called, it increments the internal random sequence. Normally this isn't an issue, but if you've seeded the random generator (using The only way to fix it in 9.x is to rollback the change and use the original methods. We can still use the new methods in 10.x, but it would need to be marked down as a breaking change. Anyone who relies on seed values would need to reimplement the original methods to regain the original expected sequence. Note that the So I guess the question is: does it go into 10 as a breaking change, or be rolled back? |
I'm personally of the opinion that the change to
|
@taylorotwell Of particular concern to me is the impact on Laravel Octane. This is done using Swoole, which takes over the random number state at the time the process fork occurs. There is an unintentional bias of some sort, and something could break as a result due to this modification. I think it would be better to leave it as is until 10.x and modify it in 11.x to accept the Randomizer as an argument. This would provide the user with an appropriate migration path. |
I’m sorry - I’m not following at all how Arr::random and shuffle are going to break Laravel Octane? |
@taylorotwell |
$keys = array_keys($array);
// ...
$shuffled[] = $array[$keys[0]];
$items = collect($searchResults)
->filter->hasLicenseIn($state)
->shuffle(); // may be empty |
@derekmd fixing now |
@derekmd fixed and new patch tagged. |
@zeriyoshi glad to keep looking at this further but I don't feel like we have a clear example of how Laravel Octane is broken by this? |
I am not affected by this PR personally, chiming in as one of the listed maintainers for PHP's randomness functionality and the person who authored the randomness documentation for the new API in PHP 8.2. I've used Laravel before, but don't have deep knowledge about its internals.
Hyrum's Law likely applies here. I would recommend reverting this for now, because it certainly is an unexpected change for a minor release and might break user applications that rely on this behavior, even if it is not explicitly guaranteed. Once you raise the minimum PHP version to PHP 8.2, it would likely make sense to make the The The |
@taylorotwell https://wiki.swoole.com/#/getting_started/notice?id=mt_rand%e9%9a%8f%e6%9c%ba%e6%95%b0 In Swoole, the concept of parent and child processes exists. The parent process is PHP itself, and the child processes are processes created from it by Currently PHP stores the internal state of the random number generator, Mersenne Twister, in the PHP VM (think of it as the global variables). Also, once Mersenne Twister has seeded, it will reuse the generated sequence for a while. This means that if the parent process initializes the global Mersenne Twister on the PHP VM (global variables), the child process will always generate the same sequence of random numbers. This is a very dangerous behavior, and to prevent it, the child process must manually re-seed with mt_srand as appropriate in the child process to prevent it. This problem probably applies to Laravel Octane on Swoole as well. The current Laravel implementation calls mt_rand / mt_srand, but if this is replaced by random_int and the state of Mersenne Twister on the PHP VM is not changed, the child processes that were not properly reseeded by mt_srand that have not been properly reseeded with mt_srand may unintentionally continue to generate the same random numbers. |
You might think that it would be better if the parent process did not seed Mersenne Twister, but the problem is more complex, and PHP often unintentionally initializes the state of Mersenne Twister. For example, |
This change broke some tests in Statamic, just FYI. I'm sure we can rewrite/rework things because this release will be a breaking change for our users, but just pointing out that it might be a little bit of a smoking gun that a minor patch update broke our build. Edit: Wasn't actually this change, per-sé that broke the test. |
In 11.x, I recommend to change (with BC Break) This ensures perfect migration routes for all and allows users to write secure code even if they do not intend to. |
@taylorotwell <?php
spl_autoload_register(function (string $_): void {
mt_srand(random_int(0, mt_getrandmax()));
});
PHP's current random number behavior is based on a miraculous balance. Do you really want to break this? |
@zeriyoshi This argument makes no sense though. You're saying that Laravel Octane uses
Which is why it needs to be fixed. It should be purely random values by default, and a proper way to seed values implemented. @taylorotwell I think it's probably best to rollback this change in both 9.x and 10.x, and work on a proper fix for 11.x with PHP 8.2's new stuff. There isn't a critical issue that we're fixing by keeping it in 10.x - the random numbers aren't cryptographically secure, but predicting them isn't trivial, so it's not overly exploitable. We can save the weird pain now, and do it properly next time. |
Reverting this for now ... can introduce a |
Thank you for your understanding. Now many applications will be able to migrate properly! |
The
Arr::random()
andArr:shuffle()
methods relied upon PHP'sarray_rand()
andshuffle()
methods, neither of which were are cryptographically secure algorithms. As such, they were were not safe to use for any secure purposes, yet because they are core Laravel functions, they were being used as such.To make things simpler for everyone, this changes those methods to use secure implementations. Allowing them to be safely used for secure purposes. I've PR'ed this on 9.x as it's technically a security fix, and I believe that's the oldest supported version?
random_int()
andarray_slice()
to pull out a random value securely.shuffle()
method to shuffle the array and then slice off the requested number of items.I'm pretty confident this can now be considered a secure implementation, but please pick holes in it and check my implementations. It's better to be paranoid than make assumptions.
Note, I've left the seeded shuffle with the insecure methods for backwards compatibility. I haven't looked into securely seeding shuffles. It's probably something that could be done with the new https://www.php.net/manual/en/class.random-randomizer.php in PHP 8.2.