Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9.x] Use secure randomness in Arr:random and Arr:shuffle #46105

Merged
merged 8 commits into from
Feb 16, 2023

Conversation

valorin
Copy link
Contributor

@valorin valorin commented Feb 14, 2023

The Arr::random() and Arr:shuffle() methods relied upon PHP's array_rand() and shuffle() methods, neither of which were are cryptographically secure algorithms. As such, they were were not safe to use for any secure purposes, yet because they are core Laravel functions, they were being used as such.

To make things simpler for everyone, this changes those methods to use secure implementations. Allowing them to be safely used for secure purposes. I've PR'ed this on 9.x as it's technically a security fix, and I believe that's the oldest supported version?

  • Single value randomness uses random_int() and array_slice() to pull out a random value securely.
  • Multi-value randomness uses the new shuffle() method to shuffle the array and then slice off the requested number of items.
  • Shuffle uses a slightly modified version of the Fisher-Yates shuffle, to securely shuffle the values. I modified the algorithm to support PHP's magical array keys, since the pure algorithm expects numerical keys only.

I'm pretty confident this can now be considered a secure implementation, but please pick holes in it and check my implementations. It's better to be paranoid than make assumptions.

Note, I've left the seeded shuffle with the insecure methods for backwards compatibility. I haven't looked into securely seeding shuffles. It's probably something that could be done with the new https://www.php.net/manual/en/class.random-randomizer.php in PHP 8.2.

@driesvints driesvints changed the title Use secure randomness in Arr:random and Arr:shuffle [9.x] Use secure randomness in Arr:random and Arr:shuffle Feb 14, 2023
@vlakoff
Copy link
Contributor

vlakoff commented Feb 14, 2023

A quick benchmark showed this version of Arr::shuffle() is about 10x slower, and most of the time I don't think the user needs a cryptographically secure shuffle.

I would suggest adding a dedicated method Arr::secureShuffle(), which would also have the benefit of explicitly showing in user code a cryptographically secure shuffle is requested.

$keys = array_keys($array);
$shuffled = [];

for ($i = count($keys) - 1; $i >= 0; $i--) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a zero-based array, the Fisher-Yates algorithm iterates from count - 1 down to 1.

Therefore, it should be: for ($i = count($keys) - 1; $i >= 1; $i--) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just noticed that in 9f39ac6 you tweaked the original algorithm, so I can't say for sure…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the core counts down to 1 not 0, but since it shuffles in-place while I'm injecting into a new array to preserve keys, the final element was always dropped off the end. This change ensures the final element is added into the new array.

Since it shouldn't be preserving keys, I will go back to the original algorithm.

$shuffled = [];

foreach ($keys as $key) {
$shuffled[$key] = $array[$key];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the code from scratch on my end, and got the same result, except that in this line I had:

$shuffled[] = $array[$key];

Indeed, shuffle() does not preserve the keys, not even the string keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the keys should not be preserved, we could use array_values(), so that $shuffled can be made in one iteration, instead of needing a second iteration to build it from the shuffled keys:

$shuffled = array_values($array);
for ($i = count($shuffled) - 1; $i >= 1; $i--) {
    $j = random_int(0, $i);
    $temp = $shuffled[$i];
    $shuffled[$i] = $shuffled[$j];
    $shuffled[$j] = $temp;
}
return $shuffled;

It is shorter, and a bit faster.

}

return $results;
return array_slice(static::shuffle($array), 0, $number, $preserveKeys);
Copy link
Contributor

@vlakoff vlakoff Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your current implementation of Arr::shuffle() changed from not preserving the keys, to preserving the keys. Changing it back to not preserving the keys, the above line in Arr::random() which makes use of Arr::shuffle() and array_slice() (with preserve_keys parameter) would no longer work, the code would need to be adjusted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad. I added in preserving keys because I needed it for the Arr:random($num) usage, and overlooked the fact that shuffle didn't preserve them. I'll split them out so random() can preserve it's keys and shuffle() doesn't.

@valorin
Copy link
Contributor Author

valorin commented Feb 15, 2023

Awesome feedback, thanks @vlakoff!

A quick benchmark showed this version of Arr::shuffle() is about 10x slower, and most of the time I don't think the user needs a cryptographically secure shuffle.
I would suggest adding a dedicated method Arr::secureShuffle(), which would also have the benefit of explicitly showing in user code a cryptographically secure shuffle is requested.

Personally, I'm ok with the performance hit because I prefer secure by default, and I know Laravel's helpers are never going to be as efficient as a core PHP function. However I can completely understand this isn't everyone's view, and the 16x 👍 votes your comment has is showing agreement with your point, so I'm happy to concede the point.

I'm not a fan of secureShuffle() because it feels like mysql_real_escape_string()... But I can't do much about that if the original function should be left insecure. I'll tweak it.

@valorin
Copy link
Contributor Author

valorin commented Feb 15, 2023

@vlakoff can you share your benchmarking details, showing the 10x performance hit?

@valorin
Copy link
Contributor Author

valorin commented Feb 15, 2023

I've run my own benchmarks.

Shuffling 1,000 items:
Original => 0.052ms
Updated => 0.373ms

Shuffling 1,000,000 items:
Original => 59.034m
Updated => 443.084ms

Random with 1,000 items:
Original => single item 0.003ms, 100 items 0.011ms
Updated=> single item 0.003ms, 100 items 0.043ms

Random with 1,000,000 items:
Original => single item 0.007ms, 1,000 items 3.789ms
Updated => single item 0.035ms, 1,000 items 4.453ms

Shuffle definitely sees a hit in performance for large numbers of items. Random less so, but it does depend on number of items you're pulling out (for obvious reasons).

@vlakoff
Copy link
Contributor

vlakoff commented Feb 15, 2023

(indeed, the 16 thumbs up surprised me as well)

can you share your benchmarking details, showing the 10x performance hit?

I just ran the following:

$nb = 1000;
$data = range(1, 5000);

$t1 = microtime(true);
for ($ii = $nb; $ii--; ) {
    shuffle_v1($data);
}
$t2 = microtime(true);
for ($ii = $nb; $ii--; ) {
    shuffle_v2($data);
}
$t3 = microtime(true);

echo $t2 - $t1;
echo "\n";
echo $t3 - $t2;
echo "\n\n";
echo $t2 - $t1 == 0 ? 'N/A' : ($t3 - $t2) * 100 / ($t2 - $t1);

function shuffle_v1($array) {
    shuffle($array);
    return $array;
}

function shuffle_v2($array) {
    $values = array_values($array);
    for ($i = count($values) - 1; $i >= 1; --$i) {
        $j = random_int(0, $i);
        $temp = $values[$i];
        $values[$i] = $values[$j];
        $values[$j] = $temp;
    }
    return $values;
}

Results:

0.11835885047913
1.1667959690094

985.81218412152

@taylorotwell
Copy link
Member

Shuffle is broken. Try this:

dd(Arr::shuffle(['foo', 'bar', 'baz']));

@taylorotwell taylorotwell marked this pull request as draft February 16, 2023 21:36
@valorin valorin marked this pull request as ready for review February 16, 2023 22:25
@taylorotwell taylorotwell merged commit 3300fed into laravel:9.x Feb 16, 2023
@vlakoff
Copy link
Contributor

vlakoff commented Feb 16, 2023

I would still suggest adding new methods secureRandom() and secureShuffle().

Case in point, currently the shuffle() method is secure without seed, but not secure with seed. I think for a given method, the level of security shouldn't be impacted by the presence (or not) of an optional parameter.

@taylorotwell
Copy link
Member

We don't currently document the seed parameter at all - but we can mention when passed it falls back to native PHP shuffle (which is not "secure").

@zeriyoshi
Copy link

@taylorotwell

This is a major breaking change!

The current random implementation of Laravel is highly dependent on the state of the PHP VM, and making this change is expected to break numerous systems. (includes laravel-octane!)

These problems are not only in Laravel, but in PHP as well, and the Randomizer class was implemented in PHP 8.2 to solve the problem.

https://wiki.php.net/rfc/rng_extension
https://wiki.php.net/rfc/random_extension_improvement

Here's an example of how it might break things.

Before

srand(1);
Arr::random([1, 2, 3]);
dump(rand(0, 100)); // 25
dump(rand(0, 100)); // 37
dump(rand(0, 100)); // 100

After

srand(1);
Arr::random([1, 2, 3]);
dump(rand(0, 100)); // 23
dump(rand(0, 100)); // 25
dump(rand(0, 100)); // 37

@valorin
Copy link
Contributor Author

valorin commented Feb 21, 2023

To clarify, you're saying any app that uses srand() to seed the output from Arr::random() is going to break?
How often does that actually happen? What's the use-case within laravel-octane that requires it?

@driesvints
Copy link
Member

@zeriyoshi your example just shows a different generated number. How does this break anything?

@taka-oyama
Copy link
Contributor

taka-oyama commented Feb 21, 2023

I have code that stores seeds in the database so I can achieve the same results everytime.

Adding this change will alter the expected results.

@zeriyoshi
Copy link

Please read the RFC on the PHP side for details, but anything that depends on the result will be broken.

In fact, as taylor indicates, the $seed parameter exists to achieve repeatability. The value generated from the passed seed value is fully reproducible.

I understand that in many cases, shuffle and random methods do not require repeatability. However, since the $seed argument already exists and is used for repeatability (yes, I already use it that way), I don't think such a change is appropriate.

@driesvints
Copy link
Member

@taka-oyama can you explain how this works? We're talking about generating random numbers. How do you use something like this to expect the same number each time you use it?

@driesvints
Copy link
Member

Thanks, I now see how this can be breaking. I'll follow this up with Taylor.

@zeriyoshi
Copy link

@driesvints

Even in the absence of seeding, the random state of the PHP VM is not changed when the CSPRNG is used, resulting in a BC Break.

It also extends beyond the boundaries of Laravel and even affects PHP functions.

mt_srand(1234);
mt_rand(); // 411284887
collect(range(0, 10))->shuffle();
mt_rand();
// before: UNKNOWN (shuffle() is re-seeding finishing itself)
// after: 1068724585 (fixed)

@zeriyoshi
Copy link

@valorin
In my opinion, these are important issues to the extent that they needed to be fixed in PHP 8.2. Unless there is a real world security issue, I don't think these should be changed unnecessarily until PHP 8.2 shows a safe way to migrate.

Also, in spite of recent questionable destructive changes to PHP, there is a move to deprecate srand / rand itself in PHP 8.3.

https://wiki.php.net/rfc/deprecations_php_8_3

@valorin
Copy link
Contributor Author

valorin commented Feb 21, 2023

@zeriyoshi Does the polyfill solve that problem, or is it still an issue that needs to wait for 8.2?

Couldn't this still be reverted in 9.x but kept in 10.x as a breaking change?

@zeriyoshi
Copy link

@valorin
Unfortunately, it is not possible to resolve this issue while maintaining full compatibility....
This is due to the PHP implementation in the first place.

I am not familiar with Laravel, but do we need to get this fix in ASAP?

PHP's random implementation has been messed up for a long time. srand / rand is one of them and has always been used incorrectly and dangerously.

Can't we just change Randomizer to accept optional arguments in 11.x? (or Accept BC with polyfill in 10.x)

I am sorry my English is so messed up. I have no intention to offend you in any way.

@taylorotwell
Copy link
Member

Is the breaking change primarily with Arr::random or with Arr::shuffle?

@zeriyoshi
Copy link

zeriyoshi commented Feb 21, 2023

Both. Simply call the following function to change PHP's internal random number sequence. They should not be call, nor should you stop calling them.

  • array_rand
  • str_shuffle
  • shuffle
  • mt_rand
  • rand
  • mt_srand
  • srand

@valorin
Copy link
Contributor Author

valorin commented Feb 21, 2023

I am sorry my English is so messed up. I have no intention to offend you in any way.

Not at all! You've raised a very important issue, and I've been asking many questions to try and understand the extent of it. 🙂
(Also, your English is great!)

To put the issue another way: every time one of those randomness methods is called, it increments the internal random sequence. Normally this isn't an issue, but if you've seeded the random generator (using srand()), then the sequence is supposed to be predictable and repeatable. By changing the implementation of Arr::shuffle() and Arr::ranndom(), I've changed what methods are called, resulting in unexpected values to be returned from those methods specifically and any subsequent calls to those randomness methods.

The only way to fix it in 9.x is to rollback the change and use the original methods.

We can still use the new methods in 10.x, but it would need to be marked down as a breaking change. Anyone who relies on seed values would need to reimplement the original methods to regain the original expected sequence.
We could provide Arr::fastShuffle() and Arr::fastRandom() to provide access to those legacy implementations for an easier upgrader, or we could just roll back the change in 10 too, and look at Laravel 11 with the new Randomizer class on PHP 8.2. (We may be able to bring this forward into 10 with the polyfill and avoid a breaking change.)

Note that the $seed value on shuffle() isn't relevant here as srand() is global seed across all randomness method calls.

So I guess the question is: does it go into 10 as a breaking change, or be rolled back?

@taylorotwell
Copy link
Member

taylorotwell commented Feb 21, 2023

I'm personally of the opinion that the change to Arr::random isn't breaking. We never advertised how it worked internally - only that it returns random elements. The fact it could be tricked to return the same result every time is sort of a coincidence in my view.

Arr::shuffle is a bit more interesting since we did expose a "$seed" variable.

@zeriyoshi
Copy link

@taylorotwell
Indeed, But if there is no need to guarantee against that "randomness", perhaps using the existing MT would be sufficient. Should this implementation be changed, even to the point of losing compatibility with projects using Laravel?

Of particular concern to me is the impact on Laravel Octane. This is done using Swoole, which takes over the random number state at the time the process fork occurs. There is an unintentional bias of some sort, and something could break as a result due to this modification.

I think it would be better to leave it as is until 10.x and modify it in 11.x to accept the Randomizer as an argument. This would provide the user with an appropriate migration path.

@taylorotwell
Copy link
Member

I’m sorry - I’m not following at all how Arr::random and shuffle are going to break Laravel Octane?

@zeriyoshi
Copy link

@taylorotwell
Is it possible to at least not include this in the v9.52.1 & v10.1.0 releases? I will try to explain what the problem is in as much detail as possible in writing, but apparently there is no time right now. Once the release is complete, there is no going back.

@derekmd
Copy link
Contributor

derekmd commented Feb 21, 2023

$keys = array_keys($array);

// ...

$shuffled[] = $array[$keys[0]];

Arr::shuffle([]) now throws an exception when empty since it assumes at least one item exists. Makes more sense in this context:

$items = collect($searchResults)
    ->filter->hasLicenseIn($state)
    ->shuffle(); // may be empty

@taylorotwell
Copy link
Member

@derekmd fixing now

@taylorotwell
Copy link
Member

@derekmd fixed and new patch tagged.

@taylorotwell
Copy link
Member

@zeriyoshi glad to keep looking at this further but I don't feel like we have a clear example of how Laravel Octane is broken by this?

@TimWolla
Copy link

I am not affected by this PR personally, chiming in as one of the listed maintainers for PHP's randomness functionality and the person who authored the randomness documentation for the new API in PHP 8.2. I've used Laravel before, but don't have deep knowledge about its internals.

I'm personally of the opinion that the change to Arr::random isn't breaking. We never advertised how it worked internally - only that it returns random elements. The fact it could be tricked to return the same result every time is sort of a coincidence in my view.

Hyrum's Law likely applies here. I would recommend reverting this for now, because it certainly is an unexpected change for a minor release and might break user applications that rely on this behavior, even if it is not explicitly guaranteed.

Once you raise the minimum PHP version to PHP 8.2, it would likely make sense to make the Random\Engine available in the DI container and use that to create a Random\Randomizer that does the heavy lifting (possibly also stored in the DI container).

The Random\Engine\Secure engine would be a useful default for production (it allows you to cryptographically shuffle an array / pick an array key, but likely more efficient than this purely userland implementation). The engine in the DI container can then be replaced by a seedable engine for tests and an optional engine parameter for a single method call would allow users to select a specific engine for special use cases without also replacing the global Secure engine that is used to e.g. generate passwords.

The Random\Engine\Mt19937 engine is fully compatible with global Mersenne Twister (mt_rand, rand, shuffle, array_rand, str_shuffle) when combined with Random\Randomizer.

@zeriyoshi
Copy link

@taylorotwell
The mt_srand / mt_rand notes on Swoole are described in the Swoole documentation. (I understand what happens, but note that I am not familiar with the whole Swoole)

https://wiki.swoole.com/#/getting_started/notice?id=mt_rand%e9%9a%8f%e6%9c%ba%e6%95%b0

In Swoole, the concept of parent and child processes exists. The parent process is PHP itself, and the child processes are processes created from it by fork(2).

Currently PHP stores the internal state of the random number generator, Mersenne Twister, in the PHP VM (think of it as the global variables).

Also, once Mersenne Twister has seeded, it will reuse the generated sequence for a while.

This means that if the parent process initializes the global Mersenne Twister on the PHP VM (global variables), the child process will always generate the same sequence of random numbers. This is a very dangerous behavior, and to prevent it, the child process must manually re-seed with mt_srand as appropriate in the child process to prevent it.

This problem probably applies to Laravel Octane on Swoole as well.

The current Laravel implementation calls mt_rand / mt_srand, but if this is replaced by random_int and the state of Mersenne Twister on the PHP VM is not changed, the child processes that were not properly reseeded by mt_srand that have not been properly reseeded with mt_srand may unintentionally continue to generate the same random numbers.

@zeriyoshi
Copy link

You might think that it would be better if the parent process did not seed Mersenne Twister, but the problem is more complex, and PHP often unintentionally initializes the state of Mersenne Twister. For example, str_shuffle("foobar"); just this.

@jackmcdade
Copy link

jackmcdade commented Feb 21, 2023

Hyrum's Law likely applies here. I would recommend reverting this for now, because it certainly is an unexpected change for a minor release and might break user applications that rely on this behavior, even if it is not explicitly guaranteed.

This change broke some tests in Statamic, just FYI. I'm sure we can rewrite/rework things because this release will be a breaking change for our users, but just pointing out that it might be a little bit of a smoking gun that a minor patch update broke our build.

Edit: Wasn't actually this change, per-sé that broke the test.

@zeriyoshi
Copy link

  • The current PHP random implementation is completely incorrect. (I believe that a workaround is not reasonable without a CVE)
  • Therefore, until PHP 8.2 is available, no changes should be made to the random number handling.
  • Polyfill cannot reproduce the exact same behavior due to PHP implementation reasons, and is much slower to execute.

In 11.x, I recommend to change (with BC Break) ?int $seed = null in Arr/Collection/LazyCollection::shuffle to ?Randomizer = null and add a new ?Randomizer = null argument to Arr/Collection/LazyCollection::random.

This ensures perfect migration routes for all and allows users to write secure code even if they do not intend to.

@zeriyoshi
Copy link

@taylorotwell
Just adding composer.json -> autoload.files in the following code broke the test for the Laravel framework.

<?php
spl_autoload_register(function (string $_): void {
    mt_srand(random_int(0, mt_getrandmax()));
});
1) Illuminate\Tests\Foundation\FoundationHelpersTest::testFakeUsesLocale
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'Arkansas'
+'Ohio'

/Users/g-kudo/Development/framework/tests/Foundation/FoundationHelpersTest.php:263

PHP's current random number behavior is based on a miraculous balance. Do you really want to break this?

@valorin
Copy link
Contributor Author

valorin commented Feb 22, 2023

This problem probably applies to Laravel Octane on Swoole as well.
The current Laravel implementation calls mt_rand / mt_srand, but if this is replaced by random_int and the state of Mersenne Twister on the PHP VM is not changed, the child processes that were not properly reseeded by mt_srand that have not been properly reseeded with mt_srand may unintentionally continue to generate the same random numbers.

@zeriyoshi This argument makes no sense though. You're saying that Laravel Octane uses srand() to ensure randomness when it spawns off a child process, because PHP's implementation of mt_rand() is flawed. But by swapping out mt_rand() for random_int(), we're introducing more randomness not less. We're not breaking the seeding, that's still in-place for any insecure implementations.

PHP's current random number behavior is based on a miraculous balance. Do you really want to break this?

Which is why it needs to be fixed. It should be purely random values by default, and a proper way to seed values implemented.
(This is where the 8.2 Randomizer comes in.)

@taylorotwell I think it's probably best to rollback this change in both 9.x and 10.x, and work on a proper fix for 11.x with PHP 8.2's new stuff. There isn't a critical issue that we're fixing by keeping it in 10.x - the random numbers aren't cryptographically secure, but predicting them isn't trivial, so it's not overly exploitable. We can save the weird pain now, and do it properly next time.

@taylorotwell
Copy link
Member

Reverting this for now ... can introduce a Illuminate\Support\Random class with methods like array and shuffle.

@zeriyoshi
Copy link

Thank you for your understanding. Now many applications will be able to migrate properly!
That said, a secure implementation would be nice to have too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants