Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #71152: mt_rand() returns the different values from original mt19937ar.c #1681

Merged
merged 2 commits into from
Feb 17, 2016

Conversation

kusano
Copy link
Contributor

@kusano kusano commented Dec 17, 2015

No description provided.

@cmb69
Copy link
Contributor

cmb69 commented Dec 24, 2015

Comparing this with Richard J. Wagner's code there's obviously a typo, so thanks for pointing that out. However, fixing the bug would introduce a BC break. Not sure, what to do.

@smalyshev smalyshev added the Bug label Dec 29, 2015
Conflicts:
	ext/standard/rand.c
@kusano
Copy link
Contributor Author

kusano commented Feb 16, 2016

In my opinion, breaking backward compatibility in this case is not a large issue, since commit a930751 have changed the behavior of the function and no one have complained about that until now.

@laruence
Copy link
Member

I agree, it's weird people would rely on what the random number self is...

@php-pulls php-pulls merged commit 07cae46 into php:master Feb 17, 2016
@kusano kusano deleted the fix-mt_rand branch February 17, 2016 07:54
@laruence
Copy link
Member

Unfortunately, This was reverted by: a0724d3 , you may want to reopen this. sorry & thanks

@lt
Copy link
Contributor

lt commented Feb 18, 2016

Sorry I was unaware this had a PR with some discussion already when I reverted. So here's a few things to add to the discussion

  • "It was broken before" is not a good reason to break it (BC) again.
  • Repeatable pseudo-random streams based on a seed have uses (the obvious example being a game of some sort with seeded levels). There are arguably better choices than MT but that doesn't mean people wont have used it
  • mt_srand actually has quite a high number of references when searching github, no idea how many of these are actually relevant
  • What is actually gained by making this change?

@andreasdotorg
Copy link

I think the main argument here is that Mersenne Twister, or more precisely, MT19337, is a well-known PRNG with well understood properties. Of particular interest is the high period of 2^19337 - 1, and the high level of equidistribution. The paper, having been published in 1998, has not only seen peer review, but also withstood the test of time since then. For reference, it's here: http://ww3.ticaret.edu.tr/mckasapbasi/files/2013/10/p3-matsumoto.pdf

Now the current implementation might or might have not the same properties. I don't know, you don't know, and even if somebody with a sufficient clue sits down and finds out, who knows if that person gets it right without sufficient peer review?

What this boils down to is that existing code has been written, assuming the documented properties of the code. You might have been breaking that code all along and don't even know about it. Making the change makes sure the algorithm is implemented as documented, and thus has the properties promised by the documentation and literature.

If it were my call, I'd introduce a set of legacy functions implementing the old behavior, to support users who absolutely have to rely on the old code. But mt_rand() really should work as advertised.

@kusano kusano restored the fix-mt_rand branch February 19, 2016 04:02
@jerrygrey
Copy link

@Andreas23 I disagree with making legacy functions, there is already too many functions for random numbers anyway, however, the typo still needs to be fixed. I think we should just treat it like any other BC break, push it to the master (which is 7.1) and document the change.

@kusano
Copy link
Contributor Author

kusano commented Feb 19, 2016

@lt As I noted in the bug ticket, this typo possibly makes a bad effect to randomness quality.
https://bugs.php.net/bug.php?id=71152
In addition, Mersenne Twister is a useful way to get the same random numbers in different languages.
People will be able to port games with the same randomly generated levels.

I think that mt_rand() should return numbers generated by "Mersenne Twister" and this fix should be merged eventually.
But I have no idea about when and how.
Changing the behavior in the same revision (7.0) might be too hasty.

@lt
Copy link
Contributor

lt commented Feb 19, 2016

@Andreas23 @kusano After looking at it for a bit, I'm inclined to agree that this probably does negatively affect the randomness. I initially hoped that although the numbers are different, the rest of the properties of MT are preserved, but I don't think that is the case.

After some quick and dirty analysis the entropy doesn't seem to be affected.

@Andreas23 I'd be against legacy functions in core, the implementation is trivial and can be implemented in userland if required.

@kusano If we're going to change the output of this function, there's a few more things we should look at to make sure we get it all done at once.

Firstly, opinions on switching to mt19937-64 (at least on 64 bit platforms - including your fix as-is for 32 bit)?

As it stands there is no good way to obtain 64 bit pseudo-random without resorting to the cryptographic random functions - this feels unnecessarily heavy handed. (binary concat of mt_rand outputs is another way, but you actually require 3 calls to it to get the full 63 bits as we're returning 31 bit values)

Which brings us onto the current scaling mechanism, which is pretty horrific.

Lastly, whatever changes are made, a note in UGRADING is a good idea.

@cmb69
Copy link
Contributor

cmb69 commented Feb 19, 2016

Which brings us onto the current scaling mechanism, which is pretty horrific.

I'd call it broken, see PR #1416.

@lt
Copy link
Contributor

lt commented Feb 19, 2016

Only just got around to catching up on internals, I see last night @nikic proposed a few things too.

If we're going to fix this "typo", we should fix all of the loose ends at the same time and get as much benefit out of the BC break as possible. I can get behind that.

@kusano
Copy link
Contributor Author

kusano commented Feb 20, 2016

@lt I want mt19937-64 with a different name like C++ to keep the same results in 32 bit and 64 bit platform.
http://www.cplusplus.com/reference/random/mt19937/
http://www.cplusplus.com/reference/random/mt19937_64/

@hikari-no-yume
Copy link
Contributor

Firstly, opinions on switching to mt19937-64 (at least on 64 bit platforms - including your fix as-is for 32 bit)?

I'd prefer consistency between 32-bit and 64-bit. Since fixing our MT implementation means breakage, we might as well go 64-bit on both platforms while we're at it. Modern compilers can emulate 64-bit when targeting 32-bit.

@pierrejoye
Copy link
Contributor

Just a quick note, since 5.2.1 we can change the results at wish as the same seed won't produce the same sequence anymore.

There was a need or maybe still has using rand () to generate same sets of "random" values like maps for games and similar predictable series.

@hikari-no-yume
Copy link
Contributor

Well, yes, but code might have relied on the post-5.2.1 behaviour. It's pretty unlikely, though. I'm probably overly cautious.

@pierrejoye
Copy link
Contributor

Wondering how it could rely on it as it is supposed to be not (easily)
predictable also one may rely on the sequences or frequency of some numbers
but that would be a call for troubles anyway :)
On Feb 21, 2016 5:49 PM, "Andrea Faulds" notifications@github.com wrote:

Well, yes, but code might have relied on the post-5.2.1 behaviour. It's
pretty unlikely, though. I'm probably overly cautious.


Reply to this email directly or view it on GitHub
#1681 (comment).

@hikari-no-yume
Copy link
Contributor

You would rely on it by assuming that, given a specific seed, it would always give the same sequence of numbers. You don't care about predictability so much as reproducibility.

@jesseschalken
Copy link

If it helps, here is an example: We use a mt_rand() to shuffle the items resulting from a search, but the results are paginated, so the same shuffled list has to be held between requests as the user changes pages. To achieve this, we store the seed in the database are re-use it to shuffle the list with each request, and once they hit the "Search" button again, a new seed is used and the list reshuffles.

In this case a change in mt_rand() output at worse surprises some users temporarily, but it is nonetheless a minor example of depending on post-5.2.1 mt_rand() output.

@pierrejoye
Copy link
Contributor

On Feb 21, 2016 7:38 PM, "Andrea Faulds" notifications@github.com wrote:

You would rely on it by assuming that, given a specific seed, it would
always give the same sequence of numbers. You don't care about
predictability so much as reproducibility.

Right and it does not, nor it should since 5.2.1, gives the same result or
sequence using the same seed.

And while it is a different topic, predictability is quite important (if it
is or it is not predictable).


Reply to this email directly or view it on GitHub.

@hikari-no-yume
Copy link
Contributor

Right and it does not, nor it should since 5.2.1, gives the same result or sequence using the same seed.

No, it gives the same sequence, just not the same ones < 5.2.1 produced:

$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)

mt_rand is a pseudo-random number generator. Given a particular seed, it will always produce the same sequence of numbers.

@pierrejoye
Copy link
Contributor

On Feb 22, 2016 2:49 AM, "Andrea Faulds" notifications@github.com wrote:

Right and it does not, nor it should since 5.2.1, gives the same result
or sequence using the same seed.

No, it gives the same sequence, just not the same ones < 5.2.1 produced:

Oh, that's bad and the doc is wrong too:

http://php.net/manual/en/function.mt-srand.php

See the note.

$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)


Reply to this email directly or view it on GitHub.

@pierrejoye
Copy link
Contributor

On Feb 22, 2016 11:50 AM, "Pierre Joye" pierre.php@gmail.com wrote:

On Feb 22, 2016 2:49 AM, "Andrea Faulds" notifications@github.com wrote:

Right and it does not, nor it should since 5.2.1, gives the same
result or sequence using the same seed.

No, it gives the same sequence, just not the same ones < 5.2.1 produced:

Oh, that's bad and the doc is wrong too:

http://php.net/manual/en/function.mt-srand.php

See the note.

In any case, what is the appealing reason to change the implementation?

If it is making it "more secure", then I cannot agree on changing it now :)

$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)
$ php -r 'mt_srand(11); var_dump(mt_rand(), mt_rand(), mt_rand());'
int(1750656698)
int(146687839)
int(41822760)


Reply to this email directly or view it on GitHub.

@jesseschalken
Copy link

@pierrejoye If mt_rand() is not intended to produce the same sequence of numbers for a given seed, then mt_srand() shouldn't even exist and this whole pull request is moot.

And then I will submit a feature request for a seedable PRNG.

@pierrejoye
Copy link
Contributor

On Mon, Feb 22, 2016 at 11:58 AM, Jesse Schalken notifications@github.com
wrote:

If mt_rand() is not intended to produce the same sequence of numbers for
a given seed, then mt_srand() shouldn't even exist and this whole pull
request is moot.

And then I will submit a feature request for a seedable PRNG.


Reply to this email directly or view it on GitHub
#1681 (comment).

Right, I misread the notice. Sorry :) The sequences changed after 5.2.1
when a given seed is used. I read it "does not produce anymore the same
result using the same seed" instead.

So yes, we should change that as there are known use cases of using same
seed to get the same sequences, changing the implementation again will
break them without any gains (mt_rand remains unsafe to use for anything
security related).

Pierre

@pierrejoye | http://www.libgd.org

@cmb69
Copy link
Contributor

cmb69 commented Feb 22, 2016

@pierrejoye The documentation states:

It uses a random number generator with known characteristics using the Mersenne Twister, […]

Also the name of the function hints at that. However, the current implementation doesn't implement the Mersenne Twister algorithm. If we don't change the implementation, at least the documentation has to be fixed.

@hikari-no-yume
Copy link
Contributor

I think we should fix the function. Mersenne Twister is a tried-and-tested algorithm, our buggy one which just happens to work isn't. I just think we shouldn't change it in 7.0.x or 5.6.x.

@youkidearitai
Copy link
Contributor

I think we should fix this function in 7.0.x or 5.6.x. Because PHP minor / major version up is ambigous that I saw language specification change in minor version up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet