Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Mersenne Twister RNG with a PCG family algorithm #82948

Closed
dyedgreen mannequin opened this issue Nov 11, 2019 · 7 comments
Closed

Replace Mersenne Twister RNG with a PCG family algorithm #82948

dyedgreen mannequin opened this issue Nov 11, 2019 · 7 comments
Labels
3.9 only security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@dyedgreen
Copy link
Mannequin

dyedgreen mannequin commented Nov 11, 2019

BPO 38767
Nosy @tim-one, @rhettinger, @mdickinson, @dyedgreen

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-11-11.18:38:00.817>
created_at = <Date 2019-11-11.17:08:46.341>
labels = ['extension-modules', 'type-feature', '3.9']
title = 'Replace Mersenne Twister RNG with a PCG family algorithm'
updated_at = <Date 2019-11-17.18:53:56.776>
user = 'https://github.com/dyedgreen'

bugs.python.org fields:

activity = <Date 2019-11-17.18:53:56.776>
actor = 'tim.peters'
assignee = 'none'
closed = True
closed_date = <Date 2019-11-11.18:38:00.817>
closer = 'rhettinger'
components = ['Extension Modules']
creation = <Date 2019-11-11.17:08:46.341>
creator = 'dyedgreen'
dependencies = []
files = []
hgrepos = []
issue_num = 38767
keywords = []
message_count = 7.0
messages = ['356371', '356375', '356376', '356377', '356379', '356380', '356826']
nosy_count = 4.0
nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'dyedgreen']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue38767'
versions = ['Python 3.9']

@dyedgreen
Copy link
Mannequin Author

dyedgreen mannequin commented Nov 11, 2019

Currently, the random.random() function / the random module uses the Mersenne Twister algorithm for generating random numbers.

While this algorithm has acceptable statistical properties for most use-cases (and does feature a ridiculously large period), it is slow and very memory intensive when compared with algorithms from the PCG family (see http://www.pcg-random.org). PCG family algorithms also have much better statistical properties than the Mersenne Twister.

I would propose to replace the underlying generator in random with a suitable PCG family algorithm, while not changing any of the external interfaces of the module. I think that the change would make the module faster, better in terms of memory usage, and improve the statistical properties of Python's default random numbers.

I have not done anything in the direction in terms of writing any code, but if this sounds like something that would be sensible, I would be happy to work on the change and submit a PR.

Also, this is the first time I am contributing to Python/ writing an issue here, so apologies if this is not the correct place to make a suggestion like this.

@dyedgreen dyedgreen mannequin added 3.9 only security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Nov 11, 2019
@mdickinson
Copy link
Member

See discussion in bpo-30880. That issue was closed, though it's possible that it's time to reconsider.

@mdickinson
Copy link
Member

Also worth noting that NumPy 1.17 has now adopted PCG for their default BitGenerator.

@tim-one
Copy link
Member

tim-one commented Nov 11, 2019

Python is a general-purpose language, and as such I believe it's inappropriate for it to be "a leader" in adopting new PRNGs. That's for area specialists to pioneer.

If NumPy switched, that is a good reason to evaluate this again. But for backward compatibility we'd probably still need to support the current Twister anyway (for programs that set the seed, we try to keep results bit-for-bit reproducible across releases).

Note that Python didn't switch _to_ the Twister before it was, in effect, a de facto standard across many scientific libraries.

Something not noted in the earlier report: TOMS rejected the PCG paper, so it will probably never be published. I don't much care, but those who do care about peer-reviewed publication may see that as a deal killer.

@rhettinger
Copy link
Contributor

We had looked at this once before and the proposal was rejected (better to stick with the devil you know than than one that hasn't been extensively studied).

Also, there is some value to having a large state space, shuffle() and sample() consume a lot of entropy to assure that a selection from the entire population is possible.

At best, PCG would be an additional algorithm rather than a replacement -- it is already possible to subclass Random and substitute other generators (the docs link to an example for the complementary-multiply-with-carry generator).

@mdickinson
Copy link
Member

FWIW, here's the NumPy GitHub issue that led to PCG64 being chosen as the default BitGenerator. Warning: the comment thread is long (with many contributions from the PCG author).

numpy/numpy#13635

@tim-one
Copy link
Member

tim-one commented Nov 17, 2019

Thanks for the NumPy discussion link, Mark! Did that set a world record for an issue report's length? ;-) Not complaining - it's a very high quality and informative discussion.

I'd be comfortable adopting whichever PRNGs numpy uses. numpy has better brainpower to apply to "due diligence" in this area, and the discussion made clear too that they're acutely aware of that most users know next to nothing about the pathologies, so that the defaults have to be ignorance-resistant.

It's cute that you raised good questions about how "independent" PCG streams are, and that PCG's creator invented a new member of the family to address the pathologies your line of questioning uncovered. No "proof" that the new member is robust, but lots of testing. That appears to be as good as the state of art allows for now.

I had/have similar concerns about the Twister, but never pursued them. Much like PCG, in fact, it mixes a simple generator with a more-elaborate permutation of the underlying generator's output sequence (which they call "tempering").

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants