Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-108322: Optimize statistics.NormalDist.samples() #108324

Merged
merged 6 commits into from Aug 27, 2023

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Aug 22, 2023

Timings show a 3.8x speedup:

# Baseline
% ./python.exe -m timeit -s 'from random import random' -s 'from statistics import NormalDist as ND' -s 'Z = ND()' 'Z.samples(1000)'
2000 loops, best of 5: 196 usec per loop

# Patched
% ./python.exe -m timeit -s 'from random import random' -s 'from statistics import NormalDist as ND' -s 'Z = ND()' 'Z.samples(1000)'
5000 loops, best of 5: 51.6 usec per loop

Besides being faster, the PR makes the statistics module more self contained and less vulnerable to concurrency issues (because gauss() is stateful between successive calls).

@stevendaprano
If you have comments or suggestions, feel free to send them to me privately.


📚 Documentation preview 📚: https://cpython-previews--108324.org.readthedocs.build/

@rhettinger rhettinger added performance Performance or resource usage 3.13 bugs and security fixes labels Aug 22, 2023
@rhettinger rhettinger self-assigned this Aug 22, 2023
@AlexWaygood AlexWaygood changed the title gh-108322: Optimize NormalDist.samples() gh-108322: Optimize statistics.NormalDist.samples() Aug 22, 2023
@rhettinger rhettinger merged commit 042aa88 into python:main Aug 27, 2023
22 checks passed
@rhettinger rhettinger deleted the samples_with_invcdf branch August 27, 2023 13:59
@serhiy-storchaka
Copy link
Member

It looks impressive. But I wonder why not use this implementation in Random.normalvariate()?

@gpshead
Copy link
Member

gpshead commented Aug 29, 2023

Tangentially: This PR would've been improved by code review from an active core dev as it borders on feature territory. Unfortunately stevendaprano has not yet chosen to return and thus may not even have been able to see the request let alone offer visible feedback.

@gpshead
Copy link
Member

gpshead commented Aug 29, 2023

I see one potential problem with it from an API behavior change point of view in that people do rely on consistent output over time from functions like this when they specify an explicit seed value. We should retain the older slower implementation by default when an explicit seed value is specified.

We're free to use a new faster version that returns different results when no explicit seed is provided given the nature of the function being randomness based.

I'll make a PR.

gpshead added a commit to gpshead/cpython that referenced this pull request Aug 29, 2023
…les()`

python#108324 switched to a faster
implementation, but a caveat is that it changes the specific outputs.

This preserves backwards compatibility when the user has asked for
specific outputs by supplying a seed value.  With an option for them
to ask for the new algorithm implementation if desired.

When return random values anyway (no seed), the new faster algorithm is
used by default.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants