Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-108322: Preserve backwards compatibility in NormalDist.samples() when a seed is given #108658

Closed

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Aug 29, 2023

#108324 switched to a faster implementation of statistics.NormalDist.samples(), but a caveat is that it changes the specific outputs.

This preserves backwards compatibility when the user has asked for specific outputs by supplying a seed value. With an option for them to ask for the new algorithm implementation if desired.

When returning uniquely random values anyway (no seed), the new faster algorithm is used by default.


📚 Documentation preview 📚: https://cpython-previews--108658.org.readthedocs.build/

…les()`

python#108324 switched to a faster
implementation, but a caveat is that it changes the specific outputs.

This preserves backwards compatibility when the user has asked for
specific outputs by supplying a seed value.  With an option for them
to ask for the new algorithm implementation if desired.

When return random values anyway (no seed), the new faster algorithm is
used by default.
@gpshead gpshead added type-feature A feature request or enhancement 3.13 bugs and security fixes labels Aug 29, 2023
@gpshead gpshead self-assigned this Aug 29, 2023
@@ -830,8 +830,12 @@ of applications in statistics.

.. versionchanged:: 3.13
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing documentation above this says of seed "this is useful for creation reproducible results, even in a multi-threading context." and the PR this is modifying mentions that random.gauss() is problematic describing the non-gauss version as better for being "more self contained and less vulnerable to concurrency issues (because gauss() is stateful between successive calls)."

I presume we might want call that implementation detail out with a sentence added to the above paragraph recommending the new use_gauss=False argument to people if that caveat matters to them? @rhettinger

@serhiy-storchaka
Copy link
Member

It looks as an over-complication to me. Do you really expect users will use use_gauss=False in their code? It is not compatible with Python <=3.12, so it cannot be used in mass for a long time.

The reproducibility warranty in the random module extends only to random() (I would extend it also to randrange(), getrandbits() and few derived methods). Implementations and the output of more complex distributions was changed in the past (otherwise we could not fix bugs which caused losing precision, bias or hanging).

The statistics module do not even provide such warranty. Reproducibility only means that when you run the same Python (same minor version) on the same platform with the same seed you will get the same result. If it was not clear from the documentation, I think that the documentation should be made more clear.

@gpshead
Copy link
Member Author

gpshead commented Aug 30, 2023

It could be an over complication. I'm proposing it because this is the kind of conversation that could've happened within the origin unreviewed PR.

Taking a look at how often statistics.NormalDist gets used in our huge codebase at work, it really doesn't (I see 2-3 uses, even including third party PyPI/github python code in my search corpus). And none of them specify a seed.

That's another decent indication that caution about behavior changes on this API aren't worthwhile. Thanks!

@gpshead gpshead closed this Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes awaiting core review type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants