-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Random.seed does not affect string hash randomization leading to non-intuitive results #84505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The following code gives different results on each run, even though "
presumably because of string hash randomization (see also bpo-27706), However, this is non-intuitive, especially as this random aspect of Python is not mentioned in I would suggest this is either fixed (using the provided seed for string hash randomization as well) or documented. |
String hash randomization is a security feature so it may be better to not disable it unless explicitly asked for. Maybe a note in random's documentation could be added? |
I'm going to deprecate the support for sets. It was a design mistake at several levels. Better to just remove it. |
Raymond, I think that removing sample(set) support is a different issue. This report will just change its final example line to
or
and have the same complaint. |
I think the thing we can fix is the automatic set support which is intrinsically broken with respect to reproducibility and which was likely not a good idea to begin with (because it adds an implicit and possibly unexpected O(n) conversion step and because it doesn't make the API for choice()). If someone converts a set to a list or tuple upstream from sample(), there isn't much we can do about it. That wouldn't be much different from list(s)[0] giving different output from run to run. That is a general FAQ and would apply to just about anything that takes a sequence or iterator to run. |
Yup, I agree sample(set) is a misfeature. |
Yuval, thanks for the report. |
Thank you for the attention and the quick fix. However, the current documentation for "Notes on Reproducibility" should still address this issue of hash randomization. Not only
or, this
will still produce non-reproducible results even after the fix. Here is my suggestion for documentation:
My vote would be to keep hash randomization ties to |
@rhettinger checking software against 3.9 there's a little issue with the way the check is done: if passed something which is both a sequence and a set (e.g. an ordered set), Should I open a new issue for that? Fix seems simple: just move the check for _Set inside the check for _Sequence, and raise if that doesn't pass either. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: