-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Improve error messages for random.choice #25521
Conversation
@rkern thought you might have an opinion about whether the |
numpy/random/_generator.pyx
Outdated
raise ValueError("Probabilities do not sum to 1. " | ||
"You can typically solve this issue with " | ||
"`p = p / np.sum(p)`. " | ||
"In rare cases this may not work due to round-off error, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there examples of normalization by np.sum(p)
in double precision arithmetic not working in modern NumPy? Assuming kahan_sum
is not so different from np.sum
, I would have expected the tolerance atol = max(atol, np.sqrt(np.finfo(p.dtype).eps))
to be pretty generous compared to typical roundoff error in float64
, even for large arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that p
came in as float64
, yes. When the user provides a float32
array that sums reasonably close to 1 in float32
arithmetic, we're simply casting it to float64
, and that's one of the more common sources of "spuriously" hitting this exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I was going to suggest that if we were to provide a recommendation, it would be sufficient to recommend that the user normalize by np.sum(p, dtype=np.float64)
(assuming NEP 50) rather than providing a primary strategy and backup strategy.
I feel like the possible remedies are more context-dependent than what should fit into an exception message. The advice to rescale works when the user provided just implicitly-scaled weights (that do not sum at all to 1) or when they are close enough to 1 (maybe in a reduced-precision arithmetic) that the rescaling does negligible damage to the values. But sometimes, you just want to fix up the last value and leave the rest alone. Or maybe the first. Or some other one that can tolerate the deviation from what was requested better than the others because of problem-specific circumstances. |
@matheussouza88 Welcome to the project. Please look at our contribution guide on ways to contribute to the project. Numpy is an old, mature project, and there are some ways to contribute that are more well-adapted to newcomers than others. PR approval is one of the things that requires some longer experience with the project, its forward-looking goals, and its backwards-looking history. Thanks. |
I agree that we cannot provide exhaustive or one-size-fits-all advice. Perhaps a compromise would be to provide a short, lightly worded suggestion to those who are confused by the error:
Those who have considered it and decided that it is not appropriate for their use case can do whatever they deem appropriate. Another possibility is to add an example in the documentation. |
A |
@MilesCranmer would that address the issue, and if so, would you change the PR accordingly? |
I don't think this would help because requiring the user to google their error is the poor experience that this PR attempts to address. Whether that google search goes to stackoverflow (current) or the docs is not really a big difference imo. Avoiding placing the burden on the user is really the PR's goal (however that ends up). I suppose the p / sum(p) is a more obvious from the existing message, so isn't needed, but the float32 -> float64 which changes the sum away from 1 is subtle and a real pain (which I ran into myself; hence this PR). In this case a helpful error message would be the best form of documentation. |
I'm happy to expand on the message, for instance to mention that the calculation is done in |
Good idea. Let me try to implement that. |
de1c387
to
a031206
Compare
@rkern Were you OK with changing the error message of Would you like to trim it to:
And move the bit about needing normalization to the Notes? |
Only change The content doesn't necessarily have to move to the |
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Failures look unrelated. Thanks @MilesCranmer! |
Hi i am sorry to ask this kind of question , |
Questions like this are best asked on the mailing list or the Scientific Python Discourse, preferably not on unrelated Github issues. The current recommended implementation is to use |
thank you |
This improves the error messages for
random.choice
by suggesting the user usep = p / np.sum(p)
. It also suggests what to do for round-off issues if the sum of probabilities is, for example,0.99999997
(due to precision issues in division or summation), which can otherwise be very confusing (see for example [1], [2], and [3])