Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: stats.rv_discrete.pmf: should be zero at non-integer argument #17166

Merged
merged 2 commits into from
Oct 15, 2022

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Oct 7, 2022

Reference issue

gh-17146
gh-15932

What does this implement/fix?

There was a report in gh-17146 that rv_discrete.pmf does not return 0 at non-integer arguments as it used to. This was broken because a line in gh-15932 was out of the intended order. This fixes the order and adds tests.

@mdhaber mdhaber added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats labels Oct 7, 2022
@mdhaber mdhaber marked this pull request as draft October 7, 2022 17:44
@mdhaber mdhaber marked this pull request as ready for review October 7, 2022 20:55
@mdhaber
Copy link
Contributor Author

mdhaber commented Oct 13, 2022

@ev-br, could you take a look here?

Also, you mentioned that you could look into gh-8057 - I'd be happy to review a PR for that!

@mdhaber mdhaber requested a review from ev-br October 13, 2022 20:31
@ev-br
Copy link
Member

ev-br commented Oct 15, 2022

LGTM! Shall we merge it as is, or is there anything you're adding here still?

@mdhaber
Copy link
Contributor Author

mdhaber commented Oct 15, 2022

All set, if the last commit looks safe enough to you.

@ev-br ev-br merged commit ca85aa8 into scipy:main Oct 15, 2022
@ev-br ev-br added this to the 1.10.0 milestone Oct 15, 2022
@ev-br
Copy link
Member

ev-br commented Oct 15, 2022

LGTM, let's roll with it. Merged, thanks Matt

@tylerjereddy tylerjereddy added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Oct 16, 2022
@tylerjereddy
Copy link
Contributor

This has a defect label and the two commits can apparently be cleanly cherry-picked to maintenace/1.9.x, so I'll add a backport label. If you know of a reason not to backport, just remove the label soon-ish.

@mdhaber
Copy link
Contributor Author

mdhaber commented Oct 16, 2022

Good thing to backport. Thanks!

@tylerjereddy tylerjereddy modified the milestones: 1.10.0, 1.9.3 Oct 16, 2022
@josef-pkt
Copy link
Member

my guess is that this is too restrictive

AFAIU:
I guess this kills the use of pmf/log-pmf for e.g. quasi-poisson (with continuous data)
This might also cause problems with floating point error e.g. pmf for y = p * n where p is some float.

It might not cause problems for statsmodels because we use our own implementation of logpdf/loglike for (quasi-)poisson for estimation.

@mdhaber
Copy link
Contributor Author

mdhaber commented Oct 18, 2022

For types other than rv_sample, this simply reverts this behavior to the way it was before #15932. But yes, I agree that the current infrastructure can be too restrictive.

@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Oct 19, 2022
@aloctavodia
Copy link
Contributor

I have been using this "feature" here just as a "trick" for plotting (the dots are evaluated at integers values and the dashed line for floats). I also have being using this "feature " for a project where I approximate discrete variables (actually their pmf) as continuous.

I understand that the pmf should be zero at non-integer argument and thus this should be the default behaviour. But I will really appreciate an argument to allow results for non-integer values. Is that possible?

@rkern
Copy link
Member

rkern commented Nov 2, 2022

The behavior that you plot is not intentional, just an accident that the formulae used to compute the integral-input PMFs happen to also be evaluatable on non-integer values. It doesn't necessarily have any particular properties that would make it suitable for approximating PMFs as PDFs or for plots.

@aloctavodia
Copy link
Contributor

Thanks for the feedback. In those visualizations the dashed line is used just as visual aid, knowing thatthe dots are the actual values of the pmf. They seem to work reasonable well in practice, at least so far.

There has been some previous interest on evaluating the pmf at non-integers values like this paper https://arxiv.org/abs/2102.04509.

@rkern
Copy link
Member

rkern commented Nov 3, 2022

What's required in that paper are particular continuous extensions of certain ways to formulate the discrete PMFs. I'm pretty sure that we don't want to take on the burden to guarantee those. It's quite a niche use case, and would likely involve tradeoffs with the main use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants