Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH:stats:Use explicit formula for gamma.fit('mm') #19932

Merged
merged 3 commits into from Jan 28, 2024

Conversation

fancidev
Copy link
Contributor

Reference issue

Closes gh-19884.

What does this implement/fix?

Code explicit formula for method of moment fitting of gamma distribution.

Additional information

@github-actions github-actions bot added scipy.stats enhancement A new feature or improvement labels Jan 21, 2024
@lucascolley lucascolley added this to the 1.13.0 milestone Jan 21, 2024
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to check the formulas carefully, but if they are correct this looks pretty good. I solved for the formulas based on the moments listed on Wikipedia, and I checked the code against them. Looks good!

Please add tests like in gh-18824 that show this produces fits at least as good as the generic implementation.

scipy/stats/_continuous_distns.py Show resolved Hide resolved
@fancidev
Copy link
Contributor Author

Thanks for reviewing. Let me work on adding some tests.

As for the NB point about checking for data within support, technically it is not necessary for MM because the formulas do not rely on data within the support. This is unlike MLE, where data outside the support has a likelihood of zero, making the total likelihood identically zero regardless of parameter value, and therefore cannot be solved.

So data range check for MM is not a technical necessity but more of a sanity check on behalf of the user. I’m not keen to include such a check for MM, but if you think it’s worth having it I can add it too.

A side note with respect to MM: there’s no guarantee that the fitted parameters are within the range of valid parameters. This is a well known limitation / characteristic of MM. I currently do not check that the fitted parameters are within range, so that user has a chance to see what are the values, and delay the exception until they actually use the parameters (if they do). In a sense, this is arguably more prudent than returning some within-range but absurd parameters.

What do you think?

@mdhaber
Copy link
Contributor

mdhaber commented Jan 22, 2024

In a sense, this is arguably more prudent than returning some within-range but absurd parameters.

Sure - we would not do that. After I write that, I realize that's essentially what we're doing with yeojohnson_normmax and boxcox_normax by default... but I did suggest (#18852 (comment)) it should not be the default.

What do you think?

I forgot about this. We added the following to the documentation when we added method of moments:

Note that the standard method of moments can produce parameters for which some data are outside the support of the fitted distribution; this implementation does nothing to prevent this.

So yes, we should keep it like you have it, consistent with the documentation.

@fancidev
Copy link
Contributor Author

fancidev commented Jan 23, 2024

Please add tests like in gh-18824 that show this produces fits at least as good as the generic implementation.

Do you mean to include a script in the GitHub page to compare the old (generic) MM and the new (closed-form) MM and list the result?

@mdhaber
Copy link
Contributor

mdhaber commented Jan 23, 2024

Oops, no, I mean add property-based tests to test_distributions.py. Normally I'd also ask for results of more thorough testing here in the issue (as you see in that one), but I don't think they're needed in this case because everything is closed form. These will undoubtedly be faster, more accurate, and more reliable; I don't need to be convinced.

@ilayn ilayn changed the title ENH: use explicit formula for scipy.stats.gamma.fit('mm') (see #19884) ENH:stats:Use explicit formula for gamma.fit('mm') Jan 25, 2024
@mdhaber
Copy link
Contributor

mdhaber commented Jan 27, 2024

Thanks for adding tests. These are different from the tests in the linked PR in that they use hard-coded samples instead of samples that are generated at random, and they use hard-coded reference values (from where?) instead of using _assert_less_or_close_loglike, which is used by most of our fit tests (Oops, can't use _assert_less_or_close_loglike here; the analog for method of moments would be to check that the moments of the fitted distribution match the moments of the sample). Please adjust to match these features of the tests in the referenced PR, or make the case for why this test is better. Thanks.

The test generates a random sample from a gamma distribution and
fits a gamma distribution to it.  It then checks that the first r
moments of the fitted distribution is equal to those of the sample,
where r is the number of estimated parameters.

The simulation parameters are restricted to a small-ish range to
ensure the MM estimates lie within the range of valid parameter
values, so that "round-trip" testing can be performed.
@fancidev
Copy link
Contributor Author

Thanks for reviewing. It makes a lot of sense. I have updated the test accordingly.

Sorry about the messy "force push"es .. I seem to have a hard time getting it right!

@mdhaber
Copy link
Contributor

mdhaber commented Jan 27, 2024

Sorry about the messy "force push"es .. I seem to have a hard time getting it right!

In the future, just regular push. We can clean b up the history at the end if we want to, but almost all PRs in stats get squash-merged anyway.

@mdhaber
Copy link
Contributor

mdhaber commented Jan 28, 2024

Thanks @fancidev; this looks great now.

@mdhaber mdhaber merged commit 39e8aee into scipy:main Jan 28, 2024
27 of 28 checks passed
@mdhaber
Copy link
Contributor

mdhaber commented Jan 28, 2024

@fancidev a recent SO question brought to my attention that loglaplace documentation does not mention how it is related to laplace. Also, when the location is provided, stats.laplace.fit (which implements analytical formulas) could be used on the log-transformed data, then the parameters can be converted to those of loglaplace. Does addressing these interest you (after gh-19977 is done)?

Comment on lines +4697 to +4699
with pytest.raises(ValueError, match=error_msg):
stats.halfcauchy.fit(data, method='mm', **kwds)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why halfcauchy here if the PR is about gamma? Copy Paste hickup? @fancidev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why halfcauchy here if the PR is about gamma? Copy Paste hickup? @fancidev

Oops, yes exactly. Let me make a PR to correct it.

@fancidev
Copy link
Contributor Author

@fancidev a recent SO question brought to my attention that loglaplace documentation does not mention how it is related to laplace. Also, when the location is provided, stats.laplace.fit (which implements analytical formulas) could be used on the log-transformed data, then the parameters can be converted to those of loglaplace. Does addressing these interest you (after gh-19977 is done)?

First time I heard of log-laplace :-) Let me look it up after the other Issues/PRs are closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: scipy.stats.gamma.fit: return globally optimal result for arbitrary input
4 participants