ENH:stats:Use explicit formula for gamma.fit('mm') #19932

fancidev · 2024-01-21T13:46:16Z

Reference issue

Closes gh-19884.

What does this implement/fix?

Code explicit formula for method of moment fitting of gamma distribution.

Additional information

mdhaber

~~I'll have to check the formulas carefully, but if they are correct this looks pretty good.~~ I solved for the formulas based on the moments listed on Wikipedia, and I checked the code against them. Looks good!

Please add tests like in gh-18824 that show this produces fits at least as good as the generic implementation.

scipy/stats/_continuous_distns.py

fancidev · 2024-01-22T10:55:06Z

Thanks for reviewing. Let me work on adding some tests.

As for the NB point about checking for data within support, technically it is not necessary for MM because the formulas do not rely on data within the support. This is unlike MLE, where data outside the support has a likelihood of zero, making the total likelihood identically zero regardless of parameter value, and therefore cannot be solved.

So data range check for MM is not a technical necessity but more of a sanity check on behalf of the user. I’m not keen to include such a check for MM, but if you think it’s worth having it I can add it too.

A side note with respect to MM: there’s no guarantee that the fitted parameters are within the range of valid parameters. This is a well known limitation / characteristic of MM. I currently do not check that the fitted parameters are within range, so that user has a chance to see what are the values, and delay the exception until they actually use the parameters (if they do). In a sense, this is arguably more prudent than returning some within-range but absurd parameters.

What do you think?

mdhaber · 2024-01-22T15:53:25Z

In a sense, this is arguably more prudent than returning some within-range but absurd parameters.

Sure - we would not do that. After I write that, I realize that's essentially what we're doing with yeojohnson_normmax and boxcox_normax by default... but I did suggest (#18852 (comment)) it should not be the default.

What do you think?

I forgot about this. We added the following to the documentation when we added method of moments:

Note that the standard method of moments can produce parameters for which some data are outside the support of the fitted distribution; this implementation does nothing to prevent this.

So yes, we should keep it like you have it, consistent with the documentation.

fancidev · 2024-01-23T05:17:27Z

Please add tests like in gh-18824 that show this produces fits at least as good as the generic implementation.

Do you mean to include a script in the GitHub page to compare the old (generic) MM and the new (closed-form) MM and list the result?

mdhaber · 2024-01-23T05:25:08Z

Oops, no, I mean add property-based tests to test_distributions.py. Normally I'd also ask for results of more thorough testing here in the issue (as you see in that one), but I don't think they're needed in this case because everything is closed form. These will undoubtedly be faster, more accurate, and more reliable; I don't need to be convinced.

mdhaber · 2024-01-27T09:43:53Z

Thanks for adding tests. These are different from the tests in the linked PR in that they use hard-coded samples instead of samples that are generated at random, and they use hard-coded reference values (from where?) ~~instead of using _assert_less_or_close_loglike, which is used by most of our fit tests~~ (Oops, can't use _assert_less_or_close_loglike here; the analog for method of moments would be to check that the moments of the fitted distribution match the moments of the sample). Please adjust to match these features of the tests in the referenced PR, or make the case for why this test is better. Thanks.

…19884)

The test generates a random sample from a gamma distribution and fits a gamma distribution to it. It then checks that the first r moments of the fitted distribution is equal to those of the sample, where r is the number of estimated parameters. The simulation parameters are restricted to a small-ish range to ensure the MM estimates lie within the range of valid parameter values, so that "round-trip" testing can be performed.

fancidev · 2024-01-27T13:45:32Z

Thanks for reviewing. It makes a lot of sense. I have updated the test accordingly.

Sorry about the messy "force push"es .. I seem to have a hard time getting it right!

mdhaber · 2024-01-27T17:03:26Z

Sorry about the messy "force push"es .. I seem to have a hard time getting it right!

In the future, just regular push. We can clean b up the history at the end if we want to, but almost all PRs in stats get squash-merged anyway.

mdhaber · 2024-01-28T04:40:55Z

Thanks @fancidev; this looks great now.

mdhaber · 2024-01-28T04:48:49Z

@fancidev a recent SO question brought to my attention that loglaplace documentation does not mention how it is related to laplace. Also, when the location is provided, stats.laplace.fit (which implements analytical formulas) could be used on the log-transformed data, then the parameters can be converted to those of loglaplace. Does addressing these interest you (after gh-19977 is done)?

dschmitz89 · 2024-01-28T07:56:34Z

scipy/stats/tests/test_distributions.py

+            with pytest.raises(ValueError, match=error_msg):
+                stats.halfcauchy.fit(data, method='mm', **kwds)
+            return


Why halfcauchy here if the PR is about gamma? Copy Paste hickup? @fancidev

Why halfcauchy here if the PR is about gamma? Copy Paste hickup? @fancidev

Oops, yes exactly. Let me make a PR to correct it.

fancidev · 2024-01-28T09:09:46Z

@fancidev a recent SO question brought to my attention that loglaplace documentation does not mention how it is related to laplace. Also, when the location is provided, stats.laplace.fit (which implements analytical formulas) could be used on the log-transformed data, then the parameters can be converted to those of loglaplace. Does addressing these interest you (after gh-19977 is done)?

First time I heard of log-laplace :-) Let me look it up after the other Issues/PRs are closed.

github-actions bot added scipy.stats enhancement A new feature or improvement labels Jan 21, 2024

lucascolley added this to the 1.13.0 milestone Jan 21, 2024

mdhaber reviewed Jan 22, 2024

View reviewed changes

scipy/stats/_continuous_distns.py Show resolved Hide resolved

mdhaber mentioned this pull request Jan 24, 2024

DOC: clarify scale parameter of scipy.stats.invgauss #19955

Closed

fancidev force-pushed the my-gamma-fix branch from 3dac624 to d3c3a63 Compare January 25, 2024 14:06

ilayn changed the title ~~ENH: use explicit formula for scipy.stats.gamma.fit('mm') (see #19884)~~ ENH:stats:Use explicit formula for gamma.fit('mm') Jan 25, 2024

mdhaber mentioned this pull request Jan 27, 2024

DOC: document a common alternative parameterization of invgauss. #19977

Merged

fancidev force-pushed the my-gamma-fix branch from 4289b99 to fa92f51 Compare January 27, 2024 13:38

fancidev added 3 commits January 27, 2024 21:41

ENH: use explicit formula for scipy.stats.gamma.fit('mm') (see scipy#…

3f08747

…19884)

TST: add test case for scipy.stats.gamma.fit('mm')

11e56a9

fancidev force-pushed the my-gamma-fix branch from fa92f51 to e307aab Compare January 27, 2024 13:42

mdhaber merged commit 39e8aee into scipy:main Jan 28, 2024
27 of 28 checks passed

dschmitz89 reviewed Jan 28, 2024

View reviewed changes

fancidev mentioned this pull request Jan 28, 2024

TST: correct typo in TestGamma.test_fit_mm function. #19981

Merged

fancidev mentioned this pull request Jan 31, 2024

ENH: use analytic formula to fit log-Laplace distribution #19996

Closed

fancidev deleted the my-gamma-fix branch February 4, 2024 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH:stats:Use explicit formula for gamma.fit('mm') #19932

ENH:stats:Use explicit formula for gamma.fit('mm') #19932

fancidev commented Jan 21, 2024

mdhaber left a comment •

edited

fancidev commented Jan 22, 2024

mdhaber commented Jan 22, 2024 •

edited

fancidev commented Jan 23, 2024 •

edited

mdhaber commented Jan 23, 2024 •

edited

mdhaber commented Jan 27, 2024 •

edited

fancidev commented Jan 27, 2024

mdhaber commented Jan 27, 2024

mdhaber commented Jan 28, 2024

mdhaber commented Jan 28, 2024

dschmitz89 Jan 28, 2024

fancidev Jan 28, 2024

fancidev commented Jan 28, 2024

ENH:stats:Use explicit formula for gamma.fit('mm') #19932

ENH:stats:Use explicit formula for gamma.fit('mm') #19932

Conversation

fancidev commented Jan 21, 2024

Reference issue

What does this implement/fix?

Additional information

mdhaber left a comment • edited

Choose a reason for hiding this comment

fancidev commented Jan 22, 2024

mdhaber commented Jan 22, 2024 • edited

fancidev commented Jan 23, 2024 • edited

mdhaber commented Jan 23, 2024 • edited

mdhaber commented Jan 27, 2024 • edited

fancidev commented Jan 27, 2024

mdhaber commented Jan 27, 2024

mdhaber commented Jan 28, 2024

mdhaber commented Jan 28, 2024

dschmitz89 Jan 28, 2024

Choose a reason for hiding this comment

fancidev Jan 28, 2024

Choose a reason for hiding this comment

fancidev commented Jan 28, 2024

mdhaber left a comment •

edited

mdhaber commented Jan 22, 2024 •

edited

fancidev commented Jan 23, 2024 •

edited

mdhaber commented Jan 23, 2024 •

edited

mdhaber commented Jan 27, 2024 •

edited