New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats.moment: enable non-central moment calculation #18152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @doronbehar. An early suggestion.
b625879
to
e7eb749
Compare
scipy/stats/_mstats_basic.py
Outdated
@@ -2551,7 +2551,7 @@ def _winsorize1D(a, low_limit, up_limit, low_include, up_include, | |||
upinc, contains_nan, nan_policy) | |||
|
|||
|
|||
def moment(a, moment=1, axis=0): | |||
def moment(a, moment=1, axis=0, *, central=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doronbehar mstats.moment
is not deprecated, but it is essentially obsolete now that stats.moment
supports masked arrays. To reduce the scope of the PR with little impact to users, let's only add this parameter to stats.moment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have totally missed the fact that I was editing the mstats.moment
function and not the stats.moment
function! However, Maybe it is mandatory to change moment from mstats_basic
? Note my review comment in the upcoming push, that edits both mstats.moment
and stats.moment
similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to mstats.moment
in stats.moment
is probably no longer necessary. The decorator on stats.moment
should take care of this. Please try removing the call to mstats.moment
from stats.moment
and edit only stats.moment
.
If you run into any test failures because of this, we'll deal with them later.
Let's also only change only stats.moment
(public function). _moment
is private, and it looks like changing the name of the existing parameter mean
is causing a lot of problems. I guess mean
is not the best name for that function either, but it already exists and it's private, so I'm not too concerned about it. We can fix that separately, if you'd like, but I'd rather not add it to the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think I got it. I pushed again, rebased onto main
, but not squashed. I had to rebase in order to make sure I changed as less as possible, as you requested. Haven't tested anything yet.
scipy/stats/_mstats_basic.py
Outdated
The point about which moments are taken. This can be the sample mean, the | ||
origin, or any other be point. If None, compute from the sample. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point about which moments are taken. This can be the sample mean, the | |
origin, or any other be point. If None, compute from the sample. | |
The point about which moments are taken. This can be the sample mean, | |
the origin, or any other point. If None (default), use the mean | |
computed from the sample. |
Need to specify what is computed from the sample. Also, we need to specify that computing the mean from the sample is the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to specify what is computed from the sample. Also, we need to specify that computing the mean from the sample is the default.
Done!
scipy/stats/_mstats_basic.py
Outdated
@@ -2551,7 +2551,7 @@ def _winsorize1D(a, low_limit, up_limit, low_include, up_include, | |||
upinc, contains_nan, nan_policy) | |||
|
|||
|
|||
def moment(a, moment=1, axis=0): | |||
def moment(a, moment=1, axis=0, *, central=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"central" is an adjective, which suggests that the user would pass a boolean (e.g. "yes, I want the central moment").
I suggested the noun, "center". The user passes the center about which the moment is taken. (e.g. "take the moment about 0").
In any case, this should be a noun of some sort if it expects the user to pass a real number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"central" is an adjective, which suggests that the user would pass a boolean (e.g. "yes, I want the central moment").
I suggested the noun, "center". The user passes the center about which the moment is taken. (e.g. "take the moment about 0").
In any case, this should be a noun of some sort if it expects the user to pass a real number.
Totally agree, it was probably a mistake.
scipy/stats/_mstats_basic.py
Outdated
# for array_like moment input, return a value for each. | ||
if not np.isscalar(moment): | ||
mean = a.mean(axis, keepdims=True) | ||
mmnt = [_moment(a, i, axis, mean=mean) for i in moment] | ||
mmnt = [_moment(a, i, axis, mean) for i in moment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmnt = [_moment(a, i, axis, mean) for i in moment] | |
mmnt = [_moment(a, i, axis, center) for i in moment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed mean
-> center
!
scipy/stats/_mstats_basic.py
Outdated
return ma.array(mmnt) | ||
else: | ||
return _moment(a, moment, axis) | ||
return _moment(a, moment, axis, mean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return _moment(a, moment, axis, mean) | |
return _moment(a, moment, axis, center) |
scipy/stats/_mstats_basic.py
Outdated
if central is None: | ||
mean = a.mean(axis, keepdims=True) | ||
else: | ||
mean = central |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if central is None: | |
mean = a.mean(axis, keepdims=True) | |
else: | |
mean = central | |
if center is None: | |
center = a.mean(axis, keepdims=True) |
Using mean
for this variable suggests that it is always the mean. It is not always the mean (which is the purpose of the PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
mean
for this variable suggests that it is always the mean. It is not always the mean (which is the purpose of the PR).
Thanks for the suggestion! The code I wrote was probably a functional programming habit leftover 😛.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding mstats.moment
and stats.moment
and the associated _moment
functions, perhaps could there be less code duplicacy?
scipy/stats/_mstats_basic.py
Outdated
@@ -2551,7 +2551,7 @@ def _winsorize1D(a, low_limit, up_limit, low_include, up_include, | |||
upinc, contains_nan, nan_policy) | |||
|
|||
|
|||
def moment(a, moment=1, axis=0): | |||
def moment(a, moment=1, axis=0, *, central=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have totally missed the fact that I was editing the mstats.moment
function and not the stats.moment
function! However, Maybe it is mandatory to change moment from mstats_basic
? Note my review comment in the upcoming push, that edits both mstats.moment
and stats.moment
similarly.
scipy/stats/_mstats_basic.py
Outdated
@@ -2551,7 +2551,7 @@ def _winsorize1D(a, low_limit, up_limit, low_include, up_include, | |||
upinc, contains_nan, nan_policy) | |||
|
|||
|
|||
def moment(a, moment=1, axis=0): | |||
def moment(a, moment=1, axis=0, *, central=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"central" is an adjective, which suggests that the user would pass a boolean (e.g. "yes, I want the central moment").
I suggested the noun, "center". The user passes the center about which the moment is taken. (e.g. "take the moment about 0").
In any case, this should be a noun of some sort if it expects the user to pass a real number.
Totally agree, it was probably a mistake.
scipy/stats/_mstats_basic.py
Outdated
The point about which moments are taken. This can be the sample mean, the | ||
origin, or any other be point. If None, compute from the sample. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to specify what is computed from the sample. Also, we need to specify that computing the mean from the sample is the default.
Done!
scipy/stats/_mstats_basic.py
Outdated
if central is None: | ||
mean = a.mean(axis, keepdims=True) | ||
else: | ||
mean = central |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
mean
for this variable suggests that it is always the mean. It is not always the mean (which is the purpose of the PR).
Thanks for the suggestion! The code I wrote was probably a functional programming habit leftover 😛.
scipy/stats/_mstats_basic.py
Outdated
# for array_like moment input, return a value for each. | ||
if not np.isscalar(moment): | ||
mean = a.mean(axis, keepdims=True) | ||
mmnt = [_moment(a, i, axis, mean=mean) for i in moment] | ||
mmnt = [_moment(a, i, axis, mean) for i in moment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed mean
-> center
!
scipy/stats/_stats_py.py
Outdated
@@ -1078,19 +1084,19 @@ def moment(a, moment=1, axis=0, nan_policy='propagate'): | |||
|
|||
if contains_nan and nan_policy == 'omit': | |||
a = ma.masked_invalid(a) | |||
return mstats_basic.moment(a, moment, axis) | |||
return mstats_basic.moment(a, moment, axis, center) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since mstats_basic.moment
is called here, this PR requires changing it as well?
Also include review requested `mean` -> `center` argument rename.
The call to mstats.moment in stats.moment is no longer necessary.
5c9becd
to
1052a2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @doronbehar, this looks much better.
As we continue, please don't rebase or force-push. We'll_ squash-merge this at the end, so there is no need to rewrite history until then.
Merging is fine, and then you can push without forcing. In fact, I'd suggest merging upstream/main
, and then I think the existing CI errors will disappear.
The changes to _stats_py.py
look great; just one suggestion about the description of center
. I want to specify what is calculated from the sample - the sample mean.
One thing this needs is a test. Please add a new test_moment_center
method to the TestMoments
class of test_stats.py
. I would suggest two cases:
- Calculate the second moment about the origin, and compare against the naive implementation (i.e.
np.sum(x**2)/x.size
) - Calculate the third moment about an arbitrary point and compare against the naive implementation.
Another thing to consider is edge cases. What if the order of the moment is 1? Does this do the right thing? (I don't think so.) How about order zero? These would need to be tested, too.
* main: (52 commits) ENH: stats.vonmises.fit: treat case in which location likelihood equation has no solution(scipy#18190) MAINT: stats.kendalltau: avoid overflow (scipy#18193) DOC: Optimize: Fix for side bar rendering on top of Hessian (scipy#18189) MAINT: optimize.linprog: fix bound checks for integrality > 1 (scipy#18160) MAINT: Windows distutils cdist/pdist shims (scipy#18169) BUG: interpolate: add x-y length validation for `make_smoothing_spline`. (scipy#18188) ENH: Added `_sf` method for anglit distribution (scipy#17832) (scipy#18178) DOC: Fixed missing curly bracket in scipy.css DOC: Improving wording and docs for legacy directive DOC: Move legacy directive to not be first in the file DOC: Legacy directive custom styling DOC: Add optional argument to Legacy directive DOC: Ignore legacy directive in refguide_check DOC: Documenting the usage of the legacy directive DOC: Add legacy directive for documentation MAINT: stats.ecdf: store number at risk just before events (scipy#18187) DOC: cite pip issue about multiple `--config-settings` (scipy#18174) DOC: update links for ARPACK to point to ARPACK-NG (scipy#18173) MAINT: stats.logistic.fit: simplify MAINT: optimize.root_scalar: return gracefully when callable returns NaN (scipy#18172) ...
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
I tried to add some tests, but most of them seem to fail, and I'm not sure why :(. For moments of order greater then 1 I get this weird
And for moments of order 1, I get simply an assertion error - no matter what is the center I use, Haven't looked into the edge cases yet, trying to get the simple ones working.. |
This command will be handy:
|
There are special cases in For the from scipy.stats import moment
moment([1, 2, 3, 4, 5], moment=1) # no ValueError
moment([1 2, 3, 4, 5], moment=1) # ValueError From here on, please do not rebase or force push; merging main and regular-push is fine if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see doronbehar#1. After merging, please address remaining comments.
* main: (51 commits) ENH: stats.ttest_ind: add degrees of freedom and confidence interval (scipy#18210) DOC fix link in release notes v1.7 (scipy#18258) BLD: fix missing build dependency on cython signature .txt files DOC: Clarify executable tutorials and MyST/Jupytext ENH: stats: Add relativistic Breit-Wigner Distribution (scipy#17505) MAINT: setup.sh as executable file DOC: remove content related to `setup.py` usage from the docs (scipy#18245) DOC: orthogonal_procrustes fix date of reference paper and DOI (scipy#18251) BLD: implement version check for minimum Cython version ci: touch up wheel build action DOC: link to info about Gitpod. ENH: ndimage.gaussian_filter: add `axes` keyword-only argument (scipy#18016) BUG: sparse: Fix LIL full-matrix assignment (scipy#18211) STY: Remove unnecessary semicolon DOC: Pin docutils and filter additional sphinx warnings BUG: vq.kmeans() compares signed diff to a threshold. (scipy#8727) MAINT: stats: consistently return NumPy numbers (scipy#18217) TST: stats.dunnett: fix seed and filter warnings in test_shapes DOC: add info to use codespaces. MAINT: add devcontainer configuration ...
From some reason, I'm getting pytest errors unrelated to this PR:
Merged branch |
They are related to this PR; in fact, they are related to the changes you removed from my PR. (You were right that |
Weird. Indeed that test failure doesn't appear on branch |
It fails when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @doronbehar, this LGTM. The new CI failure is actually unrelated; I've seen it elswehere. Mailing list post is here.
@tirthasheshpatel, since I worked on this a fair amount, I'm not sure mine should be the only review. Would you be willing to take a look and merge if you approve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both. Changes and tests LGTM. The review and discussion was quite intensive, I don't have more to add. CI seems unrelated. Let's get this in now.
Reference issue
Closes gh-17749
What does this implement/fix?
Described in gh-17749.
Additional information
No unit tests and documentation added, just opened for reviewers to comment on the API - hence it's a draft.