-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Return array scalars for skew and kurtosis #12634
Conversation
This PR returns array scalars for the mstats version of skew and kurtosis and adds a test that the returned value is indeed a scalar value.
This commit changes ``describe(...)`` to use the new array scalars returned by ``skew(...)`` and ``kurtosis(...)`` instead of the original masked arrays.
…ts_array_scalar
@WarrenWeckesser I was wondering if you could take a quick look at this? The solution was to change |
@@ -2300,7 +2300,9 @@ def skew(a, axis=0, bias=True): | |||
m3 = np.extract(can_correct, m3) | |||
nval = ma.sqrt((n-1.0)*n)/(n-2.0)*m3/m2**1.5 | |||
np.place(vals, can_correct, nval) | |||
return vals | |||
# Add 0 to ensure a scalar result is returned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the docstring at L2280 should be updated?
Returns
-------
skewness : ndarray
The skewness of values along an axis, returning 0 where all values are
equal.
it seems in this case it's returning a scalar instead of a ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [30]: skew([[1,5,4],[5,1,3],[4,7,nan]], nan_policy="omit") + 0
Out[30]:
masked_array(data=[-0.52800498, -0.38180177, 0. ],
mask=False,
fill_value=1e+20)
Will the new code handle this case? maybe vals.data
is what we actually want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit Yes I think the docstring should be updated, but let's get the rest of the discussion done before I make any more changes.
I'm not sure what we want in that case. To me the problem wasn't from the masked array - it is that an array is being returned at all when there is only a single dimension. Like other ufuncs. I don't have a strong preference either way but I'd like to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vals.data
seems to correctly be a scalar vs ndarray. spitballing it looks like
vals = vals.data
if fisher:
return vals - 3
else:
return vals
Seems to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just run the above and it still gives a float value for fisher and an array (not a masked array) for the regular version.
>>> import numpy as np
>>> from scipy.stats import kurtosis
>>> kurtosis([1, 5, 4, 6, np.nan], nan_policy="omit")
-1.0
>>> kurtosis([1, 5, 4, 6, np.nan], fisher=False, nan_policy="omit")
array(2.)
Would we expect both results to be a float instead? i.e.
vals = vals.data
if fisher:
return vals - 3
else:
return vals + 0
The masked array's data attribute accessesses the underlying data and returns it as an ndarray.
def test_skew_omit_nan(self): | ||
# Test that skew returns a scalar when nan_policy is omit | ||
# https://github.com/scipy/scipy/issues/12548 | ||
s = stats.skew([1, 5, 4, 6, np.nan], nan_policy='omit') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s = stats.skew([1, 5, 4, 6, np.nan], nan_policy='omit') | |
s = stats.skew([1, 5, 4, 6, np.nan], fisher=True, nan_policy='omit') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion is primarily to make explicit that both options for fisher
arg are tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlucas7 sorry about taking so long to reply - I've been away for a long time, but I'm trying to get more involved again now. stats.skew
doesn't actually accept a fisher
kwarg, only kurtosis. Perhaps this should be changed but that should be another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kai-Striega thanks for this changeset
This looks like it fixes the issue and makes masked arrays consistent in shape.
I left 1 suggestion inline. The rest of the changes seem ok.
Also can you add something to the docstring to indicate that the skewness is -3, 0 depending on fisher Truthyness when n=1.
Something like:
When `a` has 1 record skew returns a -3 or 0 if fisher is true or false, respectively.
When `a` is empty skew returns a `nan`.
and similarly for kurtosis
?
Other than those small docstring changes this seems to be good.
def test_skew_omit_nan(self): | ||
# Test that skew returns a scalar when nan_policy is omit | ||
# https://github.com/scipy/scipy/issues/12548 | ||
s = stats.skew([1, 5, 4, 6, np.nan], nan_policy='omit') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion is primarily to make explicit that both options for fisher
arg are tested
@Kai-Striega I don't think this is needed now that the decorator has been applied to Now in main: import numpy as np
from scipy.stats import kurtosis
kurtosis([1, 5, 4, 6, np.nan], fisher=False, nan_policy="omit") # 2.0 Please re-open if you think the masked versions themselves should be fixed, but I think we're working toward deprecating mstats. |
This PR returns array scalars for the mstats version of skew and kurtosis and adds a test that the returned value is indeed a scalar value.
Reference issue
fixes #12548
What does this implement/fix?
The masked skew and kurtosis function have branches where the resultant masked array is not converted to an array scalar leading to a masked array being returned as below:
This PR adds an addition step so that, in the case of a 1d array, a scalar value is returned.
Additional information
I'm not sure whether this is the correct approach; and haven't worked in the mstats modules enough to know if there is a better approach. I'd appreciate if someone with more experience could give their opinion on whether there is a better way to handle it.