ENH: Return array scalars for skew and kurtosis #12634

Kai-Striega · 2020-07-29T12:45:43Z

This PR returns array scalars for the mstats version of skew and kurtosis and adds a test that the returned value is indeed a scalar value.

Reference issue

fixes #12548

What does this implement/fix?

The masked skew and kurtosis function have branches where the resultant masked array is not converted to an array scalar leading to a masked array being returned as below:

>>> kurtosis([1,5,4,6,nan], fisher=False, nan_policy="omit")
masked_array(data=2.,  #  Correct value but a masked array
             mask=False,
       fill_value=1e+20)

This PR adds an addition step so that, in the case of a 1d array, a scalar value is returned.

>>> kurtosis([1,5,4,6,nan], fisher=False, nan_policy="omit")
2.0

Additional information

I'm not sure whether this is the correct approach; and haven't worked in the mstats modules enough to know if there is a better approach. I'd appreciate if someone with more experience could give their opinion on whether there is a better way to handle it.

This PR returns array scalars for the mstats version of skew and kurtosis and adds a test that the returned value is indeed a scalar value.

This commit changes ``describe(...)`` to use the new array scalars returned by ``skew(...)`` and ``kurtosis(...)`` instead of the original masked arrays.

…ts_array_scalar

Kai-Striega · 2020-07-30T10:04:57Z

@WarrenWeckesser I was wondering if you could take a quick look at this? The solution was to changevals -> vals + 0. As vals is a masked array it will convert to a scalar if the number of dimensions is 0. It's passing (the failing tests are due to memory errors) but I'm not sure if it had been better to just use an if statement.

sethtroisi · 2020-07-30T10:01:59Z

scipy/stats/mstats_basic.py

@@ -2300,7 +2300,9 @@ def skew(a, axis=0, bias=True):
            m3 = np.extract(can_correct, m3)
            nval = ma.sqrt((n-1.0)*n)/(n-2.0)*m3/m2**1.5
            np.place(vals, can_correct, nval)
-    return vals
+    # Add 0 to ensure a scalar result is returned


should the docstring at L2280 should be updated?

Returns ------- skewness : ndarray The skewness of values along an axis, returning 0 where all values are equal.

it seems in this case it's returning a scalar instead of a ndarray

In [30]: skew([[1,5,4],[5,1,3],[4,7,nan]], nan_policy="omit") + 0 Out[30]: masked_array(data=[-0.52800498, -0.38180177, 0. ], mask=False, fill_value=1e+20)

Will the new code handle this case? maybe vals.data is what we actually want?

Edit Yes I think the docstring should be updated, but let's get the rest of the discussion done before I make any more changes.

I'm not sure what we want in that case. To me the problem wasn't from the masked array - it is that an array is being returned at all when there is only a single dimension. Like other ufuncs. I don't have a strong preference either way but I'd like to be consistent

vals.data seems to correctly be a scalar vs ndarray. spitballing it looks like

vals = vals.data if fisher: return vals - 3 else: return vals

Seems to work

I've just run the above and it still gives a float value for fisher and an array (not a masked array) for the regular version.

>>> import numpy as np >>> from scipy.stats import kurtosis >>> kurtosis([1, 5, 4, 6, np.nan], nan_policy="omit") -1.0 >>> kurtosis([1, 5, 4, 6, np.nan], fisher=False, nan_policy="omit") array(2.)

Would we expect both results to be a float instead? i.e.

vals = vals.data if fisher: return vals - 3 else: return vals + 0

The masked array's data attribute accessesses the underlying data and returns it as an ndarray.

rlucas7 · 2020-11-15T23:01:31Z

scipy/stats/tests/test_stats.py

+    def test_skew_omit_nan(self):
+        # Test that skew returns a scalar when nan_policy is omit
+        # https://github.com/scipy/scipy/issues/12548
+        s = stats.skew([1, 5, 4, 6, np.nan], nan_policy='omit')


Suggested change

s = stats.skew([1, 5, 4, 6, np.nan], nan_policy='omit')

s = stats.skew([1, 5, 4, 6, np.nan], fisher=True, nan_policy='omit')

This suggestion is primarily to make explicit that both options for fisher arg are tested

@rlucas7 sorry about taking so long to reply - I've been away for a long time, but I'm trying to get more involved again now. stats.skew doesn't actually accept a fisher kwarg, only kurtosis. Perhaps this should be changed but that should be another PR.

rlucas7

@Kai-Striega thanks for this changeset

This looks like it fixes the issue and makes masked arrays consistent in shape.

I left 1 suggestion inline. The rest of the changes seem ok.

Also can you add something to the docstring to indicate that the skewness is -3, 0 depending on fisher Truthyness when n=1.

Something like:

When `a` has 1 record skew returns a -3 or 0 if fisher is true or false, respectively.
When `a` is empty skew returns a `nan`.

and similarly for kurtosis?

Other than those small docstring changes this seems to be good.

rlucas7 · 2020-11-15T23:12:11Z

scipy/stats/tests/test_stats.py

+    def test_skew_omit_nan(self):
+        # Test that skew returns a scalar when nan_policy is omit
+        # https://github.com/scipy/scipy/issues/12548
+        s = stats.skew([1, 5, 4, 6, np.nan], nan_policy='omit')


This suggestion is primarily to make explicit that both options for fisher arg are tested

mdhaber · 2022-02-20T08:16:21Z

@Kai-Striega I don't think this is needed now that the decorator has been applied to skew and kurtosis.

Now in main:

import numpy as np
from scipy.stats import kurtosis
kurtosis([1, 5, 4, 6, np.nan], fisher=False, nan_policy="omit")  # 2.0

Please re-open if you think the masked versions themselves should be fixed, but I think we're working toward deprecating mstats.

ENH: Return array scalars for skew and kurtosis

26305b0

This PR returns array scalars for the mstats version of skew and kurtosis and adds a test that the returned value is indeed a scalar value.

Kai-Striega added the scipy.stats label Jul 29, 2020

Kai-Striega added 3 commits July 29, 2020 21:05

TST: Add test for kurtosis with fisher=False

602e593

ENH: Fix describe(...) new values

9cfb31c

This commit changes ``describe(...)`` to use the new array scalars returned by ``skew(...)`` and ``kurtosis(...)`` instead of the original masked arrays.

Merge branch 'master' of https://github.com/scipy/scipy into fix_msta…

df3545e

…ts_array_scalar

sethtroisi reviewed Jul 30, 2020

View reviewed changes

Kai-Striega added 2 commits July 31, 2020 10:40

DOC: Document float as possible return type

f25c26f

ENH: Use data attribute in skew and kurtosis

e07832f

The masked array's data attribute accessesses the underlying data and returns it as an ndarray.

rlucas7 reviewed Nov 15, 2020

View reviewed changes

rlucas7 requested changes Nov 15, 2020

View reviewed changes

mdhaber closed this Feb 20, 2022

Kai-Striega deleted the fix_mstats_array_scalar branch February 20, 2022 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Return array scalars for skew and kurtosis #12634

ENH: Return array scalars for skew and kurtosis #12634

Kai-Striega commented Jul 29, 2020 •

edited

Loading

Kai-Striega commented Jul 30, 2020

sethtroisi Jul 30, 2020

sethtroisi Jul 30, 2020

Kai-Striega Jul 30, 2020 •

edited

Loading

sethtroisi Jul 30, 2020

Kai-Striega Jul 31, 2020 •

edited

Loading

rlucas7 Nov 15, 2020

rlucas7 Nov 15, 2020

Kai-Striega Oct 10, 2021

rlucas7 left a comment

rlucas7 Nov 15, 2020

mdhaber commented Feb 20, 2022

	s = stats.skew([1, 5, 4, 6, np.nan], nan_policy='omit')
	s = stats.skew([1, 5, 4, 6, np.nan], fisher=True, nan_policy='omit')

ENH: Return array scalars for skew and kurtosis #12634

ENH: Return array scalars for skew and kurtosis #12634

Conversation

Kai-Striega commented Jul 29, 2020 • edited Loading

Reference issue

What does this implement/fix?

Additional information

Kai-Striega commented Jul 30, 2020

sethtroisi Jul 30, 2020

Choose a reason for hiding this comment

sethtroisi Jul 30, 2020

Choose a reason for hiding this comment

Kai-Striega Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

sethtroisi Jul 30, 2020

Choose a reason for hiding this comment

Kai-Striega Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

rlucas7 Nov 15, 2020

Choose a reason for hiding this comment

rlucas7 Nov 15, 2020

Choose a reason for hiding this comment

Kai-Striega Oct 10, 2021

Choose a reason for hiding this comment

rlucas7 left a comment

Choose a reason for hiding this comment

rlucas7 Nov 15, 2020

Choose a reason for hiding this comment

mdhaber commented Feb 20, 2022

Kai-Striega commented Jul 29, 2020 •

edited

Loading

Kai-Striega Jul 30, 2020 •

edited

Loading

Kai-Striega Jul 31, 2020 •

edited

Loading