ENH: stats.ansari: add axis / nan_policy / keepdims support #19371

mdhaber · 2023-10-11T15:59:17Z

Reference issue

What does this implement/fix?

Adds axis / nan_policy / keepdims support to scipy.stats.ansari.

Additional information

The one thing I noticed is that currently, for Nd input, ansari returns a scalar result. However, the result seems to be garbage - it's different from axis=None / raveling the arrays before passing them to the function. So I think it's safe to let axis=0 be the default since there's no useful behavior to maintain.

rgommers · 2023-10-12T13:26:24Z

Two CI failures are unrelated, I'll open a PR for those now.

tirthasheshpatel · 2023-10-19T15:44:02Z

Slightly modified code to test the ansari func:

import numpy as np
from scipy import stats
from numpy.ma import masked_array as ma

fun = stats.ansari

def try_case(x, y, message, kwds):
    try:
        print(message)
        res = fun(x, y, **kwds)
        print(res)
        print(type(res), res.statistic.dtype, res.pvalue.dtype)
    except AttributeError:
        print(type(res))
    except Exception as e:
        print(f"{type(e)}: {str(e)}")

    try:
        print("vs (master)")
        res = fun(x, y, **kwds,_no_deco=True)
        print(res)
        print(type(res), res.statistic.dtype, res.pvalue.dtype)
    except AttributeError:
        print(type(res))
    except Exception as e:
        print(f"{type(e)}: {str(e)}")

    print("----")

np.random.seed(0)
random_array = np.random.rand(4, 6)
random_mask = np.random.rand(4, 6) > 0.8
random_nan_array = random_array.copy()
random_nan_array[random_mask] = np.nan
random_ma = np.ma.masked_array(random_array, mask=random_mask)
random_ma32 = np.ma.masked_array(random_array, mask=random_mask, dtype=np.float32)

cases = [
          ([], [], 'Zero Observations:', {}),
          ([1.], [1.], 'One Observation:', {}),
          ([1., 2.], [1., 2.], 'Two Observations', {}),
          (random_array, random_array, '2D Array, no axis', {}),
          (random_array, random_array, '2D Array, axis=0', {'axis': 0}),
          (random_array, random_array, '2D Array, axis=1', {'axis': 1}),
          (np.matrix(random_array), np.matrix(random_array), 'Matrix, axis=1', {'axis': 1}, ),
          (random_array, random_array, '2D Array, axis=None', {'axis': None}),
          (random_array, random_array, '2D Array, axis=(0,1)', {'axis': (0, 1)}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, No nan_policy', {}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, raise', {'nan_policy': 'raise'}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, propagate', {'nan_policy': 'propagate'}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, omit', {'nan_policy': 'omit'}),
          ([np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan], '1D Array with All NaNs, omit', {'nan_policy': 'omit'}),
          ([np.nan, np.nan, 3.], [np.nan, np.nan, 3.], '1D Array with all but one NaN, omit', {'nan_policy': 'omit'}),
          (ma([0., 1., 2., 3.], mask=[True, False, False, False]), ma([0., 1., 2., 3.], mask=[True, False, False, False]), '1D Masked array, one masked', {}),
          (ma([1., 2., 3.], mask=[True, True, True]), ma([1., 2., 3.], mask=[True, True, True]), '1D Masked array, all masked', {}),
          (ma([1., 2., 3.], mask=[True, True, False]), ma([1., 2., 3.], mask=[True, True, False]), '1D Masked array, all but one masked', {}),
          (random_nan_array, random_nan_array, '2D Array with NaNs, propagate, axis=1', {'axis': 1, 'nan_policy': 'propagate'}, ),
          (random_nan_array, random_nan_array, '2D Array with NaNs, omit, axis=1', {'axis': 1, 'nan_policy': 'omit'}, ),
          (random_ma, random_ma, '2D Masked array, axis=1', {'axis': 1}, ),
          (random_array.astype(np.float32), random_array.astype(np.float32), '2D Array, float32, axis=1', {'axis': 1}),
          (random_nan_array.astype(np.float32), random_nan_array.astype(np.float32), '2D Array with NaNs, float32, propagate, axis=1', {'axis': 1, 'nan_policy': 'propagate'}, ),
          (random_ma32, random_ma32, '2D Masked Array, float32, axis=1', {'axis': 1}, ),
          ]

for case in cases:
    try_case(*case)

For 2D arrays, Ansari seems to be raveling the arrays, so I think passing default_axis=None in the decorator would resolve this:

2D Array, no axis
AnsariResult(statistic=array([10., 10., 10., 10., 10., 10.]), pvalue=array([1., 1., 1., 1., 1., 1.]))
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
AnsariResult(statistic=-52.0, pvalue=0.41830285225706465)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64

Ansari didn't respect masks before, so different results here:

1D Masked array, one masked
AnsariResult(statistic=6.5, pvalue=0.7670968684102772)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
AnsariResult(statistic=10.0, pvalue=1.0)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64

1D Masked array, all masked
<class 'ValueError'>: Not enough other observations.
vs (master)
AnsariResult(statistic=6.5, pvalue=0.7670968684102772)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64

1D Masked array, all but one masked
AnsariResult(statistic=1.5, pvalue=0.6547208460185769)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
AnsariResult(statistic=6.5, pvalue=0.7670968684102772)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64

One dtype discrepancy:

2D Masked Array, float32, axis=1
AnsariResult(statistic=array([15.5, 21. , 15.5, 21. ]), pvalue=array([0.85895492, 1.        , 0.85895492, 1.        ]))
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
<class 'TypeError'>: ansari() got an unexpected keyword argument 'axis'

mdhaber · 2023-10-19T16:00:38Z

For 2D arrays, Ansari seems to be raveling the arrays, so I think passing default_axis=None in the decorator would resolve this:

Please see top post.

Ansari didn't respect masks before, so different results here:

Yes, this is to be expected. It's only a problem if it did try to respect masks before but somehow the new behavior is different.

One dtype discrepancy:

ansari didn't respect dtypes to begin with.

import scipy
print(scipy.__version__)  # 1.11.3
import numpy as np
from scipy import stats
rng = np.random.default_rng(2345265324562)
x = rng.random(6).astype(np.float32)
y = rng.random(7).astype(np.float32)
res = stats.ansari(x, y)
print(res.statistic.dtype, res.pvalue.dtype)  # float64 float64

I didn't plan on fixing dtype issues on this pass.

tirthasheshpatel

Played around a bit with this and can't find any obvious issues. So, LGTM!

ENH: stats.ansari: add axis / nan_policy / keepdims support

106ac4d

mdhaber added scipy.stats enhancement A new feature or improvement labels Oct 11, 2023

mdhaber requested a review from tirthasheshpatel October 13, 2023 11:15

Merge branch 'main' into ansari_anp

eeeae75

mdhaber added this to the 1.12.0 milestone Oct 15, 2023

mdhaber mentioned this pull request Oct 16, 2023

ENH: stats.bartlett: add axis / nan_policy / keepdims support #19392

Merged

tirthasheshpatel approved these changes Oct 19, 2023

View reviewed changes

tirthasheshpatel merged commit ae390d7 into scipy:main Oct 19, 2023
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: stats.ansari: add axis / nan_policy / keepdims support #19371

ENH: stats.ansari: add axis / nan_policy / keepdims support #19371

mdhaber commented Oct 11, 2023

rgommers commented Oct 12, 2023

tirthasheshpatel commented Oct 19, 2023 •

edited

mdhaber commented Oct 19, 2023 •

edited

tirthasheshpatel left a comment

ENH: stats.ansari: add axis / nan_policy / keepdims support #19371

ENH: stats.ansari: add axis / nan_policy / keepdims support #19371

Conversation

mdhaber commented Oct 11, 2023

Reference issue

What does this implement/fix?

Additional information

rgommers commented Oct 12, 2023

tirthasheshpatel commented Oct 19, 2023 • edited

mdhaber commented Oct 19, 2023 • edited

tirthasheshpatel left a comment

Choose a reason for hiding this comment

tirthasheshpatel commented Oct 19, 2023 •

edited

mdhaber commented Oct 19, 2023 •

edited