Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: stats.ansari: add axis / nan_policy / keepdims support #19371

Merged
merged 2 commits into from Oct 19, 2023

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Oct 11, 2023

Reference issue

Toward gh-14651

What does this implement/fix?

Adds axis / nan_policy / keepdims support to scipy.stats.ansari.

Additional information

The one thing I noticed is that currently, for Nd input, ansari returns a scalar result. However, the result seems to be garbage - it's different from axis=None / raveling the arrays before passing them to the function. So I think it's safe to let axis=0 be the default since there's no useful behavior to maintain.

@mdhaber mdhaber added scipy.stats enhancement A new feature or improvement labels Oct 11, 2023
@rgommers
Copy link
Member

Two CI failures are unrelated, I'll open a PR for those now.

@tirthasheshpatel
Copy link
Member

tirthasheshpatel commented Oct 19, 2023

Slightly modified code to test the ansari func:
import numpy as np
from scipy import stats
from numpy.ma import masked_array as ma

fun = stats.ansari

def try_case(x, y, message, kwds):
    try:
        print(message)
        res = fun(x, y, **kwds)
        print(res)
        print(type(res), res.statistic.dtype, res.pvalue.dtype)
    except AttributeError:
        print(type(res))
    except Exception as e:
        print(f"{type(e)}: {str(e)}")

    try:
        print("vs (master)")
        res = fun(x, y, **kwds,_no_deco=True)
        print(res)
        print(type(res), res.statistic.dtype, res.pvalue.dtype)
    except AttributeError:
        print(type(res))
    except Exception as e:
        print(f"{type(e)}: {str(e)}")

    print("----")

np.random.seed(0)
random_array = np.random.rand(4, 6)
random_mask = np.random.rand(4, 6) > 0.8
random_nan_array = random_array.copy()
random_nan_array[random_mask] = np.nan
random_ma = np.ma.masked_array(random_array, mask=random_mask)
random_ma32 = np.ma.masked_array(random_array, mask=random_mask, dtype=np.float32)

cases = [
          ([], [], 'Zero Observations:', {}),
          ([1.], [1.], 'One Observation:', {}),
          ([1., 2.], [1., 2.], 'Two Observations', {}),
          (random_array, random_array, '2D Array, no axis', {}),
          (random_array, random_array, '2D Array, axis=0', {'axis': 0}),
          (random_array, random_array, '2D Array, axis=1', {'axis': 1}),
          (np.matrix(random_array), np.matrix(random_array), 'Matrix, axis=1', {'axis': 1}, ),
          (random_array, random_array, '2D Array, axis=None', {'axis': None}),
          (random_array, random_array, '2D Array, axis=(0,1)', {'axis': (0, 1)}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, No nan_policy', {}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, raise', {'nan_policy': 'raise'}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, propagate', {'nan_policy': 'propagate'}),
          ([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, omit', {'nan_policy': 'omit'}),
          ([np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan], '1D Array with All NaNs, omit', {'nan_policy': 'omit'}),
          ([np.nan, np.nan, 3.], [np.nan, np.nan, 3.], '1D Array with all but one NaN, omit', {'nan_policy': 'omit'}),
          (ma([0., 1., 2., 3.], mask=[True, False, False, False]), ma([0., 1., 2., 3.], mask=[True, False, False, False]), '1D Masked array, one masked', {}),
          (ma([1., 2., 3.], mask=[True, True, True]), ma([1., 2., 3.], mask=[True, True, True]), '1D Masked array, all masked', {}),
          (ma([1., 2., 3.], mask=[True, True, False]), ma([1., 2., 3.], mask=[True, True, False]), '1D Masked array, all but one masked', {}),
          (random_nan_array, random_nan_array, '2D Array with NaNs, propagate, axis=1', {'axis': 1, 'nan_policy': 'propagate'}, ),
          (random_nan_array, random_nan_array, '2D Array with NaNs, omit, axis=1', {'axis': 1, 'nan_policy': 'omit'}, ),
          (random_ma, random_ma, '2D Masked array, axis=1', {'axis': 1}, ),
          (random_array.astype(np.float32), random_array.astype(np.float32), '2D Array, float32, axis=1', {'axis': 1}),
          (random_nan_array.astype(np.float32), random_nan_array.astype(np.float32), '2D Array with NaNs, float32, propagate, axis=1', {'axis': 1, 'nan_policy': 'propagate'}, ),
          (random_ma32, random_ma32, '2D Masked Array, float32, axis=1', {'axis': 1}, ),
          ]

for case in cases:
    try_case(*case)

For 2D arrays, Ansari seems to be raveling the arrays, so I think passing default_axis=None in the decorator would resolve this:

2D Array, no axis
AnsariResult(statistic=array([10., 10., 10., 10., 10., 10.]), pvalue=array([1., 1., 1., 1., 1., 1.]))
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
AnsariResult(statistic=-52.0, pvalue=0.41830285225706465)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64

Ansari didn't respect masks before, so different results here:

1D Masked array, one masked
AnsariResult(statistic=6.5, pvalue=0.7670968684102772)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
AnsariResult(statistic=10.0, pvalue=1.0)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
1D Masked array, all masked
<class 'ValueError'>: Not enough other observations.
vs (master)
AnsariResult(statistic=6.5, pvalue=0.7670968684102772)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
1D Masked array, all but one masked
AnsariResult(statistic=1.5, pvalue=0.6547208460185769)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
AnsariResult(statistic=6.5, pvalue=0.7670968684102772)
<class 'scipy.stats._morestats.AnsariResult'> float64 float64

One dtype discrepancy:

2D Masked Array, float32, axis=1
AnsariResult(statistic=array([15.5, 21. , 15.5, 21. ]), pvalue=array([0.85895492, 1.        , 0.85895492, 1.        ]))
<class 'scipy.stats._morestats.AnsariResult'> float64 float64
vs (master)
<class 'TypeError'>: ansari() got an unexpected keyword argument 'axis'

@mdhaber
Copy link
Contributor Author

mdhaber commented Oct 19, 2023

For 2D arrays, Ansari seems to be raveling the arrays, so I think passing default_axis=None in the decorator would resolve this:

Please see top post.

Ansari didn't respect masks before, so different results here:

Yes, this is to be expected. It's only a problem if it did try to respect masks before but somehow the new behavior is different.

One dtype discrepancy:

ansari didn't respect dtypes to begin with.

import scipy
print(scipy.__version__)  # 1.11.3
import numpy as np
from scipy import stats
rng = np.random.default_rng(2345265324562)
x = rng.random(6).astype(np.float32)
y = rng.random(7).astype(np.float32)
res = stats.ansari(x, y)
print(res.statistic.dtype, res.pvalue.dtype)  # float64 float64

I didn't plan on fixing dtype issues on this pass.

Copy link
Member

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played around a bit with this and can't find any obvious issues. So, LGTM!

@tirthasheshpatel tirthasheshpatel merged commit ae390d7 into scipy:main Oct 19, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants