New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats.ansari: add axis / nan_policy / keepdims support #19371
Conversation
Two CI failures are unrelated, I'll open a PR for those now. |
Slightly modified code to test the ansari func:import numpy as np
from scipy import stats
from numpy.ma import masked_array as ma
fun = stats.ansari
def try_case(x, y, message, kwds):
try:
print(message)
res = fun(x, y, **kwds)
print(res)
print(type(res), res.statistic.dtype, res.pvalue.dtype)
except AttributeError:
print(type(res))
except Exception as e:
print(f"{type(e)}: {str(e)}")
try:
print("vs (master)")
res = fun(x, y, **kwds,_no_deco=True)
print(res)
print(type(res), res.statistic.dtype, res.pvalue.dtype)
except AttributeError:
print(type(res))
except Exception as e:
print(f"{type(e)}: {str(e)}")
print("----")
np.random.seed(0)
random_array = np.random.rand(4, 6)
random_mask = np.random.rand(4, 6) > 0.8
random_nan_array = random_array.copy()
random_nan_array[random_mask] = np.nan
random_ma = np.ma.masked_array(random_array, mask=random_mask)
random_ma32 = np.ma.masked_array(random_array, mask=random_mask, dtype=np.float32)
cases = [
([], [], 'Zero Observations:', {}),
([1.], [1.], 'One Observation:', {}),
([1., 2.], [1., 2.], 'Two Observations', {}),
(random_array, random_array, '2D Array, no axis', {}),
(random_array, random_array, '2D Array, axis=0', {'axis': 0}),
(random_array, random_array, '2D Array, axis=1', {'axis': 1}),
(np.matrix(random_array), np.matrix(random_array), 'Matrix, axis=1', {'axis': 1}, ),
(random_array, random_array, '2D Array, axis=None', {'axis': None}),
(random_array, random_array, '2D Array, axis=(0,1)', {'axis': (0, 1)}),
([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, No nan_policy', {}),
([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, raise', {'nan_policy': 'raise'}),
([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, propagate', {'nan_policy': 'propagate'}),
([np.nan, 1., 2., 3.], [np.nan, 1., 2., 3.], '1D Array with NaN, omit', {'nan_policy': 'omit'}),
([np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan], '1D Array with All NaNs, omit', {'nan_policy': 'omit'}),
([np.nan, np.nan, 3.], [np.nan, np.nan, 3.], '1D Array with all but one NaN, omit', {'nan_policy': 'omit'}),
(ma([0., 1., 2., 3.], mask=[True, False, False, False]), ma([0., 1., 2., 3.], mask=[True, False, False, False]), '1D Masked array, one masked', {}),
(ma([1., 2., 3.], mask=[True, True, True]), ma([1., 2., 3.], mask=[True, True, True]), '1D Masked array, all masked', {}),
(ma([1., 2., 3.], mask=[True, True, False]), ma([1., 2., 3.], mask=[True, True, False]), '1D Masked array, all but one masked', {}),
(random_nan_array, random_nan_array, '2D Array with NaNs, propagate, axis=1', {'axis': 1, 'nan_policy': 'propagate'}, ),
(random_nan_array, random_nan_array, '2D Array with NaNs, omit, axis=1', {'axis': 1, 'nan_policy': 'omit'}, ),
(random_ma, random_ma, '2D Masked array, axis=1', {'axis': 1}, ),
(random_array.astype(np.float32), random_array.astype(np.float32), '2D Array, float32, axis=1', {'axis': 1}),
(random_nan_array.astype(np.float32), random_nan_array.astype(np.float32), '2D Array with NaNs, float32, propagate, axis=1', {'axis': 1, 'nan_policy': 'propagate'}, ),
(random_ma32, random_ma32, '2D Masked Array, float32, axis=1', {'axis': 1}, ),
]
for case in cases:
try_case(*case) For 2D arrays, Ansari seems to be raveling the arrays, so I think passing
Ansari didn't respect masks before, so different results here:
One dtype discrepancy:
|
Please see top post.
Yes, this is to be expected. It's only a problem if it did try to respect masks before but somehow the new behavior is different.
import scipy
print(scipy.__version__) # 1.11.3
import numpy as np
from scipy import stats
rng = np.random.default_rng(2345265324562)
x = rng.random(6).astype(np.float32)
y = rng.random(7).astype(np.float32)
res = stats.ansari(x, y)
print(res.statistic.dtype, res.pvalue.dtype) # float64 float64 I didn't plan on fixing dtype issues on this pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Played around a bit with this and can't find any obvious issues. So, LGTM!
Reference issue
Toward gh-14651
What does this implement/fix?
Adds axis / nan_policy / keepdims support to
scipy.stats.ansari
.Additional information
The one thing I noticed is that currently, for Nd input,
ansari
returns a scalar result. However, the result seems to be garbage - it's different fromaxis=None
/ raveling the arrays before passing them to the function. So I think it's safe to letaxis=0
be the default since there's no useful behavior to maintain.