Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix mstats.kurtosistest, and test coverage for skewtest/normaltest #3008

Merged
merged 6 commits into from

6 participants

Ralf Gommers Coveralls Josef Perktold Warren Weckesser katherinehuang Evgeni Burovski
Ralf Gommers
Owner

Supercedes gh-2673

Coveralls

Coverage Status

Coverage remained the same when pulling 2dabd6d on rgommers:pull-2673 into b7aa678 on scipy:master.

Josef Perktold josef-pkt commented on the diff
scipy/stats/mstats_basic.py
@@ -1682,10 +1681,15 @@ def kurtosistest(a, axis=0):
A = 6.0 + 8.0/sqrtbeta1 * (2.0/sqrtbeta1 + np.sqrt(1+4.0/(sqrtbeta1**2)))
term1 = 1 - 2./(9.0*A)
denom = 1 + x*ma.sqrt(2/(A-4.0))
- denom[denom < 0] = masked
+ if np.ma.isMaskedArray(denom):
+ # For multi-dimensional array input
+ denom[denom < 0] = masked
Josef Perktold Collaborator

I don't see masked defined in this function.

Ralf Gommers Owner
In [1]: np.ma.masked
Out[1]: masked
Warren Weckesser Collaborator

@josef-pkt: see line 45

Josef Perktold Collaborator

I found it finally, I had searched for masked with an additional whitespace

Josef Perktold Collaborator

I don't see from reading the code when denom < 0.
However, I think this could be replaced by n < 5 or add this as additional condition for cases that should be masked.

The same lines are missing in skewtest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Josef Perktold
Collaborator

Change looks good, I'm not sure the functions are good.

Do they behave correctly if the number of valid observations in a column is too small?
(warning is when the approximation of the distribution of the test statistic is not good. If the number of valid observations is very small, then the calculations don't make sense, nobs - 3, ...)

The stats version raise an exception when the sample is too small.
I don't see that the mstats version check this, and I doubt it's implicitly masked.

Josef Perktold
Collaborator
Ralf Gommers
Owner

@EvgeniBurovski sent me a PR to make the small value behavior identical. EDIT: PR merged to here

I didn't intend to close the Statistics Review tickets on these functions yet, just fix them to not be hopelessly broken and finish up gh-2673.

Coveralls

Coverage Status

Coverage remained the same when pulling ae17bb44647dc714d28a2cc66de493ba12e27a2a on rgommers:pull-2673 into b7aa678 on scipy:master.

Josef Perktold
Collaborator

I just prefer if we get functions fixed so we don't have to look at them anymore for a few years, unless it's just a quick one-line fix (which this was initially).

Ralf Gommers
Owner

@josef-pkt I agree in principle, but for me the priority should be to fix up the stats functions first and the mstats versions after that. So I fixed the obvious issues and added decent tests. If you want the full review/doc/fixes then this PR may sit here for a while.

katherinehuang and others added some commits
katherinehuang katherinehuang BUG: fix bug in mstats.kurtosistest
Deleted astype(), because it is not an attribute of int (which could
be returned instead of an ndarray).
7e7aa7d
Ralf Gommers rgommers BUG: fix several bugs in the mstats normality tests.
Also provide decent test coverage.
ff7470e
Ralf Gommers rgommers TST: clean up test_mstats_basic.py for docstrings in tests 2d87ba7
Ralf Gommers rgommers BUG: fix errors in mstats.ttest_rel and mstats.ttest_ind.
Change default of axis kw to 0 in ttest_rel.  This is OK without warning
because the function didn't work before anyway.

Also provide basic test coverage (comparison against nonmasked version).
Testing with various masked array inputs to be done.

Closes gh-3047.
32051ab
Ralf Gommers rgommers BUG: fix issues in mstats.ttest_1samp. Add axis keyword. 9511568
Ralf Gommers
Owner

Rebased and fixed ttest issues in gh-3047 plus a bunch of other ones in the ttest functions.

@josef-pkt I'd like to merge this. Can't fix every last thing about these mstats functions now but that's no good reason to not merge bug fixes that are good to go.

Josef Perktold
Collaborator

The only question: is it "standard" to return nan nan for empty arrays?
(I would have returned empty.)

 +    if a.size == 0 or b.size == 0:
 +        return (np.nan, np.nan)

Otherwise looks fine (I'm not going to look for missing pieces.)

Coveralls

Coverage Status

Coverage remained the same when pulling c175623 on rgommers:pull-2673 into 8594f29 on scipy:master.

Ralf Gommers
Owner

It depends on the function. It was already returning nan (or crashing in a few cases), so I didn't think too hard about it. But I think nan makes sense here. Empty makes sense usually for functions that return an array of the same shape as the input array. Here t, prob are scalars (or 1-D arrays for n-D input), returning empty instead of a scalar result would be odd.

Coveralls

Coverage Status

Coverage remained the same when pulling c175623 on rgommers:pull-2673 into 8594f29 on scipy:master.

Ralf Gommers
Owner

Travis error was a server glitch, can be ignored.

Josef Perktold
Collaborator

I can think of both use cases, where I want nan and where I want empty. But, I don't have currently a use case for empty arrays.

I think then that the masked function should return masked instead of nan in this case, to follow the pattern of mean() versus ma.mean()

>>> np.ma.mean([])
masked
>>> np.ma.sum([])
0.0
>>> np.ma.var([])
0.0
>>> np.ma.std([])
0.0
>>> np.std([])
0.0
>>> np.__version__
'1.6.1'

(I don't know why std is not nan.)

Ralf Gommers
Owner

Currently masked is never returned, also not if all input is masked:

In [32]: x
Out[32]: 
masked_array(data = [-- -- --],
             mask = [ True  True  True],
       fill_value = 1e+20)


In [33]: stats.mstats.ttest_rel(x, x)
Warning: divide by zero encountered in double_scalars
Out[33]: 
(array(1.0),
 masked_array(data = nan,
             mask = False,
       fill_value = 1e+20)
)

np.ma.mean returns nan not masked (1.6.x change was reverted apparently):

In [36]: np.__version__
Out[36]: '1.5.1'

In [37]: np.ma.mean([])
Warning: invalid value encountered in double_scalars
Out[37]: nan


In [1]: np.__version__
Out[1]: '1.9.0.dev-96dd69c'

In [2]: np.ma.mean([])
/home/rgommers/Code/numpy/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/home/rgommers/Code/numpy/numpy/core/_methods.py:65: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)
Out[2]: nan

So stick to nan?

Josef Perktold
Collaborator

Stick to nan.
I don't have an opinion, and would just follow np.ma.

Ralf Gommers
Owner

OK. Thanks for the review.

Ralf Gommers rgommers merged commit b73ec23 into from
Ralf Gommers rgommers deleted the branch
Warren Weckesser
Collaborator

Sorry for being late to the party. In ttest_1samp, returning (nan, nan) is not quite right (imho) for an array of size 0 in the n-dimensional case. An n-dimensional input with n > 1 is a collection of data sets that happens to be stored in an array. When axis is not None, the t and p values returned by ttest_1samp are (n-1)-dimensional arrays, holding the corresponding statistics for each data set in the input. This generalizes down to the edge case where one or more dimensions of the input has length 0.

Assume (nan, nan) is the correct result for a single empty data set, and suppose x has shape (3,0). Computing ttest_1samp(x, 0, axis=1) means we want to apply ttest_1samp to 3 data sets, each of which has length 0. So the result should be ([nan, nan, nan], [nan, nan, nan]) (I'm dropping the array or masked array wrappers, and just showing the expected values).

On the other hand, ttest_1samp(x, 0, axis=0) means we are applying the test to an empty collection of data sets. So in that case, the result should be ([], []).

This is how the regular ttest_1samp works:

In [2]: from scipy.stats import ttest_1samp

In [3]: x = np.zeros((3,0))

In [4]: ttest_1samp(x, 0, axis=1)
/home/warren/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:55: RuntimeWarning: invalid value encountered in true_divide
  out=ret, casting='unsafe', subok=False)
/home/warren/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:72: RuntimeWarning: invalid value encountered in true_divide
  out=arrmean, casting='unsafe', subok=False)
/home/warren/local_scipy/lib/python2.7/site-packages/scipy/stats/stats.py:3148: RuntimeWarning: invalid value encountered in true_divide
  denom = np.sqrt(v / float(n))
Out[4]: (array([ nan,  nan,  nan]), array([ nan,  nan,  nan]))

In [5]: ttest_1samp(x, 0, axis=0)
Out[5]: (array([], dtype=float64), array([], dtype=float64))

(A separate issue is to fix the regular ttest_1samp function to get rid of the spurious warnings in this edge case.)

Ralf Gommers
Owner

Thanks Warren. I won't forget about this one (but a bit short on time now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 10, 2013
  1. katherinehuang Ralf Gommers

    BUG: fix bug in mstats.kurtosistest

    katherinehuang authored rgommers committed
    Deleted astype(), because it is not an attribute of int (which could
    be returned instead of an ndarray).
  2. Ralf Gommers

    BUG: fix several bugs in the mstats normality tests.

    rgommers authored
    Also provide decent test coverage.
  3. Ralf Gommers
  4. Ralf Gommers

    BUG: fix errors in mstats.ttest_rel and mstats.ttest_ind.

    rgommers authored
    Change default of axis kw to 0 in ttest_rel.  This is OK without warning
    because the function didn't work before anyway.
    
    Also provide basic test coverage (comparison against nonmasked version).
    Testing with various masked array inputs to be done.
    
    Closes gh-3047.
  5. Ralf Gommers
  6. Evgeni Burovski Ralf Gommers
This page is out of date. Refresh to see the latest.
74 scipy/stats/mstats_basic.py
View
@@ -49,12 +49,10 @@
import itertools
import warnings
-
-# import scipy.stats as stats
from . import stats
+from . import distributions
import scipy.special as special
import scipy.misc as misc
-# import scipy.stats.futil as futil
from . import futil
@@ -68,23 +66,24 @@
def _chk_asarray(a, axis):
+ # Always returns a masked array, raveled for axis=None
+ a = ma.asanyarray(a)
if axis is None:
a = ma.ravel(a)
outaxis = 0
else:
- a = ma.asanyarray(a)
outaxis = axis
return a, outaxis
def _chk2_asarray(a, b, axis):
+ a = ma.asanyarray(a)
+ b = ma.asanyarray(b)
if axis is None:
a = ma.ravel(a)
b = ma.ravel(b)
outaxis = 0
else:
- a = ma.asanyarray(a)
- b = ma.asanyarray(b)
outaxis = axis
return a, b, outaxis
@@ -763,38 +762,48 @@ def sen_seasonal_slopes(x):
#---- --- Inferential statistics ---
#####--------------------------------------------------------------------------
-def ttest_onesamp(a, popmean):
- a = ma.asarray(a)
- x = a.mean(axis=None)
- v = a.var(axis=None,ddof=1)
- n = a.count(axis=None)
- df = n-1
- svar = ((n-1)*v) / float(df)
- t = (x-popmean)/ma.sqrt(svar*(1.0/n))
- prob = betai(0.5*df,0.5,df/(df+t*t))
- return t,prob
-ttest_onesamp.__doc__ = stats.ttest_1samp.__doc__
-ttest_1samp = ttest_onesamp
+def ttest_1samp(a, popmean, axis=0):
+ a, axis = _chk_asarray(a, axis)
+ if a.size == 0:
+ return (np.nan, np.nan)
+
+ x = a.mean(axis=axis)
+ v = a.var(axis=axis, ddof=1)
+ n = a.count(axis=axis)
+ df = n - 1.
+ svar = ((n - 1) * v) / df
+ t = (x - popmean) / ma.sqrt(svar / n)
+ prob = betai(0.5 * df, 0.5, df / (df + t*t))
+ return t, prob
+ttest_1samp.__doc__ = stats.ttest_1samp.__doc__
+ttest_onesamp = ttest_1samp
def ttest_ind(a, b, axis=0):
a, b, axis = _chk2_asarray(a, b, axis)
+ if a.size == 0 or b.size == 0:
+ return (np.nan, np.nan)
+
(x1, x2) = (a.mean(axis), b.mean(axis))
(v1, v2) = (a.var(axis=axis, ddof=1), b.var(axis=axis, ddof=1))
(n1, n2) = (a.count(axis), b.count(axis))
- df = n1+n2-2
- svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
+ df = n1 + n2 - 2.
+ svar = ((n1-1)*v1+(n2-1)*v2) / df
t = (x1-x2)/ma.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!!
t = ma.filled(t, 1) # replace NaN t-values with 1.0
- probs = betai(0.5*df,0.5,float(df)/(df+t*t)).reshape(t.shape)
+ probs = betai(0.5 * df, 0.5, df/(df + t*t)).reshape(t.shape)
return t, probs.squeeze()
ttest_ind.__doc__ = stats.ttest_ind.__doc__
-def ttest_rel(a,b,axis=None):
+def ttest_rel(a, b, axis=0):
a, b, axis = _chk2_asarray(a, b, axis)
if len(a) != len(b):
raise ValueError('unequal length arrays')
+
+ if a.size == 0 or b.size == 0:
+ return (np.nan, np.nan)
+
(x1, x2) = (a.mean(axis), b.mean(axis))
(v1, v2) = (a.var(axis=axis, ddof=1), b.var(axis=axis, ddof=1))
n = a.count(axis)
@@ -1650,9 +1659,9 @@ def skewtest(a, axis=0):
b2 = skew(a,axis)
n = a.count(axis)
if np.min(n) < 8:
- warnings.warn(
- "skewtest only valid for n>=8 ... continuing anyway, n=%i" %
- np.min(n))
+ raise ValueError(
+ "skewtest is not valid with less than 8 samples; %i samples"
+ " were given." % np.min(n))
y = b2 * ma.sqrt(((n+1)*(n+3)) / (6.0*(n-2)))
beta2 = (3.0*(n*n+27*n-70)*(n+1)*(n+3)) / ((n-2.0)*(n+5)*(n+7)*(n+9))
W2 = -1 + ma.sqrt(2*(beta2-1))
@@ -1666,7 +1675,11 @@ def skewtest(a, axis=0):
def kurtosistest(a, axis=0):
a, axis = _chk_asarray(a, axis)
- n = a.count(axis=axis).astype(float)
+ n = a.count(axis=axis)
+ if np.min(n) < 5:
+ raise ValueError(
+ "kurtosistest requires at least 5 observations; %i observations"
+ " were given." % np.min(n))
if np.min(n) < 20:
warnings.warn(
"kurtosistest only valid for n>=20 ... continuing anyway, n=%i" %
@@ -1680,10 +1693,15 @@ def kurtosistest(a, axis=0):
A = 6.0 + 8.0/sqrtbeta1 * (2.0/sqrtbeta1 + np.sqrt(1+4.0/(sqrtbeta1**2)))
term1 = 1 - 2./(9.0*A)
denom = 1 + x*ma.sqrt(2/(A-4.0))
- denom[denom < 0] = masked
+ if np.ma.isMaskedArray(denom):
+ # For multi-dimensional array input
+ denom[denom < 0] = masked
Josef Perktold Collaborator

I don't see masked defined in this function.

Ralf Gommers Owner
In [1]: np.ma.masked
Out[1]: masked
Warren Weckesser Collaborator

@josef-pkt: see line 45

Josef Perktold Collaborator

I found it finally, I had searched for masked with an additional whitespace

Josef Perktold Collaborator

I don't see from reading the code when denom < 0.
However, I think this could be replaced by n < 5 or add this as additional condition for cases that should be masked.

The same lines are missing in skewtest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ elif denom < 0:
+ denom = masked
+
term2 = ma.power((1-2.0/A)/denom,1/3.0)
Z = (term1 - term2) / np.sqrt(2/(9.0*A))
- return Z, (1.0-stats.zprob(Z))*2
+ return Z, 2 * distributions.norm.sf(np.abs(Z))
kurtosistest.__doc__ = stats.kurtosistest.__doc__
8 scipy/stats/stats.py
View
@@ -3150,7 +3150,7 @@ def ttest_1samp(a, popmean, axis=0):
t = np.divide(d, denom)
t, prob = _ttest_finish(df, t)
- return t,prob
+ return t, prob
def _ttest_finish(df,t):
@@ -3251,6 +3251,9 @@ def ttest_ind(a, b, axis=0, equal_var=True):
"""
a, b, axis = _chk2_asarray(a, b, axis)
+ if a.size == 0 or b.size == 0:
+ return (np.nan, np.nan)
+
v1 = np.var(a, axis, ddof=1)
v2 = np.var(b, axis, ddof=1)
n1 = a.shape[axis]
@@ -3335,6 +3338,9 @@ def ttest_rel(a, b, axis=0):
if a.shape[axis] != b.shape[axis]:
raise ValueError('unequal length arrays')
+ if a.size == 0 or b.size == 0:
+ return (np.nan, np.nan)
+
n = a.shape[axis]
df = float(n - 1)
218 scipy/stats/tests/test_mstats_basic.py
View
@@ -14,13 +14,13 @@
from scipy import stats
from numpy.testing import TestCase, run_module_suite
from numpy.ma.testutils import (assert_equal, assert_almost_equal,
- assert_array_almost_equal, assert_array_almost_equal_nulp, assert_)
+ assert_array_almost_equal, assert_array_almost_equal_nulp, assert_,
+ assert_allclose, assert_raises)
class TestMquantiles(TestCase):
- """Regression tests for mstats module."""
def test_mquantiles_limit_keyword(self):
- """Ticket #867"""
+ # Regression test for Trac ticket #867
data = np.array([[6., 7., 1.],
[47., 15., 2.],
[49., 36., 3.],
@@ -64,10 +64,10 @@ def test_2D(self):
actual = mstats.gmean(a)
desired = np.array((1,2,3,4))
assert_array_almost_equal(actual, desired, decimal=14)
- #
+
desired1 = mstats.gmean(a,axis=0)
assert_array_almost_equal(actual, desired1, decimal=14)
- #
+
actual = mstats.gmean(a, -1)
desired = ma.array((np.power(1*2*3*4,1./4.),
np.power(2*3,1./2.),
@@ -83,7 +83,7 @@ def test_1D(self):
assert_almost_equal(actual, desired, decimal=14)
desired1 = mstats.hmean(ma.array(a),axis=-1)
assert_almost_equal(actual, desired1, decimal=14)
- #
+
a = ma.array((1,2,3,4),mask=(0,0,0,1))
actual = mstats.hmean(a)
desired = 3. / (1./1 + 1./2 + 1./3)
@@ -97,7 +97,7 @@ def test_2D(self):
actual = mstats.hmean(a)
desired = ma.array((1,2,3,4))
assert_array_almost_equal(actual, desired, decimal=14)
- #
+
actual1 = mstats.hmean(a,axis=-1)
desired = (4./(1/1.+1/2.+1/3.+1/4.),
2./(1/2.+1/3.),
@@ -160,27 +160,27 @@ def test_pearsonr(self):
assert_almost_equal(p, 1.0/3)
def test_spearmanr(self):
- "Tests some computations of Spearman's rho"
+ # Tests some computations of Spearman's rho
(x, y) = ([5.05,6.75,3.21,2.66],[1.65,2.64,2.64,6.95])
assert_almost_equal(mstats.spearmanr(x,y)[0], -0.6324555)
(x, y) = ([5.05,6.75,3.21,2.66,np.nan],[1.65,2.64,2.64,6.95,np.nan])
(x, y) = (ma.fix_invalid(x), ma.fix_invalid(y))
assert_almost_equal(mstats.spearmanr(x,y)[0], -0.6324555)
- #
+
x = [2.0, 47.4, 42.0, 10.8, 60.1, 1.7, 64.0, 63.1,
1.0, 1.4, 7.9, 0.3, 3.9, 0.3, 6.7]
- y = [22.6, 08.3, 44.4, 11.9, 24.6, 0.6, 5.7, 41.6,
+ y = [22.6, 8.3, 44.4, 11.9, 24.6, 0.6, 5.7, 41.6,
0.0, 0.6, 6.7, 3.8, 1.0, 1.2, 1.4]
assert_almost_equal(mstats.spearmanr(x,y)[0], 0.6887299)
x = [2.0, 47.4, 42.0, 10.8, 60.1, 1.7, 64.0, 63.1,
1.0, 1.4, 7.9, 0.3, 3.9, 0.3, 6.7, np.nan]
- y = [22.6, 08.3, 44.4, 11.9, 24.6, 0.6, 5.7, 41.6,
+ y = [22.6, 8.3, 44.4, 11.9, 24.6, 0.6, 5.7, 41.6,
0.0, 0.6, 6.7, 3.8, 1.0, 1.2, 1.4, np.nan]
(x, y) = (ma.fix_invalid(x), ma.fix_invalid(y))
assert_almost_equal(mstats.spearmanr(x,y)[0], 0.6887299)
def test_kendalltau(self):
- "Tests some computations of Kendall's tau"
+ # Tests some computations of Kendall's tau
x = ma.fix_invalid([5.05, 6.75, 3.21, 2.66,np.nan])
y = ma.fix_invalid([1.65, 26.5, -5.93, 7.96, np.nan])
z = ma.fix_invalid([1.65, 2.64, 2.64, 6.95, np.nan])
@@ -197,7 +197,7 @@ def test_kendalltau(self):
assert_almost_equal(np.asarray(result), [-0.1585188, 0.4128009])
def test_kendalltau_seasonal(self):
- "Tests the seasonal Kendall tau."
+ # Tests the seasonal Kendall tau.
x = [[nan,nan, 4, 2, 16, 26, 5, 1, 5, 1, 2, 3, 1],
[4, 3, 5, 3, 2, 7, 3, 1, 1, 2, 3, 5, 3],
[3, 2, 5, 6, 18, 4, 9, 1, 1,nan, 1, 1,nan],
@@ -220,7 +220,6 @@ def test_pointbiserial(self):
class TestTrimming(TestCase):
def test_trim(self):
- "Tests trimming"
a = ma.arange(10)
assert_equal(mstats.trim(a), [0,1,2,3,4,5,6,7,8,9])
a = ma.arange(10)
@@ -231,12 +230,12 @@ def test_trim(self):
a = ma.arange(10)
assert_equal(mstats.trim(a,limits=(0.1,0.2),relative=True),
[None,1,2,3,4,5,6,7,None,None])
- #
+
a = ma.arange(12)
a[[0,-1]] = a[5] = masked
assert_equal(mstats.trim(a,(2,8)),
[None,None,2,3,4,None,6,7,8,None,None,None])
- #
+
x = ma.arange(100).reshape(10,10)
trimx = mstats.trim(x,(0.1,0.2),relative=True,axis=None)
assert_equal(trimx._mask.ravel(),[1]*10+[0]*70+[1]*20)
@@ -244,7 +243,7 @@ def test_trim(self):
assert_equal(trimx._mask.ravel(),[1]*10+[0]*70+[1]*20)
trimx = mstats.trim(x,(0.1,0.2),relative=True,axis=-1)
assert_equal(trimx._mask.T.ravel(),[1]*10+[0]*70+[1]*20)
- #
+
x = ma.arange(110).reshape(11,10)
x[1] = masked
trimx = mstats.trim(x,(0.1,0.2),relative=True,axis=None)
@@ -255,7 +254,6 @@ def test_trim(self):
assert_equal(trimx.T._mask.ravel(),[1]*20+[0]*70+[1]*20)
def test_trim_old(self):
- "Tests trimming."
x = ma.arange(100)
assert_equal(mstats.trimboth(x).count(), 60)
assert_equal(mstats.trimtail(x,tail='r').count(), 80)
@@ -269,7 +267,6 @@ def test_trim_old(self):
assert_equal(mstats.trimtail(x).count(), 80)
def test_trimmedmean(self):
- "Tests the trimmed mean."
data = ma.array([77, 87, 88,114,151,210,219,246,253,262,
296,299,306,376,428,515,666,1310,2611])
assert_almost_equal(mstats.trimmed_mean(data,0.1), 343, 0)
@@ -277,14 +274,12 @@ def test_trimmedmean(self):
assert_almost_equal(mstats.trimmed_mean(data,(0.2,0.2)), 283, 0)
def test_trimmed_stde(self):
- "Tests the trimmed mean standard error."
data = ma.array([77, 87, 88,114,151,210,219,246,253,262,
296,299,306,376,428,515,666,1310,2611])
assert_almost_equal(mstats.trimmed_stde(data,(0.2,0.2)), 56.13193, 5)
assert_almost_equal(mstats.trimmed_stde(data,0.2), 56.13193, 5)
def test_winsorization(self):
- "Tests the Winsorization of the data."
data = ma.array([77, 87, 88,114,151,210,219,246,253,262,
296,299,306,376,428,515,666,1310,2611])
assert_almost_equal(mstats.winsorize(data,(0.2,0.2)).var(ddof=1),
@@ -318,7 +313,6 @@ class TestMoments(TestCase):
[False, False, True, False, False]], dtype=np.bool))
def test_moment(self):
- # mean((testcase-mean(testcase))**power,axis=0),axis=0))**power))
y = mstats.moment(self.testcase,1)
assert_almost_equal(y,0.0,10)
y = mstats.moment(self.testcase,2)
@@ -329,14 +323,10 @@ def test_moment(self):
assert_almost_equal(y,2.5625)
def test_variation(self):
- #variation = samplestd/mean
-## y = stats.variation(self.shoes[0])
-## assert_almost_equal(y,21.8770668)
y = mstats.variation(self.testcase)
assert_almost_equal(y,0.44721359549996, 10)
def test_skewness(self):
- # sum((testmathworks-mean(testmathworks,axis=0))**3,axis=0)/((sqrt(var(testmathworks)*4/5))**3)/5
y = mstats.skew(self.testmathworks)
assert_almost_equal(y,-0.29322304336607,10)
y = mstats.skew(self.testmathworks,bias=0)
@@ -345,10 +335,8 @@ def test_skewness(self):
assert_almost_equal(y,0.0,10)
def test_kurtosis(self):
- # sum((testcase-mean(testcase,axis=0))**4,axis=0)/((sqrt(var(testcase)*3/4))**4)/4
- # sum((test2-mean(testmathworks,axis=0))**4,axis=0)/((sqrt(var(testmathworks)*4/5))**4)/5
- # Set flags for axis = 0 and
- # fisher=0 (Pearson's definition of kurtosis for compatibility with Matlab)
+ # Set flags for axis = 0 and fisher=0 (Pearson's definition of kurtosis
+ # for compatibility with Matlab)
y = mstats.kurtosis(self.testmathworks,0,fisher=0,bias=1)
assert_almost_equal(y, 2.1658856802973,10)
# Note that MATLAB has confusing docs for the following case
@@ -436,29 +424,19 @@ class TestVariability(TestCase):
testcase = ma.fix_invalid([1,2,3,4,np.nan])
def test_signaltonoise(self):
- """
- this is not in R, so used
- mean(testcase,axis=0)/(sqrt(var(testcase)*3/4)) """
- #y = stats.signaltonoise(self.shoes[0])
- #assert_approx_equal(y,4.5709967)
+ # This is not in R, so used:
+ # mean(testcase, axis=0) / (sqrt(var(testcase)*3/4))
y = mstats.signaltonoise(self.testcase)
assert_almost_equal(y,2.236067977)
def test_sem(self):
- """
- this is not in R, so used
- sqrt(var(testcase)*3/4)/sqrt(3)
- """
- #y = stats.sem(self.shoes[0])
- #assert_approx_equal(y,0.775177399)
+ # This is not in R, so used: sqrt(var(testcase)*3/4) / sqrt(3)
y = mstats.sem(self.testcase)
assert_almost_equal(y,0.6454972244)
def test_zmap(self):
- """
- not in R, so tested by using
- (testcase[i]-mean(testcase,axis=0))/sqrt(var(testcase)*3/4)
- """
+ # This is not in R, so tested by using:
+ # (testcase[i]-mean(testcase,axis=0)) / sqrt(var(testcase)*3/4)
y = mstats.zmap(self.testcase, self.testcase)
desired_unmaskedvals = ([-1.3416407864999, -0.44721359549996,
0.44721359549996, 1.3416407864999])
@@ -466,10 +444,8 @@ def test_zmap(self):
y.data[y.mask == False], decimal=12)
def test_zscore(self):
- """
- not in R, so tested by using
- (testcase[i]-mean(testcase,axis=0))/sqrt(var(testcase)*3/4)
- """
+ # This is not in R, so tested by using:
+ # (testcase[i]-mean(testcase,axis=0)) / sqrt(var(testcase)*3/4)
y = mstats.zscore(self.testcase)
desired = ma.fix_invalid([-1.3416407864999, -0.44721359549996,
0.44721359549996, 1.3416407864999, np.nan])
@@ -479,7 +455,6 @@ def test_zscore(self):
class TestMisc(TestCase):
def test_obrientransform(self):
- "Tests Obrien transform"
args = [[5]*5+[6]*11+[7]*9+[8]*3+[9]*2+[10]*2,
[6]+[7]*2+[8]*4+[9]*9+[10]*16]
result = [5*[3.1828]+11*[0.5591]+9*[0.0344]+3*[1.6086]+2*[5.2817]+2*[11.0538],
@@ -488,7 +463,6 @@ def test_obrientransform(self):
result,4)
def test_kstwosamp(self):
- "Tests the Kolmogorov-Smirnov 2 samples test"
x = [[nan,nan, 4, 2, 16, 26, 5, 1, 5, 1, 2, 3, 1],
[4, 3, 5, 3, 2, 7, 3, 1, 1, 2, 3, 5, 3],
[3, 2, 5, 6, 18, 4, 9, 1, 1,nan, 1, 1,nan],
@@ -504,7 +478,6 @@ def test_kstwosamp(self):
(0.1818,0.6744))
def test_friedmanchisq(self):
- "Tests the Friedman Chi-square test"
# No missing values
args = ([9.0,9.5,5.0,7.5,9.5,7.5,8.0,7.0,8.5,6.0],
[7.0,6.5,7.0,7.5,5.0,8.0,6.0,6.5,7.0,7.0],
@@ -524,7 +497,7 @@ def test_friedmanchisq(self):
def test_regress_simple():
- """Regress a line with sinusoidal noise. Test for #1273."""
+ # Regress a line with sinusoidal noise. Test for #1273.
x = np.linspace(0, 100, 100)
y = 0.2 * np.linspace(0, 100, 100) + 10
y += np.sin(np.linspace(0, 20, 100))
@@ -535,10 +508,147 @@ def test_regress_simple():
def test_plotting_positions():
- """Regression test for #1256"""
+ # Regression test for #1256
pos = mstats.plotting_positions(np.arange(3), 0, 0)
assert_array_almost_equal(pos.data, np.array([0.25, 0.5, 0.75]))
+class TestNormalitytests():
+
+ def test_vs_nonmasked(self):
+ x = np.array((-2,-1,0,1,2,3)*4)**2
+ assert_array_almost_equal(mstats.normaltest(x), stats.normaltest(x))
+ assert_array_almost_equal(mstats.skewtest(x), stats.skewtest(x))
+ assert_array_almost_equal(mstats.kurtosistest(x),
+ stats.kurtosistest(x))
+
+ funcs = [stats.normaltest, stats.skewtest, stats.kurtosistest]
+ mfuncs = [mstats.normaltest, mstats.skewtest, mstats.kurtosistest]
+ x = [1, 2, 3, 4]
+ for func, mfunc in zip(funcs, mfuncs):
+ assert_raises(ValueError, func, x)
+ assert_raises(ValueError, mfunc, x)
+
+ def test_axis_None(self):
+ # Test axis=None (equal to axis=0 for 1-D input)
+ x = np.array((-2,-1,0,1,2,3)*4)**2
+ assert_allclose(mstats.normaltest(x, axis=None), mstats.normaltest(x))
+ assert_allclose(mstats.skewtest(x, axis=None), mstats.skewtest(x))
+ assert_allclose(mstats.kurtosistest(x, axis=None),
+ mstats.kurtosistest(x))
+
+ def test_maskedarray_input(self):
+ # Add some masked values, test result doesn't change
+ x = np.array((-2,-1,0,1,2,3)*4)**2
+ xm = np.ma.array(np.r_[np.inf, x, 10],
+ mask=np.r_[True, [False] * x.size, True])
+ assert_allclose(mstats.normaltest(xm), stats.normaltest(x))
+ assert_allclose(mstats.skewtest(xm), stats.skewtest(x))
+ assert_allclose(mstats.kurtosistest(xm), stats.kurtosistest(x))
+
+ def test_nd_input(self):
+ x = np.array((-2,-1,0,1,2,3)*4)**2
+ x_2d = np.vstack([x] * 2).T
+ for func in [mstats.normaltest, mstats.skewtest, mstats.kurtosistest]:
+ res_1d = func(x)
+ res_2d = func(x_2d)
+ assert_allclose(res_2d[0], [res_1d[0]] * 2)
+ assert_allclose(res_2d[1], [res_1d[1]] * 2)
+
+
+#TODO: for all ttest functions, add tests with masked array inputs
+class TestTtest_rel():
+
+ def test_vs_nonmasked(self):
+ np.random.seed(1234567)
+ outcome = np.random.randn(20, 4) + [0, 0, 1, 2]
+
+ # 1-D inputs
+ res1 = stats.ttest_rel(outcome[:, 0], outcome[:, 1])
+ res2 = mstats.ttest_rel(outcome[:, 0], outcome[:, 1])
+ assert_allclose(res1, res2)
+
+ # 2-D inputs
+ res1 = stats.ttest_rel(outcome[:, 0], outcome[:, 1], axis=None)
+ res2 = mstats.ttest_rel(outcome[:, 0], outcome[:, 1], axis=None)
+ assert_allclose(res1, res2)
+ res1 = stats.ttest_rel(outcome[:, :2], outcome[:, 2:], axis=0)
+ res2 = mstats.ttest_rel(outcome[:, :2], outcome[:, 2:], axis=0)
+ assert_allclose(res1, res2)
+
+ # Check default is axis=0
+ res3 = mstats.ttest_rel(outcome[:, :2], outcome[:, 2:])
+ assert_allclose(res2, res3)
+
+ def test_invalid_input_size(self):
+ assert_raises(ValueError, mstats.ttest_rel,
+ np.arange(10), np.arange(11))
+ x = np.arange(24)
+ assert_raises(ValueError, mstats.ttest_rel,
+ x.reshape(2, 3, 4), x.reshape(2, 4, 3), axis=1)
+ assert_raises(ValueError, mstats.ttest_rel,
+ x.reshape(2, 3, 4), x.reshape(2, 4, 3), axis=2)
+
+ def test_empty(self):
+ res1 = mstats.ttest_rel([], [])
+ assert_(np.all(np.isnan(res1)))
+
+
+class TestTtest_ind():
+
+ def test_vs_nonmasked(self):
+ np.random.seed(1234567)
+ outcome = np.random.randn(20, 4) + [0, 0, 1, 2]
+
+ # 1-D inputs
+ res1 = stats.ttest_ind(outcome[:, 0], outcome[:, 1])
+ res2 = mstats.ttest_ind(outcome[:, 0], outcome[:, 1])
+ assert_allclose(res1, res2)
+
+ # 2-D inputs
+ res1 = stats.ttest_ind(outcome[:, 0], outcome[:, 1], axis=None)
+ res2 = mstats.ttest_ind(outcome[:, 0], outcome[:, 1], axis=None)
+ assert_allclose(res1, res2)
+ res1 = stats.ttest_ind(outcome[:, :2], outcome[:, 2:], axis=0)
+ res2 = mstats.ttest_ind(outcome[:, :2], outcome[:, 2:], axis=0)
+ assert_allclose(res1, res2)
+
+ # Check default is axis=0
+ res3 = mstats.ttest_ind(outcome[:, :2], outcome[:, 2:])
+ assert_allclose(res2, res3)
+
+ def test_empty(self):
+ res1 = mstats.ttest_ind([], [])
+ assert_(np.all(np.isnan(res1)))
+
+
+class TestTtest_1samp():
+
+ def test_vs_nonmasked(self):
+ np.random.seed(1234567)
+ outcome = np.random.randn(20, 4) + [0, 0, 1, 2]
+
+ # 1-D inputs
+ res1 = stats.ttest_1samp(outcome[:, 0], 1)
+ res2 = mstats.ttest_1samp(outcome[:, 0], 1)
+ assert_allclose(res1, res2)
+
+ # 2-D inputs
+ res1 = stats.ttest_1samp(outcome[:, 0], outcome[:, 1], axis=None)
+ res2 = mstats.ttest_1samp(outcome[:, 0], outcome[:, 1], axis=None)
+ assert_allclose(res1, res2)
+ res1 = stats.ttest_1samp(outcome[:, :2], outcome[:, 2:], axis=0)
+ res2 = mstats.ttest_1samp(outcome[:, :2], outcome[:, 2:], axis=0)
+ assert_allclose(res1, res2)
+
+ # Check default is axis=0
+ res3 = mstats.ttest_1samp(outcome[:, :2], outcome[:, 2:])
+ assert_allclose(res2, res3)
+
+ def test_empty(self):
+ res1 = mstats.ttest_1samp([], 1)
+ assert_(np.all(np.isnan(res1)))
+
+
if __name__ == "__main__":
run_module_suite()
Something went wrong with that request. Please try again.