Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add k-sample Anderson-Darling test to stats module #3183

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
95ad13d
ENH: Add k-sample Anderson-Darling test to stats module
joergdietrich Jan 3, 2014
3f421bc
Speed up and implement both version for discrete distributions
joergdietrich Jan 6, 2014
23e1f77
add anderson_ksamp
joergdietrich Feb 2, 2014
5612e42
Replace & with and in citation and add year
joergdietrich Feb 2, 2014
ea78602
Replace another & with and
joergdietrich Feb 2, 2014
f1f3770
fix typo in docstring
joergdietrich Feb 2, 2014
79b9da6
add blank lines after if blocks
joergdietrich Feb 2, 2014
b7023b6
Use ... for line continuation in docstring
joergdietrich Feb 2, 2014
45a7447
Add comment to explain interpolation values
joergdietrich Feb 2, 2014
445febf
capital letter for each parameter description and period at end
joergdietrich Feb 2, 2014
ab1216e
combine a few simple statements
joergdietrich Feb 2, 2014
bb3ec01
Change call signature of anderson_ksamp
joergdietrich Feb 2, 2014
9b0cb39
Add missing word to docstring
joergdietrich Feb 2, 2014
908e281
avoid computation of lj for continuous distributions; fix docstring f…
joergdietrich Feb 3, 2014
6688108
actually re-use saved searchsorted array instead of just saving it an…
joergdietrich Feb 3, 2014
24931cb
More verbose explanation of midrank parameter
joergdietrich Feb 6, 2014
8a92e25
ENH: Add k-sample Anderson-Darling test to stats module
joergdietrich Jan 3, 2014
d9d46af
API: Speed up and implement both version for discrete distributions f…
joergdietrich Jan 6, 2014
b2c5ef9
STY: Replace "&" with "and: in citation and add year in stats.anderso…
joergdietrich Feb 2, 2014
371dcc7
MAINT: fix typo in docstring stats.anderson_ksamp
joergdietrich Feb 2, 2014
b2682cd
STY: add blank lines after if blocks in k-sample Anderson Darling rou…
joergdietrich Feb 2, 2014
b138007
STY: Use ... for line continuation in docstring in stats.anderson_ksamp
joergdietrich Feb 2, 2014
a12c26f
DOC: Add comment to explain interpolation values stats.anderson_ksamp
joergdietrich Feb 2, 2014
f90390d
STY: capital letter for each parameter description and period at end …
joergdietrich Feb 2, 2014
a2a077b
MAINT: combine a few simple statements in stats.anderson_ksamp
joergdietrich Feb 2, 2014
1626e0f
API: Change call signature of stats.anderson_ksamp
joergdietrich Feb 2, 2014
aad0972
MAINT: Add missing word to docstring of stats.anderson_ksamp
joergdietrich Feb 2, 2014
11b5a01
MAINT: avoid computation of lj for continuous distributions in stats.…
joergdietrich Feb 3, 2014
7e8e03c
DOC: More verbose explanation of midrank parameter in stats.anderson_…
joergdietrich Feb 6, 2014
6bce442
Merge branch 'k-sample-AD' of github.com:joergdietrich/scipy into k-s…
joergdietrich Feb 11, 2014
6231a8d
DOC: Adapt calls in example to changed signature from 1626e0f
joergdietrich Feb 23, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions THANKS.txt
Expand Up @@ -125,6 +125,7 @@ Juan Luis Cano for refactorings in lti, sparse docs improvements and some
Pawel Chojnacki for simple documentation fixes.
Gert-Ludwig Ingold for contributions to special functions.
Joris Vankerschaver for multivariate Gaussian functionality.
Jörg Dietrich for the k-sample Anderson Darling test.


Institutions
Expand Down
7 changes: 7 additions & 0 deletions doc/release/0.14.0-notes.rst
Expand Up @@ -39,6 +39,13 @@ Multivariate random variables
A new class `scipy.stats.multivariate_normal` with functionality for
multivariate normal random variables has been added.

k-sample Anderson-Darling test
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The new function `scipy.stats.anderson_ksamp` computes the k-sample
Anderson-Darling test for the null hypothesis that k samples come from
the same parent population.

``scipy.signal`` improvements
-----------------------------

Expand Down
213 changes: 212 additions & 1 deletion scipy/stats/morestats.py
Expand Up @@ -28,7 +28,7 @@
'boxcox_llf', 'boxcox', 'boxcox_normmax', 'boxcox_normplot',
'shapiro', 'anderson', 'ansari', 'bartlett', 'levene', 'binom_test',
'fligner', 'mood', 'wilcoxon',
'pdf_fromgamma', 'circmean', 'circvar', 'circstd',
'pdf_fromgamma', 'circmean', 'circvar', 'circstd', 'anderson_ksamp'
]


Expand Down Expand Up @@ -1131,6 +1131,217 @@ def rootfunc(ab,xj,N):
return A2, critical, sig


def _anderson_ksamp_both(samples, Z, Zstar, k, n, N):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this discrete? it uses midrank

"""
Compute A2akN equation 7 of Scholz & Stephens.

Parameters
----------
samples : array_like
array of sample arrays
Z : array_like
sorted array of all observations
Zstar : array_like
sorted array of unique observations
k : int
number of samples
n : array_like
number of observations in each sample
N : int
total number of observations

Returns
-------
A2aKN : float
The A2aKN statistics of Scholz & Stephens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The & should be escaped, or use a raw docstring. Results in Sphinx warnings (or errors, can't remember) otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell it out and plus year

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case where wider-than-function scope references in sphinx-formatted docstrings would be helpful. Then you could just add a ref link to https://github.com/joergdietrich/scipy/blob/k-sample-AD/scipy/stats/morestats.py#L1267

"""

A2akN = 0.
lj = Z.searchsorted(Zstar, 'right') - Z.searchsorted(Zstar, 'left')
Bj = Z.searchsorted(Zstar) + lj / 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as searchsorted 'left' (default), save and reuse

for i in arange(0, k):
s = np.sort(samples[i])
Mij = s.searchsorted(Zstar, side='right').astype(np.float)
fij = s.searchsorted(Zstar, 'right') - s.searchsorted(Zstar, 'left')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse searchsorted 'right'

Mij -= fij / 2.
inner = lj / float(N) * (N * Mij - Bj * n[i])**2 / \
(Bj * (N - Bj) - N * lj / 4.)
A2akN += inner.sum() / n[i]
A2akN *= (N - 1.) / N
return A2akN


def _anderson_ksamp_discrete(samples, Z, Zstar, k, n, N):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be both it uses 'right' and not the midrank

"""
Compute A2akN equation 6 of Scholz & Stephens.

Parameters
----------
samples : array_like
array of sample arrays
Z : array_like
sorted array of all observations
Zstar : array_like
sorted array of unique observations
k : int
number of samples
n : array_like
number of observations in each sample
N : int
total number of observations

Returns
-------
A2KN : float
The A2KN statistics of Scholz & Stephens
"""

A2kN = 0.
lj = Z.searchsorted(Zstar[:-1], 'right') - Z.searchsorted(Zstar[:-1],
'left')
Bj = lj.cumsum()
for i in arange(0, k):
s = np.sort(samples[i])
Mij = s.searchsorted(Zstar[:-1], side='right')
inner = lj / float(N) * (N * Mij - Bj * n[i])**2 / (Bj * (N - Bj))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for continuous:
My impression is that we should replace lj by 1, and use the original sorted series not the uniques
for no ties: Z == Zstar
and for discrete (without tie handling) we can take the 'right' count

This would give us some speed up (minus unique and minus two searchsorted.)

A2kN += inner.sum() / n[i]
return A2kN


def anderson_ksamp(samples, discrete=False):
"""The Anderson-Darling test for k-samples.

The k-sample Anderson-Darling test is a modification of the
one-sample Anderson-Darling test. It tests the null hypothesis
that k-samples are drawn from the same population without having
to specify the distribution function of that population. The
critical values depend on the number of samples.

Parameters
----------
samples : array_like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually sequence of 1-D array_like, right? From the description here you'd think a single 2-D array is needed.

array of sample data in arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: can you start each description with a capital letter and end it with .?


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blank line needed.

discrete : bool, optional
type of Anderson-Darling test which is computed. Default is a test
applicable to discrete and continous distributions.

Returns
-------
Tk : float
Normalized k-sample Anderson-Darling test statistic, not adjusted for
ties
tm : array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use some more descriptive variable names? p is so widely used that it may be OK, but Tk and tm are not. Same comment for the internal variables, they're almost all one-letter which isn't readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why not name the return values consistent with anderson?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the return values to be as consistent with anderson as possible. I prefer to keep the internal variable names unchanged. The implementation of methods from the literature can only be understood together with the paper describing the method. In such cases I find it much more helpful to have the variable names in the code match the variable names in the paper rather than trying to come up with descriptive names that anybody trying to match the code to the paper than has to translate back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the paper is available online for free, so for internal variables that should be fine.

The critical values for significance levels 25%, 10%, 5%, 2.5%, 1%
p : float
An approximate significance level at which the null hypothesis for the
provided samples can be rejected

Raises
------
ValueError
If less than 2 samples are provided, a sample is empty, or no
distinct observations are in the samples.

See Also
--------
ks_2samp : 2 sample Kolmogorov-Smirnov test
anderson : 1 sample Anderson-Darling test

Notes
-----
[1]_ Define three versions of the k-sample Anderson-Darling test:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define --> defines

one for continous distributions and two for discrete
distributions, in which ties between samples may occur. The latter
variant of the test is also applicable to continuous data. By
default, this routine computes the test for continuous and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement looks incorrect (only one p-value returned) and would also be strange. Default is continuous only right?

discrete data. If discrete is set to True, the test for discrete
data is computed. According to [1]_, the two test statistics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert "discrete" in "two test"

differ only slightly if a few collisions due to round-off errors
occur in the test not adjusted for ties between samples.

.. versionadded:: 0.14.0

References
----------
.. [1] Scholz, F. W & Stephens, M. A. (1987), K-Sample Anderson-Darling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write and instead of &

Tests, Journal of the American Statistical Association, Vol. 82,
pp. 918-924.

Examples:
---------
>>> from scipy import stats
>>> np.random.seed(314159)

The null hypothesis that the two random samples come from the same
distribution can be rejected at the 5% level because the returned
test value is greater than the critical value for 5% (1.961) but
not at the 2.5% level. The interpolation gives an approximate
significance level of 3.1%:

>>> stats.anderson_ksamp(np.random.normal(size=50), \
np.random.normal(loc=0.5, size=30))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ... for the line continuation instead of \.

(2.4632469079409978, array([ 0.325, 1.226, 1.961, 2.718, 3.752]),
0.03130207656720708)

The null hypothesis cannot be rejected for three samples from an
identical distribution. The approximate p-value (87%) has to be
computed by extrapolation and may not be very accurate:

>>> stats.anderson_ksamp(np.random.normal(size=50), \
np.random.normal(size=30), np.random.normal(size=20))
(-0.72478622084152444,
array([ 0.44925884, 1.3052767, 1.9434184, 2.57696569, 3.41634856]),
0.8732440333177699)

"""

k = len(samples)
if (k < 2):
raise ValueError("anderson_ksamp needs at least two samples")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank lines below this and the next couple of if statements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 2 samples? IIRC other tests need 5 to continue with a warning and 20 for no warning. 2 certainly isn't enough for useful results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind, figured this one out. The wording is a bit confusing, I propose "two sets of samples". And then there should be the check for number of values per set of samples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samples is in used and defined in the first line of docstring
2-samp, k-samp, tests for k samples
I think it should be clear that we mean 2 samples and not 2 observations per sample

samples = list(map(np.asarray, samples))
Z = np.hstack(samples)
N = Z.size
Z.sort()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd combine this with the line above: Z = np.sort(np.hstack(samples)).

Zstar = np.unique(Z)
L = Zstar.size
if not L > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L is only used here and not > is <, so I'd rewrite the above two lines as if Zstar.size < 2:

raise ValueError("anderson_ksamp needs more than one distinct "
"observation")
n = np.array([sample.size for sample in samples])
if any(n == 0):
raise ValueError("anderson_ksamp encountered sample without "
"observations")
if discrete:
A2kN = _anderson_ksamp_discrete(samples, Z, Zstar, k, n, N)
else:
A2kN = _anderson_ksamp_both(samples, Z, Zstar, k, n, N)

h = (1. / arange(1, N)).sum()
H = (1. / n).sum()
g = 0
for l in arange(1, N-1):
inner = np.array([1. / ((N - l) * m) for m in arange(l+1, N)])
g += inner.sum()
a = (4*g - 6) * (k - 1) + (10 - 6*g)*H
b = (2*g - 4)*k**2 + 8*h*k + (2*g - 14*h - 4)*H - 8*h + 4*g - 6
c = (6*h + 2*g - 2)*k**2 + (4*h - 4*g + 6)*k + (2*h - 6)*H + 4*h
d = (2*h + 6)*k**2 - 4*h*k
sigmasq = (a*N**3 + b*N**2 + c*N + d) / ((N - 1.) * (N - 2.) * (N - 3.))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is based on equation 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

m = k - 1
Tk = (A2kN - m) / math.sqrt(sigmasq)

b0 = np.array([0.675, 1.281, 1.645, 1.96, 2.326])
b1 = np.array([-0.245, 0.25, 0.678, 1.149, 1.822])
b2 = np.array([-0.105, -0.305, -0.362, -0.391, -0.396])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a comment above the line b0 =, otherwise it's unclear what the magic numbers mean (and yes, I did figure it out from the docstring after some frowning).

tm = b0 + b1 / math.sqrt(m) + b2 / m
pf = np.polyfit(tm, log(np.array([0.25, 0.1, 0.05, 0.025, 0.01])), 2)
if Tk < tm.min() or Tk > tm.max():
warnings.warn("approximate p-value will be computed by extrapolation")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this warning needed? It shows up in most of the test cases, so I'm guessing it's not that uncommon (didn't check). If so, adding a note in the docstring might make more sense. If the warning has to be kept, it shouldn't show up in the test output (can be silenced within a with warnings.catch_warnings() block if needed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best pattern for cases like this is. I don't know how good the extrapolation is, it might have quite a large error in some ranges.
I have something similar for tables of p-values without extrapolation:

  • mention only in docstring about the range of extrapolation (It's just lower precision than interpolation.)
  • keep warning as here
  • truncate (without extrapolation some packages, and some of my functions, just return the boundary value 0.25 or 0.01, for text return it would be '<0.01' or '>0.25')

For most use cases the exact p-value outside [0.01, 0.25] doesn't really matter and just mentioning in docstring would be enough. But I guess there would be multiple testing applications, where smaller p-values are relevant and users need to be aware that those are not very precise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the quality of the interpolation is known. Scholz & Stephens vary the polynomial order depending on the number of samples and provide no guidance for what a general procedure should use. The test cases are taken from Scholz and Stephens and happen to be cases where the null hypothesis can be rejected at better than the 1% level. Given the unknown level of accuracy I'd prefer to keep the warning, unless there's a strong preference to move it to the docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's fine with me then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning is fine with me too
I don't have a strong opinion given I don't know how good the extrapolation is.

p = math.exp(np.polyval(pf, Tk))
return Tk, tm, p


def ansari(x,y):
"""
Perform the Ansari-Bradley test for equal scale parameters
Expand Down
107 changes: 106 additions & 1 deletion scipy/stats/tests/test_morestats.py
Expand Up @@ -10,7 +10,7 @@
from numpy.random import RandomState
from numpy.testing import (TestCase, run_module_suite, assert_array_equal,
assert_almost_equal, assert_array_less, assert_array_almost_equal,
assert_raises, assert_, assert_allclose, assert_equal, dec)
assert_raises, assert_, assert_allclose, assert_equal, dec, assert_warns)

from scipy import stats

Expand Down Expand Up @@ -83,6 +83,111 @@ def test_bad_arg(self):
assert_raises(ValueError, stats.anderson, [1], dist='plate_of_shrimp')


class TestAndersonKSamp(TestCase):
def test_example1a(self):
# Example data from Scholz & Stephens (1987), originally
# published in Lehmann (1995, Nonparametrics, Statistical
# Methods Based on Ranks, p. 309)
# Pass a mixture of lists and arrays
t1 = [38.7, 41.5, 43.8, 44.5, 45.5, 46.0, 47.7, 58.0]
t2 = np.array([39.2, 39.3, 39.7, 41.4, 41.8, 42.9, 43.3, 45.8])
t3 = np.array([34.0, 35.0, 39.0, 40.0, 43.0, 43.0, 44.0, 45.0])
t4 = np.array([34.0, 34.8, 34.8, 35.4, 37.2, 37.8, 41.2, 42.8])
Tk, tm, p = assert_warns(UserWarning, stats.anderson_ksamp, (t1, t2,
t3, t4), discrete=True)
assert_almost_equal(Tk, 4.449, 3)
assert_array_almost_equal([0.4985, 1.3237, 1.9158, 2.4930, 3.2459],
tm, 4)
assert_almost_equal(p, 0.0021, 4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Tk and pvalues in the unittests from Scholz and Stephens or "regression test" numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tk values are from Scholz and Stephens. The pvalues differ by one at the last digit because I used a second order polynomial instead of a linear one. The choice was motivated by looking at the interpolation of the two test cases.


def test_example1b(self):
# Example data from Scholz & Stephens (1987), originally
# published in Lehmann (1995, Nonparametrics, Statistical
# Methods Based on Ranks, p. 309)
# Pass arrays
t1 = np.array([38.7, 41.5, 43.8, 44.5, 45.5, 46.0, 47.7, 58.0])
t2 = np.array([39.2, 39.3, 39.7, 41.4, 41.8, 42.9, 43.3, 45.8])
t3 = np.array([34.0, 35.0, 39.0, 40.0, 43.0, 43.0, 44.0, 45.0])
t4 = np.array([34.0, 34.8, 34.8, 35.4, 37.2, 37.8, 41.2, 42.8])
Tk, tm, p = assert_warns(UserWarning, stats.anderson_ksamp, (t1, t2,
t3, t4), discrete=False)
assert_almost_equal(Tk, 4.480, 3)
assert_array_almost_equal([0.4985, 1.3237, 1.9158, 2.4930, 3.2459],
tm, 4)
assert_almost_equal(p, 0.0020, 4)

def test_example2a(self):
# Example data taken from an earlier technical report of
# Scholz and Stephens
# Pass lists instead of arrays
t1 = [194, 15, 41, 29, 33, 181]
t2 = [413, 14, 58, 37, 100, 65, 9, 169, 447, 184, 36, 201, 118]
t3 = [34, 31, 18, 18, 67, 57, 62, 7, 22, 34]
t4 = [90, 10, 60, 186, 61, 49, 14, 24, 56, 20, 79, 84, 44, 59, 29,
118, 25, 156, 310, 76, 26, 44, 23, 62]
t5 = [130, 208, 70, 101, 208]
t6 = [74, 57, 48, 29, 502, 12, 70, 21, 29, 386, 59, 27]
t7 = [55, 320, 56, 104, 220, 239, 47, 246, 176, 182, 33]
t8 = [23, 261, 87, 7, 120, 14, 62, 47, 225, 71, 246, 21, 42, 20, 5,
12, 120, 11, 3, 14, 71, 11, 14, 11, 16, 90, 1, 16, 52, 95]
t9 = [97, 51, 11, 4, 141, 18, 142, 68, 77, 80, 1, 16, 106, 206, 82,
54, 31, 216, 46, 111, 39, 63, 18, 191, 18, 163, 24]
t10 = [50, 44, 102, 72, 22, 39, 3, 15, 197, 188, 79, 88, 46, 5, 5, 36,
22, 139, 210, 97, 30, 23, 13, 14]
t11 = [359, 9, 12, 270, 603, 3, 104, 2, 438]
t12 = [50, 254, 5, 283, 35, 12]
t13 = [487, 18, 100, 7, 98, 5, 85, 91, 43, 230, 3, 130]
t14 = [102, 209, 14, 57, 54, 32, 67, 59, 134, 152, 27, 14, 230, 66,
61, 34]
Tk, tm, p = assert_warns(UserWarning, stats.anderson_ksamp,
(t1, t2, t3, t4, t5, t6, t7, t8, t9, t10,
t11, t12, t13, t14), discrete=True)
assert_almost_equal(Tk, 3.288, 3)
assert_array_almost_equal([0.5990, 1.3269, 1.8052, 2.2486, 2.8009],
tm, 4)
assert_almost_equal(p, 0.0041, 4)

def test_example2b(self):
# Example data taken from an earlier technical report of
# Scholz and Stephens
t1 = [194, 15, 41, 29, 33, 181]
t2 = [413, 14, 58, 37, 100, 65, 9, 169, 447, 184, 36, 201, 118]
t3 = [34, 31, 18, 18, 67, 57, 62, 7, 22, 34]
t4 = [90, 10, 60, 186, 61, 49, 14, 24, 56, 20, 79, 84, 44, 59, 29,
118, 25, 156, 310, 76, 26, 44, 23, 62]
t5 = [130, 208, 70, 101, 208]
t6 = [74, 57, 48, 29, 502, 12, 70, 21, 29, 386, 59, 27]
t7 = [55, 320, 56, 104, 220, 239, 47, 246, 176, 182, 33]
t8 = [23, 261, 87, 7, 120, 14, 62, 47, 225, 71, 246, 21, 42, 20, 5,
12, 120, 11, 3, 14, 71, 11, 14, 11, 16, 90, 1, 16, 52, 95]
t9 = [97, 51, 11, 4, 141, 18, 142, 68, 77, 80, 1, 16, 106, 206, 82,
54, 31, 216, 46, 111, 39, 63, 18, 191, 18, 163, 24]
t10 = [50, 44, 102, 72, 22, 39, 3, 15, 197, 188, 79, 88, 46, 5, 5, 36,
22, 139, 210, 97, 30, 23, 13, 14]
t11 = [359, 9, 12, 270, 603, 3, 104, 2, 438]
t12 = [50, 254, 5, 283, 35, 12]
t13 = [487, 18, 100, 7, 98, 5, 85, 91, 43, 230, 3, 130]
t14 = [102, 209, 14, 57, 54, 32, 67, 59, 134, 152, 27, 14, 230, 66,
61, 34]
Tk, tm, p = assert_warns(UserWarning, stats.anderson_ksamp,
(t1, t2, t3, t4, t5, t6, t7, t8, t9, t10,
t11, t12, t13, t14), discrete=False)
assert_almost_equal(Tk, 3.294, 3)
assert_array_almost_equal([0.5990, 1.3269, 1.8052, 2.2486, 2.8009],
tm, 4)
assert_almost_equal(p, 0.0041, 4)

def test_not_enough_samples(self):
assert_raises(ValueError, stats.anderson_ksamp, np.ones(5))

def test_no_distinct_observations(self):
assert_raises(ValueError, stats.anderson_ksamp,
(np.ones(5), np.ones(5)))

def test_empty_sample(self):
assert_raises(ValueError, stats.anderson_ksamp, (np.ones(5), []))


class TestAnsari(TestCase):

def test_small(self):
Expand Down