Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Add t-test with unequal variances, bug fix for edge case on all tests #235

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

MrCreosote commented Jun 3, 2012

Added method for t-test with unequal variances. ttest_ind assumes
population variances are equal.

Refactor common code in all 4 t-test methods to _ttest_ind_setup and
_ttest_finish.

Fix bug in all 4 methods. Previous implementation would, if the
difference in means and the denominator of the t-statistic were zero,
set the t-statistic equal to 1. If the difference in means is zero, the
t-statistic should always be zero. Code and respective tests changed to
reflect this.

Added regression tests and tests that compare output to results of R
t.test() calls.

@MrCreosote MrCreosote Added method for t-test with unequal variances. ttest_ind assumes
population variances are equal.

Refactor common code in all 4 t-test methods to _ttest_ind_setup and
_ttest_finish.

Fix bug in all 4 methods. Previous implementation would, if the
difference in means and the denominator of the t-statistic were zero,
set the t-statistic equal to 1. If the difference in means is zero, the
t-statistic should always be zero. Code and respective tests changed to
reflect this.

Added regression tests and tests that compares output to results of R
t.test() calls.
311e440
Contributor

MrCreosote commented Jun 3, 2012

Gmane discussion here and here.

Member

josef-pkt commented Jun 3, 2012

0/0=1 is a feature not a bug. http://article.gmane.org/gmane.comp.python.scientific.devel/9622
The only alternative I like is returning NaN's. I sent an email to scipy-user about returning nans.

I think it would be better just to add equal_var as a keyword argument (as in R) instead of creating new functions.

(sorry about the delay, I didn't use R in a while and didn't have time to test the changes)

Contributor

MrCreosote commented Jun 3, 2012

0/0=1 is a feature not a bug. http://article.gmane.org/gmane.comp.python.scientific.devel/9622
The only alternative I like is returning NaN's. I sent an email to scipy-user about returning nans.
The problem with returning t = 0/0 = 1 is that fundamentally, it's the wrong answer. If someone, for whatever reason, uses a large alpha as their cut-off they'll erroneously reject the null hypothesis and decide that identical arrays do not have the same mean. I think the return value has to be either 0 or NaN. R returns NaN, as you know.

0 makes more sense to me as it's the right answer - you wind up with p = 1, i.e. there's 100% certainty that the arrays have the same means. That being said, I think NaN is an acceptable answer.

I think it would be better just to add equal_var as a keyword argument (as in R) instead of creating new functions.
If we're modeling the t-test package after R, shouldn't all four t-test functions be combined in one large function with keywords determining the t-test type? I added a function for 2 sample unequal variances as all the other test types (paired, 1 sample, 2 sample equal variances) had their own functions as well. It seems odd to me that the Welch test wouldn't have its own function but the other 3 types would.

Personally, I prefer 4 separate functions rather than keywords, but that's just a matter of style.

(sorry about the delay, I didn't use R in a while and didn't have time to test the changes)
No worries.

Member

josef-pkt commented Jun 3, 2012

I find your style of arguing fundamentally annoying.

If you have a reference that states that, 0/0=0 is the "right" answer in this case, then I would appreciate much if you told me. Then I don't have to worry anymore, what kind of model is appropriate in a zero probability event!

Contributor

MrCreosote commented Jun 3, 2012

Whoa, OK. It certainly wasn't my intent to be offensive or annoying. In my mind, I was trying to concisely explain my understanding of the problem, but I suppose I came off as dogmatic - sorry.

As you probably know, there doesn't seem to be any authoritative references that deal with this particular case. I spent a chunk of time looking for one when I was confused by the t = 0/0 = 1 code and just took another quick look and was unable to find anything.

Anyway, at the risk of annoying you further (hopefully I can avoid doing so this time), here's another attempt at resolving the issue.

Given two data samples with identical means, do you agree that in the context of a 2 sample t-test, the null hypothesis should never be rejected, regardless of variance? If not, could you please explain why, as, to be quite honest, this appears to be uncontroversial to me. That will help me understand our differences on the issue.

Thanks.

Member

josef-pkt commented Jun 3, 2012

to reject the null the alpha has to be > 0.317 and I don't see anyone using that.

The main point that I was arguing on the mailing list (with myself the first time before I changed to 0/0=1), is that exactly equal is a zero probability event, given that the underlying distribution is assumed to be normal. the results should be continuous for some sequence that goes to 0/0. The limit that I decided on is that it should be the same as adding a tiny noise, that is

the 0/0 case should be close to this result (for any small disturbance like 1e-50 here)

for n in range(1,20): 2**n,stats.ttest_ind(np.zeros(2**n), np.hstack(([1e-50], np.zeros(2**n-1.))))
Owner

rgommers commented Jun 6, 2012

Are we going to be able to resolve 0/0? It seems that NaN is the only thing that no one finds unreasonable.

Did you look at the rest Josef? I agree one test to unify all t-tests would be nice, but that could be done later.

@rgommers rgommers commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
return t, prob
+def ttest_ind_uneq_var(a, b, axis=0):
+ """Calculates the T-test for the means of TWO INDEPENDENT samples of scores with
+ unequal variances (or situations where it is unknown if the variances are equal).
@rgommers

rgommers Jun 9, 2012

Owner

This needs to be rephrased such that the summary fits on one line. The rest has to be added to the next paragraph.

@rgommers rgommers and 1 other commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
return t, prob
+def ttest_ind_uneq_var(a, b, axis=0):
+ """Calculates the T-test for the means of TWO INDEPENDENT samples of scores with
+ unequal variances (or situations where it is unknown if the variances are equal).
+
+ This is a two-sided test for the null hypothesis that 2 independent samples with unequal variances
+ have identical average (expected) values.
+
+ Parameters
+ ----------
+ a, b : sequence of ndarrays
@rgommers

rgommers Jun 9, 2012

Owner

a and b are both a single array, not a sequence of arrays. This type descriptor should be array_like.

@MrCreosote

MrCreosote Jun 9, 2012

Contributor

This needs to be checked/fixed for all 4 methods.

@rgommers rgommers and 1 other commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
+ we observe a large p-value, for example larger than 0.05 or 0.1,
+ then we cannot reject the null hypothesis of identical average scores.
+ If the p-value is smaller than the threshold, e.g. 1%, 5% or 10%,
+ then we reject the null hypothesis of equal averages.
+
+ References
+ ----------
+
+ http://en.wikipedia.org/wiki/T-test#Independent_two-sample_t-test
+
+
+ Examples
+ --------
+
+ >>> from scipy import stats
+ >>> import numpy as np
@rgommers

rgommers Jun 9, 2012

Owner

import numpy as np is not necessary to include in any example, it's assumed to already be available.

@MrCreosote

MrCreosote Jun 9, 2012

Contributor

Again needs to be checked/fixed for all 4 methods.

@rgommers rgommers commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
+ Examples
+ --------
+
+ >>> from scipy import stats
+ >>> import numpy as np
+
+ >>> #fix seed to get the same result
+ >>> np.random.seed(12345678)
+
+ test with sample with identical means
+
+ >>> rvs1 = stats.norm.rvs(loc=5,scale=10,size=500)
+ >>> rvs2 = stats.norm.rvs(loc=5,scale=10,size=500)
+ >>> stats.ttest_ind(rvs1,rvs2)
+ (0.26833823296239279, 0.78849443369564776)
+ >>> stats.ttest_ind(rvs1,rvs2)
@rgommers

rgommers Jun 9, 2012

Owner

The exact same as 2 lines above.

@rgommers rgommers and 1 other commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
+ >>> stats.ttest_ind(rvs1, rvs5)
+ (-3.6578361690197507, 0.00026764922615141298)
+ >>> stats.ttest_ind_uneq_var(rvs1, rvs5)
+ (-3.6578361690197512, 0.00026764941377692148)
+
+ >>> rvs6 = stats.norm.rvs(loc=8, scale=20, size=500)
+ >>> stats.ttest_ind(rvs1, rvs6)
+ (-3.3134712196511447, 0.00095456933477067706)
+ >>> stats.ttest_ind_uneq_var(rvs1, rvs6)
+ (-3.3134712196511447, 0.00096749377255367309)
+
+ >>> rvs7 = stats.norm.rvs(loc=8, scale=20, size=100)
+ >>> stats.ttest_ind(rvs1, rvs7)
+ (-0.28592251716234818, 0.77503648327796393)
+ >>> stats.ttest_ind_uneq_var(rvs1, rvs7)
+ (-0.18159713446343606, 0.85623941474785026)
@rgommers

rgommers Jun 9, 2012

Owner

Better to remove a few examples I think, this is quite long. How about keeping one example per point you want to make? So keep rvs1, rvs2, rvs3, rvs5.

@MrCreosote

MrCreosote Jun 9, 2012

Contributor

There are 6 scenarios I wanted to explain in the documentation:

  1. equal means equal variances (rvs2)
  2. equal means unequal variance (rvs3)
  3. equal means unequal variance unequal n (rvs4) <- this difference is pretty drastic depending on n. If n1=n2, then the t-statistic is the same whether you use Welch's t-test or the standard 2 sample independent t-test. If n1 != n2 the t-statistic can be very different. I think this is an important point.

rvs5, 6, and 7 are repeating scenarios 1, 2, 3, with unequal means.

I'd suggest keeping rvs4, but we could lose 2/3 of 5,6,7. I'd keep 7, I suppose.

I've clarified the documentation to explain the point I was trying to make.

@rgommers

rgommers Jun 9, 2012

Owner

Removing 5 and 6 sounds good. I would put some more of what you want to get across, like your note 3) above, into words. That's much clearer then let the user draw the conclusion from a lot of numbers.

@rgommers rgommers and 1 other commented on an outdated diff Jun 9, 2012

scipy/stats/tests/test_stats.py
+
+ # check vs. R
+ a = (1, 2, 3)
+ b = (1.1, 2.9, 4.2)
+ pr = 0.53619490753126731
+ tr = -0.68649512735572582
+ t, p = stats.ttest_ind_uneq_var(a, b)
+ assert_array_almost_equal([t,p], [tr, pr])
+
+ a = (1, 2, 3, 4)
+ pr = 0.84354139131608286
+ tr = -0.2108663315950719
+ t, p = stats.ttest_ind_uneq_var(a, b)
+ assert_array_almost_equal([t,p], [tr, pr])
+
+ #regression test
@rgommers

rgommers Jun 9, 2012

Owner

Regression test for what bug/ticket?

@MrCreosote

MrCreosote Jun 9, 2012

Contributor

I just copied this from the =var t-test test code. I assumed it was just marking that these were general regression tests against the current behavior. I can take the line out if people wish.

@rgommers

rgommers Jun 9, 2012

Owner

Ah, okay. Not a big deal, so let's leave it as is then.

@rgommers rgommers commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
return t, prob
+def ttest_ind_uneq_var(a, b, axis=0):
+ """Calculates the T-test for the means of TWO INDEPENDENT samples of scores with
+ unequal variances (or situations where it is unknown if the variances are equal).
+
+ This is a two-sided test for the null hypothesis that 2 independent samples with unequal variances
@rgommers

rgommers Jun 9, 2012

Owner

Line should be wrapped to <80 characters.

Owner

rgommers commented Jun 9, 2012

There's indeed a quite large overlap with other t-tests. Due to there being 3 separate functions already, I think adding this as a 4th is OK. Then adding one new function to unify them, with default to equal_var = False as discussed on the mailing list.

Member

josef-pkt commented Jun 9, 2012

I would find it much easier to read the code if this were just one function, not a one function wrapper around 3 or 4 main functions and additional helper functions.
My argument for reducing the number of functions was also to reduce the code separation and splittin up of code.

using the helper functions, separates out crucial parts of the code, and makes it harder to check what's going on.
for example, I'd rather have prob = distributions.t.sf(np.abs(t), df) * 2 as boiler plate in each function, than having to go look for what distribution and what tail is used.

(aside: when I rewrote ttest initially, I had added many examples because I was using it similar to a doctest to check that it works. I also think now this was overkill for docstrings.)

@josef-pkt josef-pkt and 1 other commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
+ return t, prob
+
+def _ttest_ind_setup(a, b, axis):
+ a, b, axis = _chk2_asarray(a, b, axis)
+ v1 = np.var(a, axis, ddof=1)
+ v2 = np.var(b, axis, ddof=1)
+ n1 = a.shape[axis]
+ n2 = b.shape[axis]
+ return v1, n1, v2, n2, a, axis, b
+
+def _ttest_finish(df,t):
+ prob = distributions.t.sf(np.abs(t), df) * 2 #use np.abs to get upper tail
#distributions.t.sf currently does not propagate nans
#this can be dropped, if distributions.t.sf propagates nans
#if this is removed, then prob = prob[()] needs to be removed
prob = np.where(np.isnan(t), np.nan, prob)
@josef-pkt

josef-pkt Jun 9, 2012

Member

I think this part can be dropped now. IIRC, the distributions now propagate nans, needs checking

>>> stats.t.sf([1,0, np.nan], 5)
array([ 0.18160873,  0.5       ,  0.        ])   #last should be nan
>>> import scipy
>>> scipy.__version__
'0.9.0'
@MrCreosote

MrCreosote Jun 9, 2012

Contributor

In 0.10.1 if t = nan then p = 0, so assuming I understand the issue correctly this needs to stay.

@josef-pkt

josef-pkt Jun 9, 2012

Member

Yes, then it needs to stay. I thought this has changed in distributions. (I'm not able to build scipy, so I cannot check the latest code.)

@MrCreosote

MrCreosote Jun 9, 2012

Contributor

Dangit, sorry, I had imported the release version, not the dev version. You're right, this can go.

Owner

rgommers commented Jun 9, 2012

Josef, your comment on readability makes sense, but can you clarify what to you would be the desired end result for t tests?

Right now we have ttest_1samp, ttest_ind and ttest_rel.

Once we agree on where we want to go, we can figure out the best way to get there, with least amount of work for users.

Member

josef-pkt commented Jun 9, 2012

The easiest in terms of keeping related code together, what I would have done, is just to add the keyword to ttest_ind, and add an if statement with denom= and df=

I was reading the SAS manual for ttest again. (They have one more option, COCHRAN, but I don't think it's a good addition.) ttest_rel could just delegate the calculations to ttest_1samp with the difference.

Merging all of them like R would be possible, but _1samp, _ind and _rel are 3 different problem categories (different chapters or sections in books), and we have more 1samp versus 2samp functions in stats.

Member

josef-pkt commented Jun 9, 2012

Just as detail: SAS has one procedure TTEST with options.

The sections in the SAS documentation are: One-Sample Design, Paired Design, Two-Independent-Sample Design, AB/BA Crossover Design and TOST Equivalence Test. The last two are not available yet in scipy or statsmodels.

Owner

rgommers commented Jun 9, 2012

Besides ttest there's only kstest and ks_2samp as far as I can see.

Member

josef-pkt commented Jun 9, 2012

I thought there are more.
In many cases the 2-sample versus k-sample versions of tests have different names. It's difficult to keep an overview over the functions, zmap, zscore are kind of one versus 2sample, entropy has an optional second argument. f_oneway is the k sample version of the ttest, some non-parametric tests have 2 sample and k sample versions.

(the never finished stats review)

Owner

rgommers commented Jun 9, 2012

Ah, yes. Hard to see that now. There's a tiny bit of info in http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html, but it would be good to have a complete overview at some point.

Member

josef-pkt commented Jun 9, 2012

Ah, the never finished tutorial. :) At "Comparing two samples" I ran out of patience or time.
scipy.stats.stats has an overview list as source comment but it's not updated and not complete.

Owner

rgommers commented Jun 9, 2012

Now we're talking about tests, I also found a Mann-Whitney U fixup lying around. I think it's mostly done, will submit it in a minute.

Contributor

MrCreosote commented Jun 9, 2012

I'm changing the return value @ 0/0 to NaN for all four functions as we speak, but before I can fix anything else we need to make a decision on what the interface to the four types of t-test is going to look like. Given that Ralf wants to tag a beta in 5 hours, I'd advocate for keeping the changes as minimal as possible. I'll commit to changing the interface to whatever the list agrees on (within reason) for the next version, if you wish.

So who is going to make the decision and how is it going to get made?

Member

josef-pkt commented Jun 9, 2012

Ralf is the release manager,

I would stick with the current 3 function, as functions by "problem topic", and just add the option for unequal variance and change 0/0 to np.nan for all 3 functions.

Then, add a single ttest function in future that has unequal_var = True equal_var=False as default.

The only disadvantage is that we cannot change the (un)equal_var default for ttest_ind without deprecation.

Contributor

MrCreosote commented Jun 9, 2012

Josef said:
Ralf is the release manager,

I would stick with the current 3 function, as functions by "problem topic", and just add the option for unequal variance and change 0/0 to np.nan for all 3 functions.

Then, add a single ttest function in future that has unequal_var = True equal_var=False as default.

The only disadvantage is that we cannot change the (un)equal_var default for ttest_ind without deprecation.

OK, if Ralf has the final call as soon as he decides what we're going to do I'll go do it. I'm not religious on this issue.

Contributor

MrCreosote commented Jun 9, 2012

Actually, here's what I suggest:

Since we're running out of time and the current PR has four methods (and everything is fixed at this point other than the documentation), let's go with the four separate methods for this release. That at least gets the functionality into the upcoming release.

For the next release I'll write a combined function with kw args paired=FALSE, equal_var=FALSE and mu=None to select the test type. The four original functions will be converted to deprecated stubs that call the single function.

As soon as I get the word from Ralf I'll fix my code to match and push the commit.

Owner

rgommers commented Jun 9, 2012

Everyone agrees on 0/0 --> nan now I think, so that one is fine.

As for who decides, the decision should be taken on the mailing list with everyone with an interest in the topic agreeing. I'm only going to make the branch tomorrow night.

If we're not going for the R-like interface (4-in-1), then I agree with Josef that just adding a keyword equal_var is the way to go. Actually, if we do it doesn't matter in the long run anyway. So new proposal: go with equal_var. Seems to be the path of least resistance, then we can get this done by tomorrow.

Contributor

MrCreosote commented Jun 9, 2012

Ok. I'm assuming that the default is equal_var = True.

Owner

rgommers commented Jun 9, 2012

Yes, it has to be to not break backwards compatibility.

Whether or not to aim for a change to False later can be discussed on the mailing list.

@MrCreosote MrCreosote Merge ttest_ind and ttest_ind_uneq_var functions, t = 0 / 0 = NaN
Merged the 2 sample independent t-test functions & added equal_var kw
arg.
t = 0 / 0 now returns NaN in all 3 t-test functions.
Cleaned up the docs slightly in all 3 functions.
Changed tests to reflect new behavior.
df54831
Contributor

MrCreosote commented Jun 9, 2012

Ok, this commit should address all the issues thus far, I think. Let me know if I missed anything.

@rgommers rgommers commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
This is a two-sided test for the null hypothesis that 2 independent samples
- have identical average (expected) values.
+ with have identical average (expected) values.
@rgommers

rgommers Jun 9, 2012

Owner

This sentence doesn't parse anymore.

@rgommers rgommers commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
The arrays must have the same shape, except in the dimension
corresponding to `axis` (the first, by default).
axis : int, optional
Axis can equal None (ravel array first), or an integer (the axis
over which to operate on a and b).
+ equal_var : boolean, optional
+ The default is True, which performs a standard indepentent 2 sample test
@rgommers

rgommers Jun 9, 2012

Owner

boolean --> bool

indepentent --> independent

@rgommers rgommers commented on an outdated diff Jun 9, 2012

scipy/stats/stats.py
- d = (a-b).astype('d')
- v = np.var(d,axis,ddof=1)
+ d = (a - b).astype('d')
@rgommers

rgommers Jun 9, 2012

Owner

astype('d') --> astype(np.float64) would be more readable.

Contributor

MrCreosote commented Jun 9, 2012

Arrg. Fixed.

When w.e return nan, then the np.where check is not necessary anymore. 0./0. is automatically np.nan. The only problem would be if dm and denom are both python floats, then t = np.devide(dm, denom) would force the use of the numpy function which knows about nans.

Member

josef-pkt commented Jun 10, 2012

add http://en.wikipedia.org/wiki/Welch%27s_t_test to references for ttest_ind

I think this looks good now, (except for redundant where check)

@MrCreosote MrCreosote Remove redundant np.where() checks from t-tests
0 / 0 is already NaN in np, so no need to check for 0 / 0 cases
Changed / to np.divide for python floats, which will then behave the
same
Small docs fix
a3b3a35
Contributor

MrCreosote commented Jun 10, 2012

I think we're done at this point. Yes?

Owner

rgommers commented Jun 10, 2012

Yes, looks good now. Merged in fd91947. Thanks Gavin and Josef.

@rgommers rgommers closed this Jun 10, 2012

Owner

rgommers commented Jun 10, 2012

I'll send an update to the mailing list. Still need to resolve whether or not to add a single ttest method that combines the other one, and whether or not to change the default to equal_var=False.

Owner

rgommers commented Jun 10, 2012

Also made some style changes in 9e0353a and updated the release notes and author list for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment