Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value of equal_var parameter should be False (scipy.stats.ttest_ind) #10889

Closed
anandna123 opened this issue Oct 1, 2019 · 5 comments
Closed
Labels
needs-decision Items that need further discussion before they are merged or closed scipy.stats

Comments

@anandna123
Copy link

anandna123 commented Oct 1, 2019

Is your feature request related to a problem? Please describe.
I was looking into the ttest_ind functionality and saw that the default value of equal_var parameter is True and this would be accurate/applicable only in cases where population variances are equal.

Describe the solution you'd like
Considering the fact that most of the cases variance would be different, doesn't it make sense to change the default value of equal_var parameter to False in maybe next version? Many times the non-statistical/non-mathematical background users aren't aware that the value should be False if variances are different.

Describe alternatives you've considered
Backward compatibility shouldn't be an issue for newer versions, is what I believe. Please let me know your thoughts...

Additional context (e.g. screenshots)

@anandna123 anandna123 changed the title Default value of equal_var paramenter should be False (scipy.stats.ttest_ind) Default value of equal_var parameter should be False (scipy.stats.ttest_ind) Oct 1, 2019
@tylerjereddy
Copy link
Contributor

Backward compatibility shouldn't be an issue for newer versions

It looks to me like changing the default value of that kwarg would break backward compatibility. Not saying it can't be done if stats people agree though. But it would typically have to be worth the disruption it might cause.

@anandna123
Copy link
Author

anandna123 commented Oct 14, 2019

Hey @tylerjereddy, thanks for the comment.

I said that the backward compatibility won't be affected because I believe these 2 settings (of equal_var = False or True) will lead to pretty much the same t-score (or very similar) if variances are actually equal.

The only time Student's t-test (equal_var = True) value would differ from Welch's t-test (equal_var = False) is if there are differences in variance and in that case Student’s t-test shouldn't be used. Anyway people who are aware of this would have set equal_var = False (in cases where variances are not equal or it is unknown) and if they weren't aware, they are using default setting which is wrong. So this proposed change is either backward compatible or it changes the result to correct result. Please let me know if my understanding is incorrect.

My 2 cents is that updating this default setting (to equal_var = False) would probably get the results right in most of the future cases. This would also be a step in the right direction and could be accomplished with clear notification in documentation. This is why it feels like its worth the disruptions this change might cause.

I would let the stats people decide which is best and the right thing to do...

Thanks,
Anand

(Edited few times to make some snippets more explicit, apologies for multiple edits)

@tylerjereddy
Copy link
Contributor

I would let the stats people decide which is best and the right thing to do

sounds good

@mdhaber mdhaber added the needs-decision Items that need further discussion before they are merged or closed label Dec 11, 2020
@mdhaber
Copy link
Contributor

mdhaber commented Dec 11, 2020

@josef-pkt I see you have worked with ttest_ind for a long time, so I thought you might be a good person to ask about this?

@mdhaber
Copy link
Contributor

mdhaber commented Dec 20, 2020

To maintain backwards compatibility, we can't really change this unless we were to create a new function, and I don't think that's warranted here. Thanks for the suggestion @anandna123. Do let us know if there's something we can do to make the documentation even clearer; we wouldn't want the existing default to cause users to make mistakes.

@mdhaber mdhaber closed this as completed Dec 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Items that need further discussion before they are merged or closed scipy.stats
Projects
None yet
Development

No branches or pull requests

3 participants