Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: stats: Add new ANOVA functions (WIP) #13783

Closed
wants to merge 2 commits into from

Conversation

WarrenWeckesser
Copy link
Member

@WarrenWeckesser WarrenWeckesser commented Apr 1, 2021

Expand the capabilities for analysis of variance in SciPy.

This is a very rough draft; issues include a rough (i.e. inconsistent and incomplete) API, no parameter validation tests (but lots of basic tests), incomplete docstrings, and more. I want to get what I have so far in a PR and run it though CI and the docs build.

@WarrenWeckesser WarrenWeckesser marked this pull request as draft April 1, 2021 02:40
@WarrenWeckesser WarrenWeckesser added enhancement A new feature or improvement scipy.stats labels Apr 1, 2021
@mdhaber
Copy link
Contributor

mdhaber commented Apr 1, 2021

Please let me know what you do to fix test_warning_calls_filters!

@mdhaber
Copy link
Contributor

mdhaber commented Apr 5, 2021

Also, when you are ready for this to start being reviewed, can we split this up into smaller PRs (ideally one per function)?

@mdhaber mdhaber requested a review from josef-pkt March 10, 2023 18:47
high=means[i] + delta)))
return cis

def deltas(self, confidence_level=0.95):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add those.
They will be misleading. Users should call tukey-hsd instead

@josef-pkt
Copy link
Member

oneway looks fine, but anova users will look at pairwise comparisons and don't think about multiple testing problems.

I never looked at two-way anova. It sounds too painful.
(Statsmodels supports multiway anova only for regression models)
(I just looked again at two and three way MANOVA, and interpretation of main and interaction effects are tricky. Two long issues why does statsmodels' results differ from SPSS and why do they differ from R, which each took me several days to figure out. :)

@WarrenWeckesser
Copy link
Member Author

WarrenWeckesser commented Mar 10, 2023

Thanks @josef-pkt. Obviously this PR has stagnated for too long.

I've been tempted to close this, and instead work with statsmodels so that it is the place to go for ANOVA and its variants. It would be good to avoid duplication if we can, and statsmodels already has a quite a bit more than SciPy's f_oneway. @josef-pkt, what do you think?

@tupui
Copy link
Member

tupui commented Mar 13, 2023

I would support moving this work to statsmodels. (If we do, we should also update the roadmap.)

@josef-pkt
Copy link
Member

I'm not really interested in two-way anova since we support general anova hypothesis testing through OLS, similarly, MANOVA is only a special case for multivariate tests in multivariate linear model.

I had added a lot of support for oneway anovas, but I wouldn't want to do the same for two-way as standalone function, unless there some extras for two-way that I'm not (yet) aware of.
https://www.statsmodels.org/dev/stats.html#oneway-anova

Two-way still fits in scipy.stats. It's requested every once in a while.

The main reason for me to add standalone hypothesis tests that can also be done with models is better small sample statistics.
e.g. Satterthwaite degrees of freedom for t-test and oneway. Models almost exclusively rely on the generic and asymptotic p-values, which are not very good in many cases for small or very small samples.
e.g. statsmodels/statsmodels#8727
That's a reason to support special design matrices, but my background is not good enough yet to know what to do.

A related request is within and between effects in repeated measures anova. statsmodels has univariate repeated measures anova but it's restrictive, either because of the theory or because of our understanding. I never understood much in this area. (The literature works a lot with sum of squares, but I never really understood or found the theoretical background behind it. Some statisticians argue to drop anova completely and just use the corresponding "modern" models directly.)

@WarrenWeckesser
Copy link
Member Author

I closed the pull request. For now, the code resides in a separate repo, https://github.com/WarrenWeckesser/yanova

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants