Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: related and complementary packages #6400

Open
josef-pkt opened this issue Jan 9, 2020 · 20 comments
Open

FAQ: related and complementary packages #6400

josef-pkt opened this issue Jan 9, 2020 · 20 comments

Comments

@josef-pkt
Copy link
Member

josef-pkt commented Jan 9, 2020

This should be a curated list of related packages, especially to statistics and econometrics topics that statsmodels does not (yet) provide.

(not complete adding links as found, not sorted and organized yet)

https://github.com/scikit-learn/scikit-learn The closest machine learning library. It is easy to find lists of popular machine learning libraries in Python.
Below are packages that are mainly oriented towards more traditional statistics and econometrics topics.

https://github.com/scipy/scipy statsmodels depends on many algorithm in scipy, scipy.stats has distributions and hypothesis tests (and is our older brother)

https://github.com/bashtage/linearmodels panel data and multivariate linear models
https://github.com/bashtage/arch GARCH models, unit root tests and bootstrap

https://github.com/maximtrp/scikit-posthocs Pairwise multiple comparisons (post hoc) tests

https://github.com/CamDavidsonPilon/lifelines survival analysis

https://github.com/pysal Python Spatial Analysis Library

https://github.com/lmfit/lmfit-py Non-Linear Least Squares

@bashtage
Copy link
Member

bashtage commented Jan 9, 2020

https://github.com/pysal Python Spatial Analysis Library

@josef-pkt
Copy link
Member Author

There are also stalled or abandoned packages that includes features that are not available in other packages

https://github.com/RJT1990/pyflux time series including GAS models

@josef-pkt
Copy link
Member Author

@bashtage I think you have permission to edit my comments, i.e. add to first comment

@josef-pkt
Copy link
Member Author

https://github.com/MaxHalford/Prince Python factor analysis library (PCA, CA, MCA, MFA, FAMD)
I just saw this, 350 stars, based on sk-learn with sklearn API based on description

found link on french (ENSAE) teaching site http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/ml2a/td2a_mlbasic_acp_acm_anova.html

@josef-pkt
Copy link
Member Author

a bit specialized: dynamic discrete choice models and similar in several packages
https://github.com/OpenSourceEconomics
started out at the University of Chicago, AFAIK

@josef-pkt
Copy link
Member Author

extreme value analysis
https://github.com/kikocorreoso/scikit-extremes
(I haven't looked closely) see also #5185

@josef-pkt
Copy link
Member Author

josef-pkt commented Jul 3, 2020

functional data analysis with machine learning focus, BSD-3
https://github.com/GAA-UAM/scikit-fda

not much statistics in it, but should have the basics for functional data analysis, also depth measures similar to ours (?)

(based on 10 minutes browsing)

update
another FDA package
https://github.com/StevenGolovkine/FDApy
Golovkine, Steven. 2021. “FDApy: A Python Package for Functional Data.” ArXiv:2101.11003 [Cs, Stat], January. http://arxiv.org/abs/2101.11003.

@josef-pkt
Copy link
Member Author

cubic spline smoothing https://github.com/espdev/csaps
looks similar in style to scipy splines

https://github.com/cjekel/piecewise_linear_fit_py
looks interesting, has some inferential statistic, standard errors, ..., and global optimization

@josef-pkt
Copy link
Member Author

some stats packages, I was searching for nonparametric

https://github.com/aschleg/hypothetical
https://pypi.org/project/hypothetical/
I didn't see online docs for it
I looked at it before because it has Games-Howell, and because he has good blogposts with explanations of some technical, theoretical details https://aaronschlegel.me/
also includes Van der Waerden normal-scores test, which I haven't done yet (AFAIR, I have normal scores methods only in robust cov)

https://github.com/citiususc/stac
http://tec.citius.usc.es/stac/doc/
this is more application oriented, but includes some tests that scipy and we don't have, especially for multiple testing, post-hoc problems
"Moreover, statistical tests for multiple comparison (ANOVA, Friedman, etc.) are reimplemented in STAC. This is due to the lack of information returned by the corresponding implementation in scipy.stats"

related scikit-posthoc linked to in first message above.

https://pypi.org/project/pingouin/ is becoming popular because it has a nice interface, nice returns, but is GPL

@josef-pkt
Copy link
Member Author

https://github.com/sdv-dev/Copulas Copulas including vine copulas, MIT licensed, currently work in progress, pre-alpha

https://github.com/blent-ai/pycopula development seems to have stopped, Apache license

@josef-pkt
Copy link
Member Author

structural equation modelling https://pypi.org/project/semopy/
switched from GPL to MIT

requirements.txt numpy pandas scipy sympy sklearn statsmodels numdifftools

@josef-pkt
Copy link
Member Author

outlier detection https://github.com/yzhao062/pyod
large number of algorithms

@josef-pkt
Copy link
Member Author

https://github.com/DoubleML/doubleml-for-py

Bach, Philipp, Victor Chernozhukov, Malte S. Kurz, and Martin Spindler. 2021. “DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python.” ArXiv:2104.03220 [Cs, Econ, Stat], April. http://arxiv.org/abs/2104.03220.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1): C1–68. https://doi.org/10.1111/ectj.12097.

I didn't look what this does,

@josef-pkt
Copy link
Member Author

variation on granger causality
a more narrow topic package
https://github.com/tedinburgh/causality-review

Edinburgh, Tom, Stephen J. Eglen, and Ari Ercole. 2021. “Causality Indices for Bivariate Time Series Data: A Comparative Review of Performance.” ArXiv:2104.00718 [Math, Stat], April. http://arxiv.org/abs/2104.00718.

@josef-pkt
Copy link
Member Author

tools: find nearest correlation matrix, new package with several algorithms
https://pypi.org/project/near-corr-mat/

ours is mostly find nearest positive (semi-)definite matrix.
I didn't look closely, and don't know whether there is a difference in target matrix/space.

@josef-pkt
Copy link
Member Author

causal machine learning, treatment effect estimation, package by Microsoft research, MIT licensed
looks like scikit-learn type interface to model classes

https://github.com/microsoft/EconML https://econml.azurewebsites.net/index.html

@josef-pkt
Copy link
Member Author

another survival/lifetime analysis package, MIT licensed
https://github.com/derrynknife/SurPyval

good collection of parametric distribution methods, e.g. hazard rates, ... for standard scipy distributions used in survival analysis

@josef-pkt
Copy link
Member Author

a handbook, online MIT, with notebook
not a package, but many recipes for treatment effects analysis

@josef-pkt
Copy link
Member Author

epidemiology
https://github.com/pzivich/zepid

found via https://github.com/migariane/TutorialComputationalCausalInferenceEstimators/blob/main/PythonCodeBoxes.ipynb

Smith, Matthew J., Mohammad A. Mansournia, Camille Maringe, Paul N. Zivich, Stephen R. Cole, Clémence Leyrat, Aurélien Belot, Bernard Rachet, and Miguel A. Luque-Fernandez. “Introduction to Computational Causal Inference Using Reproducible Stata, R, and Python Code: A Tutorial.” Statistics in Medicine 41, no. 2 (2022): 407–32. https://doi.org/10.1002/sim.9234.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants