Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUMM/Overview/Roadmap: multivariate #4179

Open
josef-pkt opened this issue Dec 23, 2017 · 0 comments
Open

SUMM/Overview/Roadmap: multivariate #4179

josef-pkt opened this issue Dec 23, 2017 · 0 comments
Labels
comp-multivariate roadmap roadmaps, list of todos, overall and by topic type-enh

Comments

@josef-pkt
Copy link
Member

josef-pkt commented Dec 23, 2017

What's the status and roadmap for multivariate Statistics

main methods

either available or started in statsmodels (or related packages)

  • Models

    • _MultivariateOLS
    • SUR
    • ...
    • GMM (can we get more explicit or direct multivariate, multi equation support?)
    • panel data, GEE, system of equations, VAR, VECM, multivariate statespace models,
      two-part or multi-part models (e.g. heckman type, bivariate Probit)
      many related models, not in narrow definition of multivariate statistics
    • large sparse versions of Models
  • Principal Components Analysis (PCA)

  • Canonical Correlation Analysis (CCA)

  • Factor Analysis

other methods

some are available in scikit-learn, those are not high priority unless there are specific results statistics that are important.

  • Partial Least Square (PLS)
  • Classical Multidimensional Scaling (MDS)
  • Linear Discriminant Analysis (LDA)
  • Multiclass LDA
  • Independent Component Analysis (ICA), FastICA
  • Probabilistic PCA
  • Kernel PCA
  • Correspondence Analysis
  • Cluster Analysis (scipy)
  • analogues for ordered variables: ?, e.g. item response models
  • Redundancy Analysis

standalone and supporting hypothesis tests

hypothesis tests and related statistics including confidence intervals

  • tests on mean, Hotelling, multivariate control charts
  • MANOVA
  • tests of covariances and correlation, covariance structure analysis
  • gof tests for distributions
  • rank test
  • test for non-continuous data, e.g. multinomial, multivariate GLM
  • non-Pearson correlation and association measures (polychoric, robust, regularized, ...),
    (kendalltau and spearman in scipy)

supporting code

  • factor rotation, procrustes
  • matrix tools
  • multivariate transformation, e.g. compositional
  • correlation_tools, closest positive definite
  • missing values, imputation (what and where does it belong?)

R taskview also lists multivariate distributions including copulas

list of methods partially taken from https://github.com/JuliaStats/MultivariateStats.jl, TOC of Stata mv, and R taskview https://cran.r-project.org/web/views/Multivariate.html

Todo, Priorities

???
whatever someone is working on and contributes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-multivariate roadmap roadmaps, list of todos, overall and by topic type-enh
Projects
None yet
Development

No branches or pull requests

1 participant