New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP ENH: cov and matrix tools to work with covariances and correlations #4143
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4143 +/- ##
=========================================
- Coverage 79.31% 79.3% -0.02%
=========================================
Files 546 548 +2
Lines 81806 82105 +299
Branches 9268 9288 +20
=========================================
+ Hits 64887 65113 +226
- Misses 14824 14890 +66
- Partials 2095 2102 +7
Continue to review full report at Codecov.
|
an idea how to organize the matrix tools and reuse the pieces is to create a SymmTools class e.g. e.g. using indexing and mask instead of linalg operators plus labels
|
var0, var1, covar, nobs = g1, g2, g3, 20 | ||
ztest_correlated(diff, var0, var1, covar, nobs) | ||
(-2.3642128782748038, 0.018068426888646225) | ||
zt1 = -2.30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ztest is an application in Steiger and Hackstian. Not converted to unit tests yet, and no other support for hypothesis tests yet.
assert_equal(cr1.shape, (n_low, n_low)) | ||
assert_allclose(cr2, cr1, rtol=1e-14) | ||
assert_allclose(cr3, cr1, rtol=1e-14) | ||
assert_allclose(cr4, cr1, rtol=1e-14) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are two underlying cov_corr functions, the other variation is on selection and selection options to get to minimal, unique correlations
missing comparison to steiger hakstian example, single element function and reference numbers, only tested for normal case, elliptical with small kurtosis should be close. _cov_corr works for arbitrary cov_cov but scaling of mom4 is not checked (1/nobs) for consistency.
|
||
|
||
def cov_cov_fisherz(corr, cov_cov): | ||
"""covariance of Fisher's Z-transformed correlation matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used yet, will be needed for Fisher transformed cases of hypothesis tests
return m4 / (nobs - ddof) | ||
|
||
|
||
def _mom4_normal(i, j, k, h, cov): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useful mainly for cross-checking, in most cases we want the kronecker product version
return ii.ravel(order='C'), jj.ravel(order='C') | ||
|
||
|
||
def _cov_cov(cov, nobs, assume='elliptical', kurt=0, data=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace data by mom4
cd72aaa
to
ef29651
Compare
(all red, I fixed a typo in the unit tests, wrong matrix used, and amended and force pushed) This finishes now the basic tools part, unit tests are still incomplete. the next parts will be two application and one additional tools refactoring or enhancement (but not now):
Some defaults or options like whether to divide by nobs or not, can wait until we have applications and see what's more convenient. |
1 similar comment
One piece that I haven't figured out yet: What's the impact of using standardized variables to compute cov_corr. Neudecker uses the cov as starting point in the general case, _cov_corr needs the raw data or mom4 of the raw data in the non-elliptical case. It looks like it might be possible to rewrite Neudecker and Wesselman to use only corr and covcov based on standardized mom4. The example illustrates the scale invariance of cov_corr: with raw data x and cov
with standardized data z and corr
how to replicate cov_corr at a constraint estimate corr_r (from test case)
that means we can (not so *) easily compute quadratic forms for goodness of fit or tests of constrained correlations, e.g. for
*) I don't manage right now, see e.g. Brown 1984 eq (2.20b) Proposition 4 this approximately reproduces the Steiger Hakstian example, continuing above
|
Just seen this (after a question about unfinished SVAR parts)
It doesn't say what edit |
Notation: Bai Li referenced in #4156 use |
Not sure if this is the right place, but suggestion/request for checks/assertions on covariance matrices being positive-semidefinite. A failure mode I see occasionally involves this failing due to floating point error (often but not always accompanied by convergence failure) |
about positive definiteness What's the context? |
It would be good to get this merged as private, experimental tools. Then, we can gradually write functions on top of it. Also. this is one of the main PRs that uses kurtosis correction, i.e. extending cov/corr inference to elliptical distributions. |
a semi-random collection of matrix function and estimators for the covariance of sample correlation and covariance matrices.
very unfinished. I didn't even try out all functions. API needs a lot of work. This is partially an "academic" exercise understanding how to work with and compute the asymptotic covariance of covariance and correlation matrices.
In contrast to #2765, I didn't find much in multivariate textbooks and not in Stata. There is a psych package in R that has many correlation functions or hypothesis tests. This PR is written based on a collection of articles going back to 1970. (references will follow)
I'm not sure yet about how and where this fits in. currently everything is in private module cov_tools. Overall this goes in the direction of modeling second moments, so far statsmodels is mostly focused on first moments.
related and not in here: GMM/unrestricted 4th moments can have bad small sample properties. There are some articles that use F instead of the usual chisquare distribution to improve this.
related, but nothing available yet: kurtosis measures to have better (approximation to) asymptotic distribution of test statistic in elliptically symmetric case, e.g. #3280 (several, many articles for multivariate methods). (I don't know if we need multivariate Mardia kurtosis or some average univariate measure)
list of references (incomplete, but most of what I looked at)
Magnus, Jan R., and H. Neudecker. 1986. “Symmetry, 0-1 Matrices and Jacobians: A Review.” Econometric Theory 2 (2):157–90.
Neudecker, H., and A. M. Wesselman. 1990. “The Asymptotic Variance Matrix of the Sample Correlation Matrix.” Linear Algebra and Its Applications 127 (January):589–99. https://doi.org/10.1016/0024-3795(90)90363-H.
Neudecker, Heinz. 1996. “The Asymptotic Variance Matrices of the Sample Correlation Matrix in Elliptical and Normal Situations and Their Proportionality.” Linear Algebra and Its Applications, Linear Algebra and Statistics: In Celebration of C. R. Rao’s 75th Birthday (September 10, 1995), 237 (April):127–32. https://doi.org/10.1016/0024-3795(95)00351-7.
Satorra, Albert, and Heinz Neudecker. 1997. “Compact Matrix Expressions for Generalized Wald Tests of Equality of Moment Vectors.” Journal of Multivariate Analysis 63 (2):259–76. https://doi.org/10.1006/jmva.1997.1696.
Steiger, James H., and A. Ralph Hakstian. 1982. “The Asymptotic Distribution of Elements of a Correlation Matrix: Theory and Application.” British Journal of Mathematical and Statistical Psychology 35 (2):208–15. https://doi.org/10.1111/j.2044-8317.1982.tb00653.x.
Browne, M. W. 1974. “Generalized Least Squares Estimators in the Analysis of Covariance Structures.” South African Statistical Journal 8 (1):1–24.
———. 1977. “The Analysis of Patterned Correlation Matrices by Generalized Least Squares.” British Journal of Mathematical and Statistical Psychology 30 (1):113–24. https://doi.org/10.1111/j.2044-8317.1977.tb00730.x.
———. 1984. “Asymptotically Distribution-Free Methods for the Analysis of Covariance Structures.” British Journal of Mathematical and Statistical Psychology 37 (1):62–83. https://doi.org/10.1111/j.2044-8317.1984.tb00789.x.
Huang, Yafei, and Peter M. Bentler. 2015. “Behavior of Asymptotically Distribution Free Test Statistics in Covariance Versus Correlation Structure Analysis.” Structural Equation Modeling: A Multidisciplinary Journal 22 (4):489–503. https://doi.org/10.1080/10705511.2014.954078.
Steiger, James H. 1980a. “Tests for Comparing Elements of a Correlation Matrix.” Psychological Bulletin 87 (2):245–51. https://doi.org/10.1037/0033-2909.87.2.245.
———. 1980b. “Testing Pattern Hypotheses On Correlation Matrices: Alternative Statistics And Some Empirical Results.” Multivariate Behavioral Research 15 (3):335–52. https://doi.org/10.1207/s15327906mbr1503_7.
Yuan, K.-H., and P. M. Bentler. 1999. “F Tests for Mean and Covariance Structure Analysis.” Journal of Educational and Behavioral Statistics 24 (3):225–43. https://doi.org/10.3102/10769986024003225.
Hayakawa, Kazuhiko. 2016. “On the Effect of Weighting Matrix in GMM Specification Test.” Journal of Statistical Planning and Inference 178 (November):84–98. https://doi.org/10.1016/j.jspi.2016.06.003.
Prokhorov, Artem. 2009. “On Relative Efficiency of Quasi-MLE and GMM Estimators of Covariance Structure Models.” Economics Letters 102 (1):4–6. https://doi.org/10.1016/j.econlet.2008.08.019.
edit
I found LISREL book that has a summary with collection of formulas in the appendix
16.1.5 General Covariance Structures (in appendix F)
Jöreskog, Karl G., Ulf H. Olsson, and Fan Y. Wallentin. 2016. Multivariate Analysis with LISREL. Springer Series in Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-33153-9.
includes also multivariate Mardia skew and kurtosis, chapter 12 appendix B