ENH: enhance _MultivariateOLS, MANOVA, code duplication, #8722

josef-pkt · 2023-03-07T20:37:35Z

I thought MANOVA is using _MultivariateOLS.
However, it looks like they share code, helper functions, but manova doesn't reuse the _MultivariateOLS class.
There is also quite a bit of code duplication.

I was looking for access to the _MultivariateOLS instance in the MANOVA and it's test result instances, but it's not available.

_MultivariateOLS does not have a summary implemented, which makes it difficult to get a quick overview of results.

context #8713 trying to figure out usage and problems with multi-way manova.

based on an example: _MultivariateOLS runs an identical test to MANOVA

formula = 'PC1 + PC2 + PC3 + PC4 ~ C(Genotype, Helmert) * C(Temp, Helmert) * C(Time, Helmert)'
mod = _MultivariateOLS.from_formula(formula, data=p_df)
res = mod.fit()
tt = res.mv_test()

but res does not have any of the usual results attributes and methods, not even params

[i for i in dir(res) if not i.startswith("__")]
['_fittedmod',
 'design_info',
 'endog_names',
 'exog_names',
 'mv_test',
 'summary']

The text was updated successfully, but these errors were encountered:

josef-pkt · 2023-03-09T20:08:08Z

I'm trying to figure out more generally what we need for Multivariate linear model.

mv_test with eigenvalue based tests: it looks like the multivariate linear model only supports this.
However, it works also for a single restriction on one or two parameters, but likely not for arbitrary linear restrictions.
e.g. https://stats.stackexchange.com/questions/526672/linear-hypothesis-test-for-multivariate-linear-model-mlm-object-in-r
wald_test for individual parameter
- need cov_params, AFAICS for cross-equation restriction we need the full GLS cov_params for flattened params. For within equation restrictions we would only need the var of each residual as scale
- actual hypothesis test methods would be inherited as in MNLogit
summary for model results is missing
robust cov_types, not clear, e.g. for ENH: SUR and sandwich robust covariance - cov_type #4121
I might have found more references but did not look at those yet, e.g. HC
...

Do the inferential results differ from OLS with cluster robust standard errors?
The params will be the same, and I guess cov_params will be the same or similar (except for df, small sample corrections)
What are the "rank" conditions between MultivariateOLS and OLS with cluster robust standard errors.

MultivariateOLS might be a misnomer if we add GLS inference. ie. only params and within inference are equivalent to OLS.
"MultivariateLinearModel" which might mean a likelihoodmodel, gaussian or quasi-gaussian
maybe "MultivariateLS"

Do we need a "blown up", memory inefficient version as reference, using kronecker product exog?
It would not be a memory problem in small samples as in experimental data.
But, I think we get into the SUR case if we allow for restrictions or penalization (#7255) of individual parameters.

What about GMM equivalent model?
Would not be to difficult with horizontal stacking of moment conditions, and robust cov_types would be inherited.

Note: this is all for balanced groups/panel case, i.e. same number of obs for each equation.

aside:
nice proof of equivalence of within cov_params is identical between single equation OLS and GLS
https://economics.stackexchange.com/questions/45753/seemingly-unrelated-regression-estimation-equivalent-to-ols-standard-errors
However, it does not look at cross-equation cov, cov(beta_i, beta_j) for i != j

josef-pkt · 2023-03-10T18:35:53Z

Inference in Multivariate linear model "MultivariateGLS"?
same regressors for each endog.

this article looks looks useful, includes the eigenvalue based tests Rao, ...
and standard Wald on raveled params
for row-column hypothesis as in MANOVA

Stewart, Kenneth G. “Exact Testing in Multivariate Regression.” Econometric Reviews 16, no. 3 (January 1, 1997): 321–52. https://doi.org/10.1080/07474939708800390.

a quick try comparing t_test with mv_test
mvGLS t_test with mv_test: test statistic t**2 and F are very close, however not identical.
mvGLS t_test with single equation OLS t_test: test statistics, tvalues are identical but p-values are only close if use_t=True.

problem is how to define consistent df_resid

for _multivariateOLS, I used res.df_resid = nobs * k_groups - res.params.size (corresponding to long form of OLS/GLS)
single equation uses nobs - k_vars
aside: k_groups - k_params is negative in the example, k_params = k_groups * k_vars

stats analogue would be 2 or k paired, correlated samples. What's the df for t-test?
It should be nobs - 1 if we just t-test the observationwise (pair) diff

update
df_resid = nobs - k_vars looks better
justification would be that params are equivalent to single equation regression

The mv_test have df_denom (df_resid) that are neither of the two above.
In small sample Roy's greatest root test differs quite a bit from the other three, both in p-values and df_num, df_denom (for multi-parameter joint hypothesis)

aside: multivariate L B M hypothesis only allows for within equation hypotheses, AFAICS, but joint over all or several equations.
wrong M can do multi-equation comparison. The only restriction is that hypothesis are on a rectangular block of params.

josef-pkt · 2023-03-10T23:34:11Z

aside: Roy's greatest root

df is not the same as in Steward 1997, it uses the max(p, q)

quote
"where r=max(p, q) is an upper bound on F that yields a lower bound on the significance level. Degrees of freedom are r for the numerator and v - r + q for the denominator. "
where "Let v be the error degrees of freedom"

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_introreg_sect038.htm#statug_introreg002005

    sigma = results.loc["Roy's greatest root", 'Value']
    r = np.max([p, q])
    df1 = r
    df2 = v - r + q
    F = df2 / df1 * sigma

josef-pkt · 2023-03-15T17:54:16Z

aside:
I should add the analogue to wald_test_terms to MANOVA, MultivariateGLS
specifically all terms that involve a factor are zero under null

current MANOVA is type 3, i.e. main factor is tested in the model that also includes interaction terms

http://users.stat.umn.edu/~helwig/notes/aov2-Notes.pdf for univariate anova
p. 57 type 2 anova tests main effect in the model without interaction effect (section for unbalanced anova)
this is different from testing that both main and interaction effects are zero in full model.

josef-pkt · 2023-03-15T18:01:15Z

back to the roots

Berndt, Ernst R., and N. Eugene Savin. “Conflict among Criteria for Testing Hypotheses in the Multivariate Linear Regression Model.” Econometrica 45, no. 5 (1977): 1263–77. https://doi.org/10.2307/1914072.

One application for multivariate models are cost and consumption share estimation.
This should get us closer to one of the original demands for multivariate regression in compositional analysis #3560
(related MNLogit does not handle fractional data, AFAIK)

josef-pkt added type-enh type-refactor comp-multivariate labels Mar 7, 2023

josef-pkt added this to the 0.15 milestone Mar 7, 2023

josef-pkt mentioned this issue Mar 9, 2023

Results of n-way MANOVA with interactions differ significantly from results from R's stats::manova #8713

Open

This was referenced Mar 11, 2023

ENH: df_resid for exog with structural zeros #8727

Open

ENH collect tools for linear restrictions, constraints, contrasts #1668

Open

josef-pkt mentioned this issue Mar 18, 2023

DOC: unstructured covariance is missing in GEE docs #8741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: enhance _MultivariateOLS, MANOVA, code duplication, #8722

ENH: enhance _MultivariateOLS, MANOVA, code duplication, #8722

josef-pkt commented Mar 7, 2023

josef-pkt commented Mar 9, 2023

josef-pkt commented Mar 10, 2023 •

edited

josef-pkt commented Mar 10, 2023 •

edited

josef-pkt commented Mar 15, 2023

josef-pkt commented Mar 15, 2023

ENH: enhance _MultivariateOLS, MANOVA, code duplication, #8722

ENH: enhance _MultivariateOLS, MANOVA, code duplication, #8722

Comments

josef-pkt commented Mar 7, 2023

josef-pkt commented Mar 9, 2023

josef-pkt commented Mar 10, 2023 • edited

josef-pkt commented Mar 10, 2023 • edited

josef-pkt commented Mar 15, 2023

josef-pkt commented Mar 15, 2023

josef-pkt commented Mar 10, 2023 •

edited

josef-pkt commented Mar 10, 2023 •

edited