Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust covariances: integrate with models, normal based Wald tests #1189

Closed
wants to merge 34 commits into from

Conversation

josef-pkt
Copy link
Member

first round of integrating robust covariances into the models, and make the cov_type handling more consistent across models.
plus add_t normal distribution based Wald statistics

changes behavior of RLMResults, see #1164

currently after OLS only, still needs to be extended to other models. (might currently be wrongly inherited by WLS/GLS)
OLS robust covariance integration tested against Stata, especially ivreg2

WIP: there is a lot more to do, but should be merged soonish, so it can be merged with @bashtage 's PRs

main discussion and list of issues is in #1158

(need TravisCI to check because I'm not always running the full test suite, made mistake in RLM)

for changelog

  • add normal and chisquare based wald test to all models.
    t_test and the new wald_test in LikelihoodModelResults now can now also use normal resp. chisquare distribution.
    The results instances can use a use_t boolean attribute to indicate whether the t or the normal distribution is used in t_test and whether F or chisquare distribution is used in wald_test. use_t also determines the distribution in pvalues and conf_int of a model.
    The default use_t corresponds to the previous use of the t or normal distribution in pvalues, all models except for the linear models, OLS, WLS and GLS, use the normal distribution by default.
    TODO: some models (RLM, ...) still use hardcoded normal distribution
  • improved ContrastResults after t_test, which now also create a parameter summary table
  • integrate robust covariance matrices with RegressionResults. get_robustcov_results was added as a method to RegressionResults that creates and returns a new result instance that uses the requested robust covariance for all inferential statistics and tests, pvalues, conf_int, f_test, t_test.

refactoring:

  • RegressionResults: covHCx and HCx_se are now cached attributes
  • robust.RLMResults now sets the requested covariance matrix of the parameters as default and is used for all inferential statistics and tests. Before bse, tvalues and pvalues were already based on the robust covariance matrix of the parameter estimates bcov_scaled.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 87684b7 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

@josef-pkt
Copy link
Member Author

error
import results.results_macro_ols_robust as res from test_robustcov.py
is invalid syntax in python 3

@josef-pkt
Copy link
Member Author

The import works when I type it in python 3.3, so maybe a problem with 2to3 (need to start toxing across versions again)

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 958839b on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

@josef-pkt
Copy link
Member Author

Ok, the last commit fixed the import for python 3

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling e14b4ee on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

@bashtage
Copy link
Member

One thing I don't quite get is the use of t_test and wald_test - both of these are Wald tests, although the t-test is usually restricted to a single hypothesis which can be easily implemented as a 1-sided or 2-sided test, while the usual quadratic Wald test statistic is more difficult to interpret as a 1-sided test.

Once this gets in, my PR will need some further work as there is some overlap.

@josef-pkt
Copy link
Member Author

@bashtage
The difference between t_test and f_test/wald_test is that the first treats each hypothesis (row of the restriction matrix) as separate hypothesis, running several t_test in parallel, while f_test/wald_test treat it as a single joint hypothesis.
(f_test is now obsolete as a special case of wald_test, but I decided, so far, to keep it for backwards compatibility and name recognition.)

t_test can then also be used for multiple testing (but is not connected to the p-value corrections yet)
The vectorized version of t_test was initially just a byproduct of how numpy broadcasting and the linear algebra works, but when I saw it we decided to make a feature out of it.

I still need to get unit tests for cluster-robust covariances, then we should merge your PRs and this one, so we have a common code base again.

@josef-pkt
Copy link
Member Author

As a followup to this: I'm starting to like the idea of putting the selection of the robust or non-robust cov_type into the model.fit() and the results.__init__ much better now.

The method in this PR is still useful when we want to switch cov_type without reestimating the model.

@bashtage
Copy link
Member

I see that now. Might be useful to allow one-sided t-tests eventually, something like type="upper" with a default of 2-sided.

@josef-pkt
Copy link
Member Author

Yes, I thought about adding one-sided when I was working on the basic t-test.
I forgot about it, and opened now #1193

@josef-pkt
Copy link
Member Author

strange observation needs checking:

with cluster robust standard errors, I match Stata's bse, but for the confidence interval, it looks like Stata uses 9 dof, instead of 197, ivreg2 produces the same result with option small

>>> stats.t.ppf(0.975, 9)
2.2621571627409915

Grunfeld data

. regress invest mvalue kstock, vce(cluster company)

Linear regression                                      Number of obs =     200
                                                       F(  2,     9) =   51.59
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.8124
                                                       Root MSE      =  94.408

                               (Std. Err. adjusted for 10 clusters in company)
------------------------------------------------------------------------------
             |               Robust
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1155622   .0158943     7.27   0.000     .0796067    .1515176
      kstock |   .2306785   .0849671     2.71   0.024     .0384695    .4228874
       _cons |  -42.71437    20.4252    -2.09   0.066    -88.91939    3.490649
------------------------------------------------------------------------------

edit

just saw f-statistic uses df F(2, 9) while non-cluster robust is F(2, 197)
ivreg2 without small reports normal based params table but also has F(2, 9)
ivreg2 without small uses normal, and the jump t(9) to normal is much larger than t(197) to normal

@josef-pkt
Copy link
Member Author

regression results summary has use_t=True still hardcoded,
need unit test for summary to catch this

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling eeeb326 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

@josef-pkt
Copy link
Member Author

TODO: We should also add 'non-robust' as an option get_robustcov_results, if we allow setting the cov_type in fit(), so we can still get and use the non-robust cov_params if we picked a robust version in `fit'.
It's also easier if we treat all cov the same way.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 2032dc2 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling e8e6775 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 4958ed0 on josef-pkt:robust_cov into fb72fe4 on statsmodels:master.

@bashtage
Copy link
Member

This is looking close. I have been thinking about "nonrobust" as the name for the standard estimator, and an wondering if standard or classic might be better. I was thinking that in (Q)MLE models, the classic estimator is the inverse hessian, while the robust estimator is a sandwich, and so it is unclear what these names should be.

@josef-pkt
Copy link
Member Author

Yes, I thought of stopping here and merge this so I can go back to other PR's

I did a lot of reading in the last weeks and have now a much better idea where to go from here, and have opened a large number of new issues.

About "nonrobust":
I liked the name because it signals something. standard or classic requires that we remember the context, and in the Stock Watson undergraduate textbook "standard", i.e. the default, is heteroscedasticity robust.

I didn't see any generally applicable name in Stata for nonrobust/classic. The name is not used much because it's the default without giving any special arguments.

Difficult example for finding names: Poisson

  • our case is scale=1, full poisson specification, "nonrobust" to any deviation from "Poissoness"
  • "over-/underdispersed" scale != 1, Poisson heteroscedasticity is correctly specified up to scale, (exponential model), Wooldridge or Cameron, Trivedi called this the standard/usual for GLM
  • "heteroscedasticity robust" full HC sandwich
  • other sandwiches that we have available (cluster, ...)

"nonrobust" also sounds a bit negative, which might be good for educational purposes.
"Why do you want to be nonrobust, when you can be robust?"

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 847fefe on josef-pkt:robust_cov into fb72fe4 on statsmodels:master.

@josef-pkt
Copy link
Member Author

I'm planning to merge this today.

So far it's (almost ?) completely backwards compatible, largely new methods.
Some follow-up refactoring should be more "invasive"

@bashtage
Copy link
Member

Sounds good. Then I can go back to my PR.

@josef-pkt josef-pkt closed this in a14d989 Nov 29, 2013
@josef-pkt
Copy link
Member Author

merged after rebase in a14d989

@josef-pkt josef-pkt mentioned this pull request Dec 11, 2013
@josef-pkt josef-pkt deleted the robust_cov branch December 16, 2013 00:08
PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014
@josef-pkt josef-pkt mentioned this pull request Dec 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants