Robust covariances: integrate with models, normal based Wald tests #1189

josef-pkt · 2013-11-17T12:26:27Z

first round of integrating robust covariances into the models, and make the cov_type handling more consistent across models.
plus add_t normal distribution based Wald statistics

changes behavior of RLMResults, see #1164

currently after OLS only, still needs to be extended to other models. (might currently be wrongly inherited by WLS/GLS)
OLS robust covariance integration tested against Stata, especially ivreg2

WIP: there is a lot more to do, but should be merged soonish, so it can be merged with @bashtage 's PRs

main discussion and list of issues is in #1158

(need TravisCI to check because I'm not always running the full test suite, made mistake in RLM)

for changelog

add normal and chisquare based wald test to all models.
t_test and the new wald_test in LikelihoodModelResults now can now also use normal resp. chisquare distribution.
The results instances can use a use_t boolean attribute to indicate whether the t or the normal distribution is used in t_test and whether F or chisquare distribution is used in wald_test. use_t also determines the distribution in pvalues and conf_int of a model.
The default use_t corresponds to the previous use of the t or normal distribution in pvalues, all models except for the linear models, OLS, WLS and GLS, use the normal distribution by default.
TODO: some models (RLM, ...) still use hardcoded normal distribution
improved ContrastResults after t_test, which now also create a parameter summary table
integrate robust covariance matrices with RegressionResults. get_robustcov_results was added as a method to RegressionResults that creates and returns a new result instance that uses the requested robust covariance for all inferential statistics and tests, pvalues, conf_int, f_test, t_test.

refactoring:

RegressionResults: covHCx and HCx_se are now cached attributes
robust.RLMResults now sets the requested covariance matrix of the parameters as default and is used for all inferential statistics and tests. Before bse, tvalues and pvalues were already based on the robust covariance matrix of the parameter estimates bcov_scaled.

…smodels#742

coveralls · 2013-11-17T13:52:09Z

Coverage remained the same when pulling 87684b7 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

josef-pkt · 2013-11-17T14:08:39Z

error
import results.results_macro_ols_robust as res from test_robustcov.py
is invalid syntax in python 3

josef-pkt · 2013-11-17T18:02:31Z

The import works when I type it in python 3.3, so maybe a problem with 2to3 (need to start toxing across versions again)

coveralls · 2013-11-17T18:28:51Z

Coverage remained the same when pulling 958839b on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

josef-pkt · 2013-11-17T18:35:42Z

Ok, the last commit fixed the import for python 3

coveralls · 2013-11-17T20:09:54Z

Coverage remained the same when pulling e14b4ee on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

bashtage · 2013-11-18T17:36:23Z

One thing I don't quite get is the use of t_test and wald_test - both of these are Wald tests, although the t-test is usually restricted to a single hypothesis which can be easily implemented as a 1-sided or 2-sided test, while the usual quadratic Wald test statistic is more difficult to interpret as a 1-sided test.

Once this gets in, my PR will need some further work as there is some overlap.

josef-pkt · 2013-11-18T17:51:58Z

@bashtage
The difference between t_test and f_test/wald_test is that the first treats each hypothesis (row of the restriction matrix) as separate hypothesis, running several t_test in parallel, while f_test/wald_test treat it as a single joint hypothesis.
(f_test is now obsolete as a special case of wald_test, but I decided, so far, to keep it for backwards compatibility and name recognition.)

t_test can then also be used for multiple testing (but is not connected to the p-value corrections yet)
The vectorized version of t_test was initially just a byproduct of how numpy broadcasting and the linear algebra works, but when I saw it we decided to make a feature out of it.

I still need to get unit tests for cluster-robust covariances, then we should merge your PRs and this one, so we have a common code base again.

josef-pkt · 2013-11-18T17:56:27Z

As a followup to this: I'm starting to like the idea of putting the selection of the robust or non-robust cov_type into the model.fit() and the results.__init__ much better now.

The method in this PR is still useful when we want to switch cov_type without reestimating the model.

bashtage · 2013-11-18T18:00:04Z

I see that now. Might be useful to allow one-sided t-tests eventually, something like type="upper" with a default of 2-sided.

josef-pkt · 2013-11-18T18:08:14Z

Yes, I thought about adding one-sided when I was working on the basic t-test.
I forgot about it, and opened now #1193

josef-pkt · 2013-11-21T21:34:59Z

strange observation needs checking:

with cluster robust standard errors, I match Stata's bse, but for the confidence interval, it looks like Stata uses 9 dof, instead of 197, ivreg2 produces the same result with option small

>>> stats.t.ppf(0.975, 9)
2.2621571627409915

Grunfeld data

. regress invest mvalue kstock, vce(cluster company)

Linear regression                                      Number of obs =     200
                                                       F(  2,     9) =   51.59
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.8124
                                                       Root MSE      =  94.408

                               (Std. Err. adjusted for 10 clusters in company)
------------------------------------------------------------------------------
             |               Robust
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1155622   .0158943     7.27   0.000     .0796067    .1515176
      kstock |   .2306785   .0849671     2.71   0.024     .0384695    .4228874
       _cons |  -42.71437    20.4252    -2.09   0.066    -88.91939    3.490649
------------------------------------------------------------------------------

edit

just saw f-statistic uses df F(2, 9) while non-cluster robust is F(2, 197)
ivreg2 without small reports normal based params table but also has F(2, 9)
ivreg2 without small uses normal, and the jump t(9) to normal is much larger than t(197) to normal

josef-pkt · 2013-11-21T23:49:19Z

regression results summary has use_t=True still hardcoded,
need unit test for summary to catch this

…f_resid

…els#1206

coveralls · 2013-11-24T23:55:17Z

Coverage remained the same when pulling eeeb326 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

…tatsmodels#1206

josef-pkt · 2013-11-25T15:48:25Z

TODO: We should also add 'non-robust' as an option get_robustcov_results, if we allow setting the cov_type in fit(), so we can still get and use the non-robust cov_params if we picked a robust version in `fit'.
It's also easier if we treat all cov the same way.

coveralls · 2013-11-25T16:43:30Z

Coverage remained the same when pulling 2032dc2 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

coveralls · 2013-11-25T17:11:15Z

Coverage remained the same when pulling e8e6775 on josef-pkt:robust_cov into 9d4b1f8 on statsmodels:master.

coveralls · 2013-11-28T18:49:29Z

Coverage remained the same when pulling 4958ed0 on josef-pkt:robust_cov into fb72fe4 on statsmodels:master.

…s#1212

bashtage · 2013-11-28T22:30:01Z

This is looking close. I have been thinking about "nonrobust" as the name for the standard estimator, and an wondering if standard or classic might be better. I was thinking that in (Q)MLE models, the classic estimator is the inverse hessian, while the robust estimator is a sandwich, and so it is unclear what these names should be.

josef-pkt · 2013-11-29T00:04:33Z

Yes, I thought of stopping here and merge this so I can go back to other PR's

I did a lot of reading in the last weeks and have now a much better idea where to go from here, and have opened a large number of new issues.

About "nonrobust":
I liked the name because it signals something. standard or classic requires that we remember the context, and in the Stock Watson undergraduate textbook "standard", i.e. the default, is heteroscedasticity robust.

I didn't see any generally applicable name in Stata for nonrobust/classic. The name is not used much because it's the default without giving any special arguments.

Difficult example for finding names: Poisson

our case is scale=1, full poisson specification, "nonrobust" to any deviation from "Poissoness"
"over-/underdispersed" scale != 1, Poisson heteroscedasticity is correctly specified up to scale, (exponential model), Wooldridge or Cameron, Trivedi called this the standard/usual for GLM
"heteroscedasticity robust" full HC sandwich
other sandwiches that we have available (cluster, ...)

"nonrobust" also sounds a bit negative, which might be good for educational purposes.
"Why do you want to be nonrobust, when you can be robust?"

coveralls · 2013-11-29T02:30:16Z

Coverage remained the same when pulling 847fefe on josef-pkt:robust_cov into fb72fe4 on statsmodels:master.

josef-pkt · 2013-11-29T14:35:05Z

I'm planning to merge this today.

So far it's (almost ?) completely backwards compatible, largely new methods.
Some follow-up refactoring should be more "invasive"

bashtage · 2013-11-29T14:35:50Z

Sounds good. Then I can go back to my PR.

josef-pkt · 2013-11-29T19:12:28Z

merged after rebase in a14d989

josef-pkt added 12 commits October 28, 2013 19:45

TST: getting started with test against Stata

b9a5a8a

REF: wire robust covariances into RegressionResults

4341193

ENH/DOC: robustcov, add cluster (not tested), add to docstring

3dc6f93

ENH: add Warnings to compare_lr_test and compare_f_test if nonrobust cov

e2decc8

REF: partial extension to ContrastResults for t-test f_test, see stat…

81c0320

…smodels#742

ENH: add summary_params_frame to ContrastResult and iolib.summary

8cb5ce8

REF: bring use_t back, partially, used in t_test

9ba43e5

REF: use robustcov in RLM as default, see statsmodels#1164

6117549

REF: add use_t in t_test, add wald_test with use_f

00c7e83

REF: allow title in summary_params, fix typo

efa2e2c

BUG: ContastResults.__str__ infinite recursion, fix f_test REF typo

51c3ebc

BUG: RLM bcov_unscaled was broken by 6117549

87684b7

BUG fix import for python 3

958839b

TST: add generic tests across models (forgotten to commit)

e14b4ee

josef-pkt mentioned this pull request Nov 22, 2013

cluster robust inference, degrees of freedom #1201

Open

josef-pkt added 2 commits November 23, 2013 05:28

REF/ENH: self.use_t in summary, additional robust sandwiches, reset d…

2bcb2a2

…f_resid

BUG: regression robust cov, fix and transmit use_correction

b446745

josef-pkt mentioned this pull request Nov 24, 2013

REF: RegressionResults cov-HCx into cached attributes #1206

Closed

REF: RegressionResults convert HCx to cache_readonly, closes statsmod…

eeeb326

…els#1206

DOC: adjust docstring for cov_HC# and HC#_se for previous commit, see s…

95a6339

…tatsmodels#1206

DOC cov_type version in docstring for get_robustcov_results

2032dc2

DOC: docstring for f_test, wald_test (not DRY, strange changeset)

5d68be0

This was referenced Nov 26, 2013

more panel robust, contemporaneous correlation xtpcse #1207

Open

SUMM: robust covariances: connect to models, default options #1158

Open

ENH HCx robust for WLS, unittest only for hc1

4958ed0

TST: test compare OLS WLS for HCx, fvalues don't match see statsmodel…

2d85374

…s#1212

josef-pkt mentioned this pull request Nov 28, 2013

f_value OLS WLS don't match #1212

Open

TST: fix precision in TestWLSOLSRobustSmall

847fefe

josef-pkt closed this in a14d989 Nov 29, 2013

josef-pkt mentioned this pull request Nov 30, 2013

Compare lr test rebased #1214

Merged

josef-pkt mentioned this pull request Dec 11, 2013

Sandwich mle #1225

Merged

josef-pkt deleted the robust_cov branch December 16, 2013 00:08

josef-pkt mentioned this pull request May 22, 2014

t_test, f_test, model.py for normal instead of t-distribution #50

Closed

josef-pkt mentioned this pull request Jul 8, 2014

BUG/ REF conf_int and use_t #1812

Closed

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge branch 'robust_cov_rebased' closes statsmodels#1189

aaa85ce

josef-pkt mentioned this pull request Dec 17, 2014

ENH ContrastResults #742

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robust covariances: integrate with models, normal based Wald tests #1189

Robust covariances: integrate with models, normal based Wald tests #1189

josef-pkt commented Nov 17, 2013

coveralls commented Nov 17, 2013

josef-pkt commented Nov 17, 2013

josef-pkt commented Nov 17, 2013

coveralls commented Nov 17, 2013

josef-pkt commented Nov 17, 2013

coveralls commented Nov 17, 2013

bashtage commented Nov 18, 2013

josef-pkt commented Nov 18, 2013

josef-pkt commented Nov 18, 2013

bashtage commented Nov 18, 2013

josef-pkt commented Nov 18, 2013

josef-pkt commented Nov 21, 2013

josef-pkt commented Nov 21, 2013

coveralls commented Nov 24, 2013

josef-pkt commented Nov 25, 2013

coveralls commented Nov 25, 2013

coveralls commented Nov 25, 2013

coveralls commented Nov 28, 2013

bashtage commented Nov 28, 2013

josef-pkt commented Nov 29, 2013

coveralls commented Nov 29, 2013

josef-pkt commented Nov 29, 2013

bashtage commented Nov 29, 2013

josef-pkt commented Nov 29, 2013

Robust covariances: integrate with models, normal based Wald tests #1189

Robust covariances: integrate with models, normal based Wald tests #1189

Conversation

josef-pkt commented Nov 17, 2013

coveralls commented Nov 17, 2013

josef-pkt commented Nov 17, 2013

josef-pkt commented Nov 17, 2013

coveralls commented Nov 17, 2013

josef-pkt commented Nov 17, 2013

coveralls commented Nov 17, 2013

bashtage commented Nov 18, 2013

josef-pkt commented Nov 18, 2013

josef-pkt commented Nov 18, 2013

bashtage commented Nov 18, 2013

josef-pkt commented Nov 18, 2013

josef-pkt commented Nov 21, 2013

josef-pkt commented Nov 21, 2013

coveralls commented Nov 24, 2013

josef-pkt commented Nov 25, 2013

coveralls commented Nov 25, 2013

coveralls commented Nov 25, 2013

coveralls commented Nov 28, 2013

bashtage commented Nov 28, 2013

josef-pkt commented Nov 29, 2013

coveralls commented Nov 29, 2013

josef-pkt commented Nov 29, 2013

bashtage commented Nov 29, 2013

josef-pkt commented Nov 29, 2013