-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add CRV3 Inference via the Cluster Jackknife to OLS and WLS #8461 #8596
base: main
Are you sure you want to change the base?
Conversation
I'm soon going skiing for a few days. Will look at the details in January. I think unit tests for OLS, WLS should go into statsmodels.regression.tests.test_robustcov In statsmodels.regression.tests.test_robustcov there are test classes with names that end in The integration into the current sandwich cov look good (base on skimming the changes). Thanks for the PR The current unit tests fail in tests unrelated to this PR |
I merged the disabling of the two failing unit tests. If you rebase on |
Thanks for your feedback, @josef-pkt !
Enjoy your skiing vacation! Best, Alex |
style check failures are mostly trailing whitespace, and |
Do you have the Stata package summclust to write unit tests against? I read partway through their working paper, but not enough yet to understand the details. Aside: GEE has an option for bias reduced cluster robust standard errors. OLS is a special case of GEE and should crv should have similar standard errors. an old script of mine |
Yes, I should still have access to stata (unless my license has expired), so I can prepare tests against summclust and HC3 errors produced by statsmodels. I'll also take a look at the GEE implementation & script. My goal is to do most of this by the end of the week. |
When you have unit tests, then I will merge this without going through the math and theory. There is too much and too many articles to read for me to go over it now. question: |
Hi @josef-pkt - sorry for taking so long to get back to you about this PR. Early January I was traveling, then there was more work to do than expected, and outside of work, I had to fix/push some things with my R projects. Indeed there are two variants of the cluster jackknife discussed in MNW, the difference is in whether one should use the "mean" of the jackknifed regression coefficients, or the sample estimate in one point of the vcov formula. What is implemented in the PR is the "regression coefficient" approach, which is equal to the HC3 estimator when each cluster is a singleton. I will add an option for the mean version as well, as it is easy to do. Regarding tests - I have to admit that I struggle a little bit with the tests in Regarding all the math in MNW - it is indeed a lot of linear algebra. But what I effectively implement is quite straightforward - equation (31) (and potentially 30) in their summclust paper. To speed things up, I could also create a separate test file for the new vcov estimator, which you could then refactor into the appropriate test file? But I'd also be happy to do this myself, but would need some guidance =) Best, Alex |
no problem with the delay. I was busy with work for our release that is already late, (and got distracted by other topics). about the unit tests I can help with this, and look into it if you have the R code and statsmodels code that should produce equivalent results. |
Hi @josef-pkt, I have finally added support for the second jackknife type in MNW and added a new option, |
We have an ado file that exports the model data. I will look for my latest version of #1716 and an example. |
The version that I used last is now in #8691 I don't think we have a proper helpfile or docstring for it. |
I have added a first set of tests for OLS and oneway-clustering, which both pass on my laptop. I test the CRV3 estimator against the HC3 estimator when all clusters are singleton, and against my R implementation, but using the Stata schema (unfortunately my Stata license has expired. The R version is tested against Stata here). Let me know if you'd prefer that I integrate the new test into the old test structure. I will add tests for WLS and two-way clustering over the weekend. |
Hi @josef-pkt, I think this PR is now ready for review. There are two things potentially lacking: tests for two-way clustering (as I have not yet implemented this in my R package) and a cleanup of the test files. Plus of course anything else that you will spot =) |
Thanks, Did you try to compare it with the GEE version? |
style check fails, needs pep-8 corrections |
I have not yet compared with the GEE version. I'll try to take a closer look tomorrow. So far, the statsmodels implementation is tested against my R package (which is again tested against the Stata package + the cluster jackknife in |
comparison with GEE is more for reference, that is, I might have to answer question about differences, or compare those for extension of CRV3 to GLM. |
Hi @josef-pkt , I have some free time in the next week (vacation coming up). Would you be so kind to retrigger the CI workflow so that I can see what I still need to fix? I vaguely recall that it was mostly style checks that were failing? |
I have not found a way to restart an old, deleted testrun on Azure. I'm finally done with 0.14 (after floating around for 3 months trying to figure out topics for 0.15) |
Ok, great, I will then push an "empty" commit =) |
looks like style failures are all in the results module (one unrelated test failure) |
Only took me four attempts, but looks like I managed to fix all style checks! |
I'm running some examples from the test cases to try out the PR crv3 without correction is the same as GEE biasreduced cov_type
|
Let me know if I can be of any help, if still required =) |
Hi all!
This PR closes #8461.
It adds support for CRV3 cluster robust inference for OLS and WLS via a cluster jackknife.
In short, the PR adds a new
cov_type
,cluster-jackknife
, which can be run in the following way:I have updated the docstring in
linear_model.py
.I have not yet added tests, as I'd first like to get feedback on where and how to test the new
cov_type
(and I also have to get up to speed withpytest
) . There are multiple strategies how to test the new feature:Below is a small example of the ladder test, using the new
cluster-jackknife
covtype andHC3
inference:Let me know which types of tests you'd like to see =)