-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustered SE different from that produced by reghdfe #1
Comments
Can you please provide a reproducible example with both the R and stata code and output? |
The data is private to the professor I've done work for. I can ask if I can share it for the purpose of this ticket. Do you have an email address I can send the data to if the prof says yes? |
Here's an example with very slight differences. R library(lfe)
library(nycflights13)
library(haven)
# Write out for Stata in the next block.
haven::write_dta(nycflights13::flights, "nycflights.dta")
lfe::felm(arr_delay ~ distance | tailnum + origin | 0 | tailnum, data = nycflights13::flights)$cse Stata // ssc install reghdfe
use nycflights // created by the R block above
encode tailnum, gen(tailnum_n)
encode origin, gen(origin_n)
reghdfe arr_delay distance, absorb(tailnum_n origin_n) vce(cluster tailnum_n) cformat(%9.8f)
|
I think @karldw is correct about the discrepancy being due to the treatment of the degrees-of-freedom adjustment. Gormley and Matsa (RFS 2014) describe the difference in the last section, "Stata programs that can be used to estimate models with multiple high-dimensional FE". They also include a description on how to manually adjust the standard errors. It might be useful to include an option in |
Here's a more precise take on the issue above. The Cameron and Miller PDF has more details on the degrees-of-freedom correction. Below a contrived example where the degrees-of-freedom calculation changes the significance level of the result (assuming you're particularly interested in a null hypothesis significance test at 5%). R with lfe set.seed(0)
# Generate 500 FE levels, with 5 obs each, nested within 50 clusters
n_clusters <- 50
n_obs_per_fe_level <- 5
n_fe_levels <- 500
n_obs <- n_obs_per_fe_level * n_fe_levels
n_fe_per_cluster <- n_fe_levels / n_clusters
n_obs_per_cluster <- n_obs / n_clusters
# Make 500 FE levels
fe <- sort(rep(1:n_fe_levels, length.out = n_obs))
# Make clusters that nest FE levels
cl <- ceiling(fe / n_fe_per_cluster)
draw_one_cluster <- function(N) {
# Each cluster has its own mean and se for epsilon.
cluster_eps_mean <- rnorm(1)
cluster_eps_sd <- 10 * runif(1)
rnorm(N, mean=cluster_eps_mean, sd = cluster_eps_sd)
}
eps <- unlist(
replicate(n_clusters, draw_one_cluster(n_obs_per_cluster), simplify = FALSE)
)
x <- rnorm(n_obs, 0, 10)
y <- 0.02 * x + fe / 100 + eps
DF <- data.frame(cl, y, x, fe)
haven::write_dta(DF, "~/test_data.dta")
print(broom::tidy(lfe::felm(y ~ x | fe | 0 | cl, data=DF)))
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> x 0.0284 0.0148 1.91 0.0562
# Edit (2019-08-27): I previously had these values:
#> x 0.00936 0.0148 0.630 0.529
Stata with reghdfe use "~/test_data", clear
reghdfe y x, absorb(fe) vce(cluster cl)
/*
(MWFE estimator converged in 1 iterations)
HDFE Linear regression Number of obs = 2,500
Absorbing 1 HDFE group F( 1, 49) = 4.56
Statistics robust to heteroskedasticity Prob > F = 0.0377
R-squared = 0.2373
Adj R-squared = 0.0466
Within R-sq. = 0.0018
Number of clusters (cl) = 50 Root MSE = 6.4882
(Std. Err. adjusted for 50 clusters in cl)
------------------------------------------------------------------------------
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0283565 .0132782 2.14 0.038 .001673 .0550401
_cons | 2.372491 .0022073 1074.83 0.000 2.368056 2.376927
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
fe | 500 500 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
Edit (2019-08-27): I previously had these values:
x | .0283565 .0132782 2.14 0.038 .001673 .0550401
_cons | 2.372491 .0022073 1074.83 0.000 2.368056 2.376927
Edit (2019-09-12): The "update" was incorrect, based on an old version of reghdfe.
Those values were:
x | .0283565 .0132755 2.14 0.038 .0016783 .0550347
*/
|
FWIW, this clusters-nested-within-FE issue was recently addressed in Julia's analogous FixedEffectModels.jl package. See discussion and PRs here: https://github.com/matthieugomez/FixedEffectModels.jl/pull/48 I imagine everyone agrees that consistency between all of these packages is desirable. However, as Cameron Gelbach & Miller (2011; section 2.3) make clear in their original article, there are several (potentially?) valid bias correction measures when it comes to nested and multiway clusters. A note along those lines in the package documentation would be probably be helpful for users. @karldw are you still thinking of putting in a PR? |
@grantmcdermott, I'm up for writing a PR, but wanted to hear back from @sgaure before sinking a bunch of time into it. |
@grantmcdermott @karldw @weilu Is it possible that the discrepancy is on Stata's side? Following @karldw's simulation, I found the same standard errors from # Add magrittr
library(magrittr)
# With lfe::felm()
lfe::felm(
y ~ x | fe | 0 | cl, data=DF
) %>% summary() %>% coef()
# Estimate Cluster s.e. t value Pr(>|t|)
# x 0.02835652 0.01484326 1.910397 0.05622501
# With estimatr::lm_robust()
estimatr::lm_robust(
y ~ x, fixed_effects = ~ fe, clusters = cl, data = DF, se_type = "stata"
) %>% summary() %>% coef()
# Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
# x 0.02835652 0.01484326 1.910397 0.06194321 -0.001472122 0.05818516 49 Neither set of estimates matches Stata's output (from @karldw's estimates). Has anyone run the same analysis in Julia? I guess it is still worth noting that Finally, there are two other options that might cause differences between Stata and |
@edrubin Here's the Julia version. Everything is the same as Stata... except, bizarrely enough, the 95% confidence intervals [UPDATE: and p-values]. (The plot thickens!) #= Skipping the RCall.jl part where I replicate @karldw's simulation and
then transfer the DF object to Julia...
=#
using DataFrames, FixedEffectModels
DF.fe = categorical(DF.fe)
DF.cl = categorical(DF.cl)
reg(DF, @model(y ~ x, fe = fe, vcov = cluster(cl)))
#=
Fixed Effect Model
=============================================================
Number of obs: 2500 Degrees of freedom: 2
R2: 0.237 R2 within: 0.002
F-Statistic: 4.56065 p-value: 0.038
Iterations: 1 Converged: true
=============================================================
Estimate Std.Error t value Pr(>|t|) Lower 95% Upper 95%
-------------------------------------------------------------
x 0.0283565 0.0132782 2.13557 0.033 0.0023191 0.0543939
=============================================================
=# |
FWIW, I'm not surprised a priori that A relevant section from Cameron and Miller (2015):
I'm rushing this morning so I may well be missing something, but I wanted to flag this before I head out. PS -- Does anyone know what the constant ( |
Thanks, @grantmcdermott! The p-values differ too. Stata and Julia are using different degrees of freedom (when calculating p-values and confidence intervals). In calculating the p-value and confidence interval, So I guess any attempt to align the estimates needs to keep in mind the DF used in inference—in addition to the DF adjustment for the CRVE. I'm probably getting off track from the original question/issue. |
Good catch @edrubin! I'll let the FixedEffectModels.jl package maintainer know about the CI discrepancy in a separate issue. Out of interest, I just ran the default estimatr::lm_robust(y ~ x, fixed_effects = ~ fe, clusters = cl, data = DF)
# Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
# x 0.02835652 0.0132763 2.135875 0.0379786 0.001641456 0.05507158 46.56475 I don't want to get too far off track here with other packages, except to say I'm pretty confident |
I guess the question for @sgaure is whether |
@grantmcdermott and @edrubin, I just ran my code again and came up with different results. See the updated numbers above. (This resolves the questions about |
@karldw Thanks for the update. I probably should have double checked too. I guess it's (yet another) good argument in favour of using the reprex package. In other news, I've got a bunch of things on my plate, but I'm up for helping on a PR as much as I can if you want to distribute the workload. At least, until we hear back from Simen... |
Okay, I finally sat down and wrote a simple correction myself, using the Gormley and Matsa approach already mentioned above. I can confirm that this now produces the correct SEs using @karldw's simulation example. However, I'd like for other folks to kick the tyres with some more complicated examples before I submit a PR. (I haven't tested with multiple clusters, for example.) Here's the relevant fork: https://github.com/grantmcdermott/lfe/tree/clusternest Of course, I'm happy to accept any PRs on this branch if you spot any mistakes. PS. Here's a reprex using my fork. As you can see, the standard errors, t-stats, p-values, and CIs now all match the Stata output from earlier. devtools::load_all(path = "~/Documents/Projects/lfe") ## Change to path where you cloned the repo
#> Loading lfe
#> Loading required package: Matrix
set.seed(0)
# Generate 500 FE levels, with 5 obs each, nested within 50 clusters
n_clusters <- 50
n_obs_per_fe_level <- 5
n_fe_levels <- 500
n_obs <- n_obs_per_fe_level * n_fe_levels
n_fe_per_cluster <- n_fe_levels / n_clusters
n_obs_per_cluster <- n_obs / n_clusters
# Make 500 FE levels
fe <- sort(rep(1:n_fe_levels, length.out = n_obs))
# Make clusters that nest FE levels
cl <- ceiling(fe / n_fe_per_cluster)
draw_one_cluster <- function(N) {
# Each cluster has its own mean and se for epsilon.
cluster_eps_mean <- rnorm(1)
cluster_eps_sd <- 10 * runif(1)
rnorm(N, mean=cluster_eps_mean, sd = cluster_eps_sd)
}
eps <- unlist(
replicate(n_clusters, draw_one_cluster(n_obs_per_cluster), simplify = FALSE)
)
x <- rnorm(n_obs, 0, 10)
y <- 0.02 * x + fe / 100 + eps
DF <- data.frame(cl, y, x, fe)
# haven::write_dta(DF, "~/test_data.dta")
mod = lfe::felm(y ~ x | fe | 0 | cl, data=DF)
summary(mod)
#>
#> Call:
#> lfe::felm(formula = y ~ x | fe | 0 | cl, data = DF)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -23.7582 -2.6064 -0.0222 2.5281 24.1545
#>
#> Coefficients:
#> Estimate Cluster s.e. t value Pr(>|t|)
#> x 0.02836 0.01328 2.136 0.0377 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 6.488 on 1999 degrees of freedom
#> Multiple R-squared(full model): 0.2373 Adjusted R-squared: 0.04657
#> Multiple R-squared(proj model): 0.001831 Adjusted R-squared: -0.2478
#> F-statistic(full model, *iid*):1.244 on 500 and 1999 DF, p-value: 0.0007646
#> F-statistic(proj model): 4.561 on 1 and 49 DF, p-value: 0.03774
confint(mod)
#> 2.5 % 97.5 %
#> x 0.001672966 0.05504007 Created on 2019-09-05 by the reprex package (v0.3.0) |
Thanks for doing this. Haven't had the chance to really put it through its paces yet, but looking at the output for @karldw's simulation led to a very minor question. At line 682: Why the minus one in the denominator? Shouldn't the adjustment just add back in the absorbed df for the nested fe, i.e., It's obscured by rounding, but I think the extra -1 leads to the SEs differing ever so slightly from the reghdfe output @karldw posted (reghdfe: .0132755 vs. updated felm: 0.0132782), which also propagates to the CIs. Without the -1 they should match. |
Hey @grantmcdermott, a couple of super small things below. I'll try this out with more interesting examples next week.
fe_cl_grid <- expand.grid(fe_k=seq_along(z$fe), cl_g=seq_along(cluster))
any_nested <- vapply(
seq_len(nrow(fc_cl_grid)), function(n) {
fe_k <- fe_cl_grid$fe_k[n]
cl_g <- fe_cl_grid$cl_g[n]
is_nested(z$fe[[fe_k]], cluster[[cl_g]])
},
FUN.VALUE = logical(1)
)
if (any(any_nested)) {
# Will use the simple correction proposed by Gormley and Matsa. See:
# https://www.kellogg.northwestern.edu/faculty/matsa/htm/fe.htm
dfadj <- dfadj * z$df / (z$df + totvar - 1)
# In addition to the above, matching Stata's output in the case of
# nested clusters also requires that the regressor p-values are
# calculated according to the Wald Test "df2" degrees of freedom; i.e.
# reduce the degrees of freedom to the number of clusters-1
z$df <- min(nlevels(z$clustervar[[1]]) - 1L, z$df)
z$df.residual <- z$df
} |
@zweiven Don't you still lose one degree of freedom with the nested case? At least, that's how I read the Gormley and Matsa correction. (It's a little bit confusing because of the way FWIW, I did a sanity check by comparing my results locally with Stata and Julia output... But here's where things get weird because I still get @karldw's original Stata output (e.g. SE of 0.0132782 instead of the updated value 0.0132755). Karl, can you double check to make sure the update that you mention here wasn't in error?? FWIW, I'm running Stata 13 with reghdfe version 5.7.2 (the latest AFAIK). |
Thanks, those suggestions look good to me. (Ugh, I had meant to fully swap out -- In meantime, I've also realised that the "df2" calculation here is not robust to a change in cluster variable order (i.e. when there is more than one of these), since currently it just goes by whatever is the first object. Following GGM (2011, sec 2.3), I think I'll change this to default to whatever is the minimum-length cluster variable. Actually... come to think of it, I stole this bit of "df2" code from elsewhere in the lfe package (here to be specific). I wonder, then, whether this is another source of discrepancy between felm::lfe and reghdfe in cases where there is more than one cluster? |
Thanks @grantmcdermott. I'll have to defer to you and others with Stata as to its output. Also I realize this is immaterial for many datasets -- I don't mean to waste your time. I agree with your read of the Gormley and Matsa footnote. My question was motivated by my perhaps incorrect interpretation of Cameron and Miller (section IIIB, bottom of page 330 to middle of 331), which seems to suggest that the total number of groups is to be added back, at least in the simple case with a single (e.g. unit-level) fixed effect. If I'm interpreting both sources correctly, it's possible their recommendations differ because of the model formulation: G&M include an intercept--and I assume drop one of the FE levels rather than the intercept--while C&M omit the intercept and keep all fixed effects. I think felm is doing the latter, while Stata (for xtreg, at least) seems to keep the intercept and impose a mean constraint on (rather than drop one of) the fixed effects. Fun times? |
Not at all, @zweiven. I think catching and documenting these minor discrepancies is really valuable. Personally, I get unreasonably frustrating when I can't reconcile results exactly when trying to reproduce some analysis that was run in another language. Or, the same language for that matter... Speaking of which, I think you could well be right about the intercept issue. My suitably lazy heuristic at the moment is to match the estimates across platforms and ask question later (or, never!). |
Okay everyone, I think we're just about there. Yesterday, I was playing around with multiway cluster cases and managed to plug what I believe is the only remaining gap between DetailsCGM (2011, Section 2.3) describe two possible small cluster corrections that are relevant to multiway clustering.
Both approaches are described as valid and they will obviously yield exactly the same results when there is only a single cluster. Still, CGM2011 adopt the first approach in their paper and their companion cgmreg routine. This is also the approach that (As an aside, a quick search suggests that this discrepancy has been a point of confusion for Stata users too. See here and here , for example). So, my proposed solution is to simply add a new "reghdfe" option to felm's (silent) SummaryHere's my summary of everything this GH thread has uncovered thus far:
There are some minor changes that still need to be made before I issue a PR (e.g. add the new cmethod to the documentation). However, I think we're pretty much there from the perspective of reconciling any differences between ExampleAs promised, here is a simple multiway cluster example comparing Stata (for reference)First cgmreg . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
. // Need to download and install cmreg ado manually.
. // See: http://cameron.econ.ucdavis.edu/research/cgmreg.ado
. qui cgmreg ln_wage age i.race, cluster(idcode year)
. preserve
. // ssc install regsave
. qui regsave age, tstat pval ci
. li
+--------------------------------------------------------------------------------------+
| var coef stderr tstat pval ci_lower ci_upper N r2 |
|--------------------------------------------------------------------------------------|
1. | age .0196731 .0014538 13.53187 0 .0168237 .0225226 28510 .0946353 |
+--------------------------------------------------------------------------------------+
. restore
Then reghdfe . qui reghdfe ln_wage age, absorb(race) cluster(idcode year)
. preserve
. qui regsave age, tstat pval ci
. li
+------------------------------------------------------------------------------------------+
| var coef stderr tstat pval ci_lower ci_upper N r2 |
|------------------------------------------------------------------------------------------|
1. | age .0196731 .0014594 13.48067 2.07e-09 .0165431 .0228032 28510 .0946353 |
+------------------------------------------------------------------------------------------+
. restore
R felm()Reminder: Clone my fork and switch to the clusternest branch to run the below.. devtools::load_all(path = "~/Documents/Projects/lfe") ## Change to path where you cloned the repo
#> Loading lfe
#> Loading required package: Matrix
nlswork = haven::read_dta("http://www.stata-press.com/data/r14/nlswork.dta")
## Vanilla felm. Will give same results as cgmreg in Stata
est1 =
felm(
ln_wage ~ age | race | 0 | idcode + year,
data = nlswork
)
data.frame(broom::tidy(est1, conf.int=T))
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 age 0.01967314 0.001453812 13.53211 1.359083e-41 0.0168236 0.02252268
## New cmethod="reghdfe" argument if you want same results as reghdfe
est2 =
felm(
ln_wage ~ age | race | 0 | idcode + year,
data = nlswork,
cmethod="reghdfe" ## Added
)
data.frame(broom::tidy(est2, conf.int=T))
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 age 0.01967314 0.001459359 13.48067 2.072479e-09 0.01654312 0.02280315 Created on 2019-09-11 by the reprex package (v0.3.0) |
Hi folks, Reghdfe author here; thanks for all the work in ensuring consistency between implementations. I'm compiling a few thoughts below that might be useful. Why might SEs differI agree with Grant that there are several things that can lead to different results. In fact, I think there's a fourth way of ending up with different SEs: whether or not singletons are dropped. In summary, the number of observations in a regression So in total these are the five things to keep in mind if we want to have the same SEs:
How to pin down why results differWhen I was writing reghdfe I tried to always compare it against other estimators, so there are ways to turn most of these options off, and also to see exactly how the DoF are computed. For instance sysuse auto
reghdfe price weight length, a(turn trunk) verbose(1) vce(cluster turn) Here, we include a bit of intermediate output due to the
And from the standard output, you can see that there are 9 singleton observations, and that turn is "nested within cluster". We can turn these off: sysuse auto
reghdfe price weight length, a(turn trunk) verbose(1) vce(cluster turn) keepsingletons dof(none) Now we have:
So we have 74 obs instead of 64, and 37 parameters instead of 15. The dof() option also allows you to turn off only some components, but it gets a bit complex. One final note in terms of software: if you use cgmreg for benchmarking, note that they actually compute K-1 instead of K, so in the test folder of reghdfe I have a fixed version (see the diff here. Cheers, |
Finally, I think a broader and still not fully solved problem is that when you have heteroskedasticity and many fixed effects (K/N does not go to zero asymptotically), inference is biased even asymptotically. This problem is a generalization of Stock and Watson, "Heteroskedasticity-robust standard errors for fixed-effects panel-data regression," Econometrica 76 (2008): 155-174, AFAIK cannot be solved by the usual methods (wild bootstrap, jacknife, clustering) and is covered by several recent papers such as:
The problem of course is that none of the solutions that I have looked up (from these papers) appear to be feasible (from a computational point of view):
So, to recap, there's a problem with inference that kinda makes our discussion on degrees-of-freedom to seem trivial. At the same time, there's no clear solution to this problem, so we are in a tough spot, and any suggestions would be welcome. PS: The papers linked above that discuss this are actually extremely good (if Valentin or any other author reads this, this is not at all a rant on the papers, but on just how difficult is it to come with a practical solution to the problem). |
Thanks @sergiocorreia, that's really helpful background and context. As an aside: A huge proportion of the applied economics seminars/papers that I see rely on either |
Thanks Grant, at least in my case it was not really intended but started as a by-product of more concrete needs. BTW, one thing where the Stata and R communities have more in common than you would have thought, compared to e.g. Julia and Python, is that projects in the latter are more colaborative. For instance, I think all commits to reghdfe have been from me, and most commits to lfe have been from Simen (compare with e.g. Julia's GLM). |
This is an interesting discussion! Do you have any insights on which degrees of freedom to use for the robust F test in case of multi-way clustering? (See also the question on Cross Validated: https://stats.stackexchange.com/questions/229678/double-clustered-standard-errors-and-degrees-of-freedom-in-wald-style-f-test-for) |
Great work here folks. Just adding library(lfe)
#> Loading required package: Matrix
library(multiwayvcov)
nlswork = haven::read_dta("http://www.stata-press.com/data/r14/nlswork.dta")
## Vanilla felm. Will give same results as cgmreg in Stata
est1 =
felm(
ln_wage ~ age | race | 0 | idcode + year,
data = nlswork
)
data.frame(broom::tidy(est1, conf.int=T))
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 age 0.01967314 0.001453812 13.53211 1.359083e-41 0.0168236 0.02252268
## New cmethod="reghdfe" argument if you want same results as reghdfe
est2 =
felm(
ln_wage ~ age | race | 0 | idcode + year,
data = nlswork,
cmethod="reghdfe" ## Added
)
data.frame(broom::tidy(est2, conf.int=T))
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 age 0.01967314 0.001459359 13.48067 2.072479e-09 0.01654312 0.02280315
est3 =
lm(
ln_wage ~ age + factor(race),
data = nlswork,
)
# The multiwayvcov standard error here matches est1 i.e. cmethod = "cgm"
sqrt(multiwayvcov::cluster.vcov(model = est3, cluster = nlswork[, c("idcode", "year")])[2,2])
#> [1] 0.001453812 Created on 2020-01-09 by the reprex package (v0.3.0) |
My model is something like
felm(y ~ x1 + x2 | prefecture + province_year | 0 | province, data, weights=data$population)
, which was translated from Stata code:reghdfe y x1 x2 [aw=population], absorb(prefect province_year) cluster(province)
The point estimates are identical, but the clustered SE are quite different between R and Stata. I've manually removed the singletons from the data so the number of observations matches that reported by Stata, but the resulting clustered SE is still higher than what's reported by reghdfe. I suspected that the higher SEs are caused by fixed effects swept away by the within-group transformation are nested within clusters. I've tried various DoF adjustments I can find on google, like this and this, but none produced matching clustered SEs. I started reading your implementation then I found this comment in the code:
Could you shine some lights on what caused the difference and how I can adjust for it?
The text was updated successfully, but these errors were encountered: