New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference between predict() and average_treatment_effect() for calculating CATEs in honest causal_forest #1031
Comments
Hi, |
Hi again @njawadekar, for a recent overview on suggested practices for HTE estimation you can have a look at the August 31 talk on the Online Causal Inference Seminar where there is a tutorial on Estimating heterogeneous treatment effects in R. |
Thanks for sending this information. After watching the video, I still have a follow-up question to double-check whether I am estimating my CATEs correctly. While it is clear to me how to calculate individual-specific CATEs using the predict function, I am still not crystal clear about how to calculate CATE (and corresponding 95% CI) across a group of multiple individuals. For example, let's say that my training dataset contained 100 individuals, 30 of whom were male. If I wanted to calculate the CATE on this specific subgroup of individuals (the 30 males), then I believe I would need to write something like the following steps to calculate the CATE (but please confirm): (1) First, I would write the standard predict function to generate predictions on my entire training dataset, e.g. (2) Next, I would subset my c.pred to only those males, e.g. (3) Finally, I would calculate the average CATE in the subgroup (the 30 males) by doing this: (4) And then to compute the lower and upper 95% CI, I would just manually compute it after calculating the S.D. and sample size (N) for this strata... 95% CI = Mean +/- S.D. / sqrt(N) Does this make sense? |
Hi @njawadekar, |
Thanks @erikcs , and what about if I wanted to compute these CATEs without this doubly robust approach? Are the steps I outlined above (using the predict() function) okay? |
The mean CATE will be very similar, but the 95% CI won't give correct coverage |
Ok, thank you for the insights @erikcs ! |
Last question @erikcs , does the average_treatment_effect function conduct out-of-bag estimation (i.e., only produces estimates on trees for which those particular observations were not used to build that tree), like the predict() function does? I am currently running a simulation study on the causal_forest, to see how close to the truth the causal_forest can get to the truth under different model parameters. However, if the average_treatment_effect does not take an honest approach to estimation, then I'm not sure I should be using this for my analysis. |
Yes, average_treatment_effect uses OOB |
Thanks @erikcs. Related to this, can you also please explain scenarios when we we would want to use the AIPW approach embedded in this average_treatment_effect function, as opposed to the quasi-oracle estimation described by Nie & Wager (e.g. the Y.hat and W.hat arguments in the causal_forest function). I realize that both are doubly robust approaches intended to address unconfoundedness... however, I was wondering if your group has any general guidelines on when either/both should be used when implementing a causal forest. |
causal_forest is GRF + R-learner (Nie/Wager) by default, i.e. Y.hat/W.hat are by default estimated separately. This "orthogonalization" step helps when treatment assignment is confounded, see the last two columns in Table 1 of https://arxiv.org/pdf/1610.01271.pdf to see the empirical performance of causal_forest without/with orthogonalization |
Thanks! So just to clarify, when I calculate the conditional treatment effects using average_treatment_effect on a causal_forest object, these treatment effects have been estimated using two doubly robust approaches (orthogonalization (i.e. R-learner) as well as AIPW)? |
Orthogonalization here refers to the centering step in causal forest (the "R-learner"). Doubly robustness is a "post fitting" correction to give a "better" estimate of an ATE, which is what average_treatment_effect does (section 2.1 in https://arxiv.org/pdf/1902.07409.pdf, briefly: augmented inverse propensity weighting can "cancel out" estimation errors in W.hat and Y.hat) |
I have had the same doubts for a while. However, what does it mean if the CATE estimated using mean(predict(forest)$predictions) and average_treatment_effect(forest) are different? |
@erikcs : When estimating a conditional ATE using the average_treatment_effect function and the subset argument, what are the specific covariates that actually go into building the treatment model as well as the outcome model for constructing the AIPW? By default, does it just include all Xi covariates that originally went into the causal forest? e.g.: average_treatment_effect(cforest, target.sample = "all", subset = (cforest$X[,2] == 1 & cforest$X[,14] == 0))[1] In addition, do the treatment and outcome models used for the orthogonalization to build the causal forest, by default, also just use all of the Xi covariates? |
Yes
Yes, it's the same treatment and outcome model used above |
Hi,
I am writing this in an effort to better understand the predict() and average_treatment_effect() functions, particularly in regards to when one should be utilized over the other to estimate conditional average treatment effects (CATEs) in an honest causal forest. Additional details related to this query are listed below:
(1) Research Goal:
After building an honest causal_forest on my dataset, I would now like to calculate Conditional Average Treatment Effects (CATEs) within specific strata of covariates on the same dataset.
(2) Initial Plan:
Based on this application paper by Athey & Wager, it seems that I should be using the predict() function in order to estimate these CATEs using an "honest" approach. Based on the documentation on predict(), it appears that by default, this function estimates the treatment effects such that these effects are estimated for every observation using only the trees in the forest which did not use that particular observation when it was modeled--so, out-of-bag estimation.
(3) Question:
However, I understand that there is additionally an average_treatment_effect function, which can also supposedly estimate Conditional Average Treatment Effects in a causal forest in a doubly robust fashion. I would like to better understand the differences between these two functions (predict() and average_treatment_effect()), and the different circumstances in which one function should be used over the other to estimate CATEs on data within an honest causal forest. Evidently, the math behind each function differs, as shown in my attached code that I ran on a mock dataset. This attached R code can be used to reproduce the very different conditional average treatment effects that I calculated for a specific subset of individuals when I used the predict() approach vs. average_treatment_effect().
In addition to explaining which function is better for calculating conditional average treatment effects in various circumstances, could someone also please explain in layman's terms what each function is doing behind the scenes? For example, is the average_treatment_effect function using all of the trees to estimate the treatment effects (and not out-of-bag?)? Also - how are propensity scores utilized for the average_treatment_effect function?
Thanks!
Steps to reproduce
Please find the attached code.
cates_cf.txt
GRF version
2.0.2
The text was updated successfully, but these errors were encountered: