Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between predict() and average_treatment_effect() for calculating CATEs in honest causal_forest #1031

Closed
njawadekar opened this issue Aug 10, 2021 · 16 comments

Comments

@njawadekar
Copy link

njawadekar commented Aug 10, 2021

Hi,
I am writing this in an effort to better understand the predict() and average_treatment_effect() functions, particularly in regards to when one should be utilized over the other to estimate conditional average treatment effects (CATEs) in an honest causal forest. Additional details related to this query are listed below:

(1) Research Goal:
After building an honest causal_forest on my dataset, I would now like to calculate Conditional Average Treatment Effects (CATEs) within specific strata of covariates on the same dataset.

(2) Initial Plan:
Based on this application paper by Athey & Wager, it seems that I should be using the predict() function in order to estimate these CATEs using an "honest" approach. Based on the documentation on predict(), it appears that by default, this function estimates the treatment effects such that these effects are estimated for every observation using only the trees in the forest which did not use that particular observation when it was modeled--so, out-of-bag estimation.

(3) Question:
However, I understand that there is additionally an average_treatment_effect function, which can also supposedly estimate Conditional Average Treatment Effects in a causal forest in a doubly robust fashion. I would like to better understand the differences between these two functions (predict() and average_treatment_effect()), and the different circumstances in which one function should be used over the other to estimate CATEs on data within an honest causal forest. Evidently, the math behind each function differs, as shown in my attached code that I ran on a mock dataset. This attached R code can be used to reproduce the very different conditional average treatment effects that I calculated for a specific subset of individuals when I used the predict() approach vs. average_treatment_effect().

In addition to explaining which function is better for calculating conditional average treatment effects in various circumstances, could someone also please explain in layman's terms what each function is doing behind the scenes? For example, is the average_treatment_effect function using all of the trees to estimate the treatment effects (and not out-of-bag?)? Also - how are propensity scores utilized for the average_treatment_effect function?

Thanks!

Steps to reproduce
Please find the attached code.
cates_cf.txt

GRF version
2.0.2

@erikcs
Copy link
Member

erikcs commented Aug 14, 2021

Hi, predict(causal.forest) gives estimates of E[Y(1) - Y(0)|X=x], average_treatment_effect gives an estimate (based on augmented inverse probability weighting, (8) in https://arxiv.org/pdf/1902.07409.pdf) of E[Y(1) - Y(0)].

@erikcs
Copy link
Member

erikcs commented Sep 1, 2021

Hi again @njawadekar, for a recent overview on suggested practices for HTE estimation you can have a look at the August 31 talk on the Online Causal Inference Seminar where there is a tutorial on Estimating heterogeneous treatment effects in R.

@erikcs erikcs closed this as completed Sep 1, 2021
@njawadekar
Copy link
Author

Thanks for sending this information.

After watching the video, I still have a follow-up question to double-check whether I am estimating my CATEs correctly. While it is clear to me how to calculate individual-specific CATEs using the predict function, I am still not crystal clear about how to calculate CATE (and corresponding 95% CI) across a group of multiple individuals. For example, let's say that my training dataset contained 100 individuals, 30 of whom were male. If I wanted to calculate the CATE on this specific subgroup of individuals (the 30 males), then I believe I would need to write something like the following steps to calculate the CATE (but please confirm):

(1) First, I would write the standard predict function to generate predictions on my entire training dataset, e.g.
c.pred <- predict(object, newdata = NULL)

(2) Next, I would subset my c.pred to only those males, e.g.
males_strata <- c.pred %>% select(gender = "M")

(3) Finally, I would calculate the average CATE in the subgroup (the 30 males) by doing this:
mean(males_strata$predictions)

(4) And then to compute the lower and upper 95% CI, I would just manually compute it after calculating the S.D. and sample size (N) for this strata... 95% CI = Mean +/- S.D. / sqrt(N)

Does this make sense?

@erikcs
Copy link
Member

erikcs commented Oct 11, 2021

Hi @njawadekar, average_treatment_effect(forest, subset = gender == "M") (doc) will give you a doubly robust estimate of the ATE in that subgroup along with std errors

@njawadekar
Copy link
Author

Thanks @erikcs , and what about if I wanted to compute these CATEs without this doubly robust approach? Are the steps I outlined above (using the predict() function) okay?

@erikcs
Copy link
Member

erikcs commented Oct 12, 2021

The mean CATE will be very similar, but the 95% CI won't give correct coverage

@njawadekar
Copy link
Author

Ok, thank you for the insights @erikcs !

@njawadekar
Copy link
Author

Last question @erikcs , does the average_treatment_effect function conduct out-of-bag estimation (i.e., only produces estimates on trees for which those particular observations were not used to build that tree), like the predict() function does? I am currently running a simulation study on the causal_forest, to see how close to the truth the causal_forest can get to the truth under different model parameters. However, if the average_treatment_effect does not take an honest approach to estimation, then I'm not sure I should be using this for my analysis.

@erikcs
Copy link
Member

erikcs commented Oct 13, 2021

Yes, average_treatment_effect uses OOB

@njawadekar
Copy link
Author

Thanks @erikcs. Related to this, can you also please explain scenarios when we we would want to use the AIPW approach embedded in this average_treatment_effect function, as opposed to the quasi-oracle estimation described by Nie & Wager (e.g. the Y.hat and W.hat arguments in the causal_forest function). I realize that both are doubly robust approaches intended to address unconfoundedness... however, I was wondering if your group has any general guidelines on when either/both should be used when implementing a causal forest.

@erikcs
Copy link
Member

erikcs commented Oct 13, 2021

causal_forest is GRF + R-learner (Nie/Wager) by default, i.e. Y.hat/W.hat are by default estimated separately. This "orthogonalization" step helps when treatment assignment is confounded, see the last two columns in Table 1 of https://arxiv.org/pdf/1610.01271.pdf to see the empirical performance of causal_forest without/with orthogonalization

@njawadekar
Copy link
Author

Thanks! So just to clarify, when I calculate the conditional treatment effects using average_treatment_effect on a causal_forest object, these treatment effects have been estimated using two doubly robust approaches (orthogonalization (i.e. R-learner) as well as AIPW)?

@erikcs
Copy link
Member

erikcs commented Oct 13, 2021

Orthogonalization here refers to the centering step in causal forest (the "R-learner"). Doubly robustness is a "post fitting" correction to give a "better" estimate of an ATE, which is what average_treatment_effect does (section 2.1 in https://arxiv.org/pdf/1902.07409.pdf, briefly: augmented inverse propensity weighting can "cancel out" estimation errors in W.hat and Y.hat)

@katianak
Copy link

I have had the same doubts for a while. However, what does it mean if the CATE estimated using mean(predict(forest)$predictions) and average_treatment_effect(forest) are different?

@njawadekar
Copy link
Author

njawadekar commented Oct 18, 2021

@erikcs : When estimating a conditional ATE using the average_treatment_effect function and the subset argument, what are the specific covariates that actually go into building the treatment model as well as the outcome model for constructing the AIPW? By default, does it just include all Xi covariates that originally went into the causal forest?

e.g.: average_treatment_effect(cforest, target.sample = "all", subset = (cforest$X[,2] == 1 & cforest$X[,14] == 0))[1]

In addition, do the treatment and outcome models used for the orthogonalization to build the causal forest, by default, also just use all of the Xi covariates?

@erikcs
Copy link
Member

erikcs commented Oct 18, 2021

by default, does it just include all Xi covariates that originally went into the causal forest?

Yes

In addition, do the treatment and outcome models used for the orthogonalization to build the causal forest, by default, also just use all of the Xi covariates?

Yes, it's the same treatment and outcome model used above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants