New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using causal_forest to estimate average treatment effects over subgroups #238
Comments
Hi Mark, I have the same question as you. The best answer that I've found was published by a group at Mt. Sinai Medical School using a neutral randomized controlled trial of a weight loss intervention in patients with type 2 diabetes mellitus. To cut to the chase, their analysis led them to conclude that the neutral overall finding masked heterogeneous treatment effects within the overall cohort and that those heterogeneous treatment effects were driven by two parameters (hemoglobin A1c and self-care ability). The reference is Baum A, et al. Lancet Diabetes Endocrinol 2017;5:808-15. I'd be interested to learn whether you've made any progress on this question on your own. Please let me know. Leo |
Mark, excellent questions. Wondering the same things. Especially,
and
|
@lbuckley13 @adeldaoud what I have been doing so far is "aggregating up" to get effects conditional on one variable (e.g., gender = male, race = Black, etc.) To do that, I've been doing the following code. First, I'll recreate the data and model above and then show how I aggregate up:
The resulting
And the plot looks like: The only thing I'm wondering is: are the confidence intervals too wide, since they are based on individual-level estimates? Also, this does not get me closer to answering the question about how to find if it is warranted to look at a subgroup effect (sort of akin to telling me if the interaction is significant or not). But I thought I'd share it with you both in hopes that we can move it forward. |
@markhwhiteii thank you for the very detailed post. This type of subgroup analysis seems like a very useful thing to be able to do. In terms of the CIs, the short answer is that, yes, as you suspect, the intervals are too long. As an extreme example of this, if you wanted to estimate the average treatment effect for everyone, the function One solution would be to extend the function for ATEs, so that it would allow focusing on a subgroup, allowing for syntax something like this: average_treatment_effect(mod, subset=(X$race == "black")) Would something like this be helpful? |
Also, as to the original question of looking for variables that are associated with treatment heterogeneity, I often try something like this:
This can help you find variables that are associated with treatment heterogeneity. |
@swager responding to each comment, in turn:
|
@swager , @markhwhiteii , thanks for your input. My follow-up questions:
How can we capture interactive effects between two or more variables by looking at variable importance or searching manually over the CATE distribution (as you suggested)? I know that trees build interactions between variables as the algorithm grows the trees, but how does that show when we want to retrieve the most important interactions? Would it e.g. be possible to build a variable importance for interactive effects? E.g. it would indicate that gender and race being importan only together, but not seperatly.
In the original tutorial https://github.com/swager/grf/ , Usage Examples, we specify the test set (X.test) as a desired sequence that reflects the variable of interest.
(3a) my first question is, we set all the other variables p to zero, and I wonder if this is equivalent of setting of setting covariate to their mean in a regression framework? But how does that effect our interpretation of the average effect for the X.test sample? We then plot the point estimate and the SE, which leads to (3b) my second question. I am interpreting this as one-hundred and one ITEs predicted for each point in a desired sequence, and not a CATE. In other words, I am interpreting this in a similar ways as giving the model one observation for which I want one ITE estimate with corresponding SE. So in @markhwhiteii example this could be the ITEs for the two first rows:
From this intepretation of ITEs, I would now like to get group-averaged ITEs which leads me to the questions @markhwhiteii has raised. If my interpretation of ITE is correct, than I am not sure how we get a CATE that is group averaged (e.g. ATE for gender=female & race=black) that is sensitive to the group size in each values—maybe we can call it GATE, group averaged treatment effect. You have already indicated an answer to this, but could you please provide how the mathematics behind would look like (e.g. is it averaging of trees or sampling for GATE give its groupsize)? |
I created a version of You can find my version of the R file here: @swager should I create a pull request? |
Thanks a lot, @kingsharaman! Yes, please create a PR. I like the idea of not having to compute predictions each time you want to call the average_treatment_effect function. However, I don't like passing the predictions separately (as they should really be part of the forest object). How about doing something like this instead? Once you compute OOB predictions the first time you call average_treatment_effect, then you add them to the object in the parent scope, so the next time you can just grab them from there? Some prototyping: foo = function(cf) {
if (is.null(cf$predictions)) {
preds <- 1:3
cf.with.preds <- cf
cf.with.preds$predictions <- preds
eval.parent(substitute(cf <- cf.with.preds))
}
}
a <- list(forest="forest")
foo(a) @jtibshirani let me know what you think. p.s. We should probably move this conversation to a new thread with the PR. |
@adeldaoud the main idea is that if you want to estimate an average effect over a subgroup, then you can just use the same method as you'd use for an ATE, but only considering those observations in the subgroup. For example, AIPW computes the ATE by averaging Gamma[i] = mu.hat.1[i] - mu.hat.0[i] + W[i] / e.hat[i] * (Y[i] - mu.hat.1[i]) - (1 - W[i]) / (1 - e.hat[i]) * (Y[i] - mu.hat.0[i]) where |
@kingsharaman a couple thoughts on the discussion around subsetting:
|
Concerning the OOB predictions, here's what I propose: by default, we should calculate OOB automatically during training, and return them as part of the forest object ( As an aside to @swager -- if we calculate OOB predictions during training by default, I wonder if we can now avoid hanging on to the training matrix ( |
@swager, thanks for the reply. So the new subsetting function will also be sensitive to the size (and clustering) of the subgroup and calibrate the standard errors accordingly? My 3ed question (requoted parts of it below) above aims at understanding the difference between a SE produced for ITE (prediction for one individual conditional on all the features) and a SE produced for CATE (prediction for more than one individual conditional on all the features). With subsetting we seem to get what I would name group average treatment effect (GATE) that produces a prediction for more than one individual conditional on some the features (e.g. race or age).
|
@kingsharaman great function. I've All of my variables are factors, so I turn them all to dummy variables first:
Then I fit the causal forest:
Then I use the most recent function from the pull request:
That calculates the average treatment effect for each level of each covariate variable. |
Hi, I am looking to calculate ATEs (and their SEs) for predicted deciles based on predictions from a causal forest. A version of this is done in the Uplift package, but in that package 1) causal forests aren't used and 2) you have to bootstrap SEs. Here is an example of a paper that used this approach http://journals.ama.org/doi/10.1509/jmr.16.0163?code=amma-site |
@Zaw5009 very interesting. I'd love it if someone could explain the difference between a causal forest and an uplift random forest—from reading the algorithms and pseudocode in their respective papers, they look to be very similar algorithms. How do you figure the best way to compute bootstrap SEs? It seems like bootstrapping on an algorithm that already uses bagging would be prohibitively computationally expensive. |
@markhwhiteii I think they're basically the same. What I want is to be able to reproduce something like this |
@Zaw5009 if the groups are categorical, you can subset using this function: https://github.com/kingsharaman/grf/blob/master/r-package/grf/R/average_treatment_effect.R. But yes, I'm looking for the same type of functionality. Have you tried bootstrapping SEs for the |
I have not tried the bootstrapping approach, but this solves my problem. I can just plug in the values from the holdout group to obtain lift deciles and then calculate the ATEs and their accompanying SEs for each predicted lift decile. Thanks! |
Thanks a lot @markhwhiteii! We'll make sure this functionality is included in the next release. In terms of uplift forests: It looks like the main difference between uplift forests and causal forests is that uplift forests try to find regions in feature space with a large divergence between the treated and control outcome distributions, whereas causal forests directly target treatment heterogeneity. Thus, although the algorithms are similar, the statistical motivation seems quite different. The CCIF function in the uplift package looks more similar to causal forests. Another difference is that causal forests, as implemented here, are locally robust in a way that increases accuracy in observational studies: See Section 6.2 of https://arxiv.org/pdf/1610.01271.pdf, and also https://arxiv.org/pdf/1712.04912.pdf for a broader discussion. |
@swager As you noted before, a large sample size of subgroups are required when averaging Gamma over them. Is there any guideline to decide whether the size is "large" enough to get credible result? |
@philipy1219 the subset-specific CIs should be valid even over small-ish subsets; however, the CIs may be very wide. |
@swager This method (divide the training set into quintiles ...) also works for binary outcome, right? As it seems the estimated CATE from causal_forest will be a continuous value between 0 and 1 for a binary outcome (0, 1). |
Yes -- everything with causal forests works the same for both continuous and binary outcomes. If you have binary outcomes, then the CATE is estimated on the "difference in probabilities" scale. |
Thanks a lot! |
Does causal forest work for recurrent events? e.g., assessing comparative effectiveness of treatment A vs treatment B for asthma exacerbation outcome, majority of patients will be 0 events and some have 1 event only, and a smaller portion have multiple events during a fixed follow-up period (e.g., 1 year). |
After reading a few of the papers on causal forests, it seems like the idea is the trees will decide to split more often on variables that cause the treatment effect to vary (that is, moderators of the experimental effect). However, the object returned by
causal_forest
seems to work like any other machine learning algorithm focused on prediction—not causal statistical inference—in the way thepredict
function associated with it works.How can I use the
causal_forest
function to find where the treatment varies?For example, I simulate a randomized experiment with a binary outcome where there is a positive effect for women and a negative effect for black men. There are also nuisance variables in here. How would I use
causal_forest
to tell me that, "The trees are tending to split most on gender, and then when it leads to male, it the trees also are more likely to split on race." This would help show me where the treatment effects are occurring.Here are the data:
And then I fit a model using
grf::causal_forest
:Looking at
mod
, I see some variable importance issues that hints at where the splits are occurring most:However, this doesn't capture how multiple
X
variables may depend on one another in their interaction withW
onY
(what may be called a three-way interaction in the general linear model world).If I use the
predict()
function, I can only look at individual-level conditional treatment effects and their associated variance estimates. For example, just the first row of my data:However, how could I estimate the treatment effect and variance for the category women, collapsing across all other variables? Or the combination of male and Black, collapsing across all other variables? Is this as simple as averaging the treatment effects on the training set within the groups of interest? And if so, how are confidence intervals calculated from those?
Additionally, can the
causal_forest
function tell me where these variations are most likely to occur? It seems like this is possible from the papers I have read on causal forests (as well as earlier papers on causal trees and transformed outcome trees, etc.), but if I may very well be mistaken.The text was updated successfully, but these errors were encountered: