-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRF refuse to report average treatment effect when the dataset is imbalanced #406
Comments
I tried running this on both the old and new versions of the package. First, on this dataset, the propensity scores get very close to 0 (they get down to 0.0005). So even though the old version gives a point estimate for the ATE, it also warns The difference with the release version is that the forest that estimates the propensity scores appears to be splitting more aggressively, and now returns some propensity estimates that are exactly (rather than very nearly) zero. So that's why the ATE estimate comes out as NaN (because we're dividing by a zero propensity). I'm removing the "bug" label, as what the development version is doing still appears to be reasonable (it's just estimating a small propensity score by exactly zero instead of a very small positive number). However, it may be worth investigating further why the development version makes more splits when estimating the propensity score here. p.s. In cases like these, with very small propensity scores, it's often a good idea to focus on the average treatment effect on the treated instead of the overall average treatment effect. |
@swager Thanks for you reply! I guess what you mean by release version is actually the development version? Because it seems that only the old version is released (0.10.2) and the one on GitHub is still not released yet. (Correct me if I am wrong). I agree with you, when the data has propensities that are close to 0, it is better to use just treated treatment effects. What was puzzling me was the NaN, but I think what you said is right - it is because of the dividing by a value that is almost 0. It is indeed interesting to find out why is it the case that the new version splits more vigorously ... |
Thanks, I fixed the typo in my previous comment. I discussed with @jtibshirani, and it appears that the more aggressive splitting is due to a subtle interaction between early stopping rules for tree splitting and honesty. It's not clear to me whether such more aggressive splitting is better or worse in practice (my hunch is that slightly more aggressive splitting may in fact be good, in that it gives us more power to detect failures in overlap), so I'm closing this ticket for now. However, if anyone comes across adverse effects of the more aggressive splitting in the development version, please let me know. |
@swager Just curious - because the calculation of ATE is actually from the following formula:
|
Unless it is dividing by 0 in a step prior to calculating the average treatment effect ... |
This formula defines the average treatment effect, but it cannot be used to calculate it, as it involves potential outcomes that we do not observe. For each i, we observe either Y_i(0) [if W_i=0] or Y_i(1) [if W_i=1] but not both. Our estimator does divide by the propensity score --- take a look at lines 183-210 of average_treatment_effect.R |
@davidahirshberg Thanks! I will take a look. |
@davidahirshberg if the estimator does divide by the propensity score, does it mean that the ATEs are by default doubly robust? |
@swager are you reopening this issue? 😂 |
Hi @ginward, I reopened the issue while there's still discussion on it. We only actively monitor open tickets. For the question -- the function |
Thanks @swager I have indeed read the application paper you wrote, and at the time I thought the doubly robust is only implemented for the cluster robust estimators. But anyways - it looks like, when the fitted propensity is 0, the doubly robust method will not work ( |
Yes that's right. If there are some people whose treatment propensity is (essentially) zero, then the ATE is (essentially) not identified (because the treatment effect function is not identified in the region with poor overlap). So that's why any non-parametric estimate of the ATE will blow up in this case. The only way around this is to change estimands, so that the target of inference is still identified even when some propensities are 0. One simple way to do this is to use the average treatment effect on the treated. Because the ATT only cares about the CATE function in regions where some people are treated, it doesn't matter that the CATE function is not identified in the region with 0 propensity of treatment (because that region just gets ignored by the ATT anyways). |
@swager I see. This cleared things up a lot. Much appreciated! |
As a note, I walked back part of the change that caused forests to split more aggressively (#415). The behavior of the development version should now match that of I'm going to close this out, since it seems that this issue has been resolved. Thanks again @ginward for reporting the issue, it helped us catch an unintentional change in behavior (before we released it)! |
@jtibshirani Sure thing! |
Dear all, I have read all the posts above, and learned a lot. Very appreciated. My question is similar, the output of average_treatment_effect is [1] NaN. However, I am using IV forest insteas of conventional causal forest, and the estimand of my case is the Average (Conditional) Local Average Treatment (ACLATE): average_treatment_effect(ivforest, target.sample = "all") The warning message also indicated poor overlap, but I can't change to estimate ATE on the treated due to IV forest. The error message is as below: Error in average_treatment_effect(ivforest, target.sample = "treated") : I am eager to know how to handle this condition. Because IV forest is very essential in my research paper. Thank you very much.! Warm regards, |
The first thing I'd do in a situation like this is to make a histogram of the estimated compliance scores, i.e., estimates of Delta(x) = P[W = 1 | Z = 1, X = x] - P[W = 0 | Z = 0, X = x]. Are there any regions of X-space where these are reliably away from 0 or not? If yes, then focusing on those regions can help; but if no, then you have a weak IV problem and using AIPW for the ACLATE is simply not going to work. Algorithmically, one thing you can do to focus on well-identified regions is to use the `subset' argument to focus on regions of X-space where treatment effects are well identified. For example, in the case of ATE estimation under unconfoundedness, the following paper considers subsetting to regions of X-space where propensity scores don't get too close to 0 or 1: Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199. The important thing to remember in doing so is that your subsetting should only be a function of X, not of Z / W / Y. |
Dear Dr. Swager, Thanks! But I am NOT sure that in the case of IV forest, I should use the specific range of Z.hat or of W.hat as the subsample? And set it as subset in the average_treatment_effect( ). Many thanks. |
Description of the bug
Download the dataset at this link. This is a dataset by LaLonde (1986), Dehejia and Wahba (1999) on the NSW training program. I am not sure if there is copyright - if so it belongs to them.
The data set is very imbalanced - there are around 10000 untreated example and less than 300 treated example.
When I run the code below in the development version, the
average_treatment_effect(cps.forest, target.sample = "all")
returns NaN. However, the code returns numbers in the releasedversion 0.10.2.
It only returns NaN in the development branch.Is this supposed to be a feature (i.e. returns NaN when data is very imbalanced), or is it actually a bug?
The average treatment effect for the treated and the overlap weighted treatment effects are all normal. Although the numbers differ slightly from the released branch as well.
Steps to reproduce
Output in Development Version
Output in Release Version
GRF version
development
The text was updated successfully, but these errors were encountered: