GRF refuse to report average treatment effect when the dataset is imbalanced #406

ginward · 2019-04-23T22:14:04Z

Description of the bug
Download the dataset at this link. This is a dataset by LaLonde (1986), Dehejia and Wahba (1999) on the NSW training program. I am not sure if there is copyright - if so it belongs to them.

The data set is very imbalanced - there are around 10000 untreated example and less than 300 treated example.

When I run the code below in the development version, the average_treatment_effect(cps.forest, target.sample = "all") returns NaN. However, the code returns numbers in the released version 0.10.2. It only returns NaN in the development branch.

Is this supposed to be a feature (i.e. returns NaN when data is very imbalanced), or is it actually a bug?

The average treatment effect for the treated and the overlap weighted treatment effects are all normal. Although the numbers differ slightly from the released branch as well.

Steps to reproduce

#compile the grf development code and include it into the library
library(dplyr)
select <- dplyr::select
TREES_CPS=1000
SEED=6000
set.seed(SEED)
dat=read.csv("bug.csv")
forest=causal_forest(seed=SEED,num.trees = TREES_CPS, X=as.matrix(select(dat, re74, re75, age, education, black, hispanic, married, nodegree)), Y=dat$re78, W=dat$treat)
average_treatment_effect(forest, target.sample = "all")

Output in Development Version

> average_treatment_effect(forest, target.sample = "all")
estimate  std.err 
     NaN      NaN

Output in Release Version

> average_treatment_effect(forest, target.sample = "all")
 estimate   std.err 
-2058.140  1401.162

GRF version
development

The text was updated successfully, but these errors were encountered:

swager · 2019-04-26T04:25:38Z

I tried running this on both the old and new versions of the package. First, on this dataset, the propensity scores get very close to 0 (they get down to 0.0005). So even though the old version gives a point estimate for the ATE, it also warns Estimated treatment propensities go as low as 0 which means that treatment effects for some controls may not be well identified...

The difference with the release version is that the forest that estimates the propensity scores appears to be splitting more aggressively, and now returns some propensity estimates that are exactly (rather than very nearly) zero. So that's why the ATE estimate comes out as NaN (because we're dividing by a zero propensity).

I'm removing the "bug" label, as what the development version is doing still appears to be reasonable (it's just estimating a small propensity score by exactly zero instead of a very small positive number). However, it may be worth investigating further why the development version makes more splits when estimating the propensity score here.

p.s. In cases like these, with very small propensity scores, it's often a good idea to focus on the average treatment effect on the treated instead of the overall average treatment effect.

ginward · 2019-04-26T14:17:12Z

@swager Thanks for you reply! I guess what you mean by release version is actually the development version? Because it seems that only the old version is released (0.10.2) and the one on GitHub is still not released yet. (Correct me if I am wrong).

I agree with you, when the data has propensities that are close to 0, it is better to use just treated treatment effects. What was puzzling me was the NaN, but I think what you said is right - it is because of the dividing by a value that is almost 0.

It is indeed interesting to find out why is it the case that the new version splits more vigorously ...

swager · 2019-04-26T17:05:52Z

Thanks, I fixed the typo in my previous comment. I discussed with @jtibshirani, and it appears that the more aggressive splitting is due to a subtle interaction between early stopping rules for tree splitting and honesty. It's not clear to me whether such more aggressive splitting is better or worse in practice (my hunch is that slightly more aggressive splitting may in fact be good, in that it gives us more power to detect failures in overlap), so I'm closing this ticket for now. However, if anyone comes across adverse effects of the more aggressive splitting in the development version, please let me know.

ginward · 2019-04-28T09:27:01Z

I tried running this on both the old and new versions of the package. First, on this dataset, the propensity scores get very close to 0 (they get down to 0.0005). So even though the old version gives a point estimate for the ATE, it also warns Estimated treatment propensities go as low as 0 which means that treatment effects for some controls may not be well identified...

The difference with the release version is that the forest that estimates the propensity scores appears to be splitting more aggressively, and now returns some propensity estimates that are exactly (rather than very nearly) zero. So that's why the ATE estimate comes out as NaN (because we're dividing by a zero propensity).

I'm removing the "bug" label, as what the development version is doing still appears to be reasonable (it's just estimating a small propensity score by exactly zero instead of a very small positive number). However, it may be worth investigating further why the development version makes more splits when estimating the propensity score here.

p.s. In cases like these, with very small propensity scores, it's often a good idea to focus on the average treatment effect on the treated instead of the overall average treatment effect.

@swager Just curious - because the calculation of ATE is actually from the following formula:

target.sample = "all": the ATE on the whole population, sum_{i = 1}^n E[Y(1) - Y(0) | X = X_i] / n.
How is that "dividing by 0 propensity"? My understanding is that, because now there are more propensities that are 0, the 0s will go into the denominator. But it seems that, from the formula, the propensities doesn't go into the denominator at all.

ginward · 2019-04-28T09:30:09Z

Unless it is dividing by 0 in a step prior to calculating the average treatment effect ...

davidahirshberg · 2019-04-28T10:21:50Z

This formula defines the average treatment effect, but it cannot be used to calculate it, as it involves potential outcomes that we do not observe. For each i, we observe either Y_i(0) [if W_i=0] or Y_i(1) [if W_i=1] but not both. Our estimator does divide by the propensity score --- take a look at lines 183-210 of average_treatment_effect.R

ginward · 2019-04-28T10:23:28Z

@davidahirshberg Thanks! I will take a look.

ginward · 2019-05-09T19:05:19Z

This formula defines the average treatment effect, but it cannot be used to calculate it, as it involves potential outcomes that we do not observe. For each i, we observe either Y_i(0) [if W_i=0] or Y_i(1) [if W_i=1] but not both. Our estimator does divide by the propensity score --- take a look at lines 183-210 of average_treatment_effect.R

@davidahirshberg if the estimator does divide by the propensity score, does it mean that the ATEs are by default doubly robust?

ginward · 2019-05-09T19:10:04Z

@swager are you reopening this issue? 😂

swager · 2019-05-09T19:14:07Z

Hi @ginward, I reopened the issue while there's still discussion on it. We only actively monitor open tickets.

For the question -- the function average_treatment_effect implements a doubly robust ATE estimator. When estimating ATEs with machine learning methods, you essentially always want to use a doubly robust (or related) method to avoid bias due to regularization (see, e.g., the double machine learning paper from Chernozhukov et al.) For a discussion of what we implement exactly, see https://arxiv.org/pdf/1902.07409.pdf.

ginward · 2019-05-09T19:18:45Z

Thanks @swager

I have indeed read the application paper you wrote, and at the time I thought the doubly robust is only implemented for the cluster robust estimators.

But anyways - it looks like, when the fitted propensity is 0, the doubly robust method will not work (+inf basically). Is this what doubly robust meant to behave? What if the person, indeed, has lower propensity of treatment that is close to 0?

swager · 2019-05-09T19:25:16Z

Yes that's right. If there are some people whose treatment propensity is (essentially) zero, then the ATE is (essentially) not identified (because the treatment effect function is not identified in the region with poor overlap). So that's why any non-parametric estimate of the ATE will blow up in this case.

The only way around this is to change estimands, so that the target of inference is still identified even when some propensities are 0. One simple way to do this is to use the average treatment effect on the treated. Because the ATT only cares about the CATE function in regions where some people are treated, it doesn't matter that the CATE function is not identified in the region with 0 propensity of treatment (because that region just gets ignored by the ATT anyways).

ginward · 2019-05-09T19:26:23Z

@swager I see. This cleared things up a lot. Much appreciated!

jtibshirani · 2019-05-10T01:53:04Z

As a note, I walked back part of the change that caused forests to split more aggressively (#415). The behavior of the development version should now match that of 0.10.2 more closely.

I'm going to close this out, since it seems that this issue has been resolved. Thanks again @ginward for reporting the issue, it helped us catch an unintentional change in behavior (before we released it)!

ginward · 2019-05-11T15:05:53Z

@jtibshirani Sure thing!

minhengw · 2022-05-12T15:49:22Z

Dear all,

I have read all the posts above, and learned a lot. Very appreciated.

My question is similar, the output of average_treatment_effect is [1] NaN. However, I am using IV forest insteas of conventional causal forest, and the estimand of my case is the Average (Conditional) Local Average Treatment (ACLATE):

average_treatment_effect(ivforest, target.sample = "all")

The warning message also indicated poor overlap, but I can't change to estimate ATE on the treated due to IV forest. The error message is as below:

Error in average_treatment_effect(ivforest, target.sample = "treated") :
For any forest type other than causal_forest, the only implemented option is method=AIPW and target.sample=all

I am eager to know how to handle this condition. Because IV forest is very essential in my research paper. Thank you very much.!

Warm regards,
David

swager · 2022-05-15T17:47:38Z

The first thing I'd do in a situation like this is to make a histogram of the estimated compliance scores, i.e., estimates of Delta(x) = P[W = 1 | Z = 1, X = x] - P[W = 0 | Z = 0, X = x]. Are there any regions of X-space where these are reliably away from 0 or not? If yes, then focusing on those regions can help; but if no, then you have a weak IV problem and using AIPW for the ACLATE is simply not going to work.

Algorithmically, one thing you can do to focus on well-identified regions is to use the `subset' argument to focus on regions of X-space where treatment effects are well identified. For example, in the case of ATE estimation under unconfoundedness, the following paper considers subsetting to regions of X-space where propensity scores don't get too close to 0 or 1:

Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.

The important thing to remember in doing so is that your subsetting should only be a function of X, not of Z / W / Y.

minhengw · 2022-05-16T06:18:02Z

Dear Dr. Swager,

Thanks! But I am NOT sure that in the case of IV forest, I should use the specific range of Z.hat or of W.hat as the subsample? And set it as subset in the average_treatment_effect( ).

Many thanks.
David

swager added the bug label Apr 25, 2019

swager removed the bug label Apr 26, 2019

swager closed this as completed Apr 26, 2019

swager reopened this May 9, 2019

jtibshirani closed this as completed May 10, 2019

erikcs added the question label May 15, 2022

njawadekar mentioned this issue Jun 1, 2022

causal_forest without orthogonalization #1160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRF refuse to report average treatment effect when the dataset is imbalanced #406

GRF refuse to report average treatment effect when the dataset is imbalanced #406

ginward commented Apr 23, 2019 •

edited

Loading

swager commented Apr 26, 2019 •

edited

Loading

ginward commented Apr 26, 2019

swager commented Apr 26, 2019

ginward commented Apr 28, 2019

ginward commented Apr 28, 2019

davidahirshberg commented Apr 28, 2019

ginward commented Apr 28, 2019

ginward commented May 9, 2019

ginward commented May 9, 2019

swager commented May 9, 2019

ginward commented May 9, 2019

swager commented May 9, 2019

ginward commented May 9, 2019

jtibshirani commented May 10, 2019

ginward commented May 11, 2019

minhengw commented May 12, 2022

swager commented May 15, 2022

minhengw commented May 16, 2022

GRF refuse to report average treatment effect when the dataset is imbalanced #406

GRF refuse to report average treatment effect when the dataset is imbalanced #406

Comments

ginward commented Apr 23, 2019 • edited Loading

swager commented Apr 26, 2019 • edited Loading

ginward commented Apr 26, 2019

swager commented Apr 26, 2019

ginward commented Apr 28, 2019

ginward commented Apr 28, 2019

davidahirshberg commented Apr 28, 2019

ginward commented Apr 28, 2019

ginward commented May 9, 2019

ginward commented May 9, 2019

swager commented May 9, 2019

ginward commented May 9, 2019

swager commented May 9, 2019

ginward commented May 9, 2019

jtibshirani commented May 10, 2019

ginward commented May 11, 2019

minhengw commented May 12, 2022

swager commented May 15, 2022

minhengw commented May 16, 2022

ginward commented Apr 23, 2019 •

edited

Loading

swager commented Apr 26, 2019 •

edited

Loading