Performance of Causal Forests in the tails of the covariate distribution #1246

thomasklausch2 · 2022-12-05T12:21:30Z

I am trying to understand Causal Forest's behavior in the tail of the distribution of the covariates. I run simulations of the type below and often find that CF estimates are constant towards the tails (see plot) which means that CFs are biased there. In the simulation below I compare this to the naïve approach using random forests for each of Y0 and Y1 to predict the outcomes and then obtain tau by the difference in predictions \hat{Y1}-\hat{Y0}.
That approach seemingly does not suffer from the bias problem in the tails (but has larger variance throughout). In larger samples (e.g. n=1e4) the problem seems to persist.
Is there anything I can do to get a better fit in the tails? Thanks.

library(grf)
library(ranger)

## Simulate non-linear ps-scores and y-models
e.x = function(x) 1/ (1 + exp(- ( 3 * x) ))
mu.0 = function(x) sin(-1/2- 4*x)
mu.1 = function(x) sin(1/2 + 4*x)
tau  = function(x) (mu.1(x) - mu.0(x))
m.x = function(x) mu.0(x) + e.x(x) * tau(x)

## Sample training data
set.seed(2023)
n = 1000
sd.y = 0.6
Z = rnorm(n, mean=0, sd = 0.3)
y0 = mu.0(Z) + rnorm(n, sd = sd.y)
y1 = mu.1(Z) + rnorm(n, sd = sd.y)
e  = e.x(Z)
W  = rbinom(n, 1, e)
y  = W*y1 + (1-W) * y0
df = data.frame(Z = Z, y, y0, y1, e, W = factor(W), W.num = W)

## Estimate causal forest with tuning on
cf = causal_forest(X = cbind(df$Z), Y = df$y, W = df$W.num, num.trees = 2e3, tune.parameters = 'all')

## Estimate heterogeineity using conditional means random forests
rf0 = ranger(y ~ Z, data = df[df$W.num==0,], num.trees = 5e3 )
rf1 = ranger(y ~ Z, data = df[df$W.num==1,], num.trees = 5e3 )

## Create test data
Z.test  = seq(-1,1,0.01)
df.test = data.frame(Z = Z.test)
tau.test = tau(Z.test)
tau.test.cf = predict(cf, newdata = df.test)[,1]
tau.test.cdmrf = predict(rf1, data= df.test)$predictions - predict(rf0, data= df.test)$predictions

## Compare fits
par(mfrow=c(1,2))
plot(Z.test,tau.test, ylim=c(-3,3),ty='l', main='Predictions vs true effect')
lines(Z.test,tau.test.cf,col=2)
lines(Z.test,tau.test.cdmrf,col=4)
plot(Z.test,tau.test.cf-tau.test, ylim=c(-3,3),col=2, ty='l', main= 'Error')
lines(Z.test,tau.test.cdmrf-tau.test,col=4)
abline(h=0)
legend('bottomright', legend = c('Causal Forest', 'Standard Forests'),lty=c(1,1),col=c(2,4))

The text was updated successfully, but these errors were encountered:

erikcs · 2022-12-06T05:02:43Z

Hi @thomasklausch2, you could try with local linear forest (i.e. replace tau.test.cf with tau.test.cf = predict(cf, newdata = df.test, linear.correction.variables = 1)[,1])

thomasklausch2 · 2022-12-06T09:13:05Z

Thanks! That indeed solves the issue to some extent. Is this the same thing that has been called 'local centering' in the literature? (e.g. here on p.142)

EDIT: oh no, maybe it's not the same because your addition is only done in the predictor and not in the estimator? I understand from the reference local centering means replacing Y_i by Y_i - mu(X_i) and D_i by D_i - e(X_i) before estimation. Is that a different feature of causal_forest?

erikcs · 2022-12-06T16:35:49Z

What you are highlighting here is "boundary bias" and is actually quite normal, the local linear prediction does a correction that may help. (Note that the question of what constitute a boundary when dimensions get high is tricky and correction methods like these will have a hard time)

thomasklausch2 changed the title ~~Performance of Causal Forests in the tails of the covariate space~~ Performance of Causal Forests in the tails of the covariate distribution Dec 5, 2022

erikcs added the question label Dec 26, 2022

erikcs closed this as completed Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of Causal Forests in the tails of the covariate distribution #1246

Performance of Causal Forests in the tails of the covariate distribution #1246

thomasklausch2 commented Dec 5, 2022 •

edited

erikcs commented Dec 6, 2022

thomasklausch2 commented Dec 6, 2022 •

edited

erikcs commented Dec 6, 2022

Performance of Causal Forests in the tails of the covariate distribution #1246

Performance of Causal Forests in the tails of the covariate distribution #1246

Comments

thomasklausch2 commented Dec 5, 2022 • edited

erikcs commented Dec 6, 2022

thomasklausch2 commented Dec 6, 2022 • edited

erikcs commented Dec 6, 2022

thomasklausch2 commented Dec 5, 2022 •

edited

thomasklausch2 commented Dec 6, 2022 •

edited