Question: How can I use the package to build a single causal tree? #548

ferlocar · 2019-10-26T00:06:29Z

I'm trying to build a single causal tree using the following code:

model <- causal_forest(X_train, y_train, z_train, num.trees = 1)

However, I noticed that the causal_forest method has the parameter sample.fraction, which defines the fraction of the data that is used to build each tree (and is 0.5 by default). Because I want to use the entire data set to build the causal tree, I want to set this to 1, but when I run the following code:

model <- causal_forest(X_train, y_train, z_train, num.trees = 1, sample.fraction=1)

I get the following error message:

"Error in causal_train(data$default, data$sparse, outcome.index, treatment.index, :
When confidence intervals are enabled, the sampling fraction must be less than 0.5."

Could you please tell me how to disable confidence intervals in order to build a tree using the entire sample? Thanks in advance!

The text was updated successfully, but these errors were encountered:

erikcs · 2019-10-26T00:18:13Z

You can disable confidence intervals with ci.group.size=1 (last paragraph in the reference on variance estimates)

ferlocar · 2019-10-26T00:38:54Z

Thanks for your fast response.

I tried that, and it removed the error message, but then the predictions no longer work. The following code:

predict(model, X_test)$predictions

Gives NaN for all predictions. I also tried including the parameter estimate.variance = FALSE as part of the predict method, but I obtained the same results.

Any idea of why?

erikcs · 2019-10-26T01:37:18Z

causal_forest by default predicts Y.hat and W.hat with a separate regression_forest, but since sample.fraction=1 all of these OOB predictions will be NaN, and so your causal_forest predictions are nan.

You can avoid this by instead supplying Y.hat and W.hat to causal_forest. (e.g. W.hat = predict(regression_forest(X, W))$predictions; Y.hat = predict(regression_forest(X, Y))$predictions)

ferlocar · 2019-10-26T02:34:19Z

Thanks again for your response.

Could you please elaborate on why Y.hat and W.hat are required to make predictions?

My understanding about a causal tree is that (assuming I use the entire sample to build the tree) a fraction of the data is used to learn the tree structure and the remaining fraction is used to fill the leaves in the tree (I'll call this the prediction fraction).

Then, the prediction for 'observation i' corresponds to the estimated average treatment effect of the observations in the prediction fraction that are also in the leaf of 'observation i'. The reference on predictions also seems to agree with this. So, what's the role of Y.hat and W.hat in making predictions?

My concern about using Y.hat as you propose is that I would be using the entire sample to estimate the regression tree, and I'm not sure how does that play a role with 'honesty'.

Again, I appreciate your quick and helpful responses!

susanathey · 2019-10-26T03:28:43Z

You may do better using the https://github.com/susanathey/causaltree package for your use case. The residualizing comes in if you have an observational study, but you wouldn't necessarily want to use it for a single tree with a randomized experiment. You can set the y.hat and w.hat to be constants.

erikcs · 2019-10-26T03:33:28Z

Could you please elaborate on why Y.hat and W.hat are required to make predictions?

As @susanathey mentions above, this is not related to honesty but orthogonalization. (W.hat and Y.hat could be estimated with an arbitrary estimator, not necessarily a regression forest)

ferlocar · 2019-10-26T04:03:23Z

@susanathey Thanks, I'll check that out!
@erikcs Thanks for the orthogonalization reference, that clears my question.

By the way, I'm really impressed with the package and the fast responses. Thanks a lot for this!

sudonghua91 · 2021-08-22T14:36:57Z

You can avoid this by instead supplying Y.hat and W.hat to causal_forest. (e.g. W.hat = predict(regression_forest(X, W))$predictions; Y.hat = predict(regression_forest(X, Y))$predictions)

@erikcs I followed your expertise but the predictions are still all NaN...Why? Thanks.

erikcs · 2021-08-23T05:17:44Z

Hi @sudonghua91, could you please give some more details? If you train a forest with for example only 1 tree, then some OOB (out of bag) predictions may be NaN by construction.

sudonghua91 · 2021-08-23T13:44:37Z

Hi @erikcs , thanks for your response.
As I understand, sample.fraction is used for splitting the whole sample into a training sample and a hold-out sample. Now, I am trying to do the splitting before inputting the training sample to causal_forest. So I want to set sample.fraction=1 and only take honesty.fraction=0.5. But I need 20,000 trees so setting ci.group.size=1 cannot help this case. I am wondering how I can set sample.fraction=1 in other possible ways? Thanks.

erikcs · 2021-08-24T01:31:43Z

With sample.fraction=1 you can predict on a test set predict(forest, X.test), but OOB predictions predict(forest) on X.train will naturally be all NaN.

sudonghua91 · 2021-08-24T02:26:57Z

@erikcs Thanks. Actually I did predict on a test set but they are still NaN(you can of course try it yourself). Btw, does predict(forest) cause overfitting?

erikcs · 2021-08-25T23:26:14Z

@erikcs Thanks. Actually I did predict on a test set but they are still NaN(you can of course try it yourself). Btw, does predict(forest) cause overfitting?

n <- 500
p <- 10
X <- matrix(rnorm(n * p), n, p)
X.test <- matrix(rnorm(n * p), n, p)
W <- rbinom(n, 1, 0.5)
Y <- pmax(X[, 1], 0) * W + rnorm(n)
cf <- causal_forest(X, Y, W, W.hat = 0.5, Y.hat = 0, ci.group.size = 1, sample.fraction = 1)
head(predict(cf, X.test)$predictions)
# [1] 0.6489251 0.7139381 0.3271824 0.3057034 0.3641508 0.3184822

sudonghua91 · 2021-08-26T12:59:57Z

@erikcs well thanks! I got it.

erikcs added the question label Oct 26, 2019

ferlocar closed this as completed Oct 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How can I use the package to build a single causal tree? #548

Question: How can I use the package to build a single causal tree? #548

ferlocar commented Oct 26, 2019

erikcs commented Oct 26, 2019

ferlocar commented Oct 26, 2019

erikcs commented Oct 26, 2019

ferlocar commented Oct 26, 2019

susanathey commented Oct 26, 2019

erikcs commented Oct 26, 2019

ferlocar commented Oct 26, 2019

sudonghua91 commented Aug 22, 2021

erikcs commented Aug 23, 2021

sudonghua91 commented Aug 23, 2021 •

edited

erikcs commented Aug 24, 2021

sudonghua91 commented Aug 24, 2021

erikcs commented Aug 25, 2021

sudonghua91 commented Aug 26, 2021

Question: How can I use the package to build a single causal tree? #548

Question: How can I use the package to build a single causal tree? #548

Comments

ferlocar commented Oct 26, 2019

erikcs commented Oct 26, 2019

ferlocar commented Oct 26, 2019

erikcs commented Oct 26, 2019

ferlocar commented Oct 26, 2019

susanathey commented Oct 26, 2019

erikcs commented Oct 26, 2019

ferlocar commented Oct 26, 2019

sudonghua91 commented Aug 22, 2021

erikcs commented Aug 23, 2021

sudonghua91 commented Aug 23, 2021 • edited

erikcs commented Aug 24, 2021

sudonghua91 commented Aug 24, 2021

erikcs commented Aug 25, 2021

sudonghua91 commented Aug 26, 2021

sudonghua91 commented Aug 23, 2021 •

edited