New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: How can I use the package to build a single causal tree? #548
Comments
You can disable confidence intervals with |
Thanks for your fast response. I tried that, and it removed the error message, but then the predictions no longer work. The following code:
Gives NaN for all predictions. I also tried including the parameter Any idea of why? |
causal_forest by default predicts Y.hat and W.hat with a separate regression_forest, but since sample.fraction=1 all of these OOB predictions will be NaN, and so your causal_forest predictions are nan. You can avoid this by instead supplying Y.hat and W.hat to causal_forest. (e.g. |
Thanks again for your response. Could you please elaborate on why Y.hat and W.hat are required to make predictions? My understanding about a causal tree is that (assuming I use the entire sample to build the tree) a fraction of the data is used to learn the tree structure and the remaining fraction is used to fill the leaves in the tree (I'll call this the prediction fraction). Then, the prediction for 'observation i' corresponds to the estimated average treatment effect of the observations in the prediction fraction that are also in the leaf of 'observation i'. The reference on predictions also seems to agree with this. So, what's the role of Y.hat and W.hat in making predictions? My concern about using Y.hat as you propose is that I would be using the entire sample to estimate the regression tree, and I'm not sure how does that play a role with 'honesty'. Again, I appreciate your quick and helpful responses! |
You may do better using the https://github.com/susanathey/causaltree package for your use case. The residualizing comes in if you have an observational study, but you wouldn't necessarily want to use it for a single tree with a randomized experiment. You can set the y.hat and w.hat to be constants. |
As @susanathey mentions above, this is not related to honesty but orthogonalization. (W.hat and Y.hat could be estimated with an arbitrary estimator, not necessarily a regression forest) |
@susanathey Thanks, I'll check that out! By the way, I'm really impressed with the package and the fast responses. Thanks a lot for this! |
@erikcs I followed your expertise but the predictions are still all NaN...Why? Thanks. |
Hi @sudonghua91, could you please give some more details? If you train a forest with for example only 1 tree, then some OOB (out of bag) predictions may be NaN by construction. |
Hi @erikcs , thanks for your response. |
With |
@erikcs Thanks. Actually I did predict on a test set but they are still NaN(you can of course try it yourself). Btw, does predict(forest) cause overfitting? |
|
@erikcs well thanks! I got it. |
I'm trying to build a single causal tree using the following code:
model <- causal_forest(X_train, y_train, z_train, num.trees = 1)
However, I noticed that the
causal_forest
method has the parametersample.fraction
, which defines the fraction of the data that is used to build each tree (and is 0.5 by default). Because I want to use the entire data set to build the causal tree, I want to set this to 1, but when I run the following code:model <- causal_forest(X_train, y_train, z_train, num.trees = 1, sample.fraction=1)
I get the following error message:
"Error in causal_train(data$default, data$sparse, outcome.index, treatment.index, :
When confidence intervals are enabled, the sampling fraction must be less than 0.5."
Could you please tell me how to disable confidence intervals in order to build a tree using the entire sample? Thanks in advance!
The text was updated successfully, but these errors were encountered: