-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for factor variables. #109
Comments
When there is a categorical variable X1 in X, it is possible that there is a child node where those observations contain only one category, i.e. all values of the variable X1 are the same. In this case, how can we get pseudo-outcomes? The pseudo-outcomes is obtained through the inverse of Ap which may be singular. Isn't it? |
The X-values aren't used to compute the pseudo-outcomes in the leaf in the standard GRF formulation; rather, only the "outcomes" matter (e.g., W and Y for |
Thanks for you reply @swager! I understand in your causal_forest, pseudo-outcomes only have W and Y. But if we want to do the local linear regression, so our psi should be: |
We added the sufrep package, which contains a collection of methods for handling categorical variables, and a tutorial for how to use sufrep with grf. |
Thanks |
We should likely implement the approach suggested in ESL where in each node, factor variables are ordered by their mean outcome before performing the split. This should be properly generalized to handle non-regression forests.
The text was updated successfully, but these errors were encountered: