Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA handling regression forest vs. local linear forest covariate matrices #1240

Closed
spocksdad opened this issue Nov 17, 2022 · 5 comments
Closed
Labels

Comments

@spocksdad
Copy link

Description of the bug
Regression forest allows the X matrix to have some incomplete cases, but local linear forest returns an error. Not sure if there's some technical reason why ll forests can't handle incomplete cases?

Steps to reproduce

#Toy data 
Y    <- as.vector(rnorm(100))
X    <- data.frame(x1 = rnorm(100), x2 = rnorm(100))
#
#Add NAs 
X$x1 <- ifelse(X$x1 > 0,X$x1,NA)
#
#Let's try an r forest 
regression_forest(Y = Y,
                  X = X)
#R forest runs fine 
#
# Now let's try ll forest 
ll_regression_forest(Y = Y,
                     X = X)
#
#ll forest returns: Error in validate_X(X) : The feature matrix X contains at least one NA.

GRF version
GRF version 2.2.0

@erikcs
Copy link
Member

erikcs commented Nov 17, 2022

Hi @spocksdad, ll forest doesn't support NA in X because it runs OLS with X.

@erikcs
Copy link
Member

erikcs commented Nov 18, 2022

Btw, you could stitch together your own ll forest that allows NAs in Xjs which are not in ll.split.variables by removing that input check and making sure you call the forest only with missing in features not used in ll corrections (will likely not modify grf to support this anytime soon).

@joepvdburg
Copy link

Btw, you could stitch together your own ll forest that allows NAs in Xjs which are not in ll.split.variables by removing that input check and making sure you call the forest only with missing in features not used in ll corrections (will likely not modify grf to support this anytime soon).

I tried to do this; I've removed the input check and deleted the rows data with NA values in 'll.split.variables', so that only NA values exist outside 'll.split.variables'. But my out-of-bag predictions now give a NULL value. How can this be solved? Thank you in advance

@erikcs
Copy link
Member

erikcs commented Sep 20, 2023

I guess you'd have to call predict(ll.forest, linear.correction.variables = variables without missing values).

@joepvdburg
Copy link

Ah yes I missed that indeed. Thanks

@erikcs erikcs closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants