-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does ranger do with new factor levels in prediction? #116
Comments
Both things are on my TODO list for a while:
On 2.: In |
Oh I see. So I guess the same strategy (always to the left) is also used when calculating the OOB predictions. Maybe you can leave it like that until we have strategies of handling missing values in predictors? Then you could start to offer an option like |
Error fixed in #120. |
In randomForest all new factor levels go to the left. i'm not sure about this |
Yes, you are right. I was checking for factor levels being present in the levels but not in the data. However, I've just checked that again and it was changed in a recent version of |
I used this datasets https://cloud.mail.ru/public/5XwH/oR6FP8qwq and got error in randomForest 4.6.12
library(randomForest)
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.
data <- read.csv2("C:/Trees/Exp.csv")
data$x1 <- as.factor(data$x1)
data$x2 <- as.factor(data$x2)
data$x3 <- as.factor(data$x3)
data$x4 <- as.factor(data$x4)
data$x6 <- as.factor(data$x6)
data$y <- as.factor(data$y)
model <- randomForest(y ~., data, seed=100)
newdata <- read.csv2("C:/Trees/Exp_new.csv")
newdata$x1 <- as.factor(newdata$x1)
newdata$x2 <- as.factor(newdata$x2)
newdata$x3 <- as.factor(newdata$x3)
newdata$x4 <- as.factor(newdata$x4)
newdata$x6 <- as.factor(newdata$x6)
score <- predict(model, newdata, type="prob")
Ошибка в predict.randomForest(model, newdata, type = "prob") :
New factor levels not present in the training data
|
Hello rangers
I was recently stumbling over the error message
It was the "classic" problem of having a new fector level in a categorical predictor during prediction which seems to happen if
respect.unordered.factors = ["order"/TRUE]
only. For "partition" and "ignore"/FALSE, there is no such message.I think the behaviour in the cases "order" and "ignore" (FALSE) is clear although the error message for "order" could be more specific like "new or unknown factor levels in regressor". But what does ranger do in the last case
respect.unordered.factors = "partition"
(no error)?Below the small example for test:
The text was updated successfully, but these errors were encountered: