"Error: Missing data in columns" due to `fix.factors.prediction=TRUE` #2611

feinmann · 2019-06-26T12:00:14Z

library(mlr)

train_data <- data.frame(
  A = runif(100), B = factor(sample(c("A", "B"), 100, replace = T)))

test_data <- data.frame(
  A = runif(100), B = factor(sample(c("A", "B", "C"), 100, replace = T)))

lrn <- makeLearner("regr.ranger", fix.factors.prediction = TRUE)

train_task <- makeRegrTask(
  data = train_data,
  target = "A"
)

model <- train(lrn, train_task)

predictions <- predict(model, newdata = test_data)

Gives Error: Missing data in columns: B., although there is no missing data. Same for classification.

Kind regards

The text was updated successfully, but these errors were encountered:

pat-s · 2019-06-26T14:59:57Z

This issue sounds familiar to me - I think it hit me some time in the past as well.
I cannot tell you when I will have time to look into this.

Thanks a lot for the reprex in the first place!

jakob-r · 2020-09-11T09:29:07Z

The reason is that the new factor C is converted to an NA because of fix.factors.prediction = TRUE.
As you can see in the documentation this is feature was intended for cases where the test data has less factors than the training set. However, it has the side effect that it reduces the levels to the one seen in the training and R then just sets new factor levels to NA. Maybe some learners can deal better with an NA then with an unseen factor? However this is not really intended and definitely has to go into the documentation of the fix.factors.prediction argument.

Also we might want to deal with it better.
#2771 is kind of related.

pat-s · 2020-10-28T16:38:19Z

The PR from Jakob provides a good approach to the problem. Most likely there is not much else we can do in such situations to account for all possible issues with missing data in prediction scenarios.

feinmann added the type-bug label Jun 26, 2019

jakob-r added prio-medium type-documentation labels Sep 11, 2020

jakob-r mentioned this issue Oct 26, 2020

add error and warning on new factor levels for prediction #2794

Merged

pat-s closed this as completed Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Error: Missing data in columns" due to `fix.factors.prediction=TRUE` #2611

"Error: Missing data in columns" due to `fix.factors.prediction=TRUE` #2611

feinmann commented Jun 26, 2019

pat-s commented Jun 26, 2019

jakob-r commented Sep 11, 2020

pat-s commented Oct 28, 2020

"Error: Missing data in columns" due to fix.factors.prediction=TRUE #2611

"Error: Missing data in columns" due to fix.factors.prediction=TRUE #2611

Comments

feinmann commented Jun 26, 2019

pat-s commented Jun 26, 2019

jakob-r commented Sep 11, 2020

pat-s commented Oct 28, 2020

"Error: Missing data in columns" due to `fix.factors.prediction=TRUE` #2611

"Error: Missing data in columns" due to `fix.factors.prediction=TRUE` #2611