Error with graph learner with preprocessing #1115

larskotthoff · 2024-08-23T16:04:20Z

This snippet in the book fails currently (https://mlr3book.mlr-org.com/chapters/chapter9/preprocessing.html):

glrn_rf_impute_hist = as_learner(impute_hist %>>% lrn("regr.ranger"))
glrn_rf_impute_hist$id = "RF_imp_Hist"

glrn_rf_impute_oor = as_learner(po("imputeoor") %>>% lrn("regr.ranger"))
glrn_rf_impute_oor$id = "RF_imp_OOR"

design = benchmark_grid(tsk_ames,
  [c](https://rdrr.io/r/base/c.html)(glrn_rf_impute_hist, glrn_rf_impute_oor), rsmp_cv3)
bmr_new = benchmark(design)

Stack trace from rendering the book is:

Error:
! Learner 'regr.ranger' received task with different column info during train and predict.
This happened PipeOp regr.ranger's $predict()
Backtrace:

mlr3::benchmark(design)
mlr3:::future_map(...)
future.apply::future_mapply(...)
future.apply:::future_xapply(...)
future:::value.list(fs)
future:::resolve.list(...)
future (local) signalConditionsASAP(obj, resignal = FALSE, pos = ii)
future:::signalConditions(...)

Quitting from lines 238-249 [preprocessing-016] (preprocessing.qmd)
Execution halted

I can reproduce this locally. traceback():

11: stop(condition)
10: signalConditions(obj, exclude = getOption("future.relay.immediate", 
        "immediateCondition"), resignal = resignal, ...)
9: signalConditionsASAP(obj, resignal = FALSE, pos = ii)
8: resolve.list(y, result = TRUE, stdout = stdout, signal = signal, 
       force = TRUE)
7: resolve(y, result = TRUE, stdout = stdout, signal = signal, force = TRUE)
6: value.list(fs)
5: value(fs)
4: future_xapply(FUN = FUN, nX = nX, chunk_args = dots, MoreArgs = MoreArgs, 
       get_chunk = function(X, chunk) lapply(X, FUN = `chunkWith[[`, 
           chunk), expr = expr, envir = envir, future.envir = future.envir, 
       future.globals = future.globals, future.packages = future.packages, 
       future.scheduling = future.scheduling, future.chunk.size = future.chunk.size, 
    ...
3: future.apply::future_mapply(FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE, 
       USE.NAMES = FALSE, future.globals = FALSE, future.packages = "mlr3", 
       future.seed = TRUE, future.scheduling = scheduling, future.chunk.size = chunk_size, 
       future.stdout = stdout)
2: future_map(n, workhorse, task = grid$task, learner = grid$learner, 
       resampling = grid$resampling, iteration = grid$iteration, 
       param_values = grid$param_values, mode = grid$mode, MoreArgs = list(store_models = store_models, 
           lgr_threshold = lgr_threshold, pb = pb, unmarshal = unmarshal))
1: benchmark(design)

The text was updated successfully, but these errors were encountered:

sebffischer · 2024-08-23T18:58:27Z

This was introduced by me: 738b7bd

Maybe this check is too strict?

sebffischer · 2024-08-23T19:00:52Z

The commit that caused the book failure fixed #943 (comment)

sebffischer · 2024-08-23T19:20:39Z

I debugged this and for the RF_imp_Hist, the levels for the feature "Electrical" are:

during train:

           id   type                               levels
       <char> <char>                               <list>
1: Electrical factor FuseA,FuseF,FuseP,Mix,SBrkr,.MISSING

during predict

Key: <id>
           id   type                      levels
       <char> <char>                      <list>
1: Electrical factor FuseA,FuseF,FuseP,Mix,SBrkr

In this case we are lucky that .MISSING is the last level, so dropping it does not change how as.integer() behaves for the remaining factor levels, but if it was not the last level this would now cause a bug if used in conjunction with lrn("classif.ksvm").

sebffischer · 2024-08-23T19:21:50Z

I think the PipeOp should ensure that either the .MISSING level is also present during prediction or not present during train @mb706.

sebffischer · 2024-08-23T19:23:56Z

I guess the error message should still be improved

larskotthoff · 2024-08-24T21:17:56Z

Am I right that this now assumes that the factor levels are the same during train and predict? That seems to be too strict -- there's no reason to assume that all classes seen during training will be predicted.

sebffischer · 2024-08-24T21:42:50Z

then they need to be handled anyway by a pipeop

be-marc · 2024-08-29T10:24:41Z

Error is now a warning

larskotthoff added Type: Bug Workshop labels Aug 23, 2024

mb706 mentioned this issue Aug 24, 2024

robustify impute oor mlr-org/mlr3pipelines#812

Merged

be-marc closed this as completed Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with graph learner with preprocessing #1115

Error with graph learner with preprocessing #1115

larskotthoff commented Aug 23, 2024

sebffischer commented Aug 23, 2024

sebffischer commented Aug 23, 2024 •

edited

Loading

sebffischer commented Aug 23, 2024 •

edited

Loading

sebffischer commented Aug 23, 2024 •

edited

Loading

sebffischer commented Aug 23, 2024

larskotthoff commented Aug 24, 2024

sebffischer commented Aug 24, 2024

be-marc commented Aug 29, 2024

Error with graph learner with preprocessing #1115

Error with graph learner with preprocessing #1115

Comments

larskotthoff commented Aug 23, 2024

sebffischer commented Aug 23, 2024

sebffischer commented Aug 23, 2024 • edited Loading

sebffischer commented Aug 23, 2024 • edited Loading

sebffischer commented Aug 23, 2024 • edited Loading

sebffischer commented Aug 23, 2024

larskotthoff commented Aug 24, 2024

sebffischer commented Aug 24, 2024

be-marc commented Aug 29, 2024

sebffischer commented Aug 23, 2024 •

edited

Loading

sebffischer commented Aug 23, 2024 •

edited

Loading

sebffischer commented Aug 23, 2024 •

edited

Loading