Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with graph learner with preprocessing #1115

Closed
larskotthoff opened this issue Aug 23, 2024 · 8 comments
Closed

Error with graph learner with preprocessing #1115

larskotthoff opened this issue Aug 23, 2024 · 8 comments

Comments

@larskotthoff
Copy link
Sponsor Member

This snippet in the book fails currently (https://mlr3book.mlr-org.com/chapters/chapter9/preprocessing.html):

glrn_rf_impute_hist = as_learner(impute_hist %>>% lrn("regr.ranger"))
glrn_rf_impute_hist$id = "RF_imp_Hist"

glrn_rf_impute_oor = as_learner(po("imputeoor") %>>% lrn("regr.ranger"))
glrn_rf_impute_oor$id = "RF_imp_OOR"

design = benchmark_grid(tsk_ames,
  [c](https://rdrr.io/r/base/c.html)(glrn_rf_impute_hist, glrn_rf_impute_oor), rsmp_cv3)
bmr_new = benchmark(design)

Stack trace from rendering the book is:

Error:
! Learner 'regr.ranger' received task with different column info during train and predict.
This happened PipeOp regr.ranger's $predict()
Backtrace:

  1. mlr3::benchmark(design)
  2. mlr3:::future_map(...)
  3. future.apply::future_mapply(...)
  4. future.apply:::future_xapply(...)
  5. future:::value.list(fs)
  6. future:::resolve.list(...)
  7. future (local) signalConditionsASAP(obj, resignal = FALSE, pos = ii)
  8. future:::signalConditions(...)

Quitting from lines 238-249 [preprocessing-016] (preprocessing.qmd)
Execution halted

I can reproduce this locally. traceback():

11: stop(condition)
10: signalConditions(obj, exclude = getOption("future.relay.immediate", 
        "immediateCondition"), resignal = resignal, ...)
9: signalConditionsASAP(obj, resignal = FALSE, pos = ii)
8: resolve.list(y, result = TRUE, stdout = stdout, signal = signal, 
       force = TRUE)
7: resolve(y, result = TRUE, stdout = stdout, signal = signal, force = TRUE)
6: value.list(fs)
5: value(fs)
4: future_xapply(FUN = FUN, nX = nX, chunk_args = dots, MoreArgs = MoreArgs, 
       get_chunk = function(X, chunk) lapply(X, FUN = `chunkWith[[`, 
           chunk), expr = expr, envir = envir, future.envir = future.envir, 
       future.globals = future.globals, future.packages = future.packages, 
       future.scheduling = future.scheduling, future.chunk.size = future.chunk.size, 
    ...
3: future.apply::future_mapply(FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE, 
       USE.NAMES = FALSE, future.globals = FALSE, future.packages = "mlr3", 
       future.seed = TRUE, future.scheduling = scheduling, future.chunk.size = chunk_size, 
       future.stdout = stdout)
2: future_map(n, workhorse, task = grid$task, learner = grid$learner, 
       resampling = grid$resampling, iteration = grid$iteration, 
       param_values = grid$param_values, mode = grid$mode, MoreArgs = list(store_models = store_models, 
           lgr_threshold = lgr_threshold, pb = pb, unmarshal = unmarshal))
1: benchmark(design)
@sebffischer
Copy link
Sponsor Member

This was introduced by me: 738b7bd

Maybe this check is too strict?

@sebffischer
Copy link
Sponsor Member

sebffischer commented Aug 23, 2024

The commit that caused the book failure fixed #943 (comment)

@sebffischer
Copy link
Sponsor Member

sebffischer commented Aug 23, 2024

I debugged this and for the RF_imp_Hist, the levels for the feature "Electrical" are:

  • during train:

               id   type                               levels
           <char> <char>                               <list>
    1: Electrical factor FuseA,FuseF,FuseP,Mix,SBrkr,.MISSING
    
  • during predict

    Key: <id>
               id   type                      levels
           <char> <char>                      <list>
    1: Electrical factor FuseA,FuseF,FuseP,Mix,SBrkr
    

In this case we are lucky that .MISSING is the last level, so dropping it does not change how as.integer() behaves for the remaining factor levels, but if it was not the last level this would now cause a bug if used in conjunction with lrn("classif.ksvm").

@sebffischer
Copy link
Sponsor Member

sebffischer commented Aug 23, 2024

I think the PipeOp should ensure that either the .MISSING level is also present during prediction or not present during train @mb706.

@sebffischer
Copy link
Sponsor Member

I guess the error message should still be improved

@larskotthoff
Copy link
Sponsor Member Author

Am I right that this now assumes that the factor levels are the same during train and predict? That seems to be too strict -- there's no reason to assume that all classes seen during training will be predicted.

@sebffischer
Copy link
Sponsor Member

then they need to be handled anyway by a pipeop

@be-marc
Copy link
Sponsor Member

be-marc commented Aug 29, 2024

Error is now a warning

@be-marc be-marc closed this as completed Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants