Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure models not handled correctly in benchmarking #1981

Closed
dagola opened this issue Aug 15, 2017 · 2 comments · Fixed by #1984
Closed

Failure models not handled correctly in benchmarking #1981

dagola opened this issue Aug 15, 2017 · 2 comments · Fixed by #1984

Comments

@dagola
Copy link

dagola commented Aug 15, 2017

When running a benchmark and a learner model fails, the benchmark function stops regardless of the settings of on.learner.error.

Consider the following (minimal?) reproducible example:

set.seed(123)

# Load mlr ----
library(mlr)
configureMlr("on.learner.error" = "warn")

# Generate data ----
data = data.frame(y = factor(c(1, sample(c(0), 99, TRUE)), 
                              levels = c(0,1), labels = c("control", "case")), 
                   x = matrix(rnorm(1000), nrow = 100))
task = makeClassifTask("test", data = data, target = "y")

# Define measures ----
measures = list(mlr::auc)

# Define filter parameter set ----
filter_ps = ParamHelpers::makeParamSet(ParamHelpers::makeIntegerParam("fw.abs",
                                                                       lower = 1,
                                                                       upper = mlr::getTaskNFeats(task)))

# Define tuning control ----
ctrl = mlr::makeTuneControlRandom(maxit = 10)

# Define resampling strategies ----
inner = mlr::makeResampleDesc("CV", stratify = FALSE, iters = 3)
outer = mlr::makeResampleDesc("CV", stratify = FALSE, iters = 3)

# Define learner ----
glm_lrn = mlr::makeLearner("classif.binomial", predict.type = "prob", config = list("on.learner.error" = "warn"))
glm_lrn = mlr::makeFilterWrapper(glm_lrn, fw.method = "chi.squared")
glm_lrn = mlr::makeDownsampleWrapper(glm_lrn, dw.perc = 1/5, dw.stratify = FALSE)

glm_ps = ParamHelpers::makeParamSet(ParamHelpers::makeDiscreteParam("link",
                                                                     values = c("logit",
                                                                                "probit",
                                                                                "cloglog")))

glm_lrn = mlr::makeTuneWrapper(glm_lrn,
                                resampling = inner,
                                control = ctrl,
                                measures = measures,
                                par.set = c(filter_ps, glm_ps),
                                show.info = TRUE)

# Benchmark ----
bmr = benchmark(learners = list(glm_lrn), 
                 tasks = task, 
                 resamplings = outer, 
                 measures = measures, 
                 keep.pred = FALSE, models = FALSE, show.info = TRUE)

Task: test, Learner: classif.binomial.filtered.downsampled.tuned
[Resample] cross-validation iter 1: [Tune] Started tuning learner classif.binomial.filtered.downsampled for parameter set:
           Type len Def               Constr Req Tunable Trafo
fw.abs  integer   -   -              1 to 10   -    TRUE     -
link   discrete   -   - logit,probit,cloglog   -    TRUE     -
With control class: TuneControlRandom
Imputation value: -0
[Tune-x] 1: fw.abs=8; link=logit
Warning in train(.learner$next.learner, .task, weights = .task$weights) :
  Could not train learner classif.binomial.filtered: Error in .jcall(filter, "Z", "setInputFormat", instances) : 
  weka.core.UnsupportedAttributeTypeException: weka.filters.supervised.attribute.Discretize: Cannot handle unary class!

Warning in train(learner, task, subset = train.i, weights = weights[train.i]) :
  Could not train learner classif.binomial.filtered.downsampled.tuned: Error : $ operator is invalid for atomic vectors

Error: $ operator is invalid for atomic vectors

Given settings of on.learner.error I expected many failure models during the benchmark and finally a bad performance, but no stop because of an error.

I think the error stems from this line. Here, isFailureModel is called repeatedly on next.model but at some point this is not a BaseWrapperModel or any other model anymore, but a string:

str(model$learner.model$next.model$learner.model)
chr "Error in .jcall(filter, \"Z\", \"setInputFormat\", instances) : \n  weka.core.UnsupportedAttributeTypeException: weka.filters.s"| __truncated__

Is my expectation wrong or is this a bug?

Session info

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mlrMBO_1.1.0      smoof_1.5.1       checkmate_1.8.3   BBmisc_1.11      
[5] mlr_2.11          ParamHelpers_1.10

loaded via a namespace (and not attached):
 [1] parallelMap_1.3     Rcpp_0.12.12        RColorBrewer_1.1-2 
 [4] plyr_1.8.4          RWekajars_3.9.1-3   lhs_0.14           
 [7] tools_3.3.1         digest_0.6.12       viridisLite_0.2.0  
[10] jsonlite_1.0        tibble_1.3.3        gtable_0.2.0       
[13] lattice_0.20-33     DiceKriging_1.5.5   rlang_0.1.1        
[16] Matrix_1.2-6        DBI_0.7             parallel_3.3.1     
[19] rJava_0.9-8         dplyr_0.5.0         httr_1.2.1         
[22] htmlwidgets_0.9     plot3D_1.1          grid_3.3.1         
[25] data.table_1.10.4   R6_2.1.2            plotly_4.7.1       
[28] survival_2.39-4     mco_1.0-15.1        RJSONIO_1.3-0      
[31] RWeka_0.4-34        FSelector_0.21      ggplot2_2.2.1      
[34] purrr_0.2.3         tidyr_0.5.1         magrittr_1.5       
[37] backports_1.1.0     scales_0.4.1        htmltools_0.3.5    
[40] splines_3.3.1       randomForest_4.6-12 assertthat_0.1     
[43] misc3d_0.8-4        colorspace_1.2-6    entropy_1.2.1      
[46] stringi_1.1.5       lazyeval_0.2.0      munsell_0.4.3 

Bug report

  • [ x] Start a new R session
  • [x ] Install the latest version of mlr: update.packages(oldPkgs="mlr", ask=FALSE) or if you use a GitHub install of mlr: devtools::install_github(c("BBmisc", "ParamHelpers", "mlr"))
  • [ x] run sessionInfo()
  • [x ] Give a minimal reproducible example
@larskotthoff
Copy link
Sponsor Member

This is a bug in mlr, which is why it isn't handled by on.error. Traceback:

12: isFailureModel(model$learner.model$next.model)
11: isFailureModel.BaseWrapperModel(m)
10: isFailureModel(m)
9: calculateResampleIterationResult(learner = learner, task = task, 
       train.i = train.i, test.i = test.i, measures = measures, 
       weights = weights, rdesc = rin$desc, model = model, extract = extract, 
       show.info = show.info)
8: (function (learner, task, rin, i, measures, weights, model, extract, 
       show.info) 
   {
       setSlaveOptions()
       if (show.info) 
    ...
7: mapply(fun2, ..., MoreArgs = more.args, SIMPLIFY = FALSE, USE.NAMES = FALSE)
6: parallelMap(doResampleIteration, seq_len(rin$desc$iters), level = "mlr.resample", 
       more.args = more.args)
5: resample(lrn, tasks[[task]], resamplings[[task]], measures = measures, 
       models = models, extract = extract.this, keep.pred = keep.pred, 
       show.info = show.info)
4: (function (task, learner, learners, tasks, resamplings, measures, 
       keep.pred = TRUE, models = TRUE, show.info) 
   {
       setSlaveOptions()
       if (show.info) 
    ...
3: mapply(fun2, ..., MoreArgs = more.args, SIMPLIFY = FALSE, USE.NAMES = FALSE)
2: parallelMap(benchmarkParallel, task = grid$task, learner = grid$learner, 
       more.args = list(learners = learners, tasks = tasks, resamplings = resamplings, 
           measures = measures, keep.pred = keep.pred, models = models, 
           show.info = show.info), level = plevel)
1: benchmark(learners = list(glm_lrn), tasks = task, resamplings = outer, 
       measures = measures, keep.pred = FALSE, models = FALSE, show.info = TRUE)

@dagola
Copy link
Author

dagola commented Aug 15, 2017

Maybe, the ordering of the classes is wrong?

class(model$learner.model$next.model)
[1] "FilterModel"      "BaseWrapperModel" "FailureModel"     "WrappedModel" 

So, isFailureModel.BaseWrapperModel is called, instead of isFailureModel.FailureModel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants