bug in resample using predict = "train" #1284

giuseppec · 2016-10-13T08:56:31Z

lrn = makeLearner("classif.rpart", predict.type = "prob")
rdesc = makeResampleDesc("CV", iter = 3, predict = "train")
mmce.train = setAggregation(mmce, train.mean)
res = resample(lrn, binaryclass.task, rdesc, mmce.train)

Issue 1: printing res$pred yields

Resampled Prediction for:
Resample description: cross-validation with 3 iterations.
Predict: train
Stratification: FALSE

threshold: 
time (mean): NA
Error in as.data.frame.default(x) : 
  cannot coerce class "c("ResamplePrediction", "NULL")" to a data.frame
In addition: Warning message:
In mean.default(x$time) : argument is not numeric or logical: returning NA

Issue 2: res$pred$predict.type seems to be NULL but should still have the same value as lrn$predict.type!
For example, using predict = "both" seems to work:

lrn = makeLearner("classif.rpart", predict.type = "prob")
rdesc = makeResampleDesc("CV", iter = 3, predict = "both")
mmce.train = setAggregation(mmce, train.mean)
res = resample(lrn, binaryclass.task, rdesc, mmce.train)
res$pred$predict.type
# [1] "prob"

The text was updated successfully, but these errors were encountered:

MariaErdmann · 2016-10-13T14:42:42Z

The problem is in makeResamplePrediction. See my (unfortunately not so short example below):

learner = makeLearner("classif.rpart")
task = binaryclass.task
resampling = makeResampleDesc("CV", iters = 2, predict = "train")
resampling = makeResampleInstance(resampling, task = task)
rin = resampling

mmce.train = setAggregation(mmce, train.mean)
extract = function(model) {}
more.args = list(learner = learner, task = task, rin = rin, weights = NULL,
  measures = list(mmce.train), model = FALSE, extract = extract, show.info = getMlrOption("show.info"))

library(parallelMap)
parallelLibrary("mlr", master = FALSE, level = "mlr.resample", show.info = FALSE)
exportMlrOptions(level = "mlr.resample")
iter.results = parallelMap(doResampleIteration, seq_len(rin$desc$iters), level = "mlr.resample", more.args = more.args)

ms.train = as.data.frame(extractSubList(iter.results, "measures.train", simplify = "rows"))
ms.train
ms.test = extractSubList(iter.results, "measures.test", simplify = FALSE)
ms.test = as.data.frame(do.call(rbind, ms.test))
ms.test

preds.test = extractSubList(iter.results, "pred.test", simplify = FALSE)
preds.test
preds.train = extractSubList(iter.results, "pred.train", simplify = FALSE)
preds.train

pred = makeResamplePrediction(instance = rin, preds.test = preds.test, preds.train = preds.train)
# calling pred I can reproduce the error
pred

# looking into makeResamplePrediction we see where it comes from
tenull = sapply(preds.test, is.null)
trnull = sapply(preds.train, is.null)
if (any(tenull)) pr.te = preds.test[!tenull] else pr.te = preds.test
if (any(trnull)) pr.tr = preds.train[!trnull] else pr.tr = preds.train

data = setDF(rbind(
  rbindlist(lapply(seq_along(pr.te), function(X) cbind(pr.te[[X]]$data, iter = X, set = "test"))),
  rbindlist(lapply(seq_along(pr.tr), function(X) cbind(pr.tr[[X]]$data, iter = X, set = "train")))
))

# the problem ist p1 which is NULL because we just calculated measures for train
p1 = preds.test[[1L]]
p1

# printing the S3 object fails because some 'slots' are NULL (see below)

makeS3Obj(c("ResamplePrediction", class(p1)),
  instance = rin,
  predict.type = p1$predict.type,
  data = data,
  threshold = p1$threshold,
  task.desc = p1$task.desc,
  time = extractSubList(preds.test, "time")
)

p1$predict.type
p1$threshold
p1$task.desc
extractSubList(preds.test, "time")

So my suggestion for the task description and predict.type would be to pass the learner and the task to the makeResamplePrediction function which is no big deal because makeResamplePrediction is only called infunctionmergeResampleResult`where the learner and task are passed anyway.

For threshold and the time I am not sure how to handle this.
Is it possible to have different thresholds for train and predict measures? If so, then we need to pass a vector, right? Regarding the time slot: what time shall be displayed here?

MariaErdmann · 2016-10-28T10:43:03Z

Fixed in #1315

* enable printing pred of resample results when predict type is train * Finish bug fix and adapt test * better tests

giuseppec added type-bug hiwi prio-high labels Oct 13, 2016

giuseppec assigned MariaErdmann Oct 13, 2016

MariaErdmann mentioned this issue Oct 13, 2016

Aggregation of measure timetrain is made on test set #1286

Closed

larskotthoff pushed a commit that referenced this issue Oct 31, 2016

Fix bug in resampling when using predict = "train" (#1284) (#1315)

4dbec91

* enable printing pred of resample results when predict type is train * Finish bug fix and adapt test * better tests

larskotthoff closed this as completed Oct 31, 2016

MariaErdmann mentioned this issue Nov 7, 2016

Tiny improvments for ResamplePredition and an important question #1324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in resample using predict = "train" #1284

bug in resample using predict = "train" #1284

giuseppec commented Oct 13, 2016

MariaErdmann commented Oct 13, 2016

MariaErdmann commented Oct 28, 2016

bug in resample using predict = "train" #1284

bug in resample using predict = "train" #1284

Comments

giuseppec commented Oct 13, 2016

MariaErdmann commented Oct 13, 2016

MariaErdmann commented Oct 28, 2016