Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny improvments for ResamplePredition and an important question #1324

Closed
wants to merge 1 commit into from

Conversation

MariaErdmann
Copy link
Contributor

After discussing issue #1284 and PR #1315 respectively on our last meeting, I did some beauty to ResamplePrediction:

  • vapply instead of sapply
  • changed the object passed to p1 (which is "safer")

There is one major question that arose regarding the ResamplePrediction function and
depending on the answer some probably more changes need to be implemented.

Is it possible to make resampling, i.e. cross-validation, where the result of one of the iterations
is NULL? To demonstrate what we mean you find a small code snippet below:


lrn = makeLearner("classif.rpart", predict.type = "prob")
rdesc = makeResampleDesc("CV", iter = 4, predict = "train")
mmce.train = setAggregation(mmce, train.mean)
res = resample(lrn, binaryclass.task, rdesc, mmce.train)

head(res$pred$data)
# here every iteration has a result. Is it possible that one iteration has no result?

# Since I do not have a use case I will manipulate the results of doResampleIteration such
# that it shows what I mean
rin = makeResampleInstance(rdesc, task = binaryclass.task)

extract = function(model) {}
more.args = list(learner = lrn, task = binaryclass.task, rin = rin, weights = NULL,
  measures = list(mmce), model = FALSE, extract = extract, show.info = getMlrOption("show.info"))

library(parallelMap)
parallelLibrary("mlr", master = FALSE, level = "mlr.resample", show.info = FALSE)
exportMlrOptions(level = "mlr.resample")
iter.results = parallelMap(doResampleIteration, seq_len(rin$desc$iters), level = "mlr.resample", more.args = more.args)

preds.test = extractSubList(iter.results, "pred.test", simplify = FALSE)
preds.train = extractSubList(iter.results, "pred.train", simplify = FALSE)
head(preds.train, n = 2L)
preds.train[1] <- list(NULL)
head(preds.train, n = 2L)

Is there any use case you can think about where this might happen?
Thank you!

tenull = sapply(preds.test, is.null)
trnull = sapply(preds.train, is.null)
tenull = vapply(preds.test, is.null, FUN.VALUE = logical(1))
trnull = vapply(preds.train, is.null, FUN.VALUE = logical(1))
if (any(tenull)) pr.te = preds.test[!tenull] else pr.te = preds.test
if (any(trnull)) pr.tr = preds.train[!trnull] else pr.tr = preds.train
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume that preds.test and preds.train never contains 'NULL' (as we discussed in our weekly meeting), we could even remove lines 19 - 22. If not, I think we can at least prettify the code and replace lines 19 - 22 with pr.te = filterNull(preds.test) and pr.tr = filterNull(preds.train).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MariaErdmann what is the status here? Is filterNull appropriate here?

@@ -16,8 +16,8 @@ NULL


makeResamplePrediction = function(instance, preds.test, preds.train) {
tenull = sapply(preds.test, is.null)
trnull = sapply(preds.train, is.null)
tenull = vapply(preds.test, is.null, FUN.VALUE = logical(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use vlapply

tenull = sapply(preds.test, is.null)
trnull = sapply(preds.train, is.null)
tenull = vapply(preds.test, is.null, FUN.VALUE = logical(1))
trnull = vapply(preds.train, is.null, FUN.VALUE = logical(1))
if (any(tenull)) pr.te = preds.test[!tenull] else pr.te = preds.test
if (any(trnull)) pr.tr = preds.train[!trnull] else pr.tr = preds.train
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MariaErdmann what is the status here? Is filterNull appropriate here?

Copy link
Sponsor Member

@larskotthoff larskotthoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See changes requested by Giuseppe.

@berndbischl
Copy link
Sponsor Member

Is it possible to make resampling, i.e. cross-validation, where the result of one of the iterations
is NULL? To demonstrate what we mean you find a small code snippet below:

no. and iteration will either work, and produce a normal prediction object, or fail with an exception.
for the latter case you can use on.learner.error, but that will still produce a prediction object (with NAs)

so if you dont see further reasons in the code why a NULL could be produced, i am very sure that is impossible.

@larskotthoff
Copy link
Sponsor Member

@MariaErdmann what's the status here?

@pat-s
Copy link
Member

pat-s commented Mar 17, 2019

I guess this PR won't be finished?

@pat-s
Copy link
Member

pat-s commented Apr 8, 2019

Prio seems to be low and improvements tiny. Closing. Feel free to re-open if you want to finish it :)

@pat-s pat-s closed this Apr 8, 2019
@pat-s pat-s deleted the improve.ResamplePrediction branch November 15, 2019 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants