Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlr3measures::mse() unexpected result #328

Closed
MarcelMiche opened this issue Feb 9, 2022 · 2 comments
Closed

mlr3measures::mse() unexpected result #328

MarcelMiche opened this issue Feb 9, 2022 · 2 comments
Assignees

Comments

@MarcelMiche
Copy link

MarcelMiche commented Feb 9, 2022

mse() returns unexpected result when applied to a certain part of inner resampling (extended_archive$...prediction("test")), whereas returns expected result when applied to more detailed part of same inner resampling (...predictions("test")[[1]])

Date: 2022-02-09. R Version: 4.0.3 (2020-10-10).
Platform: x86_64-apple-darwin17.0 (64-bit)

Setup:
Using nested resampling (inner: rsmp("cv", 4), outer: rsmp("rep_cv", 2, 3)) and regr.glmnet as only learner, auto-tuning s (random search, terminate n_evals = 7), predict_sets = c("train", "test"), performance measure regr.mse. Execute: rrMap <- mlr3misc::map(as.data.table(rr)$learner, "model")

Replicable example with dummy data.

set.seed(123)
for(i in 1:9) {
    assign(paste0("x", i), rnorm(n=100, mean = sample(50:100,1), sd = sample(5:60,1)))
} # Make predictors
err <- rnorm(n=100, 1000, 100) # error term
y <- 42 - (2*x1) + 3*x2 - .66*x3 + .03*x4 + 1.7*x5 + .085*x6 + .1*x7 - .008*x8 + x9 + err
dat <- data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9)

library(mlr3verse)
tskReg <- TaskRegr$new(id="Check", backend=dat, target="y")
LASSO <- lrn("regr.glmnet", alpha = 1, predict_sets=c("train", "test"))
inner_rsmp = rsmp("cv", folds = 4)
measure = msr("regr.mse")
search_space = ps(s = p_dbl(lower = 0.001, upper = 0.8))
terminator = trm("evals", n_evals = 7)
tuner = tnr("random_search")
at = AutoTuner$new(LASSO, inner_rsmp, measure, terminator, tuner, search_space, store_models = TRUE) # at = auto tuning
outer_rsmp <- rsmp("repeated_cv", repeats = 2, folds = 3)
rr = resample(tskReg, at, outer_rsmp, store_models = TRUE)

rrMap <- mlr3misc::map(as.data.table(rr)$learner, "model")

rrMap[[1]]$tuning_instance$archive$extended_archive # Overview
rrMap[[1]]$tuning_instance$archive$best() # Best prediction

rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$prediction("test") # Best regr.mse = 13083.91 -> resample_result[[3]] -> prediction result

truthVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$prediction("test")$truth
respVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$prediction("test")$response # Extract truth and response values

### Unexpected result - why?
mlr3measures::mse(truth = truthVals, response = respVals) # Expected result = 13083.91, actual result = 13009.43

rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$score()[1,] # The same does not happen when using a single inner fold, instead of all four folds, per tuning parameter s.

truthVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$predictions("test")[[1]]$truth
respVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$predictions("test")[[1]]$response

mlr3measures::mse(truth = truthVals, response = respVals) # Expected and actual result of 15541.3 agree.

Thank you very much in advance, not just for answering, but also for your efforts put into mlr3.

@mllg mllg transferred this issue from mlr-org/mlr3 Feb 10, 2022
@be-marc be-marc self-assigned this Feb 10, 2022
@be-marc
Copy link
Member

be-marc commented Feb 10, 2022

rr = rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]

This is a ResampleResult object with 4 iterations. It has therefore 4 Prediction objects. You can individually score them.

rr$score()

#>             task task_id                 learner  learner_id         resampling resampling_id iteration           prediction  regr.mse
#>1: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         1 <PredictionRegr[19]> 15541.300
#>2: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         2 <PredictionRegr[19]>  5710.667
#>3: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         3 <PredictionRegr[19]> 15986.089
#>4: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         4 <PredictionRegr[19]> 15097.602

Internally, we call rr$aggregate() which calls mlr3measures::mse() on each Prediction object and then calculates the mean of the four regr.mse scores. The result 13083.91 is logged to the archive.

You used the rr$prediction("test") object which is the combined Prediction object of the four resampling iterations and then you called mlr3measures::mse() on the combined prediction result.

@mllg
Copy link
Sponsor Member

mllg commented Feb 17, 2022

mlr-org/mlr3@5cfdd9f tries to clarify the difference.

@mllg mllg closed this as completed Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants