Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation of measure timetrain is made on test set #1286

Closed
MariaErdmann opened this issue Oct 13, 2016 · 7 comments
Closed

Aggregation of measure timetrain is made on test set #1286

MariaErdmann opened this issue Oct 13, 2016 · 7 comments

Comments

@MariaErdmann
Copy link
Contributor

Hi,
while debugging #1284 I stumbled over this and this might be a bug:
If one wants to assess only the training time of a learner and uses a resampling strategy like 'cv' he/she will most likely choose to set predict = "train" in the makeResampleDesc. But the the resampling won't work because the aggregation function is set to test.mean. Shouldn't it be set to train.mean?

lrn = makeLearner("classif.rpart")
rdesc = makeResampleDesc("CV", predict = "train")
resample(lrn, binaryclass.task, rdesc, measures = list(timetrain))

#  Error in FUN(X[[i]], ...) : 
#  Aggregation 'test.mean' not compatible with resampling! You have to set arg 
# 'predict' to 'test' or 'both' in your resample object, instead it is 'train'! 

timetrain$aggr
# Aggregation function: test.mean

Taking a closer look at the measure I saw that the default setting for aggr = test.mean is passed.
If you agree that this is a bug, than I would fix this. Additionally, I think it would be great if the note does also tell that the time is measured in seconds (right now it does only say: "Time of fitting the model. ")

@larskotthoff
Copy link
Sponsor Member

Agreed on both points. Could you make a PR for this please (including tests)?

@MariaErdmann
Copy link
Contributor Author

MariaErdmann commented Oct 13, 2016

Yes, I will! Shall the test be in test_base_measures.R?
Aggregation for timeboth is also made on the test sets, is there something like aggr = both.mean ?

@larskotthoff
Copy link
Sponsor Member

We don't have both.mean (and it wouldn't really make sense), so maybe put out an error or a warning in this case?

I would put the tests in test_base_resample.R.

@MariaErdmann
Copy link
Contributor Author

We discussed this issue last week in our mlr meeting and @ja-thomas (ping :-)) had some arguments which argue against my idea of changing àggrtotrain`

@ja-thomas
Copy link
Contributor

Just some thoughts on why I think setting it to train.mean is a bad idea.

I would argue that in most cases we use predict="test" (thats why it is the default). Generally we will have more than just one the measure timetrain, because measuring the time without the performance doesn't make too much sense (except for a certain benchmark thesis ;) ).

So a "normal" thing to do is something like:

lrn = makeLearner("classif.rpart")
rdesc = makeResampleDesc("CV")
resample(lrn, binaryclass.task, rdesc, measures = list(acc, timetrain))

If we change the aggregation of timetrain to train.mean, we would have to set the aggregation by hand for that "standard" case, which should work imo. out of the box.

@berndbischl
Copy link
Sponsor Member

please dont do anything here. i am not sure what the best solution is and have assigned myself. i also dont see this as too urgent.

@stale
Copy link

stale bot commented Dec 19, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 19, 2019
@stale stale bot closed this as completed Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants