New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Changes default for return_train_score to False #9677

Merged
merged 73 commits into from Oct 17, 2017

Conversation

Projects
None yet
5 participants
@thechargedneutron
Contributor

thechargedneutron commented Sep 2, 2017

Reference Issue

Fixes #9621

What does this implement/fix? Explain your changes.

Changes the default value of return_train_score to warn. Raises a warning if train score takes more time.

Any other comments?

The warning message needs improvement.

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 2, 2017

Contributor

@jnothman Kindly suggest a suitable warning message.
Also please review. I guess this would need a lot of changes, mostly in documentations.

Contributor

thechargedneutron commented Sep 2, 2017

@jnothman Kindly suggest a suitable warning message.
Also please review. I guess this would need a lot of changes, mostly in documentations.

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 3, 2017

Contributor

@jnothman Added a parameter in the function which takes care of the actual start time of GridSearchCV. Please see if this is valid or not.

Contributor

thechargedneutron commented Sep 3, 2017

@jnothman Added a parameter in the function which takes care of the actual start time of GridSearchCV. Please see if this is valid or not.

@jnothman

It's not a bad approach, but I suspect it won't receive the consensus to get merge, just because it makes the _fit_and_score interface more messy...

@amueller, what do you think of this approach?

Show outdated Hide outdated sklearn/model_selection/_validation.py Outdated
@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 3, 2017

Contributor

@jnothman I also agree upon the fact that it makes the _fit_and_score method more messy. But I could not find and alternative to keep track of total time that GridSearchCV would take. You or @amueller may suggest a way which serves the purpose without changing _fit_and_score interface.

Contributor

thechargedneutron commented Sep 3, 2017

@jnothman I also agree upon the fact that it makes the _fit_and_score method more messy. But I could not find and alternative to keep track of total time that GridSearchCV would take. You or @amueller may suggest a way which serves the purpose without changing _fit_and_score interface.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 3, 2017

Member
Member

jnothman commented Sep 3, 2017

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 4, 2017

Contributor

@amueller @lesteve Any suggestions on whether I should go on to change _fit_and_score method interface or would it be better to raise the warning only after the call to the function _fit_and_score ends?

Contributor

thechargedneutron commented Sep 4, 2017

@amueller @lesteve Any suggestions on whether I should go on to change _fit_and_score method interface or would it be better to raise the warning only after the call to the function _fit_and_score ends?

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 6, 2017

Contributor

@jnothman Should I continue with changing the function interface or raise warning only after completing _fit_and_score call ?

Contributor

thechargedneutron commented Sep 6, 2017

@jnothman Should I continue with changing the function interface or raise warning only after completing _fit_and_score call ?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 7, 2017

Member
Member

jnothman commented Sep 7, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 7, 2017

Member

What about the seemingly simpler solution of changing the default to return_train_score=False, potentially by adding a FutureWarning that the default is going to be return_train_score=False in 0.22?

Member

lesteve commented Sep 7, 2017

What about the seemingly simpler solution of changing the default to return_train_score=False, potentially by adding a FutureWarning that the default is going to be return_train_score=False in 0.22?

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 7, 2017

Contributor

@lesteve Yes, this is also a simple solution, will be done once others agree upon this.

Contributor

thechargedneutron commented Sep 7, 2017

@lesteve Yes, this is also a simple solution, will be done once others agree upon this.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 7, 2017

Member
Member

jnothman commented Sep 7, 2017

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 7, 2017

Contributor

@jnothman Kindly review and suggest a suitable FutureWarning message.

Contributor

thechargedneutron commented Sep 7, 2017

@jnothman Kindly review and suggest a suitable FutureWarning message.

thechargedneutron added some commits Sep 7, 2017

@thechargedneutron thechargedneutron changed the title from [WIP] Adds warning in GridSearchCV if calculating train score is unduly expensive. to [MRG] Adds warning in GridSearchCV if calculating train score is unduly expensive. Sep 7, 2017

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Sep 7, 2017

Contributor

Suggestions for a "nicer" warning message. Otherwise, I think this will work. :)

Contributor

thechargedneutron commented Sep 7, 2017

Suggestions for a "nicer" warning message. Otherwise, I think this will work. :)

@jnothman jnothman changed the title from [MRG] Adds warning in GridSearchCV if calculating train score is unduly expensive. to [MRG] Changes default for return_train_score to False Sep 8, 2017

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 12, 2017

Contributor

Even after doing the required changes (change in test still not made), the following lines of code does not produce a deprecation warning, is it how DeprecationDict is supposed to behave or I am missing out something.

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
sorted(clf.cv_results_.keys())
Contributor

thechargedneutron commented Oct 12, 2017

Even after doing the required changes (change in test still not made), the following lines of code does not produce a deprecation warning, is it how DeprecationDict is supposed to behave or I am missing out something.

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
sorted(clf.cv_results_.keys())
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 12, 2017

Member

Warning should be produced when accessing the cv_results_ key for training scores:

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score']  # this is the line that should produce a warning

Note that you should not get any warning if you use return_train_score=True.

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters, return_train_score=True)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score']  # no warning because return_train_score is set to True

Basically what we want users who do not set return_train_score and access cv results train score to get a warning and tell them they should set return_train_score to True because training scores will not be present by default in 0.20.

Member

lesteve commented Oct 12, 2017

Warning should be produced when accessing the cv_results_ key for training scores:

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score']  # this is the line that should produce a warning

Note that you should not get any warning if you use return_train_score=True.

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters, return_train_score=True)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score']  # no warning because return_train_score is set to True

Basically what we want users who do not set return_train_score and access cv results train score to get a warning and tell them they should set return_train_score to True because training scores will not be present by default in 0.20.

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 12, 2017

Contributor

I am not sure how DeprecationDict is working. Need help on how to implement.

Contributor

thechargedneutron commented Oct 12, 2017

I am not sure how DeprecationDict is working. Need help on how to implement.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 13, 2017

Member

You need to help us help you!

Can you be specific about your problem is? what have you tried? If there is a failure that you don't understand, can you copy and paste it here?

Member

lesteve commented Oct 13, 2017

You need to help us help you!

Can you be specific about your problem is? what have you tried? If there is a failure that you don't understand, can you copy and paste it here?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman
Member

jnothman commented Oct 16, 2017

See my PR at thechargedneutron#2

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 16, 2017

Member

I restarted the failing build in Travis hoping the timeout was just a glitch.

Member

lesteve commented Oct 16, 2017

I restarted the failing build in Travis hoping the timeout was just a glitch.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 16, 2017

Member

Do you think this approach is decent, @lesteve? Too messy? It implies that a user no longer gets a warning informing them why the fit is so slow which was the original point...!

Member

jnothman commented Oct 16, 2017

Do you think this approach is decent, @lesteve? Too messy? It implies that a user no longer gets a warning informing them why the fit is so slow which was the original point...!

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 16, 2017

Member

I have not looked at the diff of this PR (will do shortly). To be perfectly honest, I think your DeprecationDict suggestions is really neat and this is the best we can do:

  • we tried to have a warning only if the scoring for train was slow and we failed because it was a bit messy to implement. Personally I think return_train_score=False is a reasonable default (compute more things only if asked by the user).
  • I think the DeprecationDict approach is the best way to transition from return_train_score=True to return_train_score=False. People will get the warning if they do not set return_train_score and access and then they can decide what they really want.
  • In 0.21 return_train_score=False will be the default so the code will be fast by default.
Member

lesteve commented Oct 16, 2017

I have not looked at the diff of this PR (will do shortly). To be perfectly honest, I think your DeprecationDict suggestions is really neat and this is the best we can do:

  • we tried to have a warning only if the scoring for train was slow and we failed because it was a bit messy to implement. Personally I think return_train_score=False is a reasonable default (compute more things only if asked by the user).
  • I think the DeprecationDict approach is the best way to transition from return_train_score=True to return_train_score=False. People will get the warning if they do not set return_train_score and access and then they can decide what they really want.
  • In 0.21 return_train_score=False will be the default so the code will be fast by default.

lesteve added some commits Oct 17, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 17, 2017

Member

@jnothman I pushed two main changes, it would be nice to have your opinion on these:

  • I simplified the warning message, since it only happens when accessing a training score
  • DeprecationDict is only used when return_train_score='warn'. This is me being overly cautious/pessimistic mainly and preferring to use a plain dict when return_train_score is set explicitly. I can revert this if you feel this is too much.

I am going to reset the LGTM count and add a +1 from me.

Member

lesteve commented Oct 17, 2017

@jnothman I pushed two main changes, it would be nice to have your opinion on these:

  • I simplified the warning message, since it only happens when accessing a training score
  • DeprecationDict is only used when return_train_score='warn'. This is me being overly cautious/pessimistic mainly and preferring to use a plain dict when return_train_score is set explicitly. I can revert this if you feel this is too much.

I am going to reset the LGTM count and add a +1 from me.

@lesteve lesteve changed the title from [MRG+2?] Changes default for return_train_score to False to [MRG+1] Changes default for return_train_score to False Oct 17, 2017

'which will not be available by default '
'any more in 0.21. If you need training scores, '
'please set return_train_score=True').format(key)
train_score = assert_warns_message(FutureWarning, msg,

This comment has been minimized.

@amueller

amueller Oct 17, 2017

Member

shouldn't we assert that there is no warning for the other vals and that there is no warning for the other keys for 'warn'?

Otherwise LGTM.

@amueller

amueller Oct 17, 2017

Member

shouldn't we assert that there is no warning for the other vals and that there is no warning for the other keys for 'warn'?

Otherwise LGTM.

@jnothman jnothman merged commit 766ba93 into scikit-learn:master Oct 17, 2017

1 of 4 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
lgtm analysis: Python Running analyses for revisions
Details
ci/circleci Your tests passed on CircleCI!
Details

jnothman added a commit that referenced this pull request Oct 17, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 17, 2017

Member

Thanks all

Member

jnothman commented Oct 17, 2017

Thanks all

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 17, 2017

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 17, 2017

@thechargedneutron thechargedneutron deleted the thechargedneutron:searchcv branch Oct 18, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment