New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Timing and training score in GridSearchCV #7325

Merged
merged 3 commits into from Sep 27, 2016

Conversation

Projects
None yet
5 participants
@raghavrv
Member

raghavrv commented Sep 1, 2016

Finishing up @eyc88's #7026

Resolves #6894 and #6895

TODO

  • Rebase after merge of #7324 and make all tests pass
  • Better tests for training score
  • What's new
  • Split training and scoring time as mean|std_fit_time vs mean|std_score_time

Have an example at examples/? Address in separate PR

@jnothman @amueller @vene @ogrisel (To be reviewed after merge of #7324)

NOTE: Make sure @eyc88 is credited when squashing and merging.

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 1, 2016

@amueller @jnothman Do we have to get the weighted average for mean_train_score across split if i.i.d.?

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 1, 2016

It doesn't really make sense to weight it by test set size. Perhaps by training set size. I'd leave it unweighted.

@GaelVaroquaux

This comment has been minimized.

Member

GaelVaroquaux commented Sep 1, 2016

It doesn't really make sense to weight it by test set size. Perhaps by
training set size. I'd leave it unweighted.

+1.

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 1, 2016

Yes I was asking for weighing it by the training set size only... Okay! Let's leave it unweighted...

@amueller amueller added this to the 0.18 milestone Sep 6, 2016

@amueller

This comment has been minimized.

Member

amueller commented Sep 6, 2016

are you working on this?

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 7, 2016

@amueller yes. I was waiting for #7324 to get merged. I'll push in a while...

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 11, 2016

I've made the keys as "mean_time" and "std_time" as the word test/train is irrelevant here...

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 11, 2016

@amueller @jnothman added some basic tests for training scores... Reviews please...

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 11, 2016

About the example, do we want one that compares the train and test scores and also plots the training time for each candidate? (Also ping @vene for suggestions)

@@ -746,6 +784,10 @@ class GridSearchCV(BaseSearchCV):
FitFailedWarning is raised. This parameter does not affect the refit
step, which will always raise the error.
return_train_score: boolean, default=True
If ``'False'``, the results_ attribute will not include training

This comment has been minimized.

@jnothman

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

space before colon

@@ -72,6 +72,17 @@ Model Selection Enhancements and API Changes
and :class:`model_selection.StratifiedShuffleSplit` is now renamed
to ``n_splits``.
- Training scores and Timing information
The ``cv_results_`` arrays now include the training scores for each

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

drop "The" "arrays"

"now" -> "also"

"include" -> "includes"

The ``cv_results_`` arrays now include the training scores for each
cross-validation split (with keys such as ``'split0_train_score'``), as
well as their mean (``'mean_train_score'``) and standard deviation
(``'std_train_score'``).

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

"To avoid the cost of evaluating training score, set return_train_score=False.

@@ -1006,6 +1059,10 @@ class RandomizedSearchCV(BaseSearchCV):
FitFailedWarning is raised. This parameter does not affect the refit
step, which will always raise the error.
return_train_score: boolean, default=True

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

space before colon

@@ -1006,6 +1059,10 @@ class RandomizedSearchCV(BaseSearchCV):
FitFailedWarning is raised. This parameter does not affect the refit
step, which will always raise the error.
return_train_score: boolean, default=True
If ``'False'``, the results_ attribute will not include training

This comment has been minimized.

@jnothman
'params' : [{'kernel' : 'rbf', 'gamma' : 0.1}, ...],
}
NOTE that the key ``'params'`` is used to store a list of parameter
settings dict for all the parameter candidates.
settings dict for all the parameter candidates. Besides,

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

Drop "besides ..." This does not need redundant documentation

'params' : [{'kernel': 'poly', 'degree': 2}, ...],
}
NOTE that the key ``'params'`` is used to store a list of parameter
settings dict for all the parameter candidates.
settings dict for all the parameter candidates. Besides,

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

Drop "Besides, ..."

time = np.array(time, dtype=np.float64).reshape(n_candidates, n_splits)
time_means = np.average(time, axis=1)
time_stds = np.sqrt(np.average((time - time_means[:, np.newaxis]) ** 2,
axis=1))
cv_results = dict()

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

I should have insisted you keep this local var as results!

This comment has been minimized.

@raghavrv
train_scores[:, split_i])
cv_results["mean_train_score"] = train_means
cv_results["std_train_score"] = train_stds
cv_results["rank_train_score"] = np.asarray(rankdata(-train_means,

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

I don't generally think ranking by training score would be useful / well advised. I'd happily be wrong if someone stated otherwise.

This comment has been minimized.

@raghavrv

raghavrv Sep 11, 2016

Member

I'm removing it for now... Adding should be easier than deprecating the keys...

if self.return_train_score:
train_means = np.average(train_scores, axis=1)
train_stds = np.sqrt(

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

np.std?

This comment has been minimized.

@raghavrv

raghavrv Sep 11, 2016

Member

Ah yes. Thanks!

time = np.array(time, dtype=np.float64).reshape(n_candidates, n_splits)
time_means = np.average(time, axis=1)
time_stds = np.sqrt(np.average((time - time_means[:, np.newaxis]) ** 2,

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

np.std?

weights=weights))
time = np.array(time, dtype=np.float64).reshape(n_candidates, n_splits)
time_means = np.average(time, axis=1)

This comment has been minimized.

@jnothman

jnothman Sep 11, 2016

Member

I'm wondering, but not certain, whether it's worth refactoring some of this triplication.

This comment has been minimized.

@raghavrv

raghavrv Sep 11, 2016

Member

It's triplication only if you would be okay with taking the weighted averaging for all the test scores, train scores and times...

This comment has been minimized.

@raghavrv

raghavrv Sep 11, 2016

Member

Or humm... we could add a helper taking in the weights too...

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 11, 2016

I'm not altogether happy that in _fit_and_score, scoring of training data is included within the timing. Should we consider pulling it out?

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 11, 2016

Regarding example, @amueller plotted train score and time in a variant of plot_rbf_parameters at #1742.

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 11, 2016

I.e. that was @amueller's attempt at this PR's feature, never merged.

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 11, 2016

Otherwise, re example, I think an example of train scores explicitly introducing over/under-fitting would be interesting, but adding it to the wishlist for a later PR is fine.

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 11, 2016

I'm not altogether happy that in _fit_and_score, scoring of training data is included within the timing. Should we consider pulling it out?

I think if we want to... then mean_test_time would make sense... WDYT?

EDIT: Or maybe not. I feel test still does not have a place here... We are timing fit + score (on testing set).

@raghavrv raghavrv changed the title from [WIP] Timing and training score in GridSearchCV to [MRG] Timing and training score in GridSearchCV Sep 11, 2016

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 11, 2016

@ogrisel Now I can bug you here ^_^

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 23, 2016

Apologies for the delay. I got busy configuring my laptop ^_^.

And

oh, you're saying it registered 0. Of course >= 0 is what you want.

No I speculate that to be the reason...

I tried running on my windows machine and found it not to be the case... time.time is as accurate as it was on linux. I don't know how older versions of windows handle it though...

I still don't have success in installing scikit-learn on windows ;(

And if we follow the >= 0 that would mean we are allowing fit_time to be 0 in linux too. Such a test would mask a bug of fit_time always being 0...

In the next couple of hours I'll somehow figure out a way to install scikit-learn on my windows and see if this test fails...

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 23, 2016

I did >= 0 and the appveyor passes. I am unable to install on windows (I set too small a partition size and it bit me by being unable to install the whopping 8GB for visual studio libraries. I don't understand why mingw requires vs13)

Anyway, @jnothman @amueller WDYT we should do. The current test setup fine?

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 24, 2016

You can use a less trivial fit, then? But I'm okay with this being written
off as "hard to test".

On 24 September 2016 at 09:39, Raghav RV notifications@github.com wrote:

I did >= 0 and the appveyor passes. I am unable to install on windows (I
set too small a partition size and it bit me by being unable to install the
whopping 8GB for visual studio libraries. I don't understand why mingw
requires vs13)

Anyway, @jnothman https://github.com/jnothman @amueller
https://github.com/amueller WDYT we should do. The current test setup
fine?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7325 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61B1nXjUChsX_xHAckbXYD-EcUzIks5qtGMWgaJpZM4JyaRM
.

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 24, 2016

Finally I succeeded in installing it in windows. (Had to resort to a 32 bit win in virtual box). Now I can confirm that it is the time precision that makes the test fail.

This section of timeit explains it clearly. I would say let's leave it as such like you suggested before. The alternative would be to make a utils.fixes function safe_time that always gives us the highest precision clock. (Which could mean checking platform and choosing between time.clock, time.time, or the newer time.perf_counter)

@amueller @jnothman Final reviews and merge?

@jnothman

Otherwise LGTM

'split0_train_score' : [0.8, 0.9, 0.7],
'split1_train_score' : [0.82, 0.5, 0.7],
'mean_train_score' : [0.81, 0.7, 0.7],
'std_train_score' : [0.00073, 0.00063, 0.00043],

This comment has been minimized.

@jnothman

jnothman Sep 24, 2016

Member

Can we make these numbers more readable? Pretend it's slower.

@jnothman jnothman changed the title from [MRG] Timing and training score in GridSearchCV to [MRG+1] Timing and training score in GridSearchCV Sep 24, 2016

@@ -249,19 +258,22 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose,
else:
test_score = _score(estimator, X_test, y_test, scorer)
score_time = time.time() - start_time - fit_time

This comment has been minimized.

@amueller

amueller Sep 25, 2016

Member

Ah ok that makes more sense. Before this was just time.time() - start_time or something like that and I was really confused. and I was really confused.

@amueller

This comment has been minimized.

Member

amueller commented Sep 25, 2016

lgtm.

@amueller amueller changed the title from [MRG+1] Timing and training score in GridSearchCV to [MRG+2] Timing and training score in GridSearchCV Sep 25, 2016

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 25, 2016

Thanks everyone for the reviews. @jnothman Does it look fine now? Merge?

ENH/FIX timing and training score.
* ENH separate fit / score times
* Make score_time=0 if errored; Ignore warnings in test
* Cleanup docstrings
* ENH Use helper to store the results
* Move fit time computation to else of try...except...else
* DOC readable sample scores
* COSMIT Add a commnent on why time test is >= 0 instead of > 0
  (Windows time.time precision is not accurate enought to be non-zero
   for trivial fits)
@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 27, 2016

@jnothman bump :) Anything more to be done here?

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 27, 2016

Thanks for asking. The last thing to do: mention somewhere that timings are in seconds in cv_results_.

@raghavrv

This comment has been minimized.

Member

raghavrv commented Sep 27, 2016

Done :)

@jnothman jnothman merged commit b444cc9 into scikit-learn:master Sep 27, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman

This comment has been minimized.

Member

jnothman commented Sep 27, 2016

And done. Backport please, @amueller...

@raghavrv raghavrv deleted the raghavrv:timing_training_score branch Sep 27, 2016

amueller added a commit to amueller/scikit-learn that referenced this pull request Sep 27, 2016

[MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
* Resolved issue scikit-learn#6894 and scikit-learn#6895:
    Now *SearchCV.results_ includes both timing and training scores.

wrote new test (sklearn/model_selection/test_search.py)
and new doctest (sklearn/model_selection/_search.py)
added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV.
Revised code according to suggestions.
Add a few more lines to test_grid_search_results():
    1. check test_rank_score always >= 1
    2. check all regular scores (test/train_mean/std_score) and timing >= 0
    3. check all regular scores <= 1
Note that timing can be greater than 1 in general, and std of regular scores
always <= 1 because the scores are bounded between 0 and 1.

* ENH/FIX timing and training score.

* ENH separate fit / score times
* Make score_time=0 if errored; Ignore warnings in test
* Cleanup docstrings
* ENH Use helper to store the results
* Move fit time computation to else of try...except...else
* DOC readable sample scores
* COSMIT Add a commnent on why time test is >= 0 instead of > 0
  (Windows time.time precision is not accurate enought to be non-zero
   for trivial fits)

* Convey that times are in seconds

# Conflicts:
#	doc/whats_new.rst

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

[MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
* Resolved issue scikit-learn#6894 and scikit-learn#6895:
    Now *SearchCV.results_ includes both timing and training scores.

wrote new test (sklearn/model_selection/test_search.py)
and new doctest (sklearn/model_selection/_search.py)
added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV.
Revised code according to suggestions.
Add a few more lines to test_grid_search_results():
    1. check test_rank_score always >= 1
    2. check all regular scores (test/train_mean/std_score) and timing >= 0
    3. check all regular scores <= 1
Note that timing can be greater than 1 in general, and std of regular scores
always <= 1 because the scores are bounded between 0 and 1.

* ENH/FIX timing and training score.

* ENH separate fit / score times
* Make score_time=0 if errored; Ignore warnings in test
* Cleanup docstrings
* ENH Use helper to store the results
* Move fit time computation to else of try...except...else
* DOC readable sample scores
* COSMIT Add a commnent on why time test is >= 0 instead of > 0
  (Windows time.time precision is not accurate enought to be non-zero
   for trivial fits)

* Convey that times are in seconds

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Nov 10, 2016

Merge tag '0.18' into releases
* tag '0.18': (1286 commits)
  [MRG + 1] More versionadded everywhere! (scikit-learn#7403)
  minor doc fixes
  fix lbfgs rename (scikit-learn#7503)
  minor fixes to whatsnew
  fix scoring function table
  fix rebase messup
  DOC more what's new subdivision
  DOC Attempt to impose some order on What's New 0.18
  no fixed width within bold
  REL changes for release in 0.18.X branch (scikit-learn#7414)
  [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
  DOC: Added Nested Cross Validation Example (scikit-learn#7111)
  Sync docstring and definition default argument in kneighbors (scikit-learn#7476)
  added contributors for 0.18, minor formatting fixes.
  Fix typo in whats_new.rst
  [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411)
  Addressing issue scikit-learn#7468. (scikit-learn#7472)
  Reorganize README
  clean up deprecation warning stuff in common tests
  [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438)
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Nov 10, 2016

Merge branch 'releases' into dfsg
* releases: (1286 commits)
  [MRG + 1] More versionadded everywhere! (scikit-learn#7403)
  minor doc fixes
  fix lbfgs rename (scikit-learn#7503)
  minor fixes to whatsnew
  fix scoring function table
  fix rebase messup
  DOC more what's new subdivision
  DOC Attempt to impose some order on What's New 0.18
  no fixed width within bold
  REL changes for release in 0.18.X branch (scikit-learn#7414)
  [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
  DOC: Added Nested Cross Validation Example (scikit-learn#7111)
  Sync docstring and definition default argument in kneighbors (scikit-learn#7476)
  added contributors for 0.18, minor formatting fixes.
  Fix typo in whats_new.rst
  [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411)
  Addressing issue scikit-learn#7468. (scikit-learn#7472)
  Reorganize README
  clean up deprecation warning stuff in common tests
  [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438)
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Nov 10, 2016

Merge branch 'dfsg' into debian
* dfsg: (1286 commits)
  [MRG + 1] More versionadded everywhere! (scikit-learn#7403)
  minor doc fixes
  fix lbfgs rename (scikit-learn#7503)
  minor fixes to whatsnew
  fix scoring function table
  fix rebase messup
  DOC more what's new subdivision
  DOC Attempt to impose some order on What's New 0.18
  no fixed width within bold
  REL changes for release in 0.18.X branch (scikit-learn#7414)
  [MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
  DOC: Added Nested Cross Validation Example (scikit-learn#7111)
  Sync docstring and definition default argument in kneighbors (scikit-learn#7476)
  added contributors for 0.18, minor formatting fixes.
  Fix typo in whats_new.rst
  [MRG+2] FIX adaboost estimators not randomising correctly (scikit-learn#7411)
  Addressing issue scikit-learn#7468. (scikit-learn#7472)
  Reorganize README
  clean up deprecation warning stuff in common tests
  [MRG+1] Fix regression in silhouette_score for clusters of size 1 (scikit-learn#7438)
  ...

afiodorov added a commit to unravelin/scikit-learn that referenced this pull request Apr 25, 2017

[MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
* Resolved issue scikit-learn#6894 and scikit-learn#6895:
    Now *SearchCV.results_ includes both timing and training scores.

wrote new test (sklearn/model_selection/test_search.py)
and new doctest (sklearn/model_selection/_search.py)
added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV.
Revised code according to suggestions.
Add a few more lines to test_grid_search_results():
    1. check test_rank_score always >= 1
    2. check all regular scores (test/train_mean/std_score) and timing >= 0
    3. check all regular scores <= 1
Note that timing can be greater than 1 in general, and std of regular scores
always <= 1 because the scores are bounded between 0 and 1.

* ENH/FIX timing and training score.

* ENH separate fit / score times
* Make score_time=0 if errored; Ignore warnings in test
* Cleanup docstrings
* ENH Use helper to store the results
* Move fit time computation to else of try...except...else
* DOC readable sample scores
* COSMIT Add a commnent on why time test is >= 0 instead of > 0
  (Windows time.time precision is not accurate enought to be non-zero
   for trivial fits)

* Convey that times are in seconds

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
* Resolved issue scikit-learn#6894 and scikit-learn#6895:
    Now *SearchCV.results_ includes both timing and training scores.

wrote new test (sklearn/model_selection/test_search.py)
and new doctest (sklearn/model_selection/_search.py)
added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV.
Revised code according to suggestions.
Add a few more lines to test_grid_search_results():
    1. check test_rank_score always >= 1
    2. check all regular scores (test/train_mean/std_score) and timing >= 0
    3. check all regular scores <= 1
Note that timing can be greater than 1 in general, and std of regular scores
always <= 1 because the scores are bounded between 0 and 1.

* ENH/FIX timing and training score.

* ENH separate fit / score times
* Make score_time=0 if errored; Ignore warnings in test
* Cleanup docstrings
* ENH Use helper to store the results
* Move fit time computation to else of try...except...else
* DOC readable sample scores
* COSMIT Add a commnent on why time test is >= 0 instead of > 0
  (Windows time.time precision is not accurate enought to be non-zero
   for trivial fits)

* Convey that times are in seconds

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+2] Timing and training score in GridSearchCV (scikit-learn#7325)
* Resolved issue scikit-learn#6894 and scikit-learn#6895:
    Now *SearchCV.results_ includes both timing and training scores.

wrote new test (sklearn/model_selection/test_search.py)
and new doctest (sklearn/model_selection/_search.py)
added a few more lines in the docstring of GridSearchCV and RandomizedSearchCV.
Revised code according to suggestions.
Add a few more lines to test_grid_search_results():
    1. check test_rank_score always >= 1
    2. check all regular scores (test/train_mean/std_score) and timing >= 0
    3. check all regular scores <= 1
Note that timing can be greater than 1 in general, and std of regular scores
always <= 1 because the scores are bounded between 0 and 1.

* ENH/FIX timing and training score.

* ENH separate fit / score times
* Make score_time=0 if errored; Ignore warnings in test
* Cleanup docstrings
* ENH Use helper to store the results
* Move fit time computation to else of try...except...else
* DOC readable sample scores
* COSMIT Add a commnent on why time test is >= 0 instead of > 0
  (Windows time.time precision is not accurate enought to be non-zero
   for trivial fits)

* Convey that times are in seconds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment