MRG Training Score in Gridsearch #1742

amueller · 2013-03-05T18:58:36Z

This PR adds training scores to the GridSearchCV output, as wished for by @ogrisel.

ogrisel · 2013-03-06T09:51:53Z

sklearn/grid_search.py

-                                                   'cv_validation_scores'))
+        CVScoreTuple = namedtuple('CVScoreTuple',
+                                  ('parameters', 'mean_test_score',
+                                   'mean_training_score', 'cv_test_scores'))


Why did you rename *_validation_score to *_test_score? validation sounds more correct in a CV setting. Don't you think?

First I thought training and test build a nicer pair. Then I though validation would be better but didn't change it back. Will do once my slides are done ;)

Alright as you wish I don't have any strong opinion on this either.

ogrisel · 2013-03-06T09:52:50Z

What about measuring the training_duration and testing_duration as well? That should be cheap without any overhead.

amueller · 2013-03-06T10:37:48Z

It's on the todo. Is there a better way than using time.time?

ogrisel · 2013-03-06T11:04:30Z

I think time.time is good enough for a start. I don't see any better way.

amueller · 2013-03-09T16:50:43Z

Fixed doctests, rebased squashed. Should be good to go.

ogrisel · 2013-03-11T20:06:24Z

examples/svm/plot_rbf_parameters.py

+for i, (arr, title) in enumerate(zip(arrays, titles)):
+    pl.subplot(2, 2, i + 1)
+    arr = np.array(arr).reshape(len(C_range), len(gamma_range))
+    #pl.subplots_adjust(left=0.05, right=0.95, bottom=0.15, top=0.95)


Is this a left-over of some experiment? It should be removed if it's not useful.

Whoops. Actually I still need to have a look how it renders on the website.

ogrisel · 2013-03-11T20:11:36Z

Please add some smoke tests for the new tuple items: for instance check that all of them are positive and that train_score is lower than 1.0.

ogrisel · 2013-03-11T20:12:02Z

Other than the above comments this looks good to me.

amueller · 2013-03-11T20:45:55Z

Also added some tests.

ogrisel · 2013-03-11T20:49:56Z

examples/svm/plot_rbf_parameters.py

+
+We can observe that the lower right half of the parameters (below the diagonal
+with high C and gamma values) is characteristic of parameters that yields an
+overfitting model: the trainin score is very high but there is a wide gap. The


typo: trainin (my fault)

" wide gap ... with the validation score"

jnothman · 2013-03-12T13:35:25Z

See an alternative patch at https://github.com/jnothman/scikit-learn/tree/grid_search_more_info

Note I have chosen different field names, aiming for consistency and memorability, if not preciseness of name.

amueller · 2013-03-12T23:09:35Z

@jnothman btw, does your version work with lists of dicts as param_grid and with RandomizedSearchCV?
Thinking about it a bit more, I'm not sure your interface is better if the parameter space has a more complicated form. Could you maybe issue a PR? That would make tracking the changes easier.

jnothman · 2013-03-13T00:07:51Z

I don't think it's better, but it's certainly no worse: it provides exactly the same ordering according to parameter_iterator as your solution did. If that ordering is meaningful, then the data can be reshaped! If it is not, then you've lost nothing.

It doesn't do anything particular to GridSearchCV, though I see now why you might not want to call the attribute grid_results_. But params_results_ is not nice; point_results_ might work, but fit_grid_point actually fits one fold, not one point.

PR forthcoming.

amueller · 2013-03-13T07:33:08Z

On 03/13/2013 01:07 AM, jnothman wrote:

I don't think it's better, but it's certainly no worse: it provides
exactly the same ordering according to |parameter_iterator| as your
solution did. If that ordering is meaningful, then the data can be
reshaped! If it is not, then you've lost nothing.

It doesn't do anything particular to |GridSearchCV|, though I see now
why you might not want to call the attribute |grid_results_|. But
|params_results_| is not nice; |point_results_| might work, but
|fit_grid_point| actually fits one fold, not one point.

I know, there are many inconsistencies in the naming.
Both the randomized search and the scoring parameter have been
introduced recently and there is still some polishing to do.
Thanks for all your input!

add docstring for GridSearchCV, RandomizedSearchCV and fit_grid_point. In "fit_grid_point" I used test_score rather than validation_score, as the split is given to the function. rbf svm grid search example now also shows training scores - which illustrates overfitting for high C, and training/prediction times... which pasically serve to illustrate that this is possible. Maybe random forests would be better to evaluate training times?

amueller · 2016-07-29T18:51:03Z

superseeded by #7026

ogrisel reviewed Mar 6, 2013
View reviewed changes

ogrisel reviewed Mar 11, 2013
View reviewed changes

jnothman mentioned this pull request Mar 13, 2013

Restructure the output attributes of *SearchCV #1768

Closed

jnothman mentioned this pull request Mar 18, 2013

ENH extensible parameter search results #1787

Closed

jnothman mentioned this pull request Jun 10, 2013

allow GridSearchCV to work with params={} or cv=1 #2048

Closed

jnothman mentioned this pull request Jan 20, 2014

[MRG] Validation curves #2765

Merged

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller added the Waiting for Reviewer label Dec 10, 2015

This was referenced Apr 20, 2016

[RFC] Better Format for search results in model_selection module. #6686

Closed

[MRG+3] ENH Restructure grid_scores_ into a dict of 1D (numpy) (masked) arrays that can be imported into pandas as a DataFrame. #6697

Merged

jnothman mentioned this pull request Jun 16, 2016

Training score in *SearchCV.results_ #6895

Closed

amueller mentioned this pull request Jul 17, 2016

Timing and training score in *SearchCV.results_ #7026

Closed

amueller closed this Jul 29, 2016

jnothman mentioned this pull request Sep 11, 2016

[MRG+2] Timing and training score in GridSearchCV #7325

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRG Training Score in Gridsearch #1742

MRG Training Score in Gridsearch #1742

amueller commented Mar 5, 2013

ogrisel Mar 6, 2013

amueller Mar 6, 2013

ogrisel Mar 6, 2013

ogrisel commented Mar 6, 2013

amueller commented Mar 6, 2013

ogrisel commented Mar 6, 2013

amueller commented Mar 9, 2013

ogrisel Mar 11, 2013

amueller Mar 11, 2013

ogrisel commented Mar 11, 2013

ogrisel commented Mar 11, 2013

amueller commented Mar 11, 2013

ogrisel Mar 11, 2013

ogrisel Mar 11, 2013

jnothman commented Mar 12, 2013

amueller commented Mar 12, 2013

jnothman commented Mar 13, 2013

amueller commented Mar 13, 2013

amueller commented Jul 29, 2016

MRG Training Score in Gridsearch #1742

MRG Training Score in Gridsearch #1742

Conversation

amueller commented Mar 5, 2013

ogrisel Mar 6, 2013

Choose a reason for hiding this comment

amueller Mar 6, 2013

Choose a reason for hiding this comment

ogrisel Mar 6, 2013

Choose a reason for hiding this comment

ogrisel commented Mar 6, 2013

amueller commented Mar 6, 2013

ogrisel commented Mar 6, 2013

amueller commented Mar 9, 2013

ogrisel Mar 11, 2013

Choose a reason for hiding this comment

amueller Mar 11, 2013

Choose a reason for hiding this comment

ogrisel commented Mar 11, 2013

ogrisel commented Mar 11, 2013

amueller commented Mar 11, 2013

ogrisel Mar 11, 2013

Choose a reason for hiding this comment

ogrisel Mar 11, 2013

Choose a reason for hiding this comment

jnothman commented Mar 12, 2013

amueller commented Mar 12, 2013

jnothman commented Mar 13, 2013

amueller commented Mar 13, 2013

amueller commented Jul 29, 2016