MRG Training Score in Gridsearch #1742

Closed
wants to merge 1 commit into
from

3 participants

@amueller
scikit-learn member

This PR adds training scores to the GridSearchCV output, as wished for by @ogrisel.

@ogrisel ogrisel and 1 other commented on an outdated diff Mar 6, 2013
sklearn/grid_search.py
@@ -483,12 +511,12 @@ def _fit(self, X, y, parameter_iterator, **params):
self._set_methods()
# Store the computed scores
- CVScoreTuple = namedtuple('CVScoreTuple', ('parameters',
- 'mean_validation_score',
- 'cv_validation_scores'))
+ CVScoreTuple = namedtuple('CVScoreTuple',
+ ('parameters', 'mean_test_score',
+ 'mean_training_score', 'cv_test_scores'))
@ogrisel
scikit-learn member
ogrisel added a line comment Mar 6, 2013

Why did you rename *_validation_score to *_test_score? validation sounds more correct in a CV setting. Don't you think?

@amueller
scikit-learn member
amueller added a line comment Mar 6, 2013

First I thought training and test build a nicer pair. Then I though validation would be better but didn't change it back. Will do once my slides are done ;)

@ogrisel
scikit-learn member
ogrisel added a line comment Mar 6, 2013

Alright as you wish I don't have any strong opinion on this either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel
scikit-learn member

What about measuring the training_duration and testing_duration as well? That should be cheap without any overhead.

@amueller
scikit-learn member

It's on the todo. Is there a better way than using time.time?

@ogrisel
scikit-learn member

I think time.time is good enough for a start. I don't see any better way.

@amueller
scikit-learn member

Fixed doctests, rebased squashed. Should be good to go.

@ogrisel ogrisel and 1 other commented on an outdated diff Mar 11, 2013
examples/svm/plot_rbf_parameters.py
-pl.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)
-pl.yticks(np.arange(len(C_range)), C_range)
+# We extract validation and training scores, as well as training and prediction
+# times
+_, val_scores, _, train_scores, train_time, pred_time = zip(*score_dict)
+
+arrays = [val_scores, train_scores, train_time, pred_time]
+titles = ["Validation Score", "Training Score", "Training Time",
+ "Prediction Time"]
+
+# for each value draw heatmap as a function of gamma and C
+pl.figure(figsize=(8, 8))
+for i, (arr, title) in enumerate(zip(arrays, titles)):
+ pl.subplot(2, 2, i + 1)
+ arr = np.array(arr).reshape(len(C_range), len(gamma_range))
+ #pl.subplots_adjust(left=0.05, right=0.95, bottom=0.15, top=0.95)
@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

Is this a left-over of some experiment? It should be removed if it's not useful.

@amueller
scikit-learn member
amueller added a line comment Mar 11, 2013

Whoops. Actually I still need to have a look how it renders on the website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel commented on the diff Mar 11, 2013
examples/svm/plot_rbf_parameters.py
@@ -14,10 +14,19 @@
the decision surface smooth, while a high C aims at classifying
all training examples correctly.
-Two plots are generated. The first is a visualization of the
-decision function for a variety of parameter values, and the second
-is a heatmap of the classifier's cross-validation accuracy as
-a function of `C` and `gamma`.
+Two plots are generated. The first is a visualization of the decision function
+for a variety of parameter values, and the second is a heatmap of the
+classifier's cross-validation accuracy and training time as a function of `C`
+and `gamma`.
+
+An interesting observation on overfitting can be made when comparing validation
+and training error: higher C always result in lower training error, as it
+inceases complexity of the classifier.
@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

You should add a note on which areas under / overfitting areas here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel and 1 other commented on an outdated diff Mar 11, 2013
examples/svm/plot_rbf_parameters.py
@@ -14,10 +14,19 @@
the decision surface smooth, while a high C aims at classifying
all training examples correctly.
-Two plots are generated. The first is a visualization of the
-decision function for a variety of parameter values, and the second
-is a heatmap of the classifier's cross-validation accuracy as
-a function of `C` and `gamma`.
+Two plots are generated. The first is a visualization of the decision function
+for a variety of parameter values, and the second is a heatmap of the
+classifier's cross-validation accuracy and training time as a function of `C`
+and `gamma`.
+
+An interesting observation on overfitting can be made when comparing validation
+and training error: higher C always result in lower training error, as it
+inceases complexity of the classifier.
+For the validation set on the other hand, there is a tradeoff of goodness of
+fit and generalization.
@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

Actually better here: this is basically just adding an alternative phrasing of what you already say but I think it's better to repeat those concepts over and over again to teach them (and to increase googlability).

@amueller
scikit-learn member
amueller added a line comment Mar 11, 2013

I added something, but I'm not entirely sure what you wanted. You can always edit later ;)

@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

Something like:

We can observe that the lower-right half of the parameters (below the diagonal, when both C and gamma are high) is characteristic of parameters that yields an overfitting model: the training score is very high but there is a wide gap.

The top and top-left parts of the parameter plots show underfitting models: the C and gamma values can individually or in conjunction constrain the model too much leading to low training scores (hence low validation scores too as validation scores are on average upper bounded by training scores).

@amueller
scikit-learn member
amueller added a line comment Mar 11, 2013

Done and also made the plots look better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel
scikit-learn member

Please add some smoke tests for the new tuple items: for instance check that all of them are positive and that train_score is lower than 1.0.

@ogrisel
scikit-learn member

Other than the above comments this looks good to me.

@amueller
scikit-learn member

Also added some tests.

@ogrisel ogrisel commented on the diff Mar 11, 2013
examples/svm/plot_rbf_parameters.py
-a function of `C` and `gamma`.
+Two plots are generated. The first is a visualization of the decision function
+for a variety of parameter values, and the second is a heatmap of the
+classifier's cross-validation accuracy and training time as a function of `C`
+and `gamma`.
+
+An interesting observation on overfitting can be made when comparing validation
+and training error: higher C always result in lower training error, as it
+inceases complexity of the classifier.
+
+For the validation set on the other hand, there is a tradeoff of goodness of
+fit and generalization.
+
+We can observe that the lower right half of the parameters (below the diagonal
+with high C and gamma values) is characteristic of parameters that yields an
+overfitting model: the trainin score is very high but there is a wide gap. The
@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

typo: trainin (my fault)

@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

" wide gap ... with the validation score"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel commented on the diff Mar 11, 2013
examples/svm/plot_rbf_parameters.py
+An interesting observation on overfitting can be made when comparing validation
+and training error: higher C always result in lower training error, as it
+inceases complexity of the classifier.
+
+For the validation set on the other hand, there is a tradeoff of goodness of
+fit and generalization.
+
+We can observe that the lower right half of the parameters (below the diagonal
+with high C and gamma values) is characteristic of parameters that yields an
+overfitting model: the trainin score is very high but there is a wide gap. The
+top and left parts of the parameter plots show underfitting models: the C and
+gamma values can individually or in conjunction constrain the model too much
+leading to low training scores (hence low validation scores too as validation
+scores are on average upper bounded by training scores).
+
+
@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

Please remove one of the blank lines. I let you choose which one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel commented on the diff Mar 11, 2013
examples/svm/plot_rbf_parameters.py
+
+For the validation set on the other hand, there is a tradeoff of goodness of
+fit and generalization.
+
+We can observe that the lower right half of the parameters (below the diagonal
+with high C and gamma values) is characteristic of parameters that yields an
+overfitting model: the trainin score is very high but there is a wide gap. The
+top and left parts of the parameter plots show underfitting models: the C and
+gamma values can individually or in conjunction constrain the model too much
+leading to low training scores (hence low validation scores too as validation
+scores are on average upper bounded by training scores).
+
+
+We can also see that the training time is quite sensitive to the parameter
+setting, while the prediction time is not impacted very much. This is probably
+a consequence of the small size of the data set.
@ogrisel
scikit-learn member
ogrisel added a line comment Mar 11, 2013

We can also notice that the time plot look noisy for the same reason. A higher number of cross validation steps would be required to properly evaluate the impact of the parameters on the training and prediction times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jnothman
scikit-learn member

See an alternative patch at https://github.com/jnothman/scikit-learn/tree/grid_search_more_info

Note I have chosen different field names, aiming for consistency and memorability, if not preciseness of name.

@amueller
scikit-learn member

@jnothman btw, does your version work with lists of dicts as param_grid and with RandomizedSearchCV?
Thinking about it a bit more, I'm not sure your interface is better if the parameter space has a more complicated form. Could you maybe issue a PR? That would make tracking the changes easier.

@jnothman
scikit-learn member

I don't think it's better, but it's certainly no worse: it provides exactly the same ordering according to parameter_iterator as your solution did. If that ordering is meaningful, then the data can be reshaped! If it is not, then you've lost nothing.

It doesn't do anything particular to GridSearchCV, though I see now why you might not want to call the attribute grid_results_. But params_results_ is not nice; point_results_ might work, but fit_grid_point actually fits one fold, not one point.

PR forthcoming.

@amueller
scikit-learn member
@amueller amueller ENH add training score to GridSearchCV.cv_scores_
add docstring for GridSearchCV, RandomizedSearchCV and fit_grid_point. In "fit_grid_point" I used test_score rather than validation_score, as the split is given to the function.
rbf svm grid search example now also shows training scores - which illustrates overfitting for high C, and training/prediction times... which pasically serve to illustrate that this is possible. Maybe random forests would be better to evaluate training times?
52ceff3
@amueller
scikit-learn member

superseeded by #7026

@amueller amueller closed this Jul 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment