Skip to content

DOC score() of `BaseSearchCV` used different scorer than defined in `scoring` of `GridSearchCV` #3185

Closed
t-pfaff opened this Issue May 22, 2014 · 5 comments

5 participants

@t-pfaff
t-pfaff commented May 22, 2014

I'm not sure if this is intended, but I was confused that the score function after grid search uses a different scorer than the scorer defined in GridSearchCV.

Example:

from sklearn import datasets
from sklearn.linear_model import LinearRegression
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

digits = datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

hyperparams = [{'fit_intercept':[True, False]}]
algo = LinearRegression()

grid = GridSearchCV(algo, hyperparams, cv=5, scoring='mean_squared_error')
grid.fit(X_train, y_train)
print grid.score(X_test, y_test)
print mean_squared_error(y_test, grid.best_estimator_.predict(X_test))
print r2_score(y_test, grid.best_estimator_.predict(X_test))

I expected that grid.score() would use the mean_squared_error because this was previously defined in the scoring option. It took me some time to find out that the number is actually the r2_score which seems to be the score function of LinearRegression.
The documentation of score() in BaseSearchCV says:
The score function of the best estimator is used, or the scoring parameter where unavailable.
Maybe the documentation of score() in BaseSearchCV could be adapted to make it clear that the calculated score is not necessarily the same as the one defined in the scoring parameter of GridSearchCV.

@jnothman
scikit-learn member
@GaelVaroquaux
scikit-learn member
@GaelVaroquaux GaelVaroquaux added the Bug label May 23, 2014
@GaelVaroquaux GaelVaroquaux added this to the 0.15 milestone May 23, 2014
@jnothman
scikit-learn member

No, it preceded the scoring API:

$ git show 0.13:sklearn/grid_search.py
...
    def score(self, X, y=None):
        if hasattr(self.best_estimator_, 'score'):
            return self.best_estimator_.score(X, y)
        if self.score_func is None:
            raise ValueError("No score function explicitly defined, "
                             "and the estimator doesn't provide one %s"
                             % self.best_estimator_)
        y_predicted = self.predict(X)
        return self.score_func(y, y_predicted)
@GaelVaroquaux
scikit-learn member
@ogrisel ogrisel removed this from the 0.15 milestone Jun 4, 2014
@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014
@amueller
scikit-learn member

Duplicate of #1831

@amueller amueller closed this Jul 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.