Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV #4790

Closed
wants to merge 13 commits into from

Conversation

pianomania
Copy link
Contributor

Issue #4667
Add best_score_ attributeand fix a bug when _BaseRidgeCV use GridSearchCV without passing scoring into it.

@agramfort
Copy link
Member

please add a test

@pianomania
Copy link
Contributor Author

Is there any code I should modify?
And I cautious about whether it has need to make best_score_ positive because if scoring is loss function, scorer will sign-flip the outcome of loss function. Or it should be added more detail at docstring?

@amueller
Copy link
Member

amueller commented Jun 4, 2015

I don't think you should worry about that for this PR.

@@ -867,10 +871,12 @@ def fit(self, X, y, sample_weight=None):
#fit_params = {'sample_weight' : sample_weight}
fit_params = {}
gs = GridSearchCV(Ridge(fit_intercept=self.fit_intercept),
parameters, fit_params=fit_params, cv=self.cv)
parameters, fit_params=fit_params, cv=self.cv,
scoring = self.scoring)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a regression test for this.
That is, please compare the output of RidgeCV with cv != None to GridSearchCV, and make sure that the test fails on master.

Also, there shouldn't be a space before and after =.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RidgeCV with cv!=None should have the same outcome as using GridSearchCV if they are specified same cv. Or I misunderstand what you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I'm still confused about what should I compare. Is comparison between outcome of RidgeCV with specified cv and outcome of directly using GridSearchCV? Or something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current regression test already tests if the scoring functions is taken into account (through reg3 and reg4 comparison of self.best_score_).
To fail on master branch, the test should test it through self.alpha_, but it is not easy to find an example which gives 2 different alphas for 2 different scoring functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it need to fail on master branch? (So I need to change the test in master branch?)
And I'm still curious about why the test should compare alpha_of RidgeCV with a specific cv with alpha_ of GridSearchCV (and expect they are not equal?).
The reason I'm curious is that RidgeCV with specified cv actually uses GridSearchCV to find alpha_, so if I use the GridSearchCV (with the same settings, i.e. cv, alpha and scoring), it should provide the same result.

@amueller
Copy link
Member

amueller commented Jun 9, 2015

yes, comparing RidgeCV with a specific cv with GridSearchCV

reg.fit(X_diabetes,y_diabetes)
assert_equal(type(reg.best_score_), np.float64)

reg = RidgeCV()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add spaces to align equal signs

@amueller
Copy link
Member

Yes, RidgeCV and GridSearchCV should provide the same result. However, because there was a bug, not passing the scoring, they did not give the same results when passing scoring. You fixed that bug.
We want to add a tests that this bug doesn't happen again.

@pianomania
Copy link
Contributor Author

OK, I got it. But the test have compared reg3 and reg4. isn't it sufficient?

@TomDLT
Copy link
Member

TomDLT commented Jun 22, 2015

It is sufficient when best_score_ exists, but if we want the test to fail on master branch, we need to compare something else.

For example, this test fails on master but not on your branch:

rng = np.random.RandomState(0)
n_samples, n_features = 10, 4
X = rng.randn(n_samples, n_features)
w = rng.randn(n_features)
y = np.dot(X, w)
y[:n_samples / 2] = -y[:n_samples / 2]

alphas = (10, 21, 83)
for scoring in ('mean_absolute_error', 'mean_squared_error', 'r2'):
    gs = GridSearchCV(Ridge(fit_intercept=False), {'alpha': alphas},
                      fit_params={}, cv=5, scoring=scoring)
    rcv = RidgeCV(alphas=alphas, fit_intercept=False, scoring=scoring, cv=5)

    assert_equal(gs.fit(X, y).best_estimator_.alpha, rcv.fit(X, y).alpha_)

@pianomania
Copy link
Contributor Author

I finally got the key point. Thank you for your patiently answering my questions!

@amueller
Copy link
Member

amueller commented Oct 7, 2016

@pianomania sorry for the slow reply. Can you please rebase?

@pianomania
Copy link
Contributor Author

@amueller. Ok , I will do that.

@@ -1014,6 +1014,7 @@ def fit(self, X, y, sample_weight=None):

if error:
best = cv_values.mean(axis=0).argmin()
best_score = cv_values.mean(axis=0)[best]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid recalculating this mean.

Also, I wonder if we need to follow the scoring convention that says errors should be negated so that greater is better. Perhaps in documenting best_score_ we should be clear that if scoring is None we get an error that is being minimised and otherwise, as with *SearchCV.best_score_ it is maximised.

@@ -1195,6 +1200,10 @@ class RidgeCV(_BaseRidgeCV, RegressorMixin):
alpha_ : float
Estimated regularization parameter.

best_score_ : float
Score of best_estimator on the left out data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best_estimator -> best_estimator_ or just "best estimator"

?"left out" -> "held out"

from sklearn.utils import check_random_state
from sklearn.datasets import make_multilabel_classification
=======
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge error


def test_best_score_():

def fit_and_get_best_score_(reg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please inline this code rather than a function.


def fit_and_get_best_score_(reg):
reg.fit(X_diabetes,y_diabetes)
assert_equal(type(reg.best_score_), np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should test this more precisely, e.g. by retraining for the best alpha with cross_val_score.

reg2 = RidgeCV(scoring='mean_absolute_error')
reg3 = RidgeCV(fit_intercept=False, cv=5, scoring='mean_absolute_error')
reg4 = RidgeCV(fit_intercept=False, cv=5, scoring='median_absolute_error')
reg5 = GridSearchCV(Ridge(fit_intercept=False), {'alpha': reg3.alphas},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're trying to test here.

for i in (reg, reg2, reg3, reg4, reg5):
fit_and_get_best_score_(i)

assert_true(reg3.best_score_ is not reg4.best_score_, True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use != not is not. is not essentially means "does not have the same address in memory". Please also check that reg's score differs.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

reg1.fit(X_diabetes, y_diabetes)
reg2.fit(X_diabetes, y_diabetes)

assert_equal(reg1.alpha_, reg2.best_estimator_.alpha)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite a weak test. at least also check the equality of best_score_

reg.fit(X_diabetes, y_diabetes)
assert_equal(type(reg.best_score_), np.float64)

def test_ridgeCV_when_scoring_is_used_():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't usually use camel case in test names

@jnothman jnothman changed the title Add best_score_ attribute to RidgeCV and RidgeClassifierCV [MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV Mar 5, 2017
@jnothman jnothman added this to the 0.19 milestone Jun 18, 2017
@rth rth closed this in #15655 Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants