[MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV #4790

pianomania · 2015-05-29T17:26:01Z

Issue #4667
Add best_score_ attributeand fix a bug when _BaseRidgeCV use GridSearchCV without passing scoring into it.

agramfort · 2015-05-31T06:59:00Z

please add a test

pianomania · 2015-06-04T07:37:32Z

Is there any code I should modify?
And I cautious about whether it has need to make best_score_ positive because if scoring is loss function, scorer will sign-flip the outcome of loss function. Or it should be added more detail at docstring?

amueller · 2015-06-04T18:47:04Z

I don't think you should worry about that for this PR.

amueller · 2015-06-04T18:48:32Z

sklearn/linear_model/ridge.py

@@ -867,10 +871,12 @@ def fit(self, X, y, sample_weight=None):
            #fit_params = {'sample_weight' : sample_weight}
            fit_params = {}
            gs = GridSearchCV(Ridge(fit_intercept=self.fit_intercept),
-                              parameters, fit_params=fit_params, cv=self.cv)
+                              parameters, fit_params=fit_params, cv=self.cv,
+                              scoring = self.scoring)


please add a regression test for this.
That is, please compare the output of RidgeCV with cv != None to GridSearchCV, and make sure that the test fails on master.

Also, there shouldn't be a space before and after =.

I think RidgeCV with cv!=None should have the same outcome as using GridSearchCV if they are specified same cv. Or I misunderstand what you mean?

Yes exactly.

Sorry, but I'm still confused about what should I compare. Is comparison between outcome of RidgeCV with specified cv and outcome of directly using GridSearchCV? Or something else.

The current regression test already tests if the scoring functions is taken into account (through reg3 and reg4 comparison of self.best_score_).
To fail on master branch, the test should test it through self.alpha_, but it is not easy to find an example which gives 2 different alphas for 2 different scoring functions.

Why should it need to fail on master branch? (So I need to change the test in master branch?)
And I'm still curious about why the test should compare alpha_of RidgeCV with a specific cv with alpha_ of GridSearchCV (and expect they are not equal?).
The reason I'm curious is that RidgeCV with specified cv actually uses GridSearchCV to find alpha_, so if I use the GridSearchCV (with the same settings, i.e. cv, alpha and scoring), it should provide the same result.

amueller · 2015-06-09T21:06:47Z

yes, comparing RidgeCV with a specific cv with GridSearchCV

TomDLT · 2015-06-18T09:41:47Z

sklearn/linear_model/tests/test_ridge.py

+        reg.fit(X_diabetes,y_diabetes)
+        assert_equal(type(reg.best_score_), np.float64)
+
+    reg  = RidgeCV()


Please don't add spaces to align equal signs

amueller · 2015-06-18T18:22:05Z

Yes, RidgeCV and GridSearchCV should provide the same result. However, because there was a bug, not passing the scoring, they did not give the same results when passing scoring. You fixed that bug.
We want to add a tests that this bug doesn't happen again.

pianomania · 2015-06-21T04:51:27Z

OK, I got it. But the test have compared reg3 and reg4. isn't it sufficient?

TomDLT · 2015-06-22T12:29:15Z

It is sufficient when best_score_ exists, but if we want the test to fail on master branch, we need to compare something else.

For example, this test fails on master but not on your branch:

rng = np.random.RandomState(0)
n_samples, n_features = 10, 4
X = rng.randn(n_samples, n_features)
w = rng.randn(n_features)
y = np.dot(X, w)
y[:n_samples / 2] = -y[:n_samples / 2]

alphas = (10, 21, 83)
for scoring in ('mean_absolute_error', 'mean_squared_error', 'r2'):
    gs = GridSearchCV(Ridge(fit_intercept=False), {'alpha': alphas},
                      fit_params={}, cv=5, scoring=scoring)
    rcv = RidgeCV(alphas=alphas, fit_intercept=False, scoring=scoring, cv=5)

    assert_equal(gs.fit(X, y).best_estimator_.alpha, rcv.fit(X, y).alpha_)

pianomania · 2015-06-23T10:41:45Z

I finally got the key point. Thank you for your patiently answering my questions!

amueller · 2016-10-07T18:21:42Z

@pianomania sorry for the slow reply. Can you please rebase?

pianomania · 2016-10-15T04:42:08Z

@amueller. Ok , I will do that.

jnothman · 2017-02-23T10:04:00Z

sklearn/linear_model/ridge.py

@@ -1014,6 +1014,7 @@ def fit(self, X, y, sample_weight=None):

        if error:
            best = cv_values.mean(axis=0).argmin()
+            best_score = cv_values.mean(axis=0)[best]


please avoid recalculating this mean.

Also, I wonder if we need to follow the scoring convention that says errors should be negated so that greater is better. Perhaps in documenting best_score_ we should be clear that if scoring is None we get an error that is being minimised and otherwise, as with *SearchCV.best_score_ it is maximised.

jnothman · 2017-02-23T10:04:03Z

sklearn/linear_model/ridge.py

@@ -1195,6 +1200,10 @@ class RidgeCV(_BaseRidgeCV, RegressorMixin):
    alpha_ : float
        Estimated regularization parameter.

+    best_score_ : float
+        Score of best_estimator on the left out data.


best_estimator -> best_estimator_ or just "best estimator"

?"left out" -> "held out"

jnothman · 2017-02-23T10:04:05Z

sklearn/linear_model/tests/test_ridge.py

 from sklearn.utils import check_random_state
 from sklearn.datasets import make_multilabel_classification
+=======


merge error

jnothman · 2017-02-23T10:04:07Z

sklearn/linear_model/tests/test_ridge.py

+
+def test_best_score_():
+
+    def fit_and_get_best_score_(reg):


please inline this code rather than a function.

jnothman · 2017-02-23T10:04:09Z

sklearn/linear_model/tests/test_ridge.py

+
+    def fit_and_get_best_score_(reg):
+        reg.fit(X_diabetes,y_diabetes)
+        assert_equal(type(reg.best_score_), np.float64)


I wonder if we should test this more precisely, e.g. by retraining for the best alpha with cross_val_score.

jnothman · 2017-02-23T10:04:10Z

sklearn/linear_model/tests/test_ridge.py

+    reg2 = RidgeCV(scoring='mean_absolute_error')
+    reg3 = RidgeCV(fit_intercept=False, cv=5, scoring='mean_absolute_error')
+    reg4 = RidgeCV(fit_intercept=False, cv=5, scoring='median_absolute_error')
+    reg5 = GridSearchCV(Ridge(fit_intercept=False), {'alpha': reg3.alphas},


I'm not sure what you're trying to test here.

jnothman · 2017-02-23T10:04:11Z

sklearn/linear_model/tests/test_ridge.py

+    for i in (reg, reg2, reg3, reg4, reg5):
+        fit_and_get_best_score_(i)
+
+    assert_true(reg3.best_score_ is not reg4.best_score_, True)


use != not is not. is not essentially means "does not have the same address in memory". Please also check that reg's score differs.

…-learn into add_best_score

jnothman

Otherwise LGTM

jnothman · 2017-03-05T11:12:15Z

sklearn/linear_model/tests/test_ridge.py

+    reg1.fit(X_diabetes, y_diabetes)
+    reg2.fit(X_diabetes, y_diabetes)
+
+    assert_equal(reg1.alpha_, reg2.best_estimator_.alpha)


this is quite a weak test. at least also check the equality of best_score_

jnothman · 2017-03-05T11:12:27Z

sklearn/linear_model/tests/test_ridge.py

+    reg.fit(X_diabetes, y_diabetes) 
+    assert_equal(type(reg.best_score_), np.float64)
+
+def test_ridgeCV_when_scoring_is_used_():


we don't usually use camel case in test names

pianomania force-pushed the add_best_score branch from bd95b12 to 663fe25 Compare June 3, 2015 13:36

amueller reviewed Jun 4, 2015
View reviewed changes

TomDLT reviewed Jun 18, 2015
View reviewed changes

amueller mentioned this pull request Aug 1, 2015

RidgeCV should provide best_score_ #4667

Closed

amueller mentioned this pull request Oct 21, 2015

[WIP] Storing the best attributes of (non-GridSearch) CV models #5498

Closed

pianomania added 5 commits October 21, 2016 23:09

Add best_score_ attribute to RidgeCV and RidgeClassifierCV

7519d4e

add a test

da24b4a

slight change

eeccbb1

add a comparison between RidgeCV and GridSearchCV

4ce7c3c

delete space

2593046

pianomania force-pushed the add_best_score branch from 0b786e6 to 2593046 Compare October 21, 2016 15:41

Merge branch 'master' into add_best_score

2001966

jnothman reviewed Feb 23, 2017

View reviewed changes

pianomania added 5 commits March 5, 2017 00:39

Add best_score_ attribute to RidgeCV and RidgeClassifierCV

2bcaf54

add a test

23e1c07

add a comparison between RidgeCV and GridSearchCV

30ad6da

delete space

7a1106c

rebase and make some changes based on reviewer's suggestion

b34ca0d

pianomania added 2 commits March 5, 2017 00:57

Merge branch 'add_best_score' of https://github.com/pianomania/scikit…

a4cf8c4

…-learn into add_best_score

delete cythonize.dat and remove bug in testfile

a829269

jnothman reviewed Mar 5, 2017

View reviewed changes

jnothman changed the title ~~Add best_score_ attribute to RidgeCV and RidgeClassifierCV~~ [MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV Mar 5, 2017

jnothman added this to the 0.19 milestone Jun 18, 2017

amueller added the Waiting for Reviewer label Aug 5, 2019

qinhanmin2014 mentioned this pull request Nov 18, 2019

[MRG] FIX and ENH in _RidgeGCV #15648

Closed

glemaitre mentioned this pull request Nov 18, 2019

FIX add best_score_ to Ridge*CV estimators #15655

Merged

rth closed this in #15655 Dec 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV #4790

[MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV #4790

pianomania commented May 29, 2015

agramfort commented May 31, 2015

pianomania commented Jun 4, 2015

amueller commented Jun 4, 2015

amueller Jun 4, 2015

pianomania Jun 9, 2015

amueller Jun 9, 2015

pianomania Jun 9, 2015

TomDLT Jun 18, 2015

pianomania Jun 18, 2015

amueller commented Jun 9, 2015

TomDLT Jun 18, 2015

amueller commented Jun 18, 2015

pianomania commented Jun 21, 2015

TomDLT commented Jun 22, 2015

pianomania commented Jun 23, 2015

amueller commented Oct 7, 2016

pianomania commented Oct 15, 2016

jnothman Feb 23, 2017

jnothman Feb 23, 2017

jnothman Feb 23, 2017

jnothman Feb 23, 2017

jnothman Feb 23, 2017

jnothman Feb 23, 2017

jnothman Feb 23, 2017

jnothman left a comment

jnothman Mar 5, 2017

jnothman Mar 5, 2017

[MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV #4790

[MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV #4790

Conversation

pianomania commented May 29, 2015

agramfort commented May 31, 2015

pianomania commented Jun 4, 2015

amueller commented Jun 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jun 9, 2015

Choose a reason for hiding this comment

amueller commented Jun 18, 2015

pianomania commented Jun 21, 2015

TomDLT commented Jun 22, 2015

pianomania commented Jun 23, 2015

amueller commented Oct 7, 2016

pianomania commented Oct 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment