-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Add best_score_ attribute to RidgeCV and RidgeClassifierCV #4790
Conversation
please add a test |
bd95b12
to
663fe25
Compare
Is there any code I should modify? |
I don't think you should worry about that for this PR. |
sklearn/linear_model/ridge.py
Outdated
@@ -867,10 +871,12 @@ def fit(self, X, y, sample_weight=None): | |||
#fit_params = {'sample_weight' : sample_weight} | |||
fit_params = {} | |||
gs = GridSearchCV(Ridge(fit_intercept=self.fit_intercept), | |||
parameters, fit_params=fit_params, cv=self.cv) | |||
parameters, fit_params=fit_params, cv=self.cv, | |||
scoring = self.scoring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a regression test for this.
That is, please compare the output of RidgeCV with cv != None to GridSearchCV, and make sure that the test fails on master.
Also, there shouldn't be a space before and after =
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RidgeCV with cv!=None should have the same outcome as using GridSearchCV if they are specified same cv
. Or I misunderstand what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I'm still confused about what should I compare. Is comparison between outcome of RidgeCV with specified cv
and outcome of directly using GridSearchCV? Or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current regression test already tests if the scoring functions is taken into account (through reg3 and reg4 comparison of self.best_score_
).
To fail on master branch, the test should test it through self.alpha_
, but it is not easy to find an example which gives 2 different alphas for 2 different scoring functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should it need to fail on master branch? (So I need to change the test in master branch?)
And I'm still curious about why the test should compare alpha_
of RidgeCV with a specific cv with alpha_
of GridSearchCV (and expect they are not equal?).
The reason I'm curious is that RidgeCV with specified cv
actually uses GridSearchCV to find alpha_
, so if I use the GridSearchCV (with the same settings, i.e. cv
, alpha
and scoring
), it should provide the same result.
yes, comparing RidgeCV with a specific cv with GridSearchCV |
reg.fit(X_diabetes,y_diabetes) | ||
assert_equal(type(reg.best_score_), np.float64) | ||
|
||
reg = RidgeCV() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add spaces to align equal signs
Yes, RidgeCV and GridSearchCV should provide the same result. However, because there was a bug, not passing the scoring, they did not give the same results when passing scoring. You fixed that bug. |
OK, I got it. But the test have compared |
It is sufficient when For example, this test fails on master but not on your branch: rng = np.random.RandomState(0)
n_samples, n_features = 10, 4
X = rng.randn(n_samples, n_features)
w = rng.randn(n_features)
y = np.dot(X, w)
y[:n_samples / 2] = -y[:n_samples / 2]
alphas = (10, 21, 83)
for scoring in ('mean_absolute_error', 'mean_squared_error', 'r2'):
gs = GridSearchCV(Ridge(fit_intercept=False), {'alpha': alphas},
fit_params={}, cv=5, scoring=scoring)
rcv = RidgeCV(alphas=alphas, fit_intercept=False, scoring=scoring, cv=5)
assert_equal(gs.fit(X, y).best_estimator_.alpha, rcv.fit(X, y).alpha_) |
I finally got the key point. Thank you for your patiently answering my questions! |
@pianomania sorry for the slow reply. Can you please rebase? |
@amueller. Ok , I will do that. |
0b786e6
to
2593046
Compare
sklearn/linear_model/ridge.py
Outdated
@@ -1014,6 +1014,7 @@ def fit(self, X, y, sample_weight=None): | |||
|
|||
if error: | |||
best = cv_values.mean(axis=0).argmin() | |||
best_score = cv_values.mean(axis=0)[best] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid recalculating this mean.
Also, I wonder if we need to follow the scoring
convention that says errors should be negated so that greater is better. Perhaps in documenting best_score_
we should be clear that if scoring is None
we get an error that is being minimised and otherwise, as with *SearchCV.best_score_
it is maximised.
sklearn/linear_model/ridge.py
Outdated
@@ -1195,6 +1200,10 @@ class RidgeCV(_BaseRidgeCV, RegressorMixin): | |||
alpha_ : float | |||
Estimated regularization parameter. | |||
|
|||
best_score_ : float | |||
Score of best_estimator on the left out data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best_estimator
-> best_estimator_
or just "best estimator"
?"left out" -> "held out"
from sklearn.utils import check_random_state | ||
from sklearn.datasets import make_multilabel_classification | ||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge error
|
||
def test_best_score_(): | ||
|
||
def fit_and_get_best_score_(reg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please inline this code rather than a function.
|
||
def fit_and_get_best_score_(reg): | ||
reg.fit(X_diabetes,y_diabetes) | ||
assert_equal(type(reg.best_score_), np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should test this more precisely, e.g. by retraining for the best alpha
with cross_val_score
.
reg2 = RidgeCV(scoring='mean_absolute_error') | ||
reg3 = RidgeCV(fit_intercept=False, cv=5, scoring='mean_absolute_error') | ||
reg4 = RidgeCV(fit_intercept=False, cv=5, scoring='median_absolute_error') | ||
reg5 = GridSearchCV(Ridge(fit_intercept=False), {'alpha': reg3.alphas}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're trying to test here.
for i in (reg, reg2, reg3, reg4, reg5): | ||
fit_and_get_best_score_(i) | ||
|
||
assert_true(reg3.best_score_ is not reg4.best_score_, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use !=
not is not
. is not
essentially means "does not have the same address in memory". Please also check that reg
's score differs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
reg1.fit(X_diabetes, y_diabetes) | ||
reg2.fit(X_diabetes, y_diabetes) | ||
|
||
assert_equal(reg1.alpha_, reg2.best_estimator_.alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite a weak test. at least also check the equality of best_score_
reg.fit(X_diabetes, y_diabetes) | ||
assert_equal(type(reg.best_score_), np.float64) | ||
|
||
def test_ridgeCV_when_scoring_is_used_(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't usually use camel case in test names
Issue #4667
Add
best_score_
attributeand fix a bug when _BaseRidgeCV use GridSearchCV without passingscoring
into it.