New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Replacing grid_scores_ by cv_results_ in _rfe.py and test_rfe.py #20161
Changes from 26 commits
3b79637
43f64a0
464dc37
15a658f
672bca7
9448514
fc36365
68436b2
41bc337
1914f5e
2beb458
1712464
ec05856
24835e2
f781f30
b4ae6f4
d8ce1c4
45e6ffa
4f23ec4
42ecf9d
0e820e0
5eee2f1
7c786a9
63a1bfa
20d733a
b70baa9
b3ae926
007d427
3a40617
0be435a
4247091
bd94dd4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ | |
from ..utils._tags import _safe_tags | ||
from ..utils.validation import check_is_fitted | ||
from ..utils.fixes import delayed | ||
from ..utils.deprecation import deprecated | ||
from ..base import BaseEstimator | ||
from ..base import MetaEstimatorMixin | ||
from ..base import clone | ||
|
@@ -497,6 +498,24 @@ class RFECV(RFE): | |
``grid_scores_[i]`` corresponds to | ||
the CV score of the i-th subset of features. | ||
|
||
.. deprecated:: 1.0 | ||
The `grid_scores_` attribute is deprecated in version 1.0 in favor | ||
of `cv_results_` and will be removed in version 1.2. | ||
|
||
cv_results_ : dict of ndarrays | ||
A dict with keys: | ||
|
||
split(k)_score : ndarray of shape (n_features,) | ||
The cross-validation scores across (k)th fold. | ||
|
||
mean_score : ndarray of shape (n_features,) | ||
Mean of scores over the folds. | ||
|
||
std_score : ndarray of shape (n_features,) | ||
Standard deviation of scores over the folds. | ||
|
||
.. versionadded:: 1.0 | ||
|
||
n_features_ : int | ||
The number of selected features with cross-validation. | ||
|
||
|
@@ -650,9 +669,10 @@ def fit(self, X, y, groups=None): | |
for train, test in cv.split(X, y, groups) | ||
) | ||
|
||
scores = np.sum(scores, axis=0) | ||
scores_rev = scores[::-1] | ||
argmax_idx = len(scores) - np.argmax(scores_rev) - 1 | ||
scores = np.array(scores) | ||
scores_sum = np.sum(scores, axis=0) | ||
scores_sum_rev = scores_sum[::-1] | ||
argmax_idx = len(scores_sum) - np.argmax(scores_sum_rev) - 1 | ||
n_features_to_select = max( | ||
n_features - (argmax_idx * step), self.min_features_to_select | ||
) | ||
|
@@ -675,7 +695,27 @@ def fit(self, X, y, groups=None): | |
self.estimator_ = clone(self.estimator) | ||
self.estimator_.fit(self.transform(X), y) | ||
|
||
# Fixing a normalization error, n is equal to get_n_splits(X, y) - 1 | ||
# here, the scores are normalized by get_n_splits(X, y) | ||
self.grid_scores_ = scores[::-1] / cv.get_n_splits(X, y, groups) | ||
# reverse to stay consistent with before | ||
scores_rev = scores[:, ::-1] | ||
self.cv_results_ = {} | ||
self.cv_results_["mean_score"] = np.mean(scores_rev, axis=0) | ||
self.cv_results_["std_score"] = np.std(scores_rev, axis=0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that @thomasjpfan mentioned in its previous post was to change both entries using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sorry, I was mistaken about that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes it should be enough There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand. Thank you so much. |
||
|
||
for i in range(scores.shape[0]): | ||
self.cv_results_[f"split{i}_score"] = scores_rev[i] | ||
|
||
return self | ||
|
||
# TODO: Remove in v1.2 when grid_scores_ is removed | ||
# mypy error: Decorated property not supported | ||
@deprecated( # type: ignore | ||
"The grid_scores_ attribute is deprecated in version 1.0 in favor " | ||
"of cv_results_ and will be removed in version 1.2." | ||
wowry marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
@property | ||
def grid_scores_(self): | ||
# remove 2 for mean_score, std_score | ||
grid_size = len(self.cv_results_) - 2 | ||
return np.asarray( | ||
[self.cv_results_["split{}_score".format(i)] for i in range(grid_size)] | ||
).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should not use
mean_test_score
/std_test_score
/split{i}_test_score
instead.GridSearchCV
Edit actuall ignore that remark. #16392 already uses the
mean_score
notation...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being consistent with
GridSearchCV
looks like a net win to me. I am okay with deprecatingmean_score
inGraphicalLassoCV
and switch to using*_test_score
instead.@ogrisel Do you think deprecating would be too much of a hassle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with both approaches.