[MRG] Replacing grid_scores_ by cv_results_ in _rfe.py #16961

arka204 · 2020-04-18T22:13:48Z

Reference Issues/PRs

Partially fixes #11198
Based on #16392

What does this implement/fix? Explain your changes.

This PR replaces grid_scores_ with cv_results_ in _rfy.py.
Also adds temporary property grid_scores.

Any other comments?

I plan to change tests in a similar way (replacing grid_scores_ with code of property grid_scores_) after confirming that it is correct way to do so.
Am I mistaken or are those grid_scores a one-dimensional array?

Merging changes from the main repository

KumarGanesha1996

lgtm

KumarGanesha1996 · 2020-05-11T16:44:01Z

sklearn/feature_selection/_rfe.py

+        grid_size = len(self.cv_results_) - 2
+        return np.asarray(
+            [self.cv_results_["split{}_score".format(i)]
+            for i in range(grid_size)]).T


i think you wrong indented... and newline missing.

sklearn/feature_selection/_rfe.py:607:13: E128 continuation line under-indented for visual indent
for i in range(grid_size)]).T ^
sklearn/feature_selection/_rfe.py:607:42: W292 no newline at end of file
for i in range(grid_size)]).T ^

Exited with code exit status 1

KumarGanesha1996 · 2020-05-11T16:50:02Z

sklearn/feature_selection/_rfe.py

+        grid_scores = scores[::-1] / cv.get_n_splits(X, y, groups)
+        self.cv_results_ = {}
+        for i in range(grid_scores.shape[0]):
+            key = "split{}_score".format(i)


key = "split%d_score" % i

Merging changes from the main repository

thomasjpfan

This needs test for grid_scores_ and cv_results_.

KumarGanesha1996 · 2020-05-25T00:08:48Z

i think you test fail because you need rebase. check this great tut by my dear friend https://www.youtube.com/watch?v=Gjd44YpucEA

KumarGanesha1996 · 2020-05-25T23:26:15Z

@arka204 are you still working on this?

…rfe.py-update

Rfe.py update

cmarmo

A sphinx warning in the documentation is preventing your build from finalization.
Once the sphinx warning fixed and if you think that this PR is ready for review, do you mind changing the title from [WIP] to [MRG] as specified in the documentation? Thanks!

cmarmo · 2020-05-30T17:24:37Z

sklearn/feature_selection/_rfe.py

@@ -457,6 +458,24 @@ class RFECV(RFE):
        ``grid_scores_[i]`` corresponds to
        the CV score of the i-th subset of features.

+        .. deprecated:: 0.23
+        The `grid_scores_` attribute is deprecated in version 0.23 in favor


Suggested change

The `grid_scores_` attribute is deprecated in version 0.23 in favor

The `grid_scores_` attribute is deprecated in version 0.23 in favor

This indentation and the next one will fix the 'unexpected unindent' sphinx warning (see artifacts)

cmarmo · 2020-05-30T17:24:53Z

sklearn/feature_selection/_rfe.py

@@ -457,6 +458,24 @@ class RFECV(RFE):
        ``grid_scores_[i]`` corresponds to
        the CV score of the i-th subset of features.

+        .. deprecated:: 0.23
+        The `grid_scores_` attribute is deprecated in version 0.23 in favor
+        of `cv_results_` and will be removed in version 0.25


Suggested change

of `cv_results_` and will be removed in version 0.25

of `cv_results_` and will be removed in version 0.25

cmarmo · 2020-05-30T17:31:24Z

sklearn/feature_selection/_rfe.py

+
+        split(i)_score : float
+        corresponds to the CV score of the i-th subset of features
+
+        mean_score : float
+        mean of split(i)_score values in dict
+
+        std_score : float
+        std of split(i)_score values in dict


Maybe the dictionary keys could be listed as a bullet list?

I am okay with this for now, we use this formatting for other places where we return dicts (fetch_openml).

…ps://github.com/arka204/scikit-learn into Replacing-grid_scores_-by-cv_results-in-_rfe.py

arka204 · 2020-06-06T13:02:54Z

Can I have Your opinion on this @jnothman, @thomasjpfan?

thomasjpfan

We need a nontrivial test to make sure cv_results_["mean_score"] and cv_results_["std_score"] are computed correctly.

sklearn/feature_selection/_rfe.py

thomasjpfan · 2020-06-01T20:16:58Z

sklearn/feature_selection/_rfe.py

+        "The grid_scores_ attribute is deprecated in version 0.24 in favor "
+        "of cv_results_ and will be removed in version 0.25"
+    )
+    @property  # type: ignore


Suggested change

@property # type: ignore

@property

thomasjpfan · 2020-06-06T14:37:48Z

sklearn/feature_selection/_rfe.py

+
+        split(i)_score : float
+        corresponds to the CV score of the i-th subset of features
+
+        mean_score : float
+        mean of split(i)_score values in dict
+
+        std_score : float
+        std of split(i)_score values in dict


I am okay with this for now, we use this formatting for other places where we return dicts (fetch_openml).

thomasjpfan · 2020-06-06T14:37:57Z

sklearn/feature_selection/_rfe.py

+        A dict with keys:
+
+        split(i)_score : float
+        corresponds to the CV score of the i-th subset of features


Suggested change

corresponds to the CV score of the i-th subset of features

corresponds to the CV score of the i-th subset of features

thomasjpfan · 2020-06-06T14:38:25Z

sklearn/feature_selection/_rfe.py

+        corresponds to the CV score of the i-th subset of features
+
+        mean_score : float
+        mean of split(i)_score values in dict


Suggested change

mean of split(i)_score values in dict

mean of split(i)_score values

thomasjpfan · 2020-06-06T14:39:34Z

sklearn/feature_selection/_rfe.py

+        mean of split(i)_score values in dict
+
+        std_score : float
+        std of split(i)_score values in dict


Suggested change

std of split(i)_score values in dict

standard deviation of split(i)_score values in dict

thomasjpfan

This is looking good!

We can have test_rfecv be the only one that explicitly checks for the deprecation warning.

The other tests can be decorated with ignore_warnings and remove the pytest.warns. For example:

# TODO: Remove in 0.25 when grid_scores_ is deprecated
@ignore_warnings(category=FutureWarning)
def test_rfecv_cv_results_size()

thomasjpfan · 2020-06-08T22:55:56Z

sklearn/feature_selection/_rfe.py

        return self
+
+    # mypy error: Decorated property not supported
+    @deprecated(


Suggested change

@deprecated(

@deprecated( # type: ignore

thomasjpfan · 2020-06-08T23:01:46Z

sklearn/feature_selection/tests/test_rfe.py

+        with pytest.warns(FutureWarning, match=msg):
+            assert len(rfecv.grid_scores_) == score_len
+
+        assert (len(rfecv.cv_results_) - 2) == score_len


Suggested change

assert (len(rfecv.cv_results_) - 2) == score_len

assert len(rfecv.cv_results_) - 2 == score_len

thomasjpfan · 2020-06-08T23:02:08Z

sklearn/feature_selection/tests/test_rfe.py

                formula1(n_features, n_features_to_select, step))
-        assert (rfecv.grid_scores_.shape[0] ==
+        assert ((len(rfecv.cv_results_) - 2) ==


Suggested change

assert ((len(rfecv.cv_results_) - 2) ==

assert (len(rfecv.cv_results_) - 2 ==

sklearn/feature_selection/tests/test_rfe.py

thomasjpfan · 2020-06-08T23:04:20Z

sklearn/feature_selection/tests/test_rfe.py

+
+    assert (rfecv_cv_results_.keys() == rfecv.cv_results_.keys())
+    for key in rfecv_cv_results_.keys():
+        assert (rfecv_cv_results_[key] == rfecv.cv_results_[key])


This is comparing floats:

Suggested change

assert (rfecv_cv_results_[key] == rfecv.cv_results_[key])

assert rfecv_cv_results_[key] == pytest.approx(rfecv.cv_results_[key])

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

thomasjpfan

The computation of cv_results_ seems a little off.

thomasjpfan · 2020-06-09T01:33:44Z

sklearn/feature_selection/_rfe.py

+        for i in range(grid_scores.shape[0]):
+            key = "split%d_score" % i
+            self.cv_results_[key] = grid_scores[i]
+        self.cv_results_["mean_score"] = np.mean(grid_scores, axis=0)


The scores are already the sum along the 0 axis. So this mean, would not be over the splits.

We need to keep the original scores the was returned by:

scikit-learn/sklearn/feature_selection/_rfe.py

Lines 578 to 582 in 728b413

scores = parallel(

func(rfe, self.estimator, X, y, train, test, scorer)

for train, test in cv.split(X, y, groups))

scores = np.sum(scores, axis=0)

so we can compute the mean and std correctly.

(Also the grid_scores_ are already the means of each split)

If we hold on to the scores:

scores = parallel( func(rfe, self.estimator, X, y, train, test, scorer) for train, test in cv.split(X, y, groups)) scores = np.array(scores) scores_sum = np.sum(scores, axis=0) # technically could use mean here ... # reverse to stay consistent with before scores_rev = scores[:, ::-1] self.cv_results_ = {} self.cv_results_["mean_score"] = np.mean(scores_rev, axis=0) self.cv_results_["std_score"] = np.std(scores_rev, axis=0) for i in range(scores.shape[0]): self.cv_results_[f"split{i}_score"] = scores_rev[i]

And then grid_score_ is just self.cv_results_["mean_score"].

thomasjpfan · 2020-06-09T02:02:44Z

sklearn/feature_selection/tests/test_rfe.py

+    values = np.asarray(
+            [rfecv.cv_results_["split{}_score".format(i)]
+                for i in range(results_size - 2)]).T
+    assert rfecv.cv_results_["mean_score"] == np.mean(values)
+    assert rfecv.cv_results_["std_score"] == np.std(values)


This looks like 'mean_score' is a single number and not a vector of means for each feature subsets.

thomasjpfan · 2020-06-09T02:04:58Z

sklearn/feature_selection/tests/test_rfe.py

+    for key in rfecv.cv_results_.keys():
+        if key == 'std_score':
+            assert (rfecv.cv_results_[key] == 0)
+        else:
+            assert (rfecv.cv_results_[key] == 1)


Lets remove this and use test_std_and_mean to explicitly test this.

cmarmo · 2020-08-23T17:26:18Z

Hi @arka204 , would you be able to finish this pull request? Thanks!

wowry · 2021-05-28T10:05:06Z

Working on it in #20161.

cmarmo · 2021-07-30T16:24:00Z

Closed by #20161.

arka204 added 2 commits April 11, 2020 18:47

Merge pull request #1 from scikit-learn/master

3b79637

Merging changes from the main repository

Deprecating grid_scores_ in _rfe.py.

43f64a0

github-actions bot added the module:feature_selection label Apr 18, 2020

Merge pull request #2 from scikit-learn/master

464dc37

Merging changes from the main repository

KumarGanesha1996 approved these changes May 11, 2020

View reviewed changes

arka204 added 2 commits May 22, 2020 22:09

Merge pull request #3 from scikit-learn/master

15a658f

Merging changes from the main repository

Fixing linting errors.

672bca7

thomasjpfan reviewed May 23, 2020

View reviewed changes

arka204 added 3 commits May 30, 2020 18:20

Modifying tests and fixing mypy error.

9448514

Merge branch 'Replacing-grid_scores_-by-cv_results-in-_rfe.py' into _…

fc36365

…rfe.py-update

Merge pull request #7 from arka204/_rfe.py-update

68436b2

Rfe.py update

cmarmo reviewed May 30, 2020

View reviewed changes

arka204 added 2 commits May 30, 2020 19:38

Fixing small errors.

41bc337

Merge branch 'Replacing-grid_scores_-by-cv_results-in-_rfe.py' of htt…

1914f5e

…ps://github.com/arka204/scikit-learn into Replacing-grid_scores_-by-cv_results-in-_rfe.py

arka204 changed the title ~~[WIP] Replacing grid_scores_ by cv_results_ in _rfe.py~~ [MRG] Replacing grid_scores_ by cv_results_ in _rfe.py May 30, 2020

thomasjpfan reviewed Jun 6, 2020

View reviewed changes

Applying suggestions.

2beb458

thomasjpfan reviewed Jun 8, 2020

View reviewed changes

arka204 and others added 2 commits June 9, 2020 01:38

Applaying suggestions.

1712464

Update sklearn/feature_selection/tests/test_rfe.py

ec05856

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

thomasjpfan reviewed Jun 9, 2020

View reviewed changes

cmarmo added the Needs work label Aug 23, 2020

Base automatically changed from master to main January 22, 2021 10:52

cmarmo added Stalled help wanted labels Apr 23, 2021

wowry mentioned this pull request May 28, 2021

API Replacing grid_scores_ by cv_results_ in _rfe.py and test_rfe.py #20161

Merged

cmarmo added Superseded PR has been replace by a newer PR and removed help wanted labels May 28, 2021

cmarmo closed this Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Replacing grid_scores_ by cv_results_ in _rfe.py #16961

[MRG] Replacing grid_scores_ by cv_results_ in _rfe.py #16961

arka204 commented Apr 18, 2020

KumarGanesha1996 left a comment

KumarGanesha1996 May 11, 2020

KumarGanesha1996 May 11, 2020

thomasjpfan left a comment

KumarGanesha1996 commented May 25, 2020

KumarGanesha1996 commented May 25, 2020

cmarmo left a comment

cmarmo May 30, 2020

cmarmo May 30, 2020

cmarmo May 30, 2020

cmarmo May 30, 2020

thomasjpfan Jun 6, 2020

arka204 commented Jun 6, 2020

thomasjpfan left a comment

thomasjpfan Jun 1, 2020

thomasjpfan Jun 6, 2020

thomasjpfan Jun 6, 2020

thomasjpfan Jun 6, 2020

thomasjpfan Jun 6, 2020

thomasjpfan left a comment

thomasjpfan Jun 8, 2020

thomasjpfan Jun 8, 2020

thomasjpfan Jun 8, 2020

thomasjpfan Jun 8, 2020

thomasjpfan left a comment

thomasjpfan Jun 9, 2020

thomasjpfan Jun 9, 2020

thomasjpfan Jun 9, 2020

cmarmo commented Aug 23, 2020

wowry commented May 28, 2021

cmarmo commented Jul 30, 2021

	The `grid_scores_` attribute is deprecated in version 0.23 in favor
	The `grid_scores_` attribute is deprecated in version 0.23 in favor

	of `cv_results_` and will be removed in version 0.25
	of `cv_results_` and will be removed in version 0.25

	corresponds to the CV score of the i-th subset of features
	corresponds to the CV score of the i-th subset of features

	mean of split(i)_score values in dict
	mean of split(i)_score values

	std of split(i)_score values in dict
	standard deviation of split(i)_score values in dict

	assert (len(rfecv.cv_results_) - 2) == score_len
	assert len(rfecv.cv_results_) - 2 == score_len

	assert ((len(rfecv.cv_results_) - 2) ==
	assert (len(rfecv.cv_results_) - 2 ==

	assert (rfecv_cv_results_[key] == rfecv.cv_results_[key])
	assert rfecv_cv_results_[key] == pytest.approx(rfecv.cv_results_[key])

	scores = parallel(
	func(rfe, self.estimator, X, y, train, test, scorer)
	for train, test in cv.split(X, y, groups))

	scores = np.sum(scores, axis=0)

[MRG] Replacing grid_scores_ by cv_results_ in _rfe.py #16961

[MRG] Replacing grid_scores_ by cv_results_ in _rfe.py #16961

Conversation

arka204 commented Apr 18, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

KumarGanesha1996 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

KumarGanesha1996 commented May 25, 2020

KumarGanesha1996 commented May 25, 2020

cmarmo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arka204 commented Jun 6, 2020

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmarmo commented Aug 23, 2020

wowry commented May 28, 2021

cmarmo commented Jul 30, 2021