Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] BUG Fixes voting named_estiamtor bug #15375

Merged

Conversation

@thomasjpfan
Copy link
Member

thomasjpfan commented Oct 28, 2019

Reference Issues/PRs

Fixes #15374

thomasjpfan added 3 commits Oct 28, 2019
@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Oct 28, 2019

@@ -560,3 +560,21 @@ def test_deprecate_none_transformer(Voter, BaseEstimator):
"Use the string 'drop' instead.")
with pytest.warns(DeprecationWarning, match=msg):
est.fit(X, y)


# TODO: Remove drop pparametrize in 0.24 when None is removed in Voting*

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Oct 28, 2019

Member

typo parametrize :)

Copy link
Contributor

NicolasHug left a comment

IIUC the named_estimators_ was previously wrong whenever there was a dropped estimator? If so, the what'snew could be more explicit.

LGTM otherwise

(VotingRegressor, DecisionTreeRegressor)]
)
@pytest.mark.parametrize("drop", [None, 'drop'])
def test_correct_named_estimator_with_drop(Voter, BaseEstimator, drop):

This comment has been minimized.

Copy link
@glemaitre

glemaitre Oct 28, 2019

Contributor

Since this named_estimator_ is now shared between the classes inheriting from _BaseHeterogeneousEnsemble, we could move the test in test_common.py and try on both Stacking and Voting.

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Oct 28, 2019

Author Member

I prefer to keep bug fix PRs limited in scope to make it easier for a reviewer.

I think a follow up PR to place this in test_common.py would be nice to have.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Oct 28, 2019

Contributor

I would agree with newcomer where it might be easier to have focus PR.

But here, it will just take twice more review time (you need to find reviewers for this PR and the next one) for adding a parametrize and move the test and we could even forget about opening the next PR.

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Oct 28, 2019

Author Member

Okay, I'll move it

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Oct 28, 2019

Author Member

Hmm there is no test_common in ensemble. Creating a test_common just for Voting and Stacking in ensemble seems slightly strange.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Oct 28, 2019

Contributor

Uhm apparently I did not commit my test_common.py when refactoring. So a second review will be required in this case :)


- |Fix| The `named_estimators_` attribute in :class:`voting.VotingClassifier`
and :class:`voting.VotingRegressor` now correctly maps to dropped estimators.
Previously, `named_estimators_` mapped to estimators that were not dropped.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Oct 28, 2019

Contributor

Not sure that's correct either. As far as I understand this should be

Previously, the named_estimators_ mapping was incorrect whenever one of the estimators was dropped.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Oct 28, 2019

Contributor

Then name was wrong, the estimator was correct.

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Oct 28, 2019

Author Member

It wasn't always incorrect. Maybe something like

Previously, named_estimators_ mapped to estimators that were not dropped
if the dropped estimator was ahead in the estimators list.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Oct 28, 2019

Contributor

The only case where named_estimators_ wasn't wrong is when the dropped estimator was the last one. I wouldn't go into such details. We can just say it was wrong.

thomasjpfan added 2 commits Oct 28, 2019
…mators_bug
@glemaitre glemaitre merged commit 7c47337 into scikit-learn:master Oct 28, 2019
14 of 17 checks passed
14 of 17 checks passed
scikit-learn.scikit-learn Build #20191028.50 failed
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas failed
Details
ci/circleci: deploy Your tests are queued behind your running builds
Details
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl) Linux pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
est.fit(X, y)
assert rec if drop is None else not rec

assert est.named_estimators_['lr'] == drop

This comment has been minimized.

Copy link
@glemaitre

glemaitre Oct 28, 2019

Contributor

I'm having a second thought here. I think that the fix is not right. In StackingClassifier, we are not reporting the dropped estimator. I think that it was the docstring is actually saying. Basically we should have only 'tree' in named_estimators_

This comment has been minimized.

Copy link
@glemaitre

glemaitre Oct 28, 2019

Contributor

Voting is probably fine, see discussion in #15387

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Oct 28, 2019

Author Member

Fundamentally, it would be good to have len(named_estimators_) == len(estimators_). Do you recall why we do not include 'drop' in estimators_?

@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Oct 28, 2019

A common test to illustrate difference is there: #15387

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.