Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] BUG Fixes voting named_estiamtor bug #15375

Merged

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 28, 2019

Reference Issues/PRs

Fixes #15374

@thomasjpfan
Copy link
Member Author

@thomasjpfan thomasjpfan commented Oct 28, 2019

@@ -560,3 +560,21 @@ def test_deprecate_none_transformer(Voter, BaseEstimator):
"Use the string 'drop' instead.")
with pytest.warns(DeprecationWarning, match=msg):
est.fit(X, y)


# TODO: Remove drop pparametrize in 0.24 when None is removed in Voting*
Copy link
Member

@qinhanmin2014 qinhanmin2014 Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo parametrize :)

Copy link
Member

@NicolasHug NicolasHug left a comment

IIUC the named_estimators_ was previously wrong whenever there was a dropped estimator? If so, the what'snew could be more explicit.

LGTM otherwise

(VotingRegressor, DecisionTreeRegressor)]
)
@pytest.mark.parametrize("drop", [None, 'drop'])
def test_correct_named_estimator_with_drop(Voter, BaseEstimator, drop):
Copy link
Contributor

@glemaitre glemaitre Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this named_estimator_ is now shared between the classes inheriting from _BaseHeterogeneousEnsemble, we could move the test in test_common.py and try on both Stacking and Voting.

Copy link
Member Author

@thomasjpfan thomasjpfan Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep bug fix PRs limited in scope to make it easier for a reviewer.

I think a follow up PR to place this in test_common.py would be nice to have.

Copy link
Contributor

@glemaitre glemaitre Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with newcomer where it might be easier to have focus PR.

But here, it will just take twice more review time (you need to find reviewers for this PR and the next one) for adding a parametrize and move the test and we could even forget about opening the next PR.

Copy link
Member Author

@thomasjpfan thomasjpfan Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll move it

Copy link
Member Author

@thomasjpfan thomasjpfan Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm there is no test_common in ensemble. Creating a test_common just for Voting and Stacking in ensemble seems slightly strange.

Copy link
Contributor

@glemaitre glemaitre Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm apparently I did not commit my test_common.py when refactoring. So a second review will be required in this case :)


- |Fix| The `named_estimators_` attribute in :class:`voting.VotingClassifier`
and :class:`voting.VotingRegressor` now correctly maps to dropped estimators.
Previously, `named_estimators_` mapped to estimators that were not dropped.
Copy link
Member

@NicolasHug NicolasHug Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that's correct either. As far as I understand this should be

Previously, the named_estimators_ mapping was incorrect whenever one of the estimators was dropped.

Copy link
Contributor

@glemaitre glemaitre Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then name was wrong, the estimator was correct.

Copy link
Member Author

@thomasjpfan thomasjpfan Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't always incorrect. Maybe something like

Previously, named_estimators_ mapped to estimators that were not dropped
if the dropped estimator was ahead in the estimators list.

Copy link
Member

@NicolasHug NicolasHug Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where named_estimators_ wasn't wrong is when the dropped estimator was the last one. I wouldn't go into such details. We can just say it was wrong.

@glemaitre glemaitre merged commit 7c47337 into scikit-learn:master Oct 28, 2019
14 of 17 checks passed
est.fit(X, y)
assert rec if drop is None else not rec

assert est.named_estimators_['lr'] == drop
Copy link
Contributor

@glemaitre glemaitre Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a second thought here. I think that the fix is not right. In StackingClassifier, we are not reporting the dropped estimator. I think that it was the docstring is actually saying. Basically we should have only 'tree' in named_estimators_

Copy link
Contributor

@glemaitre glemaitre Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Voting is probably fine, see discussion in #15387

Copy link
Member Author

@thomasjpfan thomasjpfan Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, it would be good to have len(named_estimators_) == len(estimators_). Do you recall why we do not include 'drop' in estimators_?

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Oct 28, 2019

A common test to illustrate difference is there: #15387

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants