Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] MAINT add base class for voting and stacking #15084

Merged
merged 12 commits into from Oct 5, 2019

Conversation

glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Sep 24, 2019

closes #15056

Create a base class for Voting* and Stacking*. They both are an ensemble of multiple learners type.
They could share the get_params, set_params and validation of estimators (as well as the fitted attributes then).

This base class could be contrasted with the ensemble of single learner type such as boosting (adaboost, GBDT), RF and Bagging.

@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Sep 24, 2019

@thomasjpfan @ogrisel @rth @adrinjalali

So the naming of the base class is terrible but I wanted to have a WIP PR such that we see what is in common and if it makes sense to merge code.

NB: the tests will fail because I did not add support for None to drop an estimator (only available in the voting and not in the stacking). This is easily fixed and would ease the deprecation.

WDYT?

sklearn/ensemble/_stacking.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

I like this refactoring. The _BaseEnsembleHeterogeneousEstimator class has have well defined boundaries.

sklearn/ensemble/base.py Outdated Show resolved Hide resolved
sklearn/ensemble/base.py Outdated Show resolved Hide resolved
@glemaitre glemaitre changed the title [WIP] MAINT add base class for voting and stacking [MRG] MAINT add base class for voting and stacking Oct 1, 2019
@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Oct 1, 2019

Good to be reviewed. I will open a PR to deprecate None support and use 'drop' instead.

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
sklearn/ensemble/base.py Show resolved Hide resolved
sklearn/ensemble/tests/test_stacking.py Outdated Show resolved Hide resolved
sklearn/ensemble/base.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Oct 2, 2019

Copy link
Member

@NicolasHug NicolasHug left a comment

I guess I'm becoming increasingly skeptical about the relevance of inheritance in some cases (like here where all it does is set a single attribute). Makes the code easy to write, but often harder to understand.

But LGTM anyway.

- |Fix| Stacking and Voting estimators now ensure that their underlying
estimators are either all classifiers or all regressors.
We introduced a new base class
:class:`ensemble.base._BaseHeterogeneousEnsemble` to raise consistent error
Copy link
Member

@thomasjpfan thomasjpfan Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include a private class in the whats new? This can be something like:

Stacking and Voting estimators now raise consistent error messages.

Copy link
Contributor Author

@glemaitre glemaitre Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstood what @NicolasHug meant by adding a link?
Did you mean mentioning the class or do you expect something else?

Copy link
Member

@thomasjpfan thomasjpfan Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not generating a new API doc for _BaseHeterogeneousEnsemble, there is nothing to link to: https://76528-843222-gh.circle-artifacts.com/0/doc/whats_new/v0.22.html#sklearn-ensemble

Copy link
Member

@NicolasHug NicolasHug Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The links referred to the Stacking and Voting estimators, sorry if that wasn't clear. I agree we shouldn't link a private class. (and I'm also fine not linking the estimators... it's just a nit)

Copy link
Contributor Author

@glemaitre glemaitre Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok make sense.

@glemaitre glemaitre added this to TO BE MERGED in Guillaume's pet Oct 3, 2019
@@ -236,6 +236,15 @@ Changelog
:user:`Matt Hancock <notmatthancock>` and
:pr:`5963` by :user:`Pablo Duboue <DrDub>`.

- |Fix| Stacking and Voting estimators now ensure that their underlying
estimators are either all classifiers or all regressors.
We introduced a new base class
Copy link
Member

@thomasjpfan thomasjpfan Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need a "new base class" part?

:class:ensemble.StackingClassifier, :class:ensemble.StackingRegressor, :class:ensemble.VotingClassifier, and :class:ensemeble.VotingRegressor now raise consistent error messages.

Copy link
Contributor Author

@glemaitre glemaitre Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong edit

@thomasjpfan thomasjpfan merged commit 7dd03e0 into scikit-learn:master Oct 5, 2019
19 checks passed
@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 5, 2019

Thank you @glemaitre !

@glemaitre glemaitre moved this from TO BE MERGED to MERGED in Guillaume's pet Oct 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
4 participants