Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Fixes stacker compatibility with estimators without n_features_in_ #17357

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 15 additions & 0 deletions doc/whats_new/v0.23.rst
Expand Up @@ -2,6 +2,21 @@

.. currentmodule:: sklearn

.. _changes_0_23_2:

Version 0.23.2
==============

Changelog
---------

:mod:`sklearn.ensemble`
.......................

- |Fix| Fixes :class:`ensemble.StackingClassifier` and
:class:`ensemble.StackingRegressor` compatibility with estimators that
do not define `n_features_in_`. :pr:`17357` by `Thomas Fan`_.

.. _changes_0_23_1:

Version 0.23.1
Expand Down
13 changes: 12 additions & 1 deletion sklearn/ensemble/_stacking.py
Expand Up @@ -13,6 +13,7 @@
from ..base import clone
from ..base import ClassifierMixin, RegressorMixin, TransformerMixin
from ..base import is_classifier, is_regressor
from ..exceptions import NotFittedError
from ..utils._estimator_html_repr import _VisualBlock

from ._base import _fit_single_estimator
Expand Down Expand Up @@ -146,7 +147,6 @@ def fit(self, X, y, sample_weight=None):
delayed(_fit_single_estimator)(clone(est), X, y, sample_weight)
for est in all_estimators if est != 'drop'
)
self.n_features_in_ = self.estimators_[0].n_features_in_

self.named_estimators_ = Bunch()
est_fitted_idx = 0
Expand Down Expand Up @@ -197,6 +197,17 @@ def fit(self, X, y, sample_weight=None):

return self

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth using the same structure as others meta estimators for consistency

    def n_features_in_(self):
        # For consistency with other estimators we raise a AttributeError so
        # that hasattr() fails if the estimator isn't fitted.
        try:
            check_is_fitted(self)
        except NotFittedError as nfe:
            raise AttributeError(
                "{} object has no n_features_in_ attribute."
                .format(self.__class__.__name__)
            ) from nfe

        return self.estimators_[0].n_features_in_

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but thinking about it, would it make sense to have something shared in a base class (or mixin).

def n_features_in_(self):
"""Number of features seen during :term:`fit`."""
try:
check_is_fitted(self)
except NotFittedError as nfe:
raise AttributeError(
f"{self.__class__.__name__} object has no attribute "
f"n_features_in_") from nfe
return self.estimators_[0].n_features_in_

def _transform(self, X):
"""Concatenate and return the predictions of the estimators."""
check_is_fitted(self)
Expand Down
31 changes: 31 additions & 0 deletions sklearn/ensemble/tests/test_stacking.py
Expand Up @@ -17,6 +17,8 @@
from sklearn.datasets import load_iris
from sklearn.datasets import load_diabetes
from sklearn.datasets import load_breast_cancer
from sklearn.datasets import make_regression
from sklearn.datasets import make_classification

from sklearn.dummy import DummyClassifier
from sklearn.dummy import DummyRegressor
Expand Down Expand Up @@ -491,3 +493,32 @@ def test_stacking_cv_influence(stacker, X, y):
with pytest.raises(AssertionError, match='Not equal'):
assert_allclose(stacker_cv_3.final_estimator_.coef_,
stacker_cv_5.final_estimator_.coef_)


@pytest.mark.parametrize("make_dataset, Stacking, Estimator", [
(make_classification, StackingClassifier, LogisticRegression),
(make_regression, StackingRegressor, LinearRegression)
])
def test_stacking_without_n_features_in(make_dataset, Stacking, Estimator):
# Stacking supports estimators without `n_features_in_`. Regression test
# for #17353

class MyEstimator(Estimator):
"""Estimator without n_features_in_"""
def fit(self, X, y):
super().fit(X, y)
del self.n_features_in_

X, y = make_dataset(random_state=0, n_samples=100)
stacker = Stacking(estimators=[('lr', MyEstimator())])

msg = f"{Stacking.__name__} object has no attribute n_features_in_"
with pytest.raises(AttributeError, match=msg):
stacker.n_features_in_

# Does not raise
stacker.fit(X, y)

msg = "'MyEstimator' object has no attribute 'n_features_in_'"
with pytest.raises(AttributeError, match=msg):
stacker.n_features_in_