Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Pass sample_weight when predicting on stacked folds #16539

Merged
merged 22 commits into from Mar 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
840cd39
pass sample_weight when predicting on stacked folds
Feb 24, 2020
7199799
use more descriptive spec def
Feb 25, 2020
676a720
fix specs by conditionally passing sample_weight
Feb 25, 2020
9d70031
Add entry in release notes describing improvement
Feb 25, 2020
324a594
add versionchanged note about sample_weight
Feb 25, 2020
c032fd3
fixup witespace in docstring
Feb 25, 2020
8996d3e
indent bullet list in release notes
Feb 25, 2020
652a374
Merge remote-tracking branch 'upstream/master' into is/16537
Feb 25, 2020
9f14750
fix: update release notes to reference to StackingClassifier and Stac…
Feb 28, 2020
4a17b2c
fix: minor formatting suggestion from review comments
Feb 28, 2020
e4d39a5
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Feb 28, 2020
7bae037
fix: add comment describing odd naming of _parallel_fit_estimator
Feb 28, 2020
c240791
fix: update link to PR with direct link to issue
Feb 28, 2020
84520e3
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Feb 29, 2020
163142e
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Feb 29, 2020
25cca55
set n_features_in_ when CheckingClassifier is fit
Feb 29, 2020
471c80c
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Feb 29, 2020
78d75b8
fix formatting
Feb 29, 2020
8bcbbe9
use len(X) as n_features_in_
Feb 29, 2020
569169d
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Mar 1, 2020
4baaa35
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Mar 3, 2020
85cd8b9
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
Mar 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/whats_new/v0.23.rst
Expand Up @@ -182,6 +182,13 @@ Changelog
used during `fit`.
:pr:`16437` by :user:`Jin-Hwan CHO <chofchof>`.

- |Fix| Fixed a bug in :class:`ensemble.StackingClassifier` and
:class:`ensemble.StackingRegressor` where the `sample_weight`
argument was not being passed to `cross_val_predict` when
evaluating the base estimators on cross-validation folds
to obtain the input to the meta estimator.
:pr:`16539` by :user:`Bill DeRose <wderose>`.

:mod:`sklearn.feature_extraction`
.................................

Expand Down
26 changes: 10 additions & 16 deletions sklearn/ensemble/_stacking.py
Expand Up @@ -122,6 +122,10 @@ def fit(self, X, y, sample_weight=None):
Note that this is supported only if all underlying estimators
support sample weights.

.. versionchanged:: 0.23
when not None, `sample_weight` is passed to all underlying
estimators

Returns
-------
self : object
Expand Down Expand Up @@ -166,10 +170,13 @@ def fit(self, X, y, sample_weight=None):
self._method_name(name, est, meth)
for name, est, meth in zip(names, all_estimators, stack_method)
]

fit_params = ({"sample_weight": sample_weight}
if sample_weight is not None
else None)
predictions = Parallel(n_jobs=self.n_jobs)(
delayed(cross_val_predict)(clone(est), X, y, cv=deepcopy(cv),
method=meth, n_jobs=self.n_jobs,
fit_params=fit_params,
verbose=self.verbose)
for est, meth in zip(all_estimators, self.stack_method_)
if est != 'drop'
Expand All @@ -183,21 +190,8 @@ def fit(self, X, y, sample_weight=None):
]

X_meta = self._concatenate_predictions(X, predictions)
if sample_weight is not None:
try:
self.final_estimator_.fit(
X_meta, y, sample_weight=sample_weight
)
except TypeError as exc:
if "unexpected keyword argument 'sample_weight'" in str(exc):
raise TypeError(
"Underlying estimator {} does not support sample "
"weights."
.format(self.final_estimator_.__class__.__name__)
) from exc
raise
else:
self.final_estimator_.fit(X_meta, y)
_fit_single_estimator(self.final_estimator_, X_meta, y,
sample_weight=sample_weight)

return self

Expand Down
14 changes: 14 additions & 0 deletions sklearn/ensemble/tests/test_stacking.py
Expand Up @@ -38,6 +38,7 @@
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import KFold

from sklearn.utils._mocking import CheckingClassifier
from sklearn.utils._testing import assert_allclose
from sklearn.utils._testing import assert_allclose_dense_sparse
from sklearn.utils._testing import ignore_warnings
Expand Down Expand Up @@ -439,6 +440,19 @@ def test_stacking_with_sample_weight(stacker, X, y):
assert np.abs(y_pred_no_weight - y_pred_biased).sum() > 0


def test_stacking_classifier_sample_weight_fit_param():
# check sample_weight is passed to all invocations of fit
stacker = StackingClassifier(
estimators=[
('lr', CheckingClassifier(expected_fit_params=['sample_weight']))
],
final_estimator=CheckingClassifier(
expected_fit_params=['sample_weight']
)
)
stacker.fit(X_iris, y_iris, sample_weight=np.ones(X_iris.shape[0]))


@pytest.mark.filterwarnings("ignore::sklearn.exceptions.ConvergenceWarning")
@pytest.mark.parametrize(
"stacker, X, y",
Expand Down
1 change: 1 addition & 0 deletions sklearn/utils/_mocking.py
Expand Up @@ -95,6 +95,7 @@ def fit(self, X, y, **fit_params):
assert self.check_X(X)
if self.check_y is not None:
assert self.check_y(y)
self.n_features_in_ = len(X)
self.classes_ = np.unique(check_array(y, ensure_2d=False,
allow_nd=True))
if self.expected_fit_params:
Expand Down