Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Pass sample_weight when predicting on stacked folds #16539

Merged
merged 22 commits into from
Mar 6, 2020

Conversation

wderose
Copy link

@wderose wderose commented Feb 25, 2020

Reference Issues/PRs

Fixes #16537.

What does this implement/fix? Explain your changes.

This PR passes sample_weight via fit_params when we call cross_val_predict in _BaseStacking, as per the comment here.

predictions = Parallel(n_jobs=self.n_jobs)(
delayed(cross_val_predict)(clone(est), X, y, cv=deepcopy(cv),
method=meth, n_jobs=self.n_jobs,
verbose=self.verbose)
for est, meth in zip(all_estimators, self.stack_method_)
if est != 'drop'
)

Any other comments?

cc: @caioaao @glemaitre

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a |Fix| entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

Please add a versionchanged note in the docstring of sample_weight

@jnothman
Copy link
Member

But tests are failing. Should have checked that first

@wderose
Copy link
Author

wderose commented Feb 25, 2020

But tests are failing. Should have checked that first

@jnothman : Thank you for the prompt review.

I've added the requested doc changes and fixed the failing specs.

Please let me know if you'd like more details in the versionchange/release notes or if you'd like to see more tests.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Only a couple of comments

@wderose
Copy link
Author

wderose commented Feb 29, 2020

@glemaitre Thank you for the comments.

I updated the release notes to reference StackingClassifier and StackingRegressor directly.

I also created a follow-on issue (#16595) regarding the naming of _parallel_fit_estimator and added a comment noting the strange naming.

Please take another look at these changes and let me know if there is anything else you'd like me to include.

@wderose wderose requested a review from glemaitre February 29, 2020 00:33
@@ -96,6 +96,7 @@ def fit(self, X, y, **fit_params):
assert self.check_X(X)
if self.check_y is not None:
assert self.check_y(y)
self._check_n_features(X, reset=True)
Copy link
Author

@wderose wderose Feb 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to add this call so that n_features_in_ is set correctly in CheckingClassifier.

The property is read during the fit method of BaseStacking: d205638#diff-c8f397939f2393da5b2eef4285761da1R143. We use CheckingClassifier in the unit tests for the stacking estimators: https://github.com/scikit-learn/scikit-learn/pull/16539/files#diff-3979301645c7cbae79edc4085d997029R443

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @wderose

LGTM!

@thomasjpfan thomasjpfan merged commit f1acf83 into scikit-learn:master Mar 6, 2020
@wderose wderose deleted the is/16537 branch March 7, 2020 01:27
ashutosh1919 pushed a commit to ashutosh1919/scikit-learn that referenced this pull request Mar 13, 2020
gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is sample_weight missing when calling cross_val_predict in stacked model?
4 participants