Skip to content

Conversation

glemaitre
Copy link
Member

closes #19119

SelfTrainingClassifier did not accept nested estimators that did not expose predict_proba.
One fix is to validate a fitted estimator where we know if predict_proba will be then available.

@@ -207,8 +207,11 @@ def fit(self, X, y):

if self.n_iter_ == 1:
# Only validate in the first iteration so that n_iter=0 is
# equivalent to the base_estimator itself.
_validate_estimator(self.base_estimator)
# equivalent to the base_estimator_ itself.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverrausch Do you recall what is the meaning of this line and the previous one?
It was not obvious to me.

Copy link
Member

@ogrisel ogrisel Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest:

# Only validate the fitted estimator of the first iteration.

Actually since the validation is really cheap, I would not mind simplifying this code and removing the if self.n_iter_ == 1: condition.

@glemaitre glemaitre added this to the 0.24.1 milestone Jan 6, 2021
@glemaitre glemaitre added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Jan 6, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (once the review comments are addressed).

@@ -207,8 +207,11 @@ def fit(self, X, y):

if self.n_iter_ == 1:
# Only validate in the first iteration so that n_iter=0 is
# equivalent to the base_estimator itself.
_validate_estimator(self.base_estimator)
# equivalent to the base_estimator_ itself.
Copy link
Member

@ogrisel ogrisel Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest:

# Only validate the fitted estimator of the first iteration.

Actually since the validation is really cheap, I would not mind simplifying this code and removing the if self.n_iter_ == 1: condition.

@riyadhctg
Copy link

riyadhctg commented Jan 6, 2021

LGTM. For the sake of sharing, the solution I had in my mind for #19119 was to change the order of the following two lines in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_stacking.py

names, all_estimators = self._validate_estimators()
self._validate_final_estimator()

to

self._validate_final_estimator()
names, all_estimators = self._validate_estimators()

@glemaitre
Copy link
Member Author

@riyadhctg The changes that you proposed is in the StackingClassifier. I think that the code is fine there. The changes required is in the SelfTrainingClassifier.

glemaitre and others added 2 commits January 6, 2021 17:29
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ogrisel ogrisel merged commit 6d3d1b8 into scikit-learn:master Jan 8, 2021
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Jan 18, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
jeremiedbb pushed a commit that referenced this pull request Jan 19, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:semi_supervised To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StackClassifier is SelfTrainingClassifier throws error
4 participants