Skip to content

FEA SLEP006: Metadata routing for SelfTrainingClassifier#28494

Merged
OmarManzoor merged 64 commits into
scikit-learn:mainfrom
adam2392:self-learn-meta
Jul 9, 2024
Merged

FEA SLEP006: Metadata routing for SelfTrainingClassifier#28494
OmarManzoor merged 64 commits into
scikit-learn:mainfrom
adam2392:self-learn-meta

Conversation

@adam2392

@adam2392 adam2392 commented Feb 21, 2024

Copy link
Copy Markdown
Member

Reference Issues/PRs

Towards: #22893

What does this implement/fix? Explain your changes.

Implements metadata routing for SelfTrainingClassifier. Note the added code diff simply comes from a replacement of base_estimator for estimator.

  • convert base_estimator to estimator and add a deprecation
  • uncomments other functions that previously were not tested in NonConsumingClassifier
  • fixes some minor design choices in the unit-testing framework of test_metaestimators_metadata_routing.py (e.g. try/except -> if/else to be more transparent)

Any other comments?

cc: @adrinjalali

Some open questions:

1. I presume, we want to forward metadata within all the functions possibly?
2. As a result, I'm not sure if the unit-testing approach is the best, so I was wondering if you have any suggestions? Should I try potentially refactoring the existing unit-testing code to allow testing for more than just fit?

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@github-actions

github-actions Bot commented Feb 21, 2024

Copy link
Copy Markdown

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 2cacdc9. Link to the linter CI: here

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 marked this pull request as ready for review February 23, 2024 20:38

@adrinjalali adrinjalali left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes, thanks @adam2392

Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Signed-off-by: Adam Li <adam2392@gmail.com>

@adam2392 adam2392 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review and pointers! I went thru and fixed the issues in the doc-strings and Bunch.

Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
@adam2392 adam2392 requested a review from adrinjalali February 27, 2024 17:17
@adam2392

Copy link
Copy Markdown
Member Author

Resolved conflicts. Feel free to ping me if there's additional changes desired

@adam2392

adam2392 commented Apr 2, 2024

Copy link
Copy Markdown
Member Author

This PR should not be affected by #28734

Comment thread sklearn/semi_supervised/tests/test_self_training.py Outdated
Comment thread sklearn/semi_supervised/tests/test_self_training.py Outdated
Comment thread sklearn/tests/test_metaestimators_metadata_routing.py Outdated
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@adam2392

Copy link
Copy Markdown
Member Author

Thanks for the review @OmarManzoor!

I'll change the last comment if @adrinjalali has any issues (#28494 (comment))

@OmarManzoor OmarManzoor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @adam2392

@adrinjalali adrinjalali left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adam2392

Comment thread sklearn/semi_supervised/_self_training.py
Comment thread sklearn/semi_supervised/_self_training.py
Comment thread sklearn/semi_supervised/_self_training.py
Comment thread sklearn/semi_supervised/_self_training.py
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment on lines +652 to +655
if method_name in ["fit", "partial_fit", "score"]:
# `fit`, `partial_fit`, 'score' accept y, others don't.
method(X, y, **method_kwargs)
except TypeError:
else:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this makes things hard to debug really. The try/except is more foolproof since a library like imbalance-learn adds some methods to the whole system by patching a few things in sklearn and things would just work.

adam2392 and others added 3 commits July 5, 2024 08:16
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>

@adam2392 adam2392 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @adrinjalali ! I addressed your comments in 3b5a3f0

Comment thread sklearn/semi_supervised/_self_training.py
Comment thread sklearn/semi_supervised/_self_training.py
Comment thread sklearn/semi_supervised/_self_training.py Outdated
Comment on lines +652 to +655
if method_name in ["fit", "partial_fit", "score"]:
# `fit`, `partial_fit`, 'score' accept y, others don't.
method(X, y, **method_kwargs)
except TypeError:
else:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I think it was hard before because the error message was not very clear on what method was failing from what class, so I assume that's been fixed with #29226

Comment on lines +652 to +655
if method_name in ["fit", "partial_fit", "score"]:
# `fit`, `partial_fit`, 'score' accept y, others don't.
method(X, y, **method_kwargs)
except TypeError:
else:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted it to the try/except

Comment thread sklearn/semi_supervised/_self_training.py Outdated
)
else:
estimator_ = clone(self.estimator)
return estimator_

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still missing the case where both estimator and base_estimator are passed, in which case we need to raise

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



# TODO(1.8): remove in 1.8
def test_deprecation_warning_base_estimator():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also test for all other cases in _get_estimator

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adam2392 added 3 commits July 5, 2024 09:12
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 requested a review from adrinjalali July 5, 2024 14:21
@adam2392

adam2392 commented Jul 6, 2024

Copy link
Copy Markdown
Member Author

Lmk if I missed anything else @adrinjalali.

Thanks for the review and patience!

@adrinjalali adrinjalali left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

I'll let @OmarManzoor have another look since this changed a bit since last he reviewed.

Comment thread sklearn/semi_supervised/_self_training.py Outdated
adam2392 added 2 commits July 8, 2024 09:42
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 requested a review from OmarManzoor July 8, 2024 13:42
@adam2392

adam2392 commented Jul 8, 2024

Copy link
Copy Markdown
Member Author

SG! Thanks for the reviews.

@OmarManzoor OmarManzoor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @adam2392

@OmarManzoor OmarManzoor merged commit cef803a into scikit-learn:main Jul 9, 2024
@adam2392 adam2392 deleted the self-learn-meta branch July 9, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants