FEA SLEP006: Metadata routing for `SelfTrainingClassifier` #28494

adam2392 · 2024-02-21T15:18:38Z

Reference Issues/PRs

Towards: #22893

What does this implement/fix? Explain your changes.

Implements metadata routing for SelfTrainingClassifier. The unit-tests need to support scoring, decision_function, and prediction, so I am leaning towards adding this as a unit-test possibly with something like Pipeline?

Or seeing if it can fit inside the existing unit-testing framework w/ the other classifiers like BaggingClassifier.

Any other comments?

cc: @adrinjalali

Some open questions:

I presume, we want to forward metadata within all the functions possibly?
As a result, I'm not sure if the unit-testing approach is the best, so I was wondering if you have any suggestions? Should I try potentially refactoring the existing unit-testing code to allow testing for more than just fit?

Signed-off-by: Adam Li <adam2392@gmail.com>

github-actions · 2024-02-21T15:19:58Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 523b365. Link to the linter CI: here}

Signed-off-by: Adam Li <adam2392@gmail.com>

adrinjalali

A few notes, thanks @adam2392

sklearn/semi_supervised/_self_training.py

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392

Thank you for the review and pointers! I went thru and fixed the issues in the doc-strings and Bunch.

sklearn/semi_supervised/_self_training.py

Signed-off-by: Adam Li <adam2392@gmail.com>

…learn into self-learn-meta

adam2392 · 2024-03-14T13:42:52Z

Resolved conflicts. Feel free to ping me if there's additional changes desired

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2024-04-02T13:42:59Z

This PR should not be affected by #28734

sklearn/semi_supervised/tests/test_self_training.py

adrinjalali · 2024-04-10T12:38:59Z

sklearn/tests/test_metaestimators_metadata_routing.py

+        "X": X,
+        "y": y,
+        "preserves_metadata": True,
+        "estimator_routing_methods": ["fit"],


we can add the other methods here, can't we?

I can add predict as well. But when I add score and decision_function, I get the following error:

AssertionError: Expected dict_keys(['sample_weight']) vs dict_keys([])

I already test those other methods technically with the SimpleEstimator class. Idk if that's sufficient? I assumed that Pipeline tests were on their own because of this extra complexity of testing all the methods together, versus within this common test file. Thus the SelfLearningClassifier has a similar pattern as Pipeline

You might need to fix the Consumer object as well for this to work. Pipeline is a different story cause it has several steps and transforming data in between and all, this object is much simpler and it might not need the separate tests at all.

Okay, I think I got this working now. I reverted the Pipeline SimpleEstimator back and leveraged the entire framework in test_metaestimators_metadata_routing.py instead. The unit tests seem to pass locally.

Actually I think when I add score to the list of methods to test, I get the following errors. It's challenging for me to figure out why this is occurring tho tbh. Was wondering if you had any pointers?

# `set_{method}_request({metadata}==True)` on the underlying objects > set_requests( estimator, method_mapping=method_mapping, methods=[method_name], metadata_name=key, ) sklearn/tests/test_metaestimators_metadata_routing.py:692: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ sklearn/tests/test_metaestimators_metadata_routing.py:548: in set_requests set_request_for_method(**{metadata_name: value}) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ args = (), kw = {'metadata': True} def func(*args, **kw): """Updates the request for provided parameters This docstring is overwritten below. See REQUESTER_DOC for expected functionality """ if not _routing_enabled(): raise RuntimeError( "This method is only available when metadata routing is enabled." " You can enable it using" " sklearn.set_config(enable_metadata_routing=True)." ) if self.validate_keys and (set(kw) - set(self.keys)): > raise TypeError( f"Unexpected args: {set(kw) - set(self.keys)} in {self.name}. " f"Accepted arguments are: {set(self.keys)}" ) E TypeError: Unexpected args: {'metadata'} in score. Accepted arguments are: {'sample_weight'} sklearn/utils/_metadata_requests.py:1264: TypeError ============================================== short test summary info =============================================== FAILED sklearn/tests/test_metaestimators_metadata_routing.py::test_error_on_missing_requests_for_sub_estimator[SelfTrainingClassifier] FAILED sklearn/tests/test_metaestimators_metadata_routing.py::test_setting_request_on_sub_estimator_removes_error[SelfTrainingClassifier]

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Signed-off-by: Adam Li <adam2392@gmail.com>

…learn into self-learn-meta

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2024-05-23T17:52:08Z

Apologies for the delay @adrinjalali. Assuming the CIs are green, then this should be fixed according to your previous comments.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2024-05-28T15:17:40Z

sklearn/tests/test_metaestimators_metadata_routing.py

+                if method_name in ["fit", "partial_fit", "score"]:
+                    # `fit`, `partial_fit`, 'score' accept y, others don't.
                    method(X, y, **method_kwargs)
-                except TypeError:
+                else:


This try/except pattern makes it very hard to debug the errors and is not explicit. This pattern makes it explicit which methods use y and which do not.

adam2392 · 2024-05-28T15:18:30Z

sklearn/tests/test_metaestimators_metadata_routing.py

@@ -628,14 +645,14 @@ def test_error_on_missing_requests_for_sub_estimator(metaestimator):
                set_requests(
                    estimator,
                    method_mapping=metaestimator.get("method_mapping", {}),
-                    methods=["fit"],
+                    methods=[method_name],


When score is called, it should unset the method request for score, not fit.

adam2392 added 2 commits February 21, 2024 10:16

Mergeing

616c178

Signed-off-by: Adam Li <adam2392@gmail.com>

Mergeing

1e8c340

Signed-off-by: Adam Li <adam2392@gmail.com>

github-actions bot added the module:semi_supervised label Feb 21, 2024

adam2392 added 4 commits February 21, 2024 10:37

WIP

61cb4df

Signed-off-by: Adam Li <adam2392@gmail.com>

Merging main

b18fbbd

Signed-off-by: Adam Li <adam2392@gmail.com>

Merged

b96e9e9

Signed-off-by: Adam Li <adam2392@gmail.com>

WIP

2053545

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 marked this pull request as ready for review February 23, 2024 20:38

adrinjalali reviewed Feb 27, 2024

View reviewed changes

Address adrin's comments

7da8015

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 commented Feb 27, 2024

View reviewed changes

adam2392 added 4 commits February 27, 2024 09:32

Merge branch 'main' into self-learn-meta

c88e943

Merge branch 'main' into self-learn-meta

b46e26f

Fix unit tests

ed2437d

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'self-learn-meta' of https://github.com/adam2392/scikit-…

8180ee0

…learn into self-learn-meta

adam2392 requested a review from adrinjalali February 27, 2024 17:17

adam2392 added 3 commits February 28, 2024 12:55

Merge branch 'main' into self-learn-meta

d863ce1

Merge branch 'main' into self-learn-meta

3314455

Merge branch 'main' into self-learn-meta

7103f04

adam2392 added 4 commits March 14, 2024 22:32

Merge branch 'main' into self-learn-meta

549380a

Reformat lint

b5cfed2

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'main' into self-learn-meta

dca2712

Merge branch 'main' into self-learn-meta

7235ae1

adam2392 added 2 commits April 2, 2024 09:43

Merge branch 'main' into self-learn-meta

5ab88c1

Merge branch 'main' into self-learn-meta

972ddc1

adrinjalali reviewed Apr 10, 2024

View reviewed changes

Update sklearn/semi_supervised/tests/test_self_training.py

dbfdace

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

adam2392 added 2 commits April 11, 2024 13:59

Merge commit

91c0c31

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'self-learn-meta' of https://github.com/adam2392/scikit-…

fcee08b

…learn into self-learn-meta

glemaitre changed the title ~~FEA Metadata routing for SelfTrainingClassifier~~ FEA SLEP006: Metadata routing for SelfTrainingClassifier May 16, 2024

glemaitre mentioned this pull request May 16, 2024

SLEP006 - Metadata Routing task list #22893

Open

68 tasks

adam2392 added 2 commits May 23, 2024 13:29

Merging main

d3a7380

Signed-off-by: Adam Li <adam2392@gmail.com>

Revert changes and enable semi supervised

dbfbb62

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added 3 commits May 23, 2024 13:53

Fix diff

054c2bf

Signed-off-by: Adam Li <adam2392@gmail.com>

Trying to get it to work

d47a9c3

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix unit tests

523b365

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 commented May 28, 2024

View reviewed changes

adam2392 requested a review from adrinjalali May 28, 2024 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA SLEP006: Metadata routing for `SelfTrainingClassifier` #28494

FEA SLEP006: Metadata routing for `SelfTrainingClassifier` #28494

adam2392 commented Feb 21, 2024 •

edited

github-actions bot commented Feb 21, 2024 •

edited

adrinjalali left a comment

adam2392 left a comment

adam2392 commented Mar 14, 2024

adam2392 commented Apr 2, 2024

adrinjalali Apr 10, 2024

adam2392 Apr 11, 2024 •

edited

adrinjalali Apr 11, 2024

adam2392 May 23, 2024

adam2392 May 24, 2024

adam2392 commented May 23, 2024

adam2392 May 28, 2024

adam2392 May 28, 2024

FEA SLEP006: Metadata routing for SelfTrainingClassifier #28494

Are you sure you want to change the base?

FEA SLEP006: Metadata routing for SelfTrainingClassifier #28494

Conversation

adam2392 commented Feb 21, 2024 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Feb 21, 2024 • edited

✔️ Linting Passed

adrinjalali left a comment

Choose a reason for hiding this comment

adam2392 left a comment

Choose a reason for hiding this comment

adam2392 commented Mar 14, 2024

adam2392 commented Apr 2, 2024

adrinjalali Apr 10, 2024

Choose a reason for hiding this comment

adam2392 Apr 11, 2024 • edited

Choose a reason for hiding this comment

adrinjalali Apr 11, 2024

Choose a reason for hiding this comment

adam2392 May 23, 2024

Choose a reason for hiding this comment

adam2392 May 24, 2024

Choose a reason for hiding this comment

adam2392 commented May 23, 2024

adam2392 May 28, 2024

Choose a reason for hiding this comment

adam2392 May 28, 2024

Choose a reason for hiding this comment

FEA SLEP006: Metadata routing for `SelfTrainingClassifier` #28494

FEA SLEP006: Metadata routing for `SelfTrainingClassifier` #28494

adam2392 commented Feb 21, 2024 •

edited

github-actions bot commented Feb 21, 2024 •

edited

adam2392 Apr 11, 2024 •

edited