Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA SLEP006: Metadata routing for SelfTrainingClassifier #28494

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

adam2392
Copy link
Contributor

@adam2392 adam2392 commented Feb 21, 2024

Reference Issues/PRs

Towards: #22893

What does this implement/fix? Explain your changes.

Implements metadata routing for SelfTrainingClassifier. The unit-tests need to support scoring, decision_function, and prediction, so I am leaning towards adding this as a unit-test possibly with something like Pipeline?

Or seeing if it can fit inside the existing unit-testing framework w/ the other classifiers like BaggingClassifier.

Any other comments?

cc: @adrinjalali

Some open questions:

  1. I presume, we want to forward metadata within all the functions possibly?
  2. As a result, I'm not sure if the unit-testing approach is the best, so I was wondering if you have any suggestions? Should I try potentially refactoring the existing unit-testing code to allow testing for more than just fit?

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link

github-actions bot commented Feb 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 523b365. Link to the linter CI: here

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 marked this pull request as ready for review February 23, 2024 20:38
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes, thanks @adam2392

sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Contributor Author

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review and pointers! I went thru and fixed the issues in the doc-strings and Bunch.

sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/_self_training.py Outdated Show resolved Hide resolved
@adam2392
Copy link
Contributor Author

Resolved conflicts. Feel free to ping me if there's additional changes desired

@adam2392
Copy link
Contributor Author

adam2392 commented Apr 2, 2024

This PR should not be affected by #28734

sklearn/semi_supervised/tests/test_self_training.py Outdated Show resolved Hide resolved
sklearn/semi_supervised/tests/test_self_training.py Outdated Show resolved Hide resolved
"X": X,
"y": y,
"preserves_metadata": True,
"estimator_routing_methods": ["fit"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add the other methods here, can't we?

Copy link
Contributor Author

@adam2392 adam2392 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add predict as well. But when I add score and decision_function, I get the following error:

 AssertionError: Expected dict_keys(['sample_weight']) vs dict_keys([])

I already test those other methods technically with the SimpleEstimator class. Idk if that's sufficient? I assumed that Pipeline tests were on their own because of this extra complexity of testing all the methods together, versus within this common test file. Thus the SelfLearningClassifier has a similar pattern as Pipeline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to fix the Consumer object as well for this to work. Pipeline is a different story cause it has several steps and transforming data in between and all, this object is much simpler and it might not need the separate tests at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I got this working now. I reverted the Pipeline SimpleEstimator back and leveraged the entire framework in test_metaestimators_metadata_routing.py instead. The unit tests seem to pass locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think when I add score to the list of methods to test, I get the following errors. It's challenging for me to figure out why this is occurring tho tbh. Was wondering if you had any pointers?

                # `set_{method}_request({metadata}==True)` on the underlying objects
>               set_requests(
                    estimator,
                    method_mapping=method_mapping,
                    methods=[method_name],
                    metadata_name=key,
                )

sklearn/tests/test_metaestimators_metadata_routing.py:692: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sklearn/tests/test_metaestimators_metadata_routing.py:548: in set_requests
    set_request_for_method(**{metadata_name: value})
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (), kw = {'metadata': True}

    def func(*args, **kw):
        """Updates the request for provided parameters
    
        This docstring is overwritten below.
        See REQUESTER_DOC for expected functionality
        """
        if not _routing_enabled():
            raise RuntimeError(
                "This method is only available when metadata routing is enabled."
                " You can enable it using"
                " sklearn.set_config(enable_metadata_routing=True)."
            )
    
        if self.validate_keys and (set(kw) - set(self.keys)):
>           raise TypeError(
                f"Unexpected args: {set(kw) - set(self.keys)} in {self.name}. "
                f"Accepted arguments are: {set(self.keys)}"
            )
E           TypeError: Unexpected args: {'metadata'} in score. Accepted arguments are: {'sample_weight'}

sklearn/utils/_metadata_requests.py:1264: TypeError
============================================== short test summary info ===============================================
FAILED sklearn/tests/test_metaestimators_metadata_routing.py::test_error_on_missing_requests_for_sub_estimator[SelfTrainingClassifier]
FAILED sklearn/tests/test_metaestimators_metadata_routing.py::test_setting_request_on_sub_estimator_removes_error[SelfTrainingClassifier]

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@glemaitre glemaitre changed the title FEA Metadata routing for SelfTrainingClassifier FEA SLEP006: Metadata routing for SelfTrainingClassifier May 16, 2024
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Contributor Author

Apologies for the delay @adrinjalali. Assuming the CIs are green, then this should be fixed according to your previous comments.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Comment on lines +652 to +655
if method_name in ["fit", "partial_fit", "score"]:
# `fit`, `partial_fit`, 'score' accept y, others don't.
method(X, y, **method_kwargs)
except TypeError:
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try/except pattern makes it very hard to debug the errors and is not explicit. This pattern makes it explicit which methods use y and which do not.

@@ -628,14 +645,14 @@ def test_error_on_missing_requests_for_sub_estimator(metaestimator):
set_requests(
estimator,
method_mapping=metaestimator.get("method_mapping", {}),
methods=["fit"],
methods=[method_name],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When score is called, it should unset the method request for score, not fit.

@adam2392 adam2392 requested a review from adrinjalali May 28, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

2 participants