-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA SLEP006: Metadata routing for SelfTrainingClassifier
#28494
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes, thanks @adam2392
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review and pointers! I went thru and fixed the issues in the doc-strings and Bunch
.
Signed-off-by: Adam Li <adam2392@gmail.com>
…learn into self-learn-meta
Resolved conflicts. Feel free to ping me if there's additional changes desired |
Signed-off-by: Adam Li <adam2392@gmail.com>
This PR should not be affected by #28734 |
"X": X, | ||
"y": y, | ||
"preserves_metadata": True, | ||
"estimator_routing_methods": ["fit"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add the other methods here, can't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add predict
as well. But when I add score
and decision_function
, I get the following error:
AssertionError: Expected dict_keys(['sample_weight']) vs dict_keys([])
I already test those other methods technically with the SimpleEstimator
class. Idk if that's sufficient? I assumed that Pipeline tests were on their own because of this extra complexity of testing all the methods together, versus within this common test file. Thus the SelfLearningClassifier
has a similar pattern as Pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to fix the Consumer object as well for this to work. Pipeline is a different story cause it has several steps and transforming data in between and all, this object is much simpler and it might not need the separate tests at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I got this working now. I reverted the Pipeline SimpleEstimator
back and leveraged the entire framework in test_metaestimators_metadata_routing.py
instead. The unit tests seem to pass locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think when I add score
to the list of methods to test, I get the following errors. It's challenging for me to figure out why this is occurring tho tbh. Was wondering if you had any pointers?
# `set_{method}_request({metadata}==True)` on the underlying objects
> set_requests(
estimator,
method_mapping=method_mapping,
methods=[method_name],
metadata_name=key,
)
sklearn/tests/test_metaestimators_metadata_routing.py:692:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/tests/test_metaestimators_metadata_routing.py:548: in set_requests
set_request_for_method(**{metadata_name: value})
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (), kw = {'metadata': True}
def func(*args, **kw):
"""Updates the request for provided parameters
This docstring is overwritten below.
See REQUESTER_DOC for expected functionality
"""
if not _routing_enabled():
raise RuntimeError(
"This method is only available when metadata routing is enabled."
" You can enable it using"
" sklearn.set_config(enable_metadata_routing=True)."
)
if self.validate_keys and (set(kw) - set(self.keys)):
> raise TypeError(
f"Unexpected args: {set(kw) - set(self.keys)} in {self.name}. "
f"Accepted arguments are: {set(self.keys)}"
)
E TypeError: Unexpected args: {'metadata'} in score. Accepted arguments are: {'sample_weight'}
sklearn/utils/_metadata_requests.py:1264: TypeError
============================================== short test summary info ===============================================
FAILED sklearn/tests/test_metaestimators_metadata_routing.py::test_error_on_missing_requests_for_sub_estimator[SelfTrainingClassifier]
FAILED sklearn/tests/test_metaestimators_metadata_routing.py::test_setting_request_on_sub_estimator_removes_error[SelfTrainingClassifier]
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
…learn into self-learn-meta
SelfTrainingClassifier
SelfTrainingClassifier
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Apologies for the delay @adrinjalali. Assuming the CIs are green, then this should be fixed according to your previous comments. |
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
if method_name in ["fit", "partial_fit", "score"]: | ||
# `fit`, `partial_fit`, 'score' accept y, others don't. | ||
method(X, y, **method_kwargs) | ||
except TypeError: | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This try/except pattern makes it very hard to debug the errors and is not explicit. This pattern makes it explicit which methods use y
and which do not.
@@ -628,14 +645,14 @@ def test_error_on_missing_requests_for_sub_estimator(metaestimator): | |||
set_requests( | |||
estimator, | |||
method_mapping=metaestimator.get("method_mapping", {}), | |||
methods=["fit"], | |||
methods=[method_name], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When score
is called, it should unset the method request for score
, not fit
.
Reference Issues/PRs
Towards: #22893
What does this implement/fix? Explain your changes.
Implements metadata routing for
SelfTrainingClassifier
. The unit-tests need to support scoring, decision_function, and prediction, so I am leaning towards adding this as a unit-test possibly with something like Pipeline?Or seeing if it can fit inside the existing unit-testing framework w/ the other classifiers like BaggingClassifier.
Any other comments?
cc: @adrinjalali
Some open questions:
fit
?