ENH Add metadata routing for `RANSACRegressor` #28261

StefanieSenger · 2024-01-25T14:52:24Z

Reference Issues/PRs

Towards #22893

What does this implement/fix? Explain your changes.

This PR is supposed to implement metadata routing for RANSACRegressor.

Any other comments?

resolved issue:

RANSACRegressor.fit() routes to the sub-estimator's fit, predict and score. As a result fit runs into UnsetMetadataPassedError when set_fit_request is set, but set_score_request is not. The tests need to allow for several set_{method}_requests be set at once, whatever is needed.

github-actions · 2024-01-25T14:53:39Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 3af9455. Link to the linter CI: here}

…sume sample_weight

…sage for users

adrinjalali

Not a complete review, but leaving the comments I worked on for now.

sklearn/linear_model/_ransac.py

sklearn/tests/metadata_routing_common.py

adrinjalali · 2024-01-29T11:17:23Z

sklearn/tests/test_metaestimators_metadata_routing.py

+        "y": y,
+        "preserves_metadata": False,
+        "estimator_routing_methods": ["fit", "predict", "score"],
+        "requests_set_together": {"fit": ["predict", "score"]},


Suggested change

"requests_set_together": {"fit": ["predict", "score"]},

"requests_set_together": {"fit": ["fit", "predict", "score"]},

Adding fit is unnecessary, because when fit is used set_fit_request is already set.

sklearn/tests/test_metaestimators_metadata_routing.py

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

…scikit-learn into metadata_RANSACRegressor

StefanieSenger

@adrinjalali
Thanks for the review. I've implemented most of your suggestions, and commented the remaining few.

StefanieSenger · 2024-01-29T15:34:46Z

sklearn/linear_model/_ransac.py

+                        f"{e}, which is used internally by `RANSACRegressor.fit()`."
+                        f"Call `{estimator.__class__.__name__}.set_{{method}}_request("
+                        "{metadata}=True)` for each metadata."
+                    ),


ConsumingRegressor.set_score_request(sample_weight=True)

Do you mean I should add an example within the error message? I think this would clutter up people's consoles.

sklearn/tests/metadata_routing_common.py

StefanieSenger · 2024-01-29T18:09:26Z

sklearn/tests/test_metaestimators_metadata_routing.py

+        "y": y,
+        "preserves_metadata": False,
+        "estimator_routing_methods": ["fit", "predict", "score"],
+        "requests_set_together": {"fit": ["predict", "score"]},


Adding fit is unnecessary, because when fit is used set_fit_request is already set.

sklearn/tests/test_metaestimators_metadata_routing.py

StefanieSenger · 2024-02-01T09:59:22Z

From my side, there is one remaining issue:

test_ransac_fit_sample_weight and test_ransac_final_model_fit_sample_weight from test_ransac.py both raise:

y_pred = estimator.predict(X, **routed_params.estimator.predict)
E TypeError: LinearModel.predict() got an unexpected keyword argument 'sample_weight'

The reason for this is that RANSACRegressor never routed sample_weight to the underlying estimator's predict before (estimator.predict() is called from within RANSACRegressor's fit), but now it does and estimator.predict() isn't prepared to use sample_weight.

Actually no regressor in scikit-learn currently seems to accept sample_weight in predict (at least I didn't find any). So, there is no regressor we could use for the test. Unless ConsumingRegressor is modified so that it would have coef_.

Either this or ... what are the 100 reasons that speak against integrating sample_weight into LinearModel.decision_function like that:

    def _decision_function(self, X, sample_weight=None):
        check_is_fitted(self)

        X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], reset=False)

        if sample_weight is not None:
            weighted_X = X * sample_weight.reshape(-1, 1)
            return (
                safe_sparse_dot(weighted_X, self.coef_.T, dense_output=True)
                + self.intercept_
            )
        else:
            return safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_

    def predict(self, X, sample_weight=None):
        """
        Predict using the linear model.

        Parameters
        ----------
        X : array-like or sparse matrix, shape (n_samples, n_features)
            Samples.

        sample_weight : array-like, default=None
            Sample weights.

        Returns
        -------
        C : array, shape (n_samples,)
            Returns predicted values.
        """
        return self._decision_function(X, sample_weight)

@adrinjalali

StefanieSenger · 2024-02-01T12:56:23Z

Update: ConsumingRegressor can be used for test_ransac_final_model_fit_sample_weight quite easily, but test_ransac_fit_sample_weight is more elaborated and needs a real Regressor, that can fit and predict for real, and we don't have that as far as I know. The test would also be useless, when it's not a real scenario.

So, I have finally realized, that I should never have put fit_params as a param into estimator.predict().

adrinjalali · 2024-02-02T11:04:37Z

sklearn/linear_model/_ransac.py

+                        f"{e}, which is used internally by `RANSACRegressor.fit()`."
+                        f"Call `{estimator.__class__.__name__}.set_{{method}}_request("
+                        "{metadata}=True)` for each metadata."
+                    ),


I mean, the whole message, not just a part of it.

sklearn/linear_model/_ransac.py

adrinjalali · 2024-02-02T11:09:53Z

sklearn/tests/metadata_routing_common.py

-        # )
-        # return np.zeros(shape=(len(X),))
+    def score(self, X, y, sample_weight="default", metadata="default"):
+        self.predict(X, sample_weight="default", metadata="default")


we don't need to add this, and our existing scorers certainly don't route metadata to predict.

No, the existing score methods don't route metadata to predict. I went through all (I believe) to confirm that. Thus, this line is really not needed here.

For curiosity: What's the problem with sample_weight in predict? I can see it in any kind of function, except for in predict. I mean, if I modify my data before fitting, I need to modify it using the same pattern before predicting....

sklearn/tests/metadata_routing_common.py

adrinjalali · 2024-02-02T11:40:04Z

sklearn/tests/test_metaestimators_metadata_routing.py

            with pytest.raises(UnsetMetadataPassedError, match=re.escape(msg)):
                method = getattr(instance, method_name)
-                method(X, y, **method_kwargs)
+                if "fit" not in method_name:
+                    # set request on fit
+                    set_requests(estimator, methods=["fit"], metadata_name=key)
+                    # make sure error message corresponding to `method_name`
+                    # is used for test
+                    if method_name == "predict":
+                        set_requests(estimator, methods=["score"], metadata_name=key)
+                    if method_name == "score":
+                        set_requests(estimator, methods=["predict"], metadata_name=key)
+                    # fit before calling method
+                    fit_method = getattr(instance, "fit")
+                    fit_method(X, y, **method_kwargs)
+                if method_name == "predict":
+                    method(X, **method_kwargs)
+                else:
+                    method(X, y, **method_kwargs)


This section could be simplified as:

method_requests=metaestimator.get("requests_set_together", {}).get(method_name, []) set_request(estimator, methods=method_requests, metadata_name=key) with pytest.raises(UnsetMetadataPassedError, match=re.escape(msg)): method = getattr(instance, method_name) if "fit" not in method_name: instance.fit(X, y) try: # fit and partial_fit accept y, others don't. method(X, y, **method_kwargs) except TypeError: method(X, **method_kwargs)

we don't need to pass metadata to fit, when trying to check metadata routing in predict and score.

Yes, in the case that any other routing method is called from within fit, like RANSACRegressor calls predict and score from within fit, we need to pass metadata to fit as well. The goal is to make sure that an error is raised if the user sets the request on fit, but not on score for instance and then tries to pass metadata (unaware that score is also consuming it from within fit). If no metadata is passed, the error cannot raise. Your suggestion doesn't cover that case.

This is the case I'm trying to modify this test for.

Note: When I wrote the test, I thought that we'd also prepare estimator.predict to be passed metadata to (since ConsumingRegressor.predict was coded to receive metadata). This idea has changed since (it would be to large an endevour for this PR) and I will adjust this test accordingly. Also, I've used some of your suggestions, to simplify this test.

Though, I am still looking for a more generic version of it, that's not so specific for RANSAC. Something maybe like the following:

if "fit" not in method_name: # set request on fit set_requests(estimator, methods=["fit"], metadata_name=key) # make sure error message corresponding to `method_name` # is used for test requests_set_together = metaestimator.get("requests_set_together", {}) if method_name in requests_set_together.get("fit", {}): #rotate on setting requests with leaving out only method_name # fit before calling method instance.fit(X, y, **method_kwargs) try: # `fit` and `partial_fit` accept y, others don't. method(X, y, **method_kwargs) except TypeError: method(X, **method_kwargs)

But I need to think about it a little more. Next time it could be another method than fit that would internally route metadata to more than one sub-estimator methods.

adrinjalali · 2024-02-02T11:40:44Z

sklearn/tests/test_metaestimators_metadata_routing.py

+            if method_name in ["predict", "score"]:
+                # fit before calling method
+                set_requests(estimator, methods=["fit"], metadata_name=key)
+                if requests_set_together:
+                    set_requests(
+                        estimator,
+                        methods=requests_set_together["fit"],
+                        metadata_name=key,
+                    )
+                fit_method = getattr(instance, "fit")
+                fit_method(X, y, **method_kwargs, **extra_method_args)
+            if method_name == "predict":
+                method(X, **method_kwargs, **extra_method_args)
+            else:
+                method(X, y, **method_kwargs, **extra_method_args)


similar pattern applies here.

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

adrinjalali

As for simpli

sklearn/linear_model/_ransac.py

adrinjalali · 2024-02-06T13:11:20Z

sklearn/linear_model/_ransac.py

+                        f"{e}, which is used internally by `RANSACRegressor.fit()`."
+                        f"Call `{estimator.__class__.__name__}.set_{{method}}_request("
+                        "{metadata}=True)` for each metadata."
+                    ),


Yes, exactly. Cause I'm confused as why e is at the beginning of the message.

adrinjalali · 2024-02-06T13:15:50Z

sklearn/tests/test_metaestimators_metadata_routing.py

+                    set_requests(estimator, methods=["fit"], metadata_name=key)
+                    # make sure error message corresponding to `method_name`
+                    # is used for test
+                    if method_name != "score":
+                        set_requests(estimator, methods=["score"], metadata_name=key)


the information on which metadata to set, should be in the dictionary where we declare the tests. This:

"requests_set_together": {"fit": ["fit", "score"]},

then we don't need to special case here.

Yes, we still do need that special case here, because in this case we need to deviate from how the test flows.

The test rotates through fit, predict and score and if we're at anything different than score, that we want to test if it raises an error, we need to set the score request, not the ones set together with fit.

adrinjalali · 2024-02-18T22:16:49Z

sklearn/linear_model/_ransac.py

+                        f"{e}, which is used internally by `RANSACRegressor.fit()`."
+                        f"Call `{estimator.__class__.__name__}.set_{{method}}_request("
+                        "{metadata}=True)` for each metadata."
+                    ),


two things I'm not sure about here:

metadata and method are not replaced with actual values, and I'm not sure if it makes the error message more understandable.

This is a pattern repeated pretty much everywhere when we call process_routing, do we want to change all of them with this pattern? I don't think we should it's really verbose. We can maybe improve the error message raised by validation inside process_routing though.

sklearn/linear_model/_ransac.py

sklearn/tests/test_metaestimators_metadata_routing.py

adrinjalali · 2024-02-18T22:36:55Z

sklearn/tests/test_metaestimators_metadata_routing.py

+- requests_set_together: a dict that defines which set_{method}_requests need
+  to be set together with the key; used in case a router routes to different
+  methods from the sub-estimator from withing the same meta-estimator's method.
+  For instance, {"fit": ["score"]} would signal that
+  `estimator.set_fit_request` premises `estimator.set_score_request` to be set
+  as well.


This is basically the method mapping. I think when present, it should include all mappings instead of adding what's on top of a one-to-one mapping, which would also make the test code easier and more straightforward. It can also be called method_mapping.

But if we include the one-to-one mapping, in this case fit to fit, then we would need to a check if method_mapping exists very early of the test and have a big devide of how to deal with the two different cases.

I had thought about this before, and came to the conclusion that it would make the test more complicated. After all, it's better to treat this further down in the test. This is why I designed it this way. Would you mind approving it as it is, @adrinjalali?

StefanieSenger · 2024-02-20T09:33:39Z

I went through the comments and reacted to all of them, @adrinjalali. Please have another look.

I am wondering about the codecov warnings for l. 417 and 424 in test_metaestimators_metadata_routing.py. Do we have to test this function (that is used for a test itself?). If so, where is the rest of this function tested? I think it's nowhere.

Do you also want to have a look, @glemaitre?

adrinjalali

d3ac747 shows my idea of how to do the multiple mapping for methods.

adrinjalali · 2024-02-20T14:56:55Z

sklearn/linear_model/_ransac.py

+            try:
+                routed_params = process_routing(self, "fit", **fit_params)
+            except UnsetMetadataPassedError as e:
+                raise UnsetMetadataPassedError(
+                    message=(
+                        f"{e}, which is used internally by `RANSACRegressor.fit()`. "
+                        f"Call `{estimator.__class__.__name__}.set_{{method}}_request("
+                        "{metadata}=True)` for each metadata."
+                    ),
+                    unrequested_params=e.unrequested_params,
+                    routed_params=e.routed_params,
+                ) from e


can't find the previous messages about this.

I don't think we should modify the exception here. The exception, if needs to be improved, needs to be handled for every estimator, and we shouldn't expect this much boilerplate from meta-estimators.

StefanieSenger · 2024-02-22T13:39:42Z

Okay, I have now reverted my re-raise the UnsetMetadataPassedError here and will introduce it as a general improvement as separate PR.
I have also put the re-setting of the requests outside of the try-except, because it always works. The try-block now only contains the code that might fail.
Please have another go, @adrinjalali. What do you think about the CodeCov warnings? Is this something I should take care of? And if so, where? I think the rest of this helper function isn't tested either. Could we have it start with an underscore and CodeCov would then accept it?

adrinjalali

Otherwise LGTM.

@OmarManzoor @glemaitre you could have a look here maybe?

sklearn/linear_model/_ransac.py

OmarManzoor

Generally looks good. Just a minor change suggested. Also code coverage shows two lines that are uncovered which related to the new ValueErrors that we are raising inside the test_metaestimators_metadata_routing file. Do we need to cover them @adrinjalali

sklearn/tests/metadata_routing_common.py

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

adrinjalali

I don't think we need to cover those two lines, but it's nice to make it explicit that we're not covering them.

sklearn/tests/test_metaestimators_metadata_routing.py

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

OmarManzoor

LGTM. Thanks @StefanieSenger

unfinished metadata implementation for ransacregressor

5286d47

github-actions bot added the module:linear_model label Jan 25, 2024

StefanieSenger added 7 commits January 25, 2024 16:33

fixes

35ec69b

not quite but almost working fix

23fd9bf

fix for legacy build of routed_params

eaaab39

fix

807fcc5

ConsumingRegressor for LinearRegression because the latter doesnt con…

a9116cc

…sume sample_weight

adjust test to set multiple metadata requests and meaningful errormes…

6be1762

…sage for users

fix missing requests test for multiple routings

8e492a4

adrinjalali self-requested a review January 29, 2024 10:51

adrinjalali reviewed Jan 29, 2024

View reviewed changes

StefanieSenger and others added 4 commits January 30, 2024 10:20

Apply suggestions from code review

8fc3bd3

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

improvements after review

1308bcc

Merge branch 'metadata_RANSACRegressor' of github.com:StefanieSenger/…

75f1e4f

…scikit-learn into metadata_RANSACRegressor

more improvements triggered by review

1935838

StefanieSenger commented Jan 30, 2024

View reviewed changes

StefanieSenger added 2 commits January 30, 2024 22:28

route metadata from score to predict

928d99c

minor things

37070ec

prepared ConsumingRegressor for test in test_ransac

9a96eb8

fit_params should not be routed to sub_estimator.predict

d308d94

StefanieSenger marked this pull request as ready for review February 1, 2024 12:57

adrinjalali reviewed Feb 2, 2024

View reviewed changes

StefanieSenger and others added 4 commits February 5, 2024 16:24

excluded estimator.predict from tests

5bea176

Update sklearn/linear_model/_ransac.py

f66761d

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

catch FutureWarning

7d8c156

adjusted tests to ConsumingRegressor now uses own score method

341ded5

adrinjalali reviewed Feb 6, 2024

View reviewed changes

StefanieSenger and others added 6 commits February 15, 2024 12:14

Merge branch 'main' into metadata_RANSACRegressor

7be5cce

repair test and revert merge conflict errors

3145bc4

bypass FutureWarnings

423f6dd

bypass FutureWarning

2edff90

correct passing of ConsumingRegressor

f0bcdd4

raise to prevent silent bug

2c73de4

adrinjalali reviewed Feb 19, 2024

View reviewed changes

changes after review

fa1d712

StefanieSenger and others added 3 commits February 20, 2024 10:34

Merge branch 'main' into metadata_RANSACRegressor

c3bb0c1

typo

4f45a6b

Add method_mapping to the common test definitions of the metaestimators

d3ac747

adrinjalali reviewed Feb 20, 2024

View reviewed changes

without re-raising error message after review

508cc85

StefanieSenger and others added 2 commits February 22, 2024 14:39

Merge branch 'main' into metadata_RANSACRegressor

8ed5e86

fixed test for Bagging*

9d1e310

StefanieSenger mentioned this pull request Feb 23, 2024

MNT more informative error message for UnsetMetadataPassedError #28517

Merged

Merge branch 'main' into metadata_RANSACRegressor

ff274de

adrinjalali approved these changes Feb 23, 2024

View reviewed changes

sklearn/linear_model/_ransac.py Outdated Show resolved Hide resolved

shorter variable name

2b1c4e8

OmarManzoor reviewed Feb 26, 2024

View reviewed changes

sklearn/tests/metadata_routing_common.py Outdated Show resolved Hide resolved

Update sklearn/tests/metadata_routing_common.py

cc7a054

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

adrinjalali reviewed Feb 26, 2024

View reviewed changes

sklearn/tests/test_metaestimators_metadata_routing.py Outdated Show resolved Hide resolved

sklearn/tests/test_metaestimators_metadata_routing.py Outdated Show resolved Hide resolved

OmarManzoor and others added 2 commits February 26, 2024 17:32

Update sklearn/tests/test_metaestimators_metadata_routing.py

175c496

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Update sklearn/tests/test_metaestimators_metadata_routing.py

3af9455

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

OmarManzoor approved these changes Feb 26, 2024

View reviewed changes

OmarManzoor enabled auto-merge (squash) February 26, 2024 12:53

OmarManzoor merged commit e2f9530 into scikit-learn:main Feb 26, 2024
28 checks passed

StefanieSenger deleted the metadata_RANSACRegressor branch February 26, 2024 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Add metadata routing for `RANSACRegressor` #28261

ENH Add metadata routing for `RANSACRegressor` #28261

StefanieSenger commented Jan 25, 2024 •

edited

github-actions bot commented Jan 25, 2024 •

edited

adrinjalali left a comment

adrinjalali Jan 29, 2024

StefanieSenger Jan 29, 2024

StefanieSenger left a comment

StefanieSenger Jan 29, 2024

StefanieSenger Jan 29, 2024

StefanieSenger commented Feb 1, 2024

StefanieSenger commented Feb 1, 2024

adrinjalali Feb 2, 2024

adrinjalali Feb 2, 2024

StefanieSenger Feb 5, 2024

adrinjalali Feb 2, 2024

StefanieSenger Feb 5, 2024

adrinjalali Feb 2, 2024

adrinjalali left a comment

adrinjalali Feb 6, 2024

adrinjalali Feb 6, 2024

StefanieSenger Feb 13, 2024

adrinjalali Feb 18, 2024

adrinjalali Feb 18, 2024

StefanieSenger Feb 20, 2024

StefanieSenger commented Feb 20, 2024

adrinjalali left a comment

adrinjalali Feb 20, 2024

StefanieSenger commented Feb 22, 2024 •

edited

adrinjalali left a comment

OmarManzoor left a comment

adrinjalali left a comment

OmarManzoor left a comment

	"requests_set_together": {"fit": ["predict", "score"]},
	"requests_set_together": {"fit": ["fit", "predict", "score"]},

ENH Add metadata routing for RANSACRegressor #28261

ENH Add metadata routing for RANSACRegressor #28261

Conversation

StefanieSenger commented Jan 25, 2024 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Jan 25, 2024 • edited

✔️ Linting Passed

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanieSenger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanieSenger commented Feb 1, 2024

StefanieSenger commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanieSenger commented Feb 20, 2024

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanieSenger commented Feb 22, 2024 • edited

adrinjalali left a comment

Choose a reason for hiding this comment

OmarManzoor left a comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

OmarManzoor left a comment

Choose a reason for hiding this comment

ENH Add metadata routing for `RANSACRegressor` #28261

ENH Add metadata routing for `RANSACRegressor` #28261

StefanieSenger commented Jan 25, 2024 •

edited

github-actions bot commented Jan 25, 2024 •

edited

StefanieSenger commented Feb 22, 2024 •

edited