Skip to content

Conversation

@StefanieSenger
Copy link
Member

Reference Issues/PRs

Towards #22893

What does this implement/fix? Explain your changes.

This PR adds metadata routing to FeatureUnion and the corresponding tests.

I was confused about why we cannot do set_fit_transform_request. I know we can work around it, but from a user's perspective it's not self-explanatory.

@github-actions
Copy link

github-actions bot commented Jan 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 0304593. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_fit_transform_request isn't well-defined since it's a composite method, therefore it always has to be a combination of routing for fit and transform.

@adrinjalali adrinjalali self-requested a review February 2, 2024 15:05
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the test about making sure get_metadata_routing works on an unfitted estimator here.

method_mapping=MethodMapping()
.add(callee="fit", caller="fit")
.add(callee="fit_transform", caller="fit")
.add(callee="fit_transform", caller="fit_transform"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't do explicit fit_transform here, since it's a compound method.

Copy link
Member Author

@StefanieSenger StefanieSenger Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see:

From FeatureUnion's fit_transform we need to prepare for routing into three directions: to fit, to transform and to fit_transform. This is because of the way _fit_transform_one is designed:

        if hasattr(transformer, "fit_transform"):
            res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
        else:
            res = transformer.fit(X, y, **params.get("fit", {})).transform(
                X, **params.get("transform", {})
            )

It is designed this way, because a transformer can use either fit and transform or fit_transform and in fact, metadata routing is only possible, if it overwrites TransformerMixin's fit_transform.

But yes, we don't route from the FeaturUnion's fit to the estimator's fit_transform.

I've changed the routing to my new understandings.

Copy link
Member Author

@StefanieSenger StefanieSenger Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't have a test for transformer.fit().transform(). I made one using a new class ConsumingNoFitTransformTransformer, that is special, because it doesn't inherit from TransformerMixin. it has a long name, but it's a speaking name. :)

We need this new class to make transformer.fit().transform() possible.

ConsumingClassifier's fit and transform would not be used, because it already has a fit_transform, that would be used instead.

@adrinjalali adrinjalali self-requested a review February 13, 2024 12:11
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

routed_fit_transform_params = Bunch()
for name, _ in self.transformer_list:
routed_fit_transform_params[name] = Bunch(fit_transform={})
routed_fit_transform_params[name].fit_transform = fit_params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
routed_fit_transform_params[name].fit_transform = fit_params
routed_fit_transform_params[name].fit_transform = fit_params
routed_fit_transform_params[name].fit = fit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it needs to be like that:

                if hasattr(obj, "fit_transform"):
                    routed_params[name] = Bunch(fit_transform={})
                    routed_params[name].fit_transform = params
                else:
                    routed_params[name] = Bunch(fit={})
                    routed_params[name] = Bunch(transform={})
                    routed_params[name].fit = params
                    routed_params[name].transform = params

It accounts for the metadata to be passed to the correct methods of the sub-transformer. I've also added a test for that (test_feature_union_fit_params_without_fit_transform).

@StefanieSenger
Copy link
Member Author

I've addressed all your review comments, @adrinjalali.
Would you have a fresh look if the PR is alright now?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@StefanieSenger
Copy link
Member Author

Okay, I've taken care of that, @adrinjalali.

@glemaitre glemaitre self-requested a review February 23, 2024 13:38
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Just some questions reagarding transform and thus the renaming from fit_params to params.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@glemaitre
Copy link
Member

@StefanieSenger I think that we are missing forwarding the metadata in the transform method of FeatureUnion. Quickly speaking with @adrinjalali, it seems this is something that we overlooked here.

@StefanieSenger
Copy link
Member Author

Oh yes, @glemaitre and @adrinjalali, sorry for the oversight.
I've added that now.

@StefanieSenger
Copy link
Member Author

I've also simplified the Metadata Router of FeatureUnion (by aligning if and else conditions), because they are almost identical and I think no harm would happen, by providing more possibilities than an object would use.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @StefanieSenger

@glemaitre glemaitre merged commit b1ce4c1 into scikit-learn:main Mar 5, 2024
@StefanieSenger StefanieSenger deleted the metadata_FeatureUnion branch March 6, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants