Skip to content

API ColumnTransformer.transformers_: passthrough -> FunctionTransformer #27204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 5, 2023

Conversation

adrinjalali
Copy link
Member

While working on adding metadata routing to ColumnTransformer, I was quite confused by certain parts of the code. This PR adds a few docstrings to help future poor souls reading the code.

With @glemaitre we also noticed "passthrough" is kept as is instead of being replaced by the corresponding FunctionTransformer in .transformers_. Putting the fitted FunctionTransformer in the fitted transformers_ attribute would make sense, and it simplifies the code quite a bit.

Some other parts are also cleaned up as a result, and also discovered a bug which was detected once "passthrough" was replaced (and our tests detected the bug).

._iter(...) now has more explicit args to filter steps, and there's no more a need for replace_strings argument.

cc @thomasjpfan

@github-actions
Copy link

github-actions bot commented Aug 29, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 70f6e68. Link to the linter CI: here

@glemaitre glemaitre self-requested a review August 29, 2023 16:58
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

adrinjalali and others added 2 commits September 1, 2023 03:00
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The title needs to change to a API tag, because ColumnTransformer's API changes.

Comment on lines 104 to 105
- |API| :class:`~compose.ColumnTransformer` now replaces `"passthrough"` with a
corresponding :class:`~preprocessing.FunctionTransformer` in the fitted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always wanted this change, but it is backward incompatible. I am undecided on going through a deprecation cycle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a FIX to me, and we really won't be breaking many people's code by this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the backward incompatibility problematic. I would have considered as a fix if we would have only transformers in transformers_ and no string. But this is not the case. I can imagine some user code relying on the "passthrough" or "drop" strings. I know that this is annoying to handle those but I would rather be more confortable with a proper deprecation cycle.

I can imagine that we could a keyword legacy_mode_attributes that could allow replace back the FunctionTransformer in the transformers_ property and raise the deprecation warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean introducing the arg, do the deprecation, then introduce a deprecation for the arg that we just added to deprecate something which should never have been there in the first place. I rather not do that and annoy very few people whose code might break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any Python trick that would make string comparisons FunctionTransformer() == 'passthrough' evaluate to True, but keep repr as intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see @lorentzenchr and yourself OK to consider it as a bug fix. So let's go with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just summed up the net benefit of both paths und the bugfix version won with less estimated trouble in total/for everyone.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is what it is, but I'm going to throw in the opinion that this should have gone through a deprecation cycle. I have a production pipeline containing a very long-running incremental estimator that can't predict on 1.4.0, and is nontrivial to retrain from scratch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we generally do not guarantee any form of forward compatibility of trained estimators. So train with 1.3 and load and predict with 1.4 won't work (or just by chance).
The pipeline itself that you have is very likely to work with this PR merged as long as you don't directly access the transformers_ attribute of a ColumnTransformer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though perhaps by coincidence there's previously been enough compatibility such that I've been able to decouple our dependency upgrades and re-fitting our stateful transformers to prevent service downtime. This update threw the following error in our test suite (due to "pass through")

>       res = transformer.transform(X, **params.transform)
E       AttributeError: 'str' object has no attribute 'transform'

../../../../.cache/pypoetry/virtualenvs/pricingbanditservice-pioIYUID-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py:1283: AttributeError

Fortunately, the update is backward compatible enough, so the solution for us was to train a new pipeline on 1.4.0, start serving it on 1.3.2, then do the dependency update.

@adrinjalali adrinjalali changed the title MNT some ColumnTransformer cleanup and add a few docstrings API some ColumnTransformer cleanup and add a few docstrings Sep 2, 2023
@adrinjalali adrinjalali changed the title API some ColumnTransformer cleanup and add a few docstrings API ColumnTransformer.transformers_: passtgrough -> FunctionTransformer Sep 2, 2023
@thomasjpfan thomasjpfan changed the title API ColumnTransformer.transformers_: passtgrough -> FunctionTransformer API ColumnTransformer.transformers_: passthrough -> FunctionTransformer Sep 3, 2023
of tuples of len 2.
transformers, dropping the columns.

DO NOT USE: This is for the implementation of get_params via
Copy link
Member

@betatim betatim Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why the loud shouty "DO NOT USE"? Can we turn it into something more informative? Otherwise I feel like I didn't learn anything that I didn't already know from _transformers being a private attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended the comment.

This is not rendered in our docs, so the note is not meant for end users. This is for us and future poor maintainers who will be confused. I started by using this instead of self._iter to iterate through the transformers and then I realized I shouldn't.

Comment on lines +724 to +728
if func is _fit_transform_one:
fitted = False
else: # func is _transform_one
fitted = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two reasons to keep fitted as a separate parameter: (i) explicit is better than implicit; (ii) potentially when partial_fit is implemented we'll need fitted=False for the first call and fitted=True for subsequent calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was as a result of a previous review, and I rather keep it as is now.

@adrinjalali
Copy link
Member Author

Should we merge this then?

@lorentzenchr
Copy link
Member

Should we merge this then?

We need a second reviewer's approval, don't we?

@glemaitre
Copy link
Member

Let me review this one. I'll do once I branch for the release 1.3.1

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart of making the deprecation explicit, I am fine with the code changes.

If True, 'drop' transformers are filtered out.

skip_empty_columns : bool
If True, transformers with empty selected columns are filtered out.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have Yields section to show that this is a generator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding something, not sure if you like it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is fine with me.

Comment on lines 104 to 105
- |API| :class:`~compose.ColumnTransformer` now replaces `"passthrough"` with a
corresponding :class:`~preprocessing.FunctionTransformer` in the fitted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the backward incompatibility problematic. I would have considered as a fix if we would have only transformers in transformers_ and no string. But this is not the case. I can imagine some user code relying on the "passthrough" or "drop" strings. I know that this is annoying to handle those but I would rather be more confortable with a proper deprecation cycle.

I can imagine that we could a keyword legacy_mode_attributes that could allow replace back the FunctionTransformer in the transformers_ property and raise the deprecation warnings.

@glemaitre glemaitre self-requested a review September 25, 2023 13:17
@glemaitre glemaitre requested review from glemaitre and removed request for glemaitre September 25, 2023 13:17
@glemaitre
Copy link
Member

glemaitre commented Oct 5, 2023

@adrinjalali Could you change make the entry as a |Fix| (even if we have both tags.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre enabled auto-merge (squash) October 5, 2023 10:01
@glemaitre
Copy link
Member

Adding auto-merge here.

@glemaitre glemaitre merged commit 24b782f into scikit-learn:main Oct 5, 2023
@adrinjalali adrinjalali deleted the columntransformer/clean branch October 5, 2023 11:47
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…er (scikit-learn#27204)

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants