-
-
Notifications
You must be signed in to change notification settings - Fork 26k
API ColumnTransformer.transformers_: passthrough -> FunctionTransformer #27204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API ColumnTransformer.transformers_: passthrough -> FunctionTransformer #27204
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! The title needs to change to a API
tag, because ColumnTransformer
's API changes.
doc/whats_new/v1.4.rst
Outdated
- |API| :class:`~compose.ColumnTransformer` now replaces `"passthrough"` with a | ||
corresponding :class:`~preprocessing.FunctionTransformer` in the fitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always wanted this change, but it is backward incompatible. I am undecided on going through a deprecation cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a FIX to me, and we really won't be breaking many people's code by this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the backward incompatibility problematic. I would have considered as a fix if we would have only transformers in transformers_
and no string. But this is not the case. I can imagine some user code relying on the "passthrough" or "drop" strings. I know that this is annoying to handle those but I would rather be more confortable with a proper deprecation cycle.
I can imagine that we could a keyword legacy_mode_attributes
that could allow replace back the FunctionTransformer
in the transformers_
property and raise the deprecation warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean introducing the arg, do the deprecation, then introduce a deprecation for the arg that we just added to deprecate something which should never have been there in the first place. I rather not do that and annoy very few people whose code might break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any Python trick that would make string comparisons FunctionTransformer() == 'passthrough'
evaluate to True, but keep repr as intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see @lorentzenchr and yourself OK to consider it as a bug fix. So let's go with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just summed up the net benefit of both paths und the bugfix version won with less estimated trouble in total/for everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is what it is, but I'm going to throw in the opinion that this should have gone through a deprecation cycle. I have a production pipeline containing a very long-running incremental estimator that can't predict on 1.4.0, and is nontrivial to retrain from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we generally do not guarantee any form of forward compatibility of trained estimators. So train with 1.3 and load and predict with 1.4 won't work (or just by chance).
The pipeline itself that you have is very likely to work with this PR merged as long as you don't directly access the transformers_
attribute of a ColumnTransformer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though perhaps by coincidence there's previously been enough compatibility such that I've been able to decouple our dependency upgrades and re-fitting our stateful transformers to prevent service downtime. This update threw the following error in our test suite (due to "pass through")
> res = transformer.transform(X, **params.transform)
E AttributeError: 'str' object has no attribute 'transform'
../../../../.cache/pypoetry/virtualenvs/pricingbanditservice-pioIYUID-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py:1283: AttributeError
Fortunately, the update is backward compatible enough, so the solution for us was to train a new pipeline on 1.4.0, start serving it on 1.3.2, then do the dependency update.
of tuples of len 2. | ||
transformers, dropping the columns. | ||
|
||
DO NOT USE: This is for the implementation of get_params via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why the loud shouty "DO NOT USE"? Can we turn it into something more informative? Otherwise I feel like I didn't learn anything that I didn't already know from _transformers
being a private attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended the comment.
This is not rendered in our docs, so the note is not meant for end users. This is for us and future poor maintainers who will be confused. I started by using this instead of self._iter
to iterate through the transformers and then I realized I shouldn't.
…it-learn into columntransformer/clean
if func is _fit_transform_one: | ||
fitted = False | ||
else: # func is _transform_one | ||
fitted = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two reasons to keep fitted
as a separate parameter: (i) explicit is better than implicit; (ii) potentially when partial_fit
is implemented we'll need fitted=False
for the first call and fitted=True
for subsequent calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was as a result of a previous review, and I rather keep it as is now.
Should we merge this then? |
We need a second reviewer's approval, don't we? |
Let me review this one. I'll do once I branch for the release 1.3.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart of making the deprecation explicit, I am fine with the code changes.
If True, 'drop' transformers are filtered out. | ||
|
||
skip_empty_columns : bool | ||
If True, transformers with empty selected columns are filtered out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also have Yields
section to show that this is a generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding something, not sure if you like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is fine with me.
doc/whats_new/v1.4.rst
Outdated
- |API| :class:`~compose.ColumnTransformer` now replaces `"passthrough"` with a | ||
corresponding :class:`~preprocessing.FunctionTransformer` in the fitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the backward incompatibility problematic. I would have considered as a fix if we would have only transformers in transformers_
and no string. But this is not the case. I can imagine some user code relying on the "passthrough" or "drop" strings. I know that this is annoying to handle those but I would rather be more confortable with a proper deprecation cycle.
I can imagine that we could a keyword legacy_mode_attributes
that could allow replace back the FunctionTransformer
in the transformers_
property and raise the deprecation warnings.
@adrinjalali Could you change make the entry as a |Fix| (even if we have both tags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adding auto-merge here. |
…er (scikit-learn#27204) Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
While working on adding metadata routing to
ColumnTransformer
, I was quite confused by certain parts of the code. This PR adds a few docstrings to help future poor souls reading the code.With @glemaitre we also noticed
"passthrough"
is kept as is instead of being replaced by the correspondingFunctionTransformer
in.transformers_
. Putting the fittedFunctionTransformer
in the fittedtransformers_
attribute would make sense, and it simplifies the code quite a bit.Some other parts are also cleaned up as a result, and also discovered a bug which was detected once
"passthrough"
was replaced (and our tests detected the bug).._iter(...)
now has more explicit args to filter steps, and there's no more a need forreplace_strings
argument.cc @thomasjpfan