-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
ENH Implement get_feature_names_out for FeatureUnion in case of "passthrough" #24058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Implement get_feature_names_out for FeatureUnion in case of "passthrough" #24058
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! I left a minor comment on the whats new.
Overall LGTM
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
17de56e
to
9ca75e9
Compare
Updated with main branch and resolved a conflict in the whats_new. Anything else required before we can merge @thomasjpfan :) ? |
Thanks for the PR @diederikwp. Mostly looks good, however it still does not work if we don't pass the feature names to import pandas as pd
from sklearn.pipeline import FeatureUnion
from sklearn.decomposition import PCA
union = FeatureUnion([("pca", PCA(n_components=1)),
("pass", "passthrough")])
X = pd.DataFrame([[0., 1., 3], [2., 2., 5]], columns=["a", "b", "c"])
union.fit_transform(X)
union.get_feature_names_out() raises
It comes from |
Good catch @jeremiedbb, we do not have a test for not passing in I think the simplest solution is for def get_feature_names_out(self, input_features=None):
input_features = _check_feature_names_in(
self, input_features, generate_names=True
)
for name, trans, _ in self._iter():
... Note (There is a more complicated solution that requires #23993 and having |
#23993 has been merged which automatically fixes the issue reported here #24058 (comment). I just added a test for that. @thomasjpfan is it still good for you ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @diederikwp
Thanks @thomasjpfan and @jeremiedbb for making it work without passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup this still looks good. Thank you for the update @diederikwp and @jeremiedbb !
Reference Issues/PRs
What does this implement/fix? Explain your changes.
FeatureUnion
allows"passthrough"
to be supplied in stead of a regular transformer, in which case all input features will be passed through. Currently, callingget_feature_names_out
on such a union fails however. This PR makes it return the input feature names (prefixed with the transformer name) instead of failing.Example: This code
will return
array(['imp__f1', 'imp__f2', 'imp__f3', 'pass__f1', 'pass__f2', 'pass__f3'], dtype=object)
after merging this PR.Currently it raises
AttributeError: Transformer pass (type FunctionTransformer) does not provide get_feature_names_out.
Any other comments?
I believe this is the behaviour most users would expect. It also makes
FeatureUnion
consistent withColumnTransformer
when it comes to handling"passthrough"
inget_feature_names_out
.