-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX ensure consistency or column and feature names in FunctionTransformer #27801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -330,7 +330,7 @@ def test_function_transformer_get_feature_names_out( | |||
transformer = FunctionTransformer( | |||
feature_names_out=feature_names_out, validate=validate | |||
) | |||
transformer.fit_transform(X) | |||
transformer.fit(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling transform
allows us to find the issue with the number of columns and the function used here. Therefore, we can call fit
to avoid this check.
So here, we only raise a better error message. There is no magic but we provide an explanation what to do. I am not a big fan of the magical solution and I am not sure that we will be able to somehow return the expected type (NumPy vs. Pandas) since it will depend of what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Needs sync with main before merging. |
Thanks @lesteve for syncing the PR. Merging with the 2 above approvals. |
closes #27695
Raise an explicit error when the column names of the container given by
transform
is not consistent with the output ofget_feature_names_out
inFunctionTransformer
.In #27695, the error raised is not easy to understand when the
FunctionTransformer
is embedded within aPipeline
.Here, we also give some solution how to resolve the problem.
I see that we have test failing in our test suite. I need to check if they are legitimate. I see that some come from the fact that
feature_names_out
return less names than the number of columns inX_trans
.