Skip to content

FIX ensure consistency or column and feature names in FunctionTransformer #27801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 1, 2023

Conversation

glemaitre
Copy link
Member

closes #27695

Raise an explicit error when the column names of the container given by transform is not consistent with the output of get_feature_names_out in FunctionTransformer.

In #27695, the error raised is not easy to understand when the FunctionTransformer is embedded within a Pipeline.

Here, we also give some solution how to resolve the problem.

I see that we have test failing in our test suite. I need to check if they are legitimate. I see that some come from the fact that feature_names_out return less names than the number of columns in X_trans.

@glemaitre glemaitre marked this pull request as draft November 17, 2023 14:13
Copy link

github-actions bot commented Nov 17, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 468d0b3. Link to the linter CI: here

@@ -330,7 +330,7 @@ def test_function_transformer_get_feature_names_out(
transformer = FunctionTransformer(
feature_names_out=feature_names_out, validate=validate
)
transformer.fit_transform(X)
transformer.fit(X)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling transform allows us to find the issue with the number of columns and the function used here. Therefore, we can call fit to avoid this check.

@glemaitre glemaitre marked this pull request as ready for review November 17, 2023 18:38
@glemaitre
Copy link
Member Author

So here, we only raise a better error message. There is no magic but we provide an explanation what to do.

I am not a big fan of the magical solution and I am not sure that we will be able to somehow return the expected type (NumPy vs. Pandas) since it will depend of what func is supposed to output (in the case that validate=False).

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 24, 2023
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lorentzenchr
Copy link
Member

Needs sync with main before merging.

@glemaitre
Copy link
Member Author

Thanks @lesteve for syncing the PR. Merging with the 2 above approvals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:preprocessing Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pipeline using FunctionTransformer with feature_names_out=... fails when applied to dataframe argument
4 participants