Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FunctionTransformer should not convert DataFrames to arrays by default #10655

Closed
jnothman opened this Issue Feb 19, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@jnothman
Copy link
Member

jnothman commented Feb 19, 2018

I would expect a common use of FunctionTransformer is to apply some function to a Pandas DataFrame, ideally using its own methods or accessors. As noted in #10648, it can be easy for users to miss that they need to set validate=False to pass through a DataFrame without converting it to a NumPy array. I think it would be more user-friendly to have validate='array-or-frame' by default, which would pass through DataFrames to the function, but otherwise convert its input to a 2d array. For strict backwards compatibility, the default should be changed through a deprecation cycle, warning whenever using the default validation means a DataFrame is currently converted to an array.

Do others agree?

@rth

This comment has been minimized.

Copy link
Member

rth commented Feb 20, 2018

Sounds reasonable, particularly since at present,

From: #10453
Some meta-estimators (notably model selection and pipeline tools) will pass a dataframe along as-is to nested estimators.

So this would be just one more step in #5523 ..

for strict backwards compatibility, the default should be changed through a deprecation cycle, warning whenever using the default validation means a DataFrame is currently converted to an array.

The warning could say to manually apply check_array to get previous behaviour.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Feb 20, 2018

sounds good

@bmanohar16

This comment has been minimized.

Copy link
Contributor

bmanohar16 commented Feb 20, 2018

Can i work on this?
I should change to validate=False in the __init__() method and issue a warning inside the fit() or _transform() method?

@jnothman

This comment has been minimized.

Copy link
Member Author

jnothman commented Feb 20, 2018

@mohamed-ali

This comment has been minimized.

Copy link
Contributor

mohamed-ali commented Mar 5, 2018

@bmanohar16 are you working on this, or would you like me to take it over?

@bmanohar16

This comment has been minimized.

Copy link
Contributor

bmanohar16 commented Mar 6, 2018

@mohamed-ali, you can take over the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.