Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FunctionTransformer should not convert DataFrames to arrays by default #10655

Closed
jnothman opened this issue Feb 19, 2018 · 6 comments
Closed

FunctionTransformer should not convert DataFrames to arrays by default #10655

jnothman opened this issue Feb 19, 2018 · 6 comments
Labels

Comments

@jnothman
Copy link
Member

@jnothman jnothman commented Feb 19, 2018

I would expect a common use of FunctionTransformer is to apply some function to a Pandas DataFrame, ideally using its own methods or accessors. As noted in #10648, it can be easy for users to miss that they need to set validate=False to pass through a DataFrame without converting it to a NumPy array. I think it would be more user-friendly to have validate='array-or-frame' by default, which would pass through DataFrames to the function, but otherwise convert its input to a 2d array. For strict backwards compatibility, the default should be changed through a deprecation cycle, warning whenever using the default validation means a DataFrame is currently converted to an array.

Do others agree?

@rth
Copy link
Member

@rth rth commented Feb 20, 2018

Sounds reasonable, particularly since at present,

From: #10453
Some meta-estimators (notably model selection and pipeline tools) will pass a dataframe along as-is to nested estimators.

So this would be just one more step in #5523 ..

for strict backwards compatibility, the default should be changed through a deprecation cycle, warning whenever using the default validation means a DataFrame is currently converted to an array.

The warning could say to manually apply check_array to get previous behaviour.

@amueller
Copy link
Member

@amueller amueller commented Feb 20, 2018

sounds good

@bmanohar16
Copy link
Contributor

@bmanohar16 bmanohar16 commented Feb 20, 2018

Can i work on this?
I should change to validate=False in the __init__() method and issue a warning inside the fit() or _transform() method?

@jnothman
Copy link
Member Author

@jnothman jnothman commented Feb 20, 2018

@mohamed-ali
Copy link
Contributor

@mohamed-ali mohamed-ali commented Mar 5, 2018

@bmanohar16 are you working on this, or would you like me to take it over?

@bmanohar16
Copy link
Contributor

@bmanohar16 bmanohar16 commented Mar 6, 2018

@mohamed-ali, you can take over the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants