Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeprecationWarning with sklearn.GridSearchCV #76

Closed
MSardelich opened this issue Jan 21, 2017 · 15 comments
Closed

DeprecationWarning with sklearn.GridSearchCV #76

MSardelich opened this issue Jan 21, 2017 · 15 comments

Comments

@MSardelich
Copy link

When using the library together with sklearn.GridSearchCV I am getting the deprecation warning below:

/home/marcelo/.local/lib/python2.7/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20. % type(estimator).__name__, DeprecationWarning)

Since the behavior will change soon, I would check if it is something relevant to this library.

@dukebody
Copy link
Collaborator

Hi @MSardelich! Thanks for the heads up. I'd have to investigate why does this happen, and how to fix it.

Do you have time to do this yourself and submit a PR to fix it? Hints:

@MSardelich
Copy link
Author

MSardelich commented Jan 23, 2017

Hi @dukebody! Thanks for the prompt response.

I see sklearn-pandas works in a way that modifies sklearn constructor parameters.
That said, I could spot the same lines that raise the exception in the sklearn library. However, I don't think it is a behavior they want to change their side. Firstly, because it is a relatively new exception and probably was implemented on purpose. Secondly, I am afraid sklearn-pandas will stop working soon, if not fixed.

Altogether, IMHO it deserves a closer approach to the sklearn team or re-engineering the way sklearn-pandas works.

What is your take?

@dukebody
Copy link
Collaborator

From a brief read I understand this requirement is for the cloning happening in the cross-validation to work easily. You are right, we need to modify the way sklearn-pandas work for it to be compatible with sklearn>=0.20. Are you interested in contributing code to fix this?

@MSardelich
Copy link
Author

@dukebody Unfortunately I don't feel confident enough to propose these structural changes. It seems to be relatively complex to change the way skleran-pandas behaves. Is Paul still contributing actively to the project? What about going straight to the sklearn community and asking them to allow the constructor behavior?

@dukebody
Copy link
Collaborator

Paul is not actively contributing to the sklearn-pandas project anymore. I'll investigate which changes are needed and ping the sklearn devs if neccessary.

@acourdavault
Copy link

Hello i just wonder If this can be fixed, or there would be a usage workaround maybe?

thanks

@dukebody
Copy link
Collaborator

It could be fixed, but requires a change in how skearn-pandas internally works. The dataframe mapper modifies the constructor parameters in the __init__ method to create a pipeline, which is what the warning is complaining about apparently.

However checking the code around https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py#L111 it looks like the scikit-learn clone function expects all input arguments of an estimator to be numpy ndarrays or sparse matrices. Since the DataFrameMapper receives lists of estimators as inputs, I'm not sure if we can comply with the new requirements.

I asked the sklearn maintainers in scikit-learn/scikit-learn#5540 (comment) for help.

@wpm
Copy link

wpm commented May 15, 2017

It looks like the issue here lies with the semantics of the __init__ parameters of an sklearn Estimator. Sklearn wants these to be unchanged by the constructor under the assumption that they might be things we want to set with set_params. But the features argument of DataFrameMapper isn't something that set_params will modify.

DataFrameMapper isn't really fully built until _build_feature has transformed all the feature tuples but Sklearn tacitly assumes that estimators are "fully built", i.e. the only modification happens via set_params. DataFrameMapper could be made to conform to this assumption by creating it with a factory function that takes the (name, transform) tuples as input, transforms them with _build_feature, then passes those transformed objects to the DataFrameMapper constructor.

@dukebody
Copy link
Collaborator

dukebody commented Jun 10, 2017

@wpm could you submit a PR to fix this issue?

According to the sklearn mantainers explanation I understant that what we should do is defer the _build_feature transformation until the fit method, which looks reasonable.

@drorata
Copy link

drorata commented Jun 16, 2017

I have a similar problem, when using the decision tree classifier.

mapper = DataFrameMapper(
    [([col], StandardScaler()) for col in X.dtypes[X.dtypes != 'category'].index.tolist()] +
    [([col], OneHotEncoder())  for col in X.dtypes[X.dtypes == 'category'].index.tolist()]
)
cross_val_score(make_pipeline(mapper, DecisionTreeClassifier()), X, Y)

Warns:

<cut>/lib/python3.6/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20.
  % type(estimator).__name__, DeprecationWarning)
<cut>/lib/python3.6/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20.
  % type(estimator).__name__, DeprecationWarning)
<cut>/lib/python3.6/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20.
  % type(estimator).__name__, DeprecationWarning)

@jimmywan
Copy link

Can someone take a look at my PR #105 ?

@dukebody
Copy link
Collaborator

@jimmywan Thanks a lot for your contribution! This is a much expected feature and I didn't have the energy to work on it myself.

I commited a patch based on yours with some extra changes to deal with the default argument and keep compatibility with old pickled mappers.

@drorata
Copy link

drorata commented Jun 24, 2017

Thanks for the efforts! Is the update already available on pip?

@dukebody
Copy link
Collaborator

Yesssss! sklearn-pandas 1.5.0! https://pypi.python.org/pypi/sklearn-pandas

Please try out and don't hesitate to report any bug or odd behavior!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants