DeprecationWarning with sklearn.GridSearchCV #76

MSardelich · 2017-01-21T12:34:23Z

When using the library together with sklearn.GridSearchCV I am getting the deprecation warning below:

/home/marcelo/.local/lib/python2.7/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20. % type(estimator).__name__, DeprecationWarning)

Since the behavior will change soon, I would check if it is something relevant to this library.

The text was updated successfully, but these errors were encountered:

dukebody · 2017-01-22T17:16:56Z

Hi @MSardelich! Thanks for the heads up. I'd have to investigate why does this happen, and how to fix it.

Do you have time to do this yourself and submit a PR to fix it? Hints:

MSardelich · 2017-01-23T10:06:53Z

Hi @dukebody! Thanks for the prompt response.

I see sklearn-pandas works in a way that modifies sklearn constructor parameters.
That said, I could spot the same lines that raise the exception in the sklearn library. However, I don't think it is a behavior they want to change their side. Firstly, because it is a relatively new exception and probably was implemented on purpose. Secondly, I am afraid sklearn-pandas will stop working soon, if not fixed.

Altogether, IMHO it deserves a closer approach to the sklearn team or re-engineering the way sklearn-pandas works.

What is your take?

dukebody · 2017-01-28T13:26:50Z

From a brief read I understand this requirement is for the cloning happening in the cross-validation to work easily. You are right, we need to modify the way sklearn-pandas work for it to be compatible with sklearn>=0.20. Are you interested in contributing code to fix this?

MSardelich · 2017-01-28T22:05:47Z

@dukebody Unfortunately I don't feel confident enough to propose these structural changes. It seems to be relatively complex to change the way skleran-pandas behaves. Is Paul still contributing actively to the project? What about going straight to the sklearn community and asking them to allow the constructor behavior?

dukebody · 2017-01-29T08:58:51Z

Paul is not actively contributing to the sklearn-pandas project anymore. I'll investigate which changes are needed and ping the sklearn devs if neccessary.

dukebody · 2017-01-29T09:02:04Z

Refs:

sklearn.base.clone cannot clone estimator with pandas data frame parameters scikit-learn/scikit-learn#5522
[MRG + 1] remove complicated equality checks in clone as __init__ shouldn't touch anything. scikit-learn/scikit-learn#5540

acourdavault · 2017-05-05T13:06:53Z

Hello i just wonder If this can be fixed, or there would be a usage workaround maybe?

thanks

dukebody · 2017-05-13T14:47:24Z

It could be fixed, but requires a change in how skearn-pandas internally works. The dataframe mapper modifies the constructor parameters in the __init__ method to create a pipeline, which is what the warning is complaining about apparently.

However checking the code around https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py#L111 it looks like the scikit-learn clone function expects all input arguments of an estimator to be numpy ndarrays or sparse matrices. Since the DataFrameMapper receives lists of estimators as inputs, I'm not sure if we can comply with the new requirements.

I asked the sklearn maintainers in scikit-learn/scikit-learn#5540 (comment) for help.

wpm · 2017-05-15T21:56:40Z

It looks like the issue here lies with the semantics of the __init__ parameters of an sklearn Estimator. Sklearn wants these to be unchanged by the constructor under the assumption that they might be things we want to set with set_params. But the features argument of DataFrameMapper isn't something that set_params will modify.

DataFrameMapper isn't really fully built until _build_feature has transformed all the feature tuples but Sklearn tacitly assumes that estimators are "fully built", i.e. the only modification happens via set_params. DataFrameMapper could be made to conform to this assumption by creating it with a factory function that takes the (name, transform) tuples as input, transforms them with _build_feature, then passes those transformed objects to the DataFrameMapper constructor.

dukebody · 2017-06-10T16:44:38Z

@wpm could you submit a PR to fix this issue?

According to the sklearn mantainers explanation I understant that what we should do is defer the _build_feature transformation until the fit method, which looks reasonable.

drorata · 2017-06-16T11:25:39Z

I have a similar problem, when using the decision tree classifier.

mapper = DataFrameMapper(
    [([col], StandardScaler()) for col in X.dtypes[X.dtypes != 'category'].index.tolist()] +
    [([col], OneHotEncoder())  for col in X.dtypes[X.dtypes == 'category'].index.tolist()]
)
cross_val_score(make_pipeline(mapper, DecisionTreeClassifier()), X, Y)

Warns:

<cut>/lib/python3.6/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20.
  % type(estimator).__name__, DeprecationWarning)
<cut>/lib/python3.6/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20.
  % type(estimator).__name__, DeprecationWarning)
<cut>/lib/python3.6/site-packages/sklearn/base.py:122: DeprecationWarning: Estimator DataFrameMapper modifies parameters in __init__. This behavior is deprecated as of 0.18 and support for this behavior will be removed in 0.20.
  % type(estimator).__name__, DeprecationWarning)

jimmywan · 2017-06-21T22:43:04Z

Can someone take a look at my PR #105 ?

dukebody · 2017-06-24T17:07:10Z

@jimmywan Thanks a lot for your contribution! This is a much expected feature and I didn't have the energy to work on it myself.

I commited a patch based on yours with some extra changes to deal with the default argument and keep compatibility with old pickled mappers.

drorata · 2017-06-24T17:32:35Z

Thanks for the efforts! Is the update already available on pip?

dukebody · 2017-06-24T17:34:31Z

Yesssss! sklearn-pandas 1.5.0! https://pypi.python.org/pypi/sklearn-pandas

Please try out and don't hesitate to report any bug or odd behavior!

maxnoe mentioned this issue Feb 7, 2017

DeprecationWarning for cross_validation #80

Closed

dukebody mentioned this issue May 13, 2017

[MRG + 1] remove complicated equality checks in clone as __init__ shouldn't touch anything. scikit-learn/scikit-learn#5540

Merged

jimmywan mentioned this issue Jun 21, 2017

Do not mutate features in __init__. #105

Closed

dukebody closed this as completed in f784956 Jun 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeprecationWarning with sklearn.GridSearchCV #76

DeprecationWarning with sklearn.GridSearchCV #76

MSardelich commented Jan 21, 2017

dukebody commented Jan 22, 2017

MSardelich commented Jan 23, 2017 •

edited

Loading

dukebody commented Jan 28, 2017

MSardelich commented Jan 28, 2017

dukebody commented Jan 29, 2017

dukebody commented Jan 29, 2017

acourdavault commented May 5, 2017

dukebody commented May 13, 2017

wpm commented May 15, 2017 •

edited

Loading

dukebody commented Jun 10, 2017 •

edited

Loading

drorata commented Jun 16, 2017 •

edited

Loading

jimmywan commented Jun 21, 2017

dukebody commented Jun 24, 2017

drorata commented Jun 24, 2017

dukebody commented Jun 24, 2017

DeprecationWarning with sklearn.GridSearchCV #76

DeprecationWarning with sklearn.GridSearchCV #76

Comments

MSardelich commented Jan 21, 2017

dukebody commented Jan 22, 2017

MSardelich commented Jan 23, 2017 • edited Loading

dukebody commented Jan 28, 2017

MSardelich commented Jan 28, 2017

dukebody commented Jan 29, 2017

dukebody commented Jan 29, 2017

acourdavault commented May 5, 2017

dukebody commented May 13, 2017

wpm commented May 15, 2017 • edited Loading

dukebody commented Jun 10, 2017 • edited Loading

drorata commented Jun 16, 2017 • edited Loading

jimmywan commented Jun 21, 2017

dukebody commented Jun 24, 2017

drorata commented Jun 24, 2017

dukebody commented Jun 24, 2017

MSardelich commented Jan 23, 2017 •

edited

Loading

wpm commented May 15, 2017 •

edited

Loading

dukebody commented Jun 10, 2017 •

edited

Loading

drorata commented Jun 16, 2017 •

edited

Loading