Bugfix: attribute naming consistency, drop_cols #219

VHeusinkveld · 2020-09-23T14:34:38Z

Within the scikit-learn ecosystem, it is standard practice to name the attributes like the init arguments. In this way the get_parms method can get the attributes. From sklearn version 24 onwards this is required behavior.

VHeusinkveld · 2020-09-23T14:38:25Z

To eleborate on the issue: it is specifically concerned with the DataFrameMapper. It turned up when I embedded it in a scikit learn Pipeline. The warning can be found here: https://github.com/scikit-learn/scikit-learn/blob/0fb307bf3/sklearn/base.py#L209

VHeusinkveld · 2020-09-23T18:10:54Z

The test seemed to have incorrectly passed in the past as in testing the get_params method is used. As the attribute name != argument name, the get_params for drop_cols would have returned None, and the object would not have been cloned correctly.

In addition, this evaluation might be why the test is failing:

For the Dataframe Mapper, we give the init argument None to drop_cols. During initialization this will be set to a newly created list (drop_cols=drop_cols or [ ]). When creating the clone somethings like this must be going on; we get 'None' as input argument, as a result a new list will be created during initialization which is a different object as the list from the instance to which we are comparing too.

VHeusinkveld · 2020-09-23T18:34:01Z

The problem is in the following statement:

drop_cols or [ ]

if the columns are empty the or statement will go for the second (which is a newly created list)

While if they are populated it goes for the first:

(empty list evaluates to False while a populated list evaluates to True)

VHeusinkveld · 2020-09-23T18:52:46Z

This fix does two things:

Compatibility with scikit learn get_parms method associated with the base estimator (which caused a bug to silently pass the tests, this would be breaking behavior from sklearn 24 onwards)
Solves an issue regarding cloning an object which has drop_cols == [ ].

VHeusinkveld · 2020-09-30T15:25:01Z

@ragrawal could you review the code?

It seems that the issue was introduced in PR #217

ragrawal · 2020-10-01T00:47:10Z

@VHeusinkveld thanks for changes. Please let me review it by tomorrow and get back to you.

ragrawal

Hi @VHeusinkveld
Thanks for explaining the issue and fixing it . Can you please add your name to the list of contributors and bump the version number from 2.0.1 to 2.0.2 (in sklearn_pandas/initi.py)

Regards,
Ritesh

sklearn_pandas/dataframe_mapper.py

VHeusinkveld

Changes look good!

VHeusinkveld · 2020-10-01T10:04:57Z

@ragrawal all things should be updated now.

Bugfix: attribute naming consistency, drop_cols

4986269

Within the scikit-learn ecosystem, it is standard practice to name the attributes like the init arguments. In this way the get_parms method can get the attributes. From sklearn version 24 onwards this is required behavior.

Bugfix: drop_cols incorrectly creates new list object

a8ee718

ragrawal requested changes Oct 1, 2020

View reviewed changes

sklearn_pandas/dataframe_mapper.py Show resolved Hide resolved

VHeusinkveld added 4 commits October 1, 2020 08:39

drop_cols get state default from None to List

4bc5721

Update changelog and contribution list

5dbfa94

Formatting

5e57e16

Bump version to 2.0.2

d1a0308

VHeusinkveld commented Oct 1, 2020

View reviewed changes

ragrawal approved these changes Oct 1, 2020

View reviewed changes

ragrawal merged commit e85877a into scikit-learn-contrib:master Oct 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix: attribute naming consistency, drop_cols #219

Bugfix: attribute naming consistency, drop_cols #219

VHeusinkveld commented Sep 23, 2020

VHeusinkveld commented Sep 23, 2020

VHeusinkveld commented Sep 23, 2020 •

edited

VHeusinkveld commented Sep 23, 2020 •

edited

VHeusinkveld commented Sep 23, 2020 •

edited

VHeusinkveld commented Sep 30, 2020

ragrawal commented Oct 1, 2020

ragrawal left a comment

VHeusinkveld left a comment

VHeusinkveld commented Oct 1, 2020

Bugfix: attribute naming consistency, drop_cols #219

Bugfix: attribute naming consistency, drop_cols #219

Conversation

VHeusinkveld commented Sep 23, 2020

VHeusinkveld commented Sep 23, 2020

VHeusinkveld commented Sep 23, 2020 • edited

VHeusinkveld commented Sep 23, 2020 • edited

VHeusinkveld commented Sep 23, 2020 • edited

VHeusinkveld commented Sep 30, 2020

ragrawal commented Oct 1, 2020

ragrawal left a comment

Choose a reason for hiding this comment

VHeusinkveld left a comment

Choose a reason for hiding this comment

VHeusinkveld commented Oct 1, 2020

VHeusinkveld commented Sep 23, 2020 •

edited

VHeusinkveld commented Sep 23, 2020 •

edited

VHeusinkveld commented Sep 23, 2020 •

edited