Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: attribute naming consistency, drop_cols #219

Merged
merged 6 commits into from Oct 1, 2020

Conversation

VHeusinkveld
Copy link
Contributor

Within the scikit-learn ecosystem, it is standard practice to name the attributes like the init arguments. In this way the get_parms method can get the attributes. From sklearn version 24 onwards this is required behavior.

Within the scikit-learn ecosystem, it is standard practice to name the attributes like the init arguments. In this way the get_parms method can get the attributes. From sklearn version 24 onwards this is required behavior.
@VHeusinkveld
Copy link
Contributor Author

To eleborate on the issue: it is specifically concerned with the DataFrameMapper. It turned up when I embedded it in a scikit learn Pipeline. The warning can be found here: https://github.com/scikit-learn/scikit-learn/blob/0fb307bf3/sklearn/base.py#L209

@VHeusinkveld
Copy link
Contributor Author

VHeusinkveld commented Sep 23, 2020

<Incorrect explanation, for the real reason check the comment after this one>

The test seemed to have incorrectly passed in the past as in testing the get_params method is used. As the attribute name != argument name, the get_params for drop_cols would have returned None, and the object would not have been cloned correctly.

In addition, this evaluation might be why the test is failing:
image

For the Dataframe Mapper, we give the init argument None to drop_cols. During initialization this will be set to a newly created list (drop_cols=drop_cols or [ ]). When creating the clone somethings like this must be going on; we get 'None' as input argument, as a result a new list will be created during initialization which is a different object as the list from the instance to which we are comparing too.

@VHeusinkveld
Copy link
Contributor Author

VHeusinkveld commented Sep 23, 2020

The problem is in the following statement:

drop_cols or [ ]

if the columns are empty the or statement will go for the second (which is a newly created list)
image

While if they are populated it goes for the first:
image

(empty list evaluates to False while a populated list evaluates to True)

@VHeusinkveld
Copy link
Contributor Author

VHeusinkveld commented Sep 23, 2020

This fix does two things:

  • Compatibility with scikit learn get_parms method associated with the base estimator (which caused a bug to silently pass the tests, this would be breaking behavior from sklearn 24 onwards)
  • Solves an issue regarding cloning an object which has drop_cols == [ ].

@VHeusinkveld
Copy link
Contributor Author

@ragrawal could you review the code?

It seems that the issue was introduced in PR #217

@ragrawal
Copy link
Collaborator

ragrawal commented Oct 1, 2020

@VHeusinkveld thanks for changes. Please let me review it by tomorrow and get back to you.

Copy link
Collaborator

@ragrawal ragrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @VHeusinkveld
Thanks for explaining the issue and fixing it . Can you please add your name to the list of contributors and bump the version number from 2.0.1 to 2.0.2 (in sklearn_pandas/initi.py)

Regards,
Ritesh

sklearn_pandas/dataframe_mapper.py Show resolved Hide resolved
Copy link
Contributor Author

@VHeusinkveld VHeusinkveld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!

@VHeusinkveld
Copy link
Contributor Author

@ragrawal all things should be updated now.

@ragrawal ragrawal merged commit e85877a into scikit-learn-contrib:master Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants