-
-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add check for non binary variables in OneHotEncoder. #16585
Conversation
Thanks for the PR! I haven't entirely followed the original issue so I might be missing something. Generally it would be good to have a regression test, i.e. a test that shows that now the generated feature names are as expected with your fix. Cheers, |
@amueller @jnothman np.array([None, None, 10, 1], dtype=object) instead of np.array([-1, -1, 10, 10], dtype=np.int_) The reason being that So if we should agree with any solution and then we can go ahead. ping @ogrisel @jeremiedbb @NicolasHug @thomasjpfan as well. |
Indeed, see also #16593 (that I would rather label as a New Feature, but maybe I'm missing something...). If we want to use indexes in |
As I would like to move forward in #15706, I have implemented |
I'm ok with this change but since it will change the result of a public attribute, shouldn't it pass through a deprecation cycle ? unless we consider it a bug :) |
The convention of setting |
@rth @thomasjpfan this was discussed during #16245 right? Could you please remind us the pros and cons of each approach? |
The sentinel value itself shouldn't matter, I think, other than for readability. -1 allows to keep the dtype as int. I also have a slight preference for |
Hi @rth, hope you don't mind I've just tried the "Request for reviewers" button ... :) |
Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>
Thanks @thomasjpfan ! |
In general I think arrays of mixed type are a strange beast. They're not
especially helpful as arrays, and should rather be a list. I don't really
see the problem in using -1 as a sentinel, as long as it's well tested. But
the consensus seems to be towards another option.
|
I can't really judge how many users will think |
@jnothman @thomasjpfan , I will not try to persuade you , just want to add a clarification. |
Shrug. I'm fine either way. I agree usability is likely the greater concern
here than efficiency or succinct representation. In numerical computing it
is not at all uncommon when working with positive ints to give negative
numbers special meanings. For instance, see how leaves are indicated in
decision trees. But it can indeed add confusion.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope you don't mind I've just tried the "Request for reviewers" button
Don't hesitate to use that button on reviewers @cmarmo :) The code LGTM.
In numerical computing it
is not at all uncommon when working with positive ints to give negative
numbers special meanings. For instance, see how leaves are indicated in
decision trees.
Indeed, but I guess the issue here is that array index is not necessarily a positive integer, and that we are introducing a different meaning from what negative index commonly means in python. Another possibility for the sentinel could have been np.iinfo(np.int32).max () == 2147483647
. Anyway any of these would likely be OK.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@rth, @thomasjpfan , after discussion with @glemaitre I've finally understood that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart of this small issue
Oh and we will need an entry in whats new: Please add an entry to the change log at doc/whats_new/v0.23.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user: |
No need for the what's new the bug was only introduced in dev. |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Someone available for merging? :) Thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cmarmo !
…n#16585) Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…n#16585) Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Fixes #16552
Closes #16554 (as superseded)
What does this implement/fix? Explain your changes.
Adds a check on the value of
drop_idx_
elements:None
is the value for no dropping.