Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Encoder should accept categories having dtype='S' #19727

Merged
merged 8 commits into from Apr 21, 2021

Conversation

andrewdelong
Copy link
Contributor

@andrewdelong andrewdelong commented Mar 19, 2021

Reference Issues/PRs

Fixes #19677, which is a regression in how OneHotEncoder and OrdinalEncoder handle categories having dtype='S'.

What does this implement/fix? Explain your changes.

  • Changed the dtype.kind checks from 'OU' to 'OUS' and updated the test.
  • To facilitate the test, also added a dtype arg to _convert_container. Examples:
>>> _convert_container([['a'], ['b']], "list", dtype='S')
[[b'a'], [b'b']]
>>> _convert_container([[1], [2]], "list", dtype=np.float32)
[[1.], [2.]]

- add dtype='S' cases to `test_encoders_string_categories`
- add dtype arg to `_convert_container` to force dtype when applicable
- `_convert_container` to list now converts container `[[1], [2]]` to requested dtype (e.g., `[[1.], [2.]]`)
@andrewdelong andrewdelong changed the title Fix encoder categories having dtype='S' [MRG] Fix encoder categories having dtype='S' Mar 19, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @andrewdelong !

sklearn/preprocessing/tests/test_encoders.py Outdated Show resolved Hide resolved
sklearn/preprocessing/tests/test_encoders.py Outdated Show resolved Hide resolved
sklearn/utils/_testing.py Outdated Show resolved Hide resolved
@cmarmo cmarmo added this to the 0.24.2 milestone Mar 23, 2021
@glemaitre glemaitre self-assigned this Apr 21, 2021
@glemaitre glemaitre changed the title [MRG] Fix encoder categories having dtype='S' FIX encoder should accept categories having dtype='S' Apr 21, 2021
@glemaitre glemaitre added To backport PR merged in master that need a backport to a release branch defined based on the milestone. and removed Waiting for Reviewer labels Apr 21, 2021
@glemaitre glemaitre removed their assignment Apr 21, 2021
Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Oh, actually we miss a second approval. @thomasjpfan do you want to have a new look at it.

@thomasjpfan thomasjpfan changed the title FIX encoder should accept categories having dtype='S' FIX Encoder should accept categories having dtype='S' Apr 21, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you for working on this @andrewdelong !

@thomasjpfan thomasjpfan merged commit a67b284 into scikit-learn:main Apr 21, 2021
29 checks passed
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021
…9727)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
glemaitre added a commit that referenced this pull request Apr 28, 2021
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:preprocessing module:utils To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OneHot/OrdinalEncoder categories broken for dtype='S'
4 participants