Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Fixes CountVectorizer sample invariance with max_features #18016

Merged
merged 2 commits into from Jul 28, 2020

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jul 27, 2020

Reference Issues/PRs

Fixes #17939

What does this implement/fix? Explain your changes.

Adds conditionally sorting based on max_features.

@thomasjpfan thomasjpfan added this to the 0.23.2 milestone Jul 27, 2020
@thomasjpfan thomasjpfan changed the title FIX Fixes CountVectorizer sample invariance when tie breaking FIX Fixes CountVectorizer sample invariance with max_features Jul 27, 2020

- |Fix| Fixes bug in :class:`feature_extraction.text.CountVectorizer` where
sample order invariance was broken when `max_features` was set and features
had the same count. :pr:`18016` by `Thomas Fan`_, `Roman Yurchak`_, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request by committee! :D

rth
rth approved these changes Jul 28, 2020
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@rth rth merged commit 06bb486 into scikit-learn:master Jul 28, 2020
7 checks passed
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Diff in CountVectorizer between versions 0.22 and 0.23
3 participants