Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Fixes CountVectorizer sample invariance with max_features #18016

Merged
merged 2 commits into from
Jul 28, 2020

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jul 27, 2020

Reference Issues/PRs

Fixes #17939

What does this implement/fix? Explain your changes.

Adds conditionally sorting based on max_features.

@thomasjpfan thomasjpfan added this to the 0.23.2 milestone Jul 27, 2020
@thomasjpfan thomasjpfan changed the title FIX Fixes CountVectorizer sample invariance when tie breaking FIX Fixes CountVectorizer sample invariance with max_features Jul 27, 2020

- |Fix| Fixes bug in :class:`feature_extraction.text.CountVectorizer` where
sample order invariance was broken when `max_features` was set and features
had the same count. :pr:`18016` by `Thomas Fan`_, `Roman Yurchak`_, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request by committee! :D

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@rth rth merged commit 06bb486 into scikit-learn:master Jul 28, 2020
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Diff in CountVectorizer between versions 0.22 and 0.23
3 participants