Skip to content

FIX Fixes CountVectorizer sample invariance with max_features#18016

Merged
rth merged 2 commits into
scikit-learn:masterfrom
thomasjpfan:fix_countvectorizer
Jul 28, 2020
Merged

FIX Fixes CountVectorizer sample invariance with max_features#18016
rth merged 2 commits into
scikit-learn:masterfrom
thomasjpfan:fix_countvectorizer

Conversation

@thomasjpfan
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan commented Jul 27, 2020

Reference Issues/PRs

Fixes #17939

What does this implement/fix? Explain your changes.

Adds conditionally sorting based on max_features.

@thomasjpfan thomasjpfan added this to the 0.23.2 milestone Jul 27, 2020
@thomasjpfan thomasjpfan changed the title FIX Fixes CountVectorizer sample invariance when tie breaking FIX Fixes CountVectorizer sample invariance with max_features Jul 27, 2020
Comment thread doc/whats_new/v0.23.rst

- |Fix| Fixes bug in :class:`feature_extraction.text.CountVectorizer` where
sample order invariance was broken when `max_features` was set and features
had the same count. :pr:`18016` by `Thomas Fan`_, `Roman Yurchak`_, and
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request by committee! :D

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@rth rth merged commit 06bb486 into scikit-learn:master Jul 28, 2020
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Diff in CountVectorizer between versions 0.22 and 0.23

3 participants