-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Fix requires_fit tag for stateless FeatureHasher and HashingVectorizer #31851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix requires_fit tag for stateless FeatureHasher and HashingVectorizer #31851
Conversation
- Set requires_fit=False for both FeatureHasher and HashingVectorizer - Both estimators are documented as stateless and work without fit() - Added tests to verify the tag behavior - Addresses inconsistency noted in issue scikit-learn#30689
- Remove whitespace from blank lines - Use double quotes for strings - Add proper spacing between functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add a changelog entry for this, otherwise LGTM.
@adrinjalali Changelog entry added as requested. Thanks for the review! All checks are now passing. |
@@ -0,0 +1 @@ | |||
:class:`feature_extraction.FeatureHasher` and :class:`feature_extraction.HashingVectorizer` now correctly set ``requires_fit=False`` tag to reflect their stateless nature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the max 88 chars here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali I changed it, currently less than 88 char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @hqkqn32. Here are some suggestions.
@@ -0,0 +1 @@ | |||
Set ``requires_fit=False`` for ``FeatureHasher`` and ``HashingVectorizer``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's a fix so the changelog fragment should be named 31851.fix.rst
instead.
Set ``requires_fit=False`` for ``FeatureHasher`` and ``HashingVectorizer``. | |
- Set the tag `requires_fit=False` for the classes | |
:class:`feature_extraction.FeatureHasher` and | |
:class:`feature_extraction.HashingVectorizer`. | |
By :user:`hakan çanakçı <hqkqn32>`. |
doc/whats_new/upcoming_changes/sklearn.feature_extraction/31851.fix.rst
Outdated
Show resolved
Hide resolved
@jeremiedbb Thanks for the suggestions. All steps completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @hqkqn32
scikit-learn#31851) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Reference Issues/PRs
Fixes #30689
What does this implement/fix? Explain your changes.
This PR fixes an inconsistency in the tag system for two stateless estimators.
Problem:
FeatureHasher
andHashingVectorizer
are documented as stateless estimators (nofit()
required), but theirrequires_fit
tag was incorrectly set toTrue
.Solution:
__sklearn_tags__()
method in both classes to setrequires_fit=False
fit()
Files changed:
sklearn/feature_extraction/_hash.py
- Addedrequires_fit=False
tagsklearn/feature_extraction/text.py
- Addedrequires_fit=False
tagsklearn/feature_extraction/tests/test_feature_hasher.py
- Added tag validation testssklearn/feature_extraction/tests/test_text.py
- Added tag validation testsThis ensures consistency with other stateless estimators in scikit-learn.
Any other comments?
This change is backward compatible and doesn't affect the public API. The estimators continue to work exactly as before, but now their internal tags correctly reflect their stateless nature.
cc/ @glemaitre @adrinjalali (as mentioned in the original issue)