Skip to content

Conversation

hqkqn32
Copy link
Contributor

@hqkqn32 hqkqn32 commented Jul 29, 2025

Reference Issues/PRs

Fixes #30689

What does this implement/fix? Explain your changes.

This PR fixes an inconsistency in the tag system for two stateless estimators.

Problem:
FeatureHasher and HashingVectorizer are documented as stateless estimators (no fit() required), but their requires_fit tag was incorrectly set to True.

Solution:

  • Modified __sklearn_tags__() method in both classes to set requires_fit=False
  • Added unit tests to verify the correct tag behavior
  • Added integration tests to ensure both estimators work without fit()

Files changed:

  • sklearn/feature_extraction/_hash.py - Added requires_fit=False tag
  • sklearn/feature_extraction/text.py - Added requires_fit=False tag
  • sklearn/feature_extraction/tests/test_feature_hasher.py - Added tag validation tests
  • sklearn/feature_extraction/tests/test_text.py - Added tag validation tests

This ensures consistency with other stateless estimators in scikit-learn.

Any other comments?

This change is backward compatible and doesn't affect the public API. The estimators continue to work exactly as before, but now their internal tags correctly reflect their stateless nature.

cc/ @glemaitre @adrinjalali (as mentioned in the original issue)

- Set requires_fit=False for both FeatureHasher and HashingVectorizer
- Both estimators are documented as stateless and work without fit()
- Added tests to verify the tag behavior
- Addresses inconsistency noted in issue scikit-learn#30689
Copy link

github-actions bot commented Jul 29, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 46cee0f. Link to the linter CI: here

hqkqn32 added 3 commits July 29, 2025 17:55
- Remove whitespace from blank lines
- Use double quotes for strings
- Add proper spacing between functions
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a changelog entry for this, otherwise LGTM.

@hqkqn32
Copy link
Contributor Author

hqkqn32 commented Jul 31, 2025

@adrinjalali Changelog entry added as requested. Thanks for the review! All checks are now passing.

@@ -0,0 +1 @@
:class:`feature_extraction.FeatureHasher` and :class:`feature_extraction.HashingVectorizer` now correctly set ``requires_fit=False`` tag to reflect their stateless nature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the max 88 chars here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali I changed it, currently less than 88 char.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @hqkqn32. Here are some suggestions.

@@ -0,0 +1 @@
Set ``requires_fit=False`` for ``FeatureHasher`` and ``HashingVectorizer``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's a fix so the changelog fragment should be named 31851.fix.rst instead.

Suggested change
Set ``requires_fit=False`` for ``FeatureHasher`` and ``HashingVectorizer``.
- Set the tag `requires_fit=False` for the classes
:class:`feature_extraction.FeatureHasher` and
:class:`feature_extraction.HashingVectorizer`.
By :user:`hakan çanakçı <hqkqn32>`.

@hqkqn32
Copy link
Contributor Author

hqkqn32 commented Aug 1, 2025

@jeremiedbb Thanks for the suggestions. All steps completed.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @hqkqn32

@jeremiedbb jeremiedbb enabled auto-merge (squash) August 4, 2025 09:38
@jeremiedbb jeremiedbb merged commit 52d93e1 into scikit-learn:main Aug 4, 2025
36 checks passed
lucyleeow pushed a commit to lucyleeow/scikit-learn that referenced this pull request Aug 22, 2025
scikit-learn#31851)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeatureHasher and HashingVectorizer does not expose requires_fit=False tag
3 participants