New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] text vectorizers should raise warnings when user params will be unused #14602
Conversation
# Conflicts: # sklearn/compose/_column_transformer.py
@jnothman , @rth - can you pls. give any feedback on the test case ? |
Co-Authored-By: Roman Yurchak <rth.yurchak@gmail.com>
Adding only CountVectorizer for now . Will add others as I add warnings for them in the text.py file.
@jnothman , @rth - can you please check why some of the checks are failing for this commit.
Linux pylatest_conda_mkl_pandas |
It is a unrelated issue. It should be fixed when #14619 get merged. |
HashingVectorizer doesn't do much in fit. I suspect we should change it to
call _build_analyzer just as a smoke test.
|
2. Adding condition for self.analyzer != 'word' or callable(self.analyzer) in build_analyzer
Tests are failing . Let us know if you need help |
Thanks. A different unit test test_callable_analyzer_change_behavior seems to be failing ...I'll check the reason and get back to you . |
@jnothman - Could you help me understand the reason for adding assert len(records) == 1 in test_callable_analyzer_change_behavior . This condition is failing for HashingVectorizer ( screen shot below). It passes for CountVectorizer and HashingVectorizer . I have commented this condition in the test case for now but will add it back once I understand it better. The difference b/w the fit() method for HashingVectorizer () vs. the others is calling build_analyzer() . https://github.com/scikit-learn/scikit-learn/pull/14602#issuecomment-521613247 |
It's saying that HashingVectorizer is now issuing the ChangedBehaviorWarning twice. It should only issue it once. By putting build_analyzer in its fit, it might now be repeating that warning. |
Yes , build_analyzer () is now being called inside fit() as well as transform() methods which is likely causing the double warnings . Looking into it now . |
@jnothman - just wanted to make sure that you are getting a notification for the comment above. |
Otherwise LGTM. (Though I've not checked that the set of parameter pairs is complete.)
Please add an Enhancement
entry to the change log at doc/whats_new/v0.22.rst
. Like the other entries there, please reference this pull request with :pr:
and credit yourself (and other contributors if applicable) with :user:
Assuming you also think this is ready for merge, please change WIP -> MRG |
…unused_params And remove blank lines
Please add a note in what's new as requested above. Thanks! |
Yup ..Was just working on it. Added now . |
doc/whats_new/v0.22.rst
Outdated
@@ -146,6 +146,14 @@ Changelog | |||
:mod:`sklearn.feature_extraction` | |||
................................. | |||
|
|||
- |Enhancement| :func:`feature_extraction.text._warn_for_unused_params` will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe changes only in terms of public API: "a warning is now raised if a parameter choice means that another parameter will be unused" or something more clear than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it ..Can you please check if the latest commit makes more sense .
Makes more sense. We might yet find clearer words before release, but as far as I'm concerned we can merge. Thanks, @getgaurav2! |
Reference Issues/PRs
#14580
What does this implement/fix? Explain your changes.
Add Warning and test cases for unused parameters for text vectorizers.
Any other comments?