-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Positive sample weight #17178
[MRG] Positive sample weight #17178
Conversation
Merging changes from the main repository
Merging changes from the main repository
Thanks!
I think we should only change this setting via the global config, and not add new parameters to individual estimators, as that would clutter the API. |
You can do,
|
I agree, it looks nicer. I will remove changes from |
The issue with this approach is that the user can't enable/disable it for different estimators in the same pipeline, since the check is done in This is another example of where I think having all the parameters set in |
I suppose we could define some method in I'm not sure that would address an actual problem though. 98% of users probably don't have negative sample weighs and would be happy with the default. For the 2% remaining they can just disable it globally and live with the risk (same as everyone is doing in the current situation). If we actually have complaints about not doing this check selectively then we could consider adding the method. Another method in all estimators is still cluttering the API, just in a different place. It's would be justified for sample props, but harder to justify here. |
I suppose you're right. I guess the 2% could also have a I'm happy with the global option then. What's the default value though? |
@@ -1071,6 +1073,7 @@ def test_check_sample_weight(): | |||
# float32 dtype is preserved | |||
X = np.ones((5, 2)) | |||
sample_weight = np.ones(5, dtype=np.float32) | |||
print(sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why print is here...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the print
sklearn/utils/validation.py
Outdated
if force_positive is False: | ||
warnings.warn("assume_positive_sample_weights=False - " | ||
"negative values in sample_weight won't raise an error.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume setting force_positive
to False
will raise a warning.
This makes sure the warning is raised if we set force_positive
to False
when calling the function.
if force_positive is False: | |
warnings.warn("assume_positive_sample_weights=False - " | |
"negative values in sample_weight won't raise an error.") | |
if not force_positive: | |
warnings.warn("assume_positive_sample_weights=False - " | |
"negative values in sample_weight won't raise an error.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, @haochunchang, but I'm not sure what your point is. Does it change anything overall?
sklearn/utils/validation.py
Outdated
if force_positive is True: | ||
if np.any(sample_weight <0): | ||
raise ValueError("There are negative values in sample_weight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is more clear.
if force_positive is True: | |
if np.any(sample_weight <0): | |
raise ValueError("There are negative values in sample_weight") | |
if force_positive and np.any(sample_weight < 0): | |
raise ValueError("There are negative values in sample_weight") |
@@ -1071,6 +1073,7 @@ def test_check_sample_weight(): | |||
# float32 dtype is preserved | |||
X = np.ones((5, 2)) | |||
sample_weight = np.ones(5, dtype=np.float32) | |||
print(sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work so far @arka204
sklearn/_config.py
Outdated
assume_positive_sample_weights : bool, optional | ||
If in function _check_sample_weight parameter force_positive is set | ||
to None, then it's value is set to assume_positive_sample_weights. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should move to the last place
@@ -1071,6 +1073,7 @@ def test_check_sample_weight(): | |||
# float32 dtype is preserved | |||
X = np.ones((5, 2)) | |||
sample_weight = np.ones(5, dtype=np.float32) | |||
print(sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the print
_check_sample_weight(sample_weight, X, force_positive=False) | ||
|
||
# no error for negative weights if global parameter set to False | ||
_set_config(assume_positive_sample_weights=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use with config_context(...):
instead maybe?
sklearn/utils/validation.py
Outdated
warnings.warn("assume_positive_sample_weights=False - negative " | ||
"values in sample_weight won't raise an error.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that if user were to accidentally end up with global variable changed to False
, they should be warned that something is wrong. I will remove it if You want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only happens if they explicitly set the value to False
, in which case they shouldn't see a warning.
Than You for Your reviews @KumarGanesha1996, @haochunchang, @adrinjalali ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good sir !...
Tests are failing @arka204 |
I fixed the tests @adrinjalali, @rth . |
@@ -28,7 +29,8 @@ def get_config(): | |||
|
|||
|
|||
def set_config(assume_finite=None, working_memory=None, | |||
print_changed_only=None, display=None): | |||
print_changed_only=None, display=None, | |||
assume_positive_sample_weights=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not documented in the docstring
with config_context(assume_positive_sample_weights=False): | ||
err_msg = "sample_weight cannot contain negative weight" | ||
with pytest.raises(ValueError, match=err_msg): | ||
model.fit(X, y, sample_weight=sample_weight) | ||
err_msg = "There are negative values in sample_weight" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably unify these messages and have the same error message.
sklearn/neighbors/tests/test_kde.py
Outdated
|
||
with config_context(assume_positive_sample_weights=False): | ||
expected_err = "sample_weight must have positive values" | ||
with pytest.raises(ValueError, match=expected_err): | ||
kde.fit(data, sample_weight=sample_weight) | ||
|
||
expected_err = "There are negative values in sample_weight" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
sklearn/utils/validation.py
Outdated
@@ -1275,13 +1275,25 @@ def _check_sample_weight(sample_weight, X, dtype=None): | |||
is be allocated. If `dtype` is not one of `float32`, `float64`, | |||
`None`, the output will be of dtype `float64`. | |||
|
|||
force_positive : {True, False or None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force_positive : {True, False or None} | |
force_positive : bool, default=None |
sklearn/utils/validation.py
Outdated
If None, assumes value of assume_positive_sample_weights, | ||
which is initially set to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If None, assumes value of assume_positive_sample_weights, | |
which is initially set to True. | |
If None, assumes value of assume_positive_sample_weights | |
from the global config which is initially set to True. |
sklearn/utils/validation.py
Outdated
warnings.warn("assume_positive_sample_weights=False - negative " | ||
"values in sample_weight won't raise an error.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only happens if they explicitly set the value to False
, in which case they shouldn't see a warning.
Due to my last commit, code coverage test is not passing (but I think it's normal since unifying error messages made me delete some lines in test files), should I do something with it? |
@@ -111,7 +111,7 @@ def fit(self, X, y, sample_weight=None): | |||
sample_weight = _check_sample_weight(sample_weight, X, np.float64) | |||
sample_weight /= sample_weight.sum() | |||
if np.any(sample_weight < 0): | |||
raise ValueError("sample_weight cannot contain negative weights") | |||
raise ValueError("There are negative values in sample_weight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this line is never run in the tests, which probably means you need to have a separate test for it. I think even if the user sets the flag, this model still requires positive weights, and that's what needs to be tested.
Closing as superseded by #21132. |
Reference Issues/PRs
Fixes #15531
References #12464 and #3774
What does this implement/fix? Explain your changes.
It adds parameter
force_positive
in_check_sample_weights
and global parameterassume_positive_sample_weights
. Weights will be enforced to be positive, unless user directly decides to do otherwise.Also changes
_ridge.py
adding similar parameter in it's methods.Any other comments?
Since user cannot interact with
_check_sample_weights
inside algorithms directly (except of constantly modifying global parameter) I modified functions inside_ridge.py
to enable this interaction. If this is desired behavior other functions may have to be modified in a similar way.