Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Positive sample weight #17178

Closed
wants to merge 8 commits into from

Conversation

arka204
Copy link
Contributor

@arka204 arka204 commented May 10, 2020

Reference Issues/PRs

Fixes #15531
References #12464 and #3774

What does this implement/fix? Explain your changes.

It adds parameter force_positive in _check_sample_weights and global parameter assume_positive_sample_weights. Weights will be enforced to be positive, unless user directly decides to do otherwise.
Also changes _ridge.py adding similar parameter in it's methods.

Any other comments?

Since user cannot interact with _check_sample_weights inside algorithms directly (except of constantly modifying global parameter) I modified functions inside _ridge.py to enable this interaction. If this is desired behavior other functions may have to be modified in a similar way.

@rth
Copy link
Member

rth commented May 10, 2020

Thanks!

Also changes '_ridge.py' adding similar parameter in it's methods.

I think we should only change this setting via the global config, and not add new parameters to individual estimators, as that would clutter the API.

@rth
Copy link
Member

rth commented May 10, 2020

Since user cannot interact with '_check_sample_weights' inside algorithms directly (except of constantly modifying global parameter)

You can do,

with sklearn.config_context(assume_positive_sample_weights=False):
    # some estimator
# another estimator. 

@arka204
Copy link
Contributor Author

arka204 commented May 10, 2020

You can do,

with sklearn.config_context(assume_positive_sample_weights=False):
    # some estimator
# another estimator. 

I agree, it looks nicer. I will remove changes from _ridge.py then.

@adrinjalali
Copy link
Member

I think we should only change this setting via the global config, and not add new parameters to individual estimators, as that would clutter the API.

The issue with this approach is that the user can't enable/disable it for different estimators in the same pipeline, since the check is done in fit. We would probably have issues with this design once we have a resampler which augments the sample weights as well, and a sample props implementation.

This is another example of where I think having all the parameters set in __init__ may not be the best idea. For example, in the sample props implementation (whose SLEP is not passed yet), we're making an exception for set_props_request and get_props_request exactly to not clutter the API. I think it'd make sense to separate the kinda configuration parameters from hyperparameters and not have them in __init__. WDYT @rth?

@rth
Copy link
Member

rth commented May 11, 2020

This is another example of where I think having all the parameters set in init may not be the best idea. For example, in the sample props implementation (whose SLEP is not passed yet), we're making an exception for set_props_request and get_props_request exactly to not clutter the API.

I suppose we could define some method in BaseEstimator, pass sample weights to BaseEstimator._validate_data and run _check_samples_weights same as we are doing for X, y, with a assume_positive_sample_weights depending on the options set in this method.

I'm not sure that would address an actual problem though. 98% of users probably don't have negative sample weighs and would be happy with the default. For the 2% remaining they can just disable it globally and live with the risk (same as everyone is doing in the current situation). If we actually have complaints about not doing this check selectively then we could consider adding the method. Another method in all estimators is still cluttering the API, just in a different place. It's would be justified for sample props, but harder to justify here.

@adrinjalali
Copy link
Member

I suppose you're right. I guess the 2% could also have a ForcePositiveSampleWeights metaestimator and add the check to their steps in the pipeline if they wish.

I'm happy with the global option then. What's the default value though?

@@ -1071,6 +1073,7 @@ def test_check_sample_weight():
# float32 dtype is preserved
X = np.ones((5, 2))
sample_weight = np.ones(5, dtype=np.float32)
print(sample_weight)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why print is here...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the print

Comment on lines 1293 to 1295
if force_positive is False:
warnings.warn("assume_positive_sample_weights=False - "
"negative values in sample_weight won't raise an error.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume setting force_positive to False will raise a warning.
This makes sure the warning is raised if we set force_positive to False when calling the function.

Suggested change
if force_positive is False:
warnings.warn("assume_positive_sample_weights=False - "
"negative values in sample_weight won't raise an error.")
if not force_positive:
warnings.warn("assume_positive_sample_weights=False - "
"negative values in sample_weight won't raise an error.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @haochunchang, but I'm not sure what your point is. Does it change anything overall?

Comment on lines 1317 to 1319
if force_positive is True:
if np.any(sample_weight <0):
raise ValueError("There are negative values in sample_weight")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is more clear.

Suggested change
if force_positive is True:
if np.any(sample_weight <0):
raise ValueError("There are negative values in sample_weight")
if force_positive and np.any(sample_weight < 0):
raise ValueError("There are negative values in sample_weight")

@@ -1071,6 +1073,7 @@ def test_check_sample_weight():
# float32 dtype is preserved
X = np.ones((5, 2))
sample_weight = np.ones(5, dtype=np.float32)
print(sample_weight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(sample_weight)

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work so far @arka204

Comment on lines 96 to 99
assume_positive_sample_weights : bool, optional
If in function _check_sample_weight parameter force_positive is set
to None, then it's value is set to assume_positive_sample_weights.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should move to the last place

@@ -1071,6 +1073,7 @@ def test_check_sample_weight():
# float32 dtype is preserved
X = np.ones((5, 2))
sample_weight = np.ones(5, dtype=np.float32)
print(sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the print

_check_sample_weight(sample_weight, X, force_positive=False)

# no error for negative weights if global parameter set to False
_set_config(assume_positive_sample_weights=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use with config_context(...): instead maybe?

Comment on lines 1294 to 1295
warnings.warn("assume_positive_sample_weights=False - negative "
"values in sample_weight won't raise an error.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that if user were to accidentally end up with global variable changed to False, they should be warned that something is wrong. I will remove it if You want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only happens if they explicitly set the value to False, in which case they shouldn't see a warning.

@arka204
Copy link
Contributor Author

arka204 commented May 23, 2020

Than You for Your reviews @KumarGanesha1996, @haochunchang, @adrinjalali !
I applied some of Your suggestions, hope the code looks better now.

Copy link

@KumarGanesha1996 KumarGanesha1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good sir !...

@adrinjalali
Copy link
Member

Tests are failing @arka204

@arka204
Copy link
Contributor Author

arka204 commented May 30, 2020

I fixed the tests @adrinjalali, @rth .
Do You think it could be merged now?
Or do You have some improvements in mind?

@arka204 arka204 changed the title [WIP] Positive sample weight [MRG] Positive sample weight May 30, 2020
@@ -28,7 +29,8 @@ def get_config():


def set_config(assume_finite=None, working_memory=None,
print_changed_only=None, display=None):
print_changed_only=None, display=None,
assume_positive_sample_weights=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not documented in the docstring

Comment on lines 582 to 586
with config_context(assume_positive_sample_weights=False):
err_msg = "sample_weight cannot contain negative weight"
with pytest.raises(ValueError, match=err_msg):
model.fit(X, y, sample_weight=sample_weight)
err_msg = "There are negative values in sample_weight"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably unify these messages and have the same error message.

Comment on lines 217 to 223

with config_context(assume_positive_sample_weights=False):
expected_err = "sample_weight must have positive values"
with pytest.raises(ValueError, match=expected_err):
kde.fit(data, sample_weight=sample_weight)

expected_err = "There are negative values in sample_weight"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -1275,13 +1275,25 @@ def _check_sample_weight(sample_weight, X, dtype=None):
is be allocated. If `dtype` is not one of `float32`, `float64`,
`None`, the output will be of dtype `float64`.

force_positive : {True, False or None}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
force_positive : {True, False or None}
force_positive : bool, default=None

Comment on lines 1281 to 1282
If None, assumes value of assume_positive_sample_weights,
which is initially set to True.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If None, assumes value of assume_positive_sample_weights,
which is initially set to True.
If None, assumes value of assume_positive_sample_weights
from the global config which is initially set to True.

Comment on lines 1294 to 1295
warnings.warn("assume_positive_sample_weights=False - negative "
"values in sample_weight won't raise an error.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only happens if they explicitly set the value to False, in which case they shouldn't see a warning.

@arka204
Copy link
Contributor Author

arka204 commented Jun 8, 2020

Due to my last commit, code coverage test is not passing (but I think it's normal since unifying error messages made me delete some lines in test files), should I do something with it?
Is there anything more I should correct @adrinjalali ?

@@ -111,7 +111,7 @@ def fit(self, X, y, sample_weight=None):
sample_weight = _check_sample_weight(sample_weight, X, np.float64)
sample_weight /= sample_weight.sum()
if np.any(sample_weight < 0):
raise ValueError("sample_weight cannot contain negative weights")
raise ValueError("There are negative values in sample_weight")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this line is never run in the tests, which probably means you need to have a separate test for it. I think even if the user sets the flag, this model still requires positive weights, and that's what needs to be tested.

Base automatically changed from master to main January 22, 2021 10:52
@cmarmo cmarmo added the Superseded PR has been replace by a newer PR label Mar 28, 2022
@cmarmo
Copy link
Member

cmarmo commented Aug 6, 2022

Closing as superseded by #21132.

@cmarmo cmarmo closed this Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:linear_model module:utils Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enforce positive sample_weight
6 participants