[MRG] Positive sample weight #17178

arka204 · 2020-05-10T21:29:11Z

Reference Issues/PRs

Fixes #15531
References #12464 and #3774

What does this implement/fix? Explain your changes.

It adds parameter force_positive in _check_sample_weights and global parameter assume_positive_sample_weights. Weights will be enforced to be positive, unless user directly decides to do otherwise.
Also changes _ridge.py adding similar parameter in it's methods.

Any other comments?

Since user cannot interact with _check_sample_weights inside algorithms directly (except of constantly modifying global parameter) I modified functions inside _ridge.py to enable this interaction. If this is desired behavior other functions may have to be modified in a similar way.

Merging changes from the main repository

rth · 2020-05-10T21:33:33Z

Thanks!

Also changes '_ridge.py' adding similar parameter in it's methods.

I think we should only change this setting via the global config, and not add new parameters to individual estimators, as that would clutter the API.

rth · 2020-05-10T21:36:09Z

Since user cannot interact with '_check_sample_weights' inside algorithms directly (except of constantly modifying global parameter)

You can do,

with sklearn.config_context(assume_positive_sample_weights=False):
    # some estimator
# another estimator.

arka204 · 2020-05-10T21:40:20Z

You can do,

with sklearn.config_context(assume_positive_sample_weights=False):
    # some estimator
# another estimator.

I agree, it looks nicer. I will remove changes from _ridge.py then.

adrinjalali · 2020-05-11T07:28:39Z

I think we should only change this setting via the global config, and not add new parameters to individual estimators, as that would clutter the API.

The issue with this approach is that the user can't enable/disable it for different estimators in the same pipeline, since the check is done in fit. We would probably have issues with this design once we have a resampler which augments the sample weights as well, and a sample props implementation.

This is another example of where I think having all the parameters set in __init__ may not be the best idea. For example, in the sample props implementation (whose SLEP is not passed yet), we're making an exception for set_props_request and get_props_request exactly to not clutter the API. I think it'd make sense to separate the kinda configuration parameters from hyperparameters and not have them in __init__. WDYT @rth?

rth · 2020-05-11T07:48:59Z

This is another example of where I think having all the parameters set in init may not be the best idea. For example, in the sample props implementation (whose SLEP is not passed yet), we're making an exception for set_props_request and get_props_request exactly to not clutter the API.

I suppose we could define some method in BaseEstimator, pass sample weights to BaseEstimator._validate_data and run _check_samples_weights same as we are doing for X, y, with a assume_positive_sample_weights depending on the options set in this method.

I'm not sure that would address an actual problem though. 98% of users probably don't have negative sample weighs and would be happy with the default. For the 2% remaining they can just disable it globally and live with the risk (same as everyone is doing in the current situation). If we actually have complaints about not doing this check selectively then we could consider adding the method. Another method in all estimators is still cluttering the API, just in a different place. It's would be justified for sample props, but harder to justify here.

adrinjalali · 2020-05-11T07:59:24Z

I suppose you're right. I guess the 2% could also have a ForcePositiveSampleWeights metaestimator and add the check to their steps in the pipeline if they wish.

I'm happy with the global option then. What's the default value though?

KumarGanesha1996 · 2020-05-11T17:10:23Z

sklearn/utils/tests/test_validation.py

@@ -1071,6 +1073,7 @@ def test_check_sample_weight():
    # float32 dtype is preserved
    X = np.ones((5, 2))
    sample_weight = np.ones(5, dtype=np.float32)
+    print(sample_weight)


why print is here...?

please remove the print

haochunchang · 2020-05-21T13:02:54Z

sklearn/utils/validation.py

+        if force_positive is False:
+            warnings.warn("assume_positive_sample_weights=False - "
+            "negative values in sample_weight won't raise an error.")


I assume setting force_positive to False will raise a warning.
This makes sure the warning is raised if we set force_positive to False when calling the function.

Suggested change

if force_positive is False:

warnings.warn("assume_positive_sample_weights=False - "

"negative values in sample_weight won't raise an error.")

if not force_positive:

warnings.warn("assume_positive_sample_weights=False - "

"negative values in sample_weight won't raise an error.")

Sorry, @haochunchang, but I'm not sure what your point is. Does it change anything overall?

haochunchang · 2020-05-21T13:04:06Z

sklearn/utils/validation.py

+    if force_positive is True:
+        if np.any(sample_weight <0):
+            raise ValueError("There are negative values in sample_weight")


Not sure if this is more clear.

Suggested change

if force_positive is True:

if np.any(sample_weight <0):

raise ValueError("There are negative values in sample_weight")

if force_positive and np.any(sample_weight < 0):

raise ValueError("There are negative values in sample_weight")

haochunchang · 2020-05-21T13:08:37Z

sklearn/utils/tests/test_validation.py

@@ -1071,6 +1073,7 @@ def test_check_sample_weight():
    # float32 dtype is preserved
    X = np.ones((5, 2))
    sample_weight = np.ones(5, dtype=np.float32)
+    print(sample_weight)


Suggested change

print(sample_weight)

adrinjalali

Thanks for the work so far @arka204

adrinjalali · 2020-05-23T15:57:44Z

sklearn/_config.py

+    assume_positive_sample_weights : bool, optional
+        If in function _check_sample_weight parameter force_positive is set
+        to None, then it's value is set to assume_positive_sample_weights.
+


this should move to the last place

adrinjalali · 2020-05-23T15:58:34Z

sklearn/utils/tests/test_validation.py

@@ -1071,6 +1073,7 @@ def test_check_sample_weight():
    # float32 dtype is preserved
    X = np.ones((5, 2))
    sample_weight = np.ones(5, dtype=np.float32)
+    print(sample_weight)


please remove the print

adrinjalali · 2020-05-23T15:59:44Z

sklearn/utils/tests/test_validation.py

+    _check_sample_weight(sample_weight, X, force_positive=False)
+
+    # no error for negative weights if global parameter set to False
+    _set_config(assume_positive_sample_weights=False)


use with config_context(...): instead maybe?

adrinjalali · 2020-05-23T16:00:34Z

sklearn/utils/validation.py

+            warnings.warn("assume_positive_sample_weights=False - negative "
+                          "values in sample_weight won't raise an error.")


why the warning?

I thought that if user were to accidentally end up with global variable changed to False, they should be warned that something is wrong. I will remove it if You want.

This only happens if they explicitly set the value to False, in which case they shouldn't see a warning.

arka204 · 2020-05-23T20:20:34Z

Than You for Your reviews @KumarGanesha1996, @haochunchang, @adrinjalali !
I applied some of Your suggestions, hope the code looks better now.

KumarGanesha1996

very good sir !...

adrinjalali · 2020-05-25T07:46:27Z

Tests are failing @arka204

arka204 · 2020-05-30T16:38:20Z

I fixed the tests @adrinjalali, @rth .
Do You think it could be merged now?
Or do You have some improvements in mind?

adrinjalali · 2020-06-05T09:02:17Z

sklearn/_config.py

@@ -28,7 +29,8 @@ def get_config():


 def set_config(assume_finite=None, working_memory=None,
-               print_changed_only=None, display=None):
+               print_changed_only=None, display=None,
+               assume_positive_sample_weights=None):


this is not documented in the docstring

adrinjalali · 2020-06-05T09:04:23Z

sklearn/ensemble/tests/test_weight_boosting.py

+    with config_context(assume_positive_sample_weights=False):
+        err_msg = "sample_weight cannot contain negative weight"
+        with pytest.raises(ValueError, match=err_msg):
+            model.fit(X, y, sample_weight=sample_weight)
+    err_msg = "There are negative values in sample_weight"


I think we should probably unify these messages and have the same error message.

adrinjalali · 2020-06-05T09:04:37Z

sklearn/neighbors/tests/test_kde.py

+
+    with config_context(assume_positive_sample_weights=False):
+        expected_err = "sample_weight must have positive values"
+        with pytest.raises(ValueError, match=expected_err):
+            kde.fit(data, sample_weight=sample_weight)
+
+    expected_err = "There are negative values in sample_weight"


adrinjalali · 2020-06-05T09:06:41Z

sklearn/utils/validation.py

@@ -1275,13 +1275,25 @@ def _check_sample_weight(sample_weight, X, dtype=None):
       is be allocated.  If `dtype` is not one of `float32`, `float64`,
       `None`, the output will be of dtype `float64`.

+    force_positive : {True, False or None}


Suggested change

force_positive : {True, False or None}

force_positive : bool, default=None

adrinjalali · 2020-06-05T09:07:56Z

sklearn/utils/validation.py

+        If None, assumes value of assume_positive_sample_weights,
+        which is initially set to True.


Suggested change

If None, assumes value of assume_positive_sample_weights,

which is initially set to True.

If None, assumes value of assume_positive_sample_weights

from the global config which is initially set to True.

adrinjalali · 2020-06-05T09:08:39Z

sklearn/utils/validation.py

+            warnings.warn("assume_positive_sample_weights=False - negative "
+                          "values in sample_weight won't raise an error.")


This only happens if they explicitly set the value to False, in which case they shouldn't see a warning.

arka204 · 2020-06-08T16:54:36Z

Due to my last commit, code coverage test is not passing (but I think it's normal since unifying error messages made me delete some lines in test files), should I do something with it?
Is there anything more I should correct @adrinjalali ?

adrinjalali · 2020-06-11T12:26:55Z

sklearn/ensemble/_weight_boosting.py

@@ -111,7 +111,7 @@ def fit(self, X, y, sample_weight=None):
        sample_weight = _check_sample_weight(sample_weight, X, np.float64)
        sample_weight /= sample_weight.sum()
        if np.any(sample_weight < 0):
-            raise ValueError("sample_weight cannot contain negative weights")
+            raise ValueError("There are negative values in sample_weight")


seems like this line is never run in the tests, which probably means you need to have a separate test for it. I think even if the user sets the flag, this model still requires positive weights, and that's what needs to be tested.

cmarmo · 2022-08-06T00:16:55Z

Closing as superseded by #21132.

arka204 added 3 commits April 11, 2020 18:47

Merge pull request #1 from scikit-learn/master

3b79637

Merging changes from the main repository

Merge pull request #2 from scikit-learn/master

464dc37

Merging changes from the main repository

Adding parameter enforcing positive sample weights.

e05057c

github-actions bot added module:linear_model module:utils labels May 10, 2020

Removing changes made in _ridge.py

ec50c6d

KumarGanesha1996 suggested changes May 11, 2020

View reviewed changes

haochunchang reviewed May 21, 2020

View reviewed changes

Fixing linting problems.

a341b5b

adrinjalali reviewed May 23, 2020

View reviewed changes

Making suggested changes.

6f2930c

KumarGanesha1996 approved these changes May 24, 2020

View reviewed changes

Fixing errors.

12071b5

arka204 changed the title ~~[WIP] Positive sample weight~~ [MRG] Positive sample weight May 30, 2020

adrinjalali reviewed Jun 5, 2020

View reviewed changes

Applying suggestions.

7228c74

adrinjalali reviewed Jun 11, 2020

View reviewed changes

Base automatically changed from master to main January 22, 2021 10:52

cmarmo added the Superseded PR has been replace by a newer PR label Mar 28, 2022

cmarmo closed this Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Positive sample weight #17178

[MRG] Positive sample weight #17178

arka204 commented May 10, 2020 •

edited

rth commented May 10, 2020

rth commented May 10, 2020

arka204 commented May 10, 2020

adrinjalali commented May 11, 2020

rth commented May 11, 2020

adrinjalali commented May 11, 2020

KumarGanesha1996 May 11, 2020

adrinjalali May 23, 2020

haochunchang May 21, 2020

arka204 May 23, 2020

haochunchang May 21, 2020

haochunchang May 21, 2020

adrinjalali left a comment

adrinjalali May 23, 2020

adrinjalali May 23, 2020

adrinjalali May 23, 2020

adrinjalali May 23, 2020

arka204 May 23, 2020

adrinjalali Jun 5, 2020

arka204 commented May 23, 2020

KumarGanesha1996 left a comment

adrinjalali commented May 25, 2020

arka204 commented May 30, 2020

adrinjalali Jun 5, 2020

adrinjalali Jun 5, 2020

adrinjalali Jun 5, 2020

adrinjalali Jun 5, 2020

adrinjalali Jun 5, 2020

adrinjalali Jun 5, 2020

arka204 commented Jun 8, 2020

adrinjalali Jun 11, 2020

cmarmo commented Aug 6, 2022

		warnings.warn("assume_positive_sample_weights=False - negative "
		"values in sample_weight won't raise an error.")

	force_positive : {True, False or None}
	force_positive : bool, default=None

		If None, assumes value of assume_positive_sample_weights,
		which is initially set to True.

[MRG] Positive sample weight #17178

[MRG] Positive sample weight #17178

Conversation

arka204 commented May 10, 2020 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

rth commented May 10, 2020

rth commented May 10, 2020

arka204 commented May 10, 2020

adrinjalali commented May 11, 2020

rth commented May 11, 2020

adrinjalali commented May 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arka204 commented May 23, 2020

KumarGanesha1996 left a comment

Choose a reason for hiding this comment

adrinjalali commented May 25, 2020

arka204 commented May 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arka204 commented Jun 8, 2020

Choose a reason for hiding this comment

cmarmo commented Aug 6, 2022

arka204 commented May 10, 2020 •

edited