Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with negative values in sample_weight #12464

Open
agamemnonc opened this issue Oct 26, 2018 · 6 comments
Open

Issues with negative values in sample_weight #12464

agamemnonc opened this issue Oct 26, 2018 · 6 comments
Assignees

Comments

@agamemnonc
Copy link
Contributor

Description

I am not sure what the interpretation of a negative value in sample_weight might be and why this should be supported, but I believe that there should be constraints in using non-negative values in several cases; the use of negative ones can lead to some very strange results.

See an example below for r2_score where the use of negative weights yields a value larger than one, which really does not make sense.

Steps/Code to Reproduce

import numpy as np

from sklearn.metrics import r2_score

np.random.seed(seed=2)
x = np.random.randn(100,)
y = x + 0.3*np.random.randn(*x.shape)
w = np.random.randn(*x.shape)

r2_score(x, y, sample_weight=w)

Expected Results

Something smaller or equal to 1.0

Actual Results

1.1919195778883198

Versions

System:
python: 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users\nak142\Miniconda3\envs\sklearn_contrib\pythonw.exe
machine: Windows-10-10.0.17134-SP0

BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: C:/Users/nak142/Miniconda3/envs/sklearn_contrib\Library\lib
cblas_libs: mkl_rt

Python deps:
pip: 10.0.1
setuptools: 40.0.0
sklearn: 0.21.dev0
numpy: 1.15.0
scipy: 1.1.0
Cython: 0.28.5
pandas: None

@amueller
Copy link
Member

We had people from high energy physics argue strongly in the other direction.
I think for most people negative sample weights make no sense.
A possible way around this could be to add a global flag that optionally allows sample weights and deprecate the use of negative sample weights if the flag is not set.

@amueller
Copy link
Member

Original issue in #3774. Maybe @ndawe is still around and could say if he still thinks the feature is useful and if having to explicitly enable it with a global flag would be acceptable.

@amueller
Copy link
Member

also ping @alexpearce

@agamemnonc
Copy link
Contributor Author

Apologies I missed that other thread; I did a quick search but it didn't show up.

Please feel free to close this issue and continue the discussion on the other one should you wish to.

@alexpearce
Copy link

Thanks for the ping @amueller.

I still think the feature is very useful. Samples with negative weights play a critical role in the area of high energy physics that I work in. But one does need to understand that one can easily arrive at nonsensical results when using them, so having a warning/error is probably a good sanity check for many use-cases.

I think a global flag is a nice idea, as then the user has to admit "I think I know what I'm doing" and has to interpret any results accordingly.

@kmqanda
Copy link

kmqanda commented Mar 28, 2022

Would like to check the status of this open item, having a global flag enabling negative sample weight is helpful in my use case. Alternatively, I uses a direct implementation to allow -ve weights:

xw = x * w
xwx = np.dot(xw.T, x)
xwx_inv = np.linalg.inv(xwx)
coef = np.dot(xwx_inv,np.dot(xw.T,y))
xf = np.dot(x, coef)
res = y - xf
res_wss = np.dot(w.T, (res ** 2))
tot_wss = np.dot(w.T, (y ** 2))
wr2 = 1 - res_wss / tot_wss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants