Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiOutputRegressor: Support for matrices of sample_weight #10912

Open
Celelibi opened this issue Apr 3, 2018 · 3 comments
Open

MultiOutputRegressor: Support for matrices of sample_weight #10912

Celelibi opened this issue Apr 3, 2018 · 3 comments

Comments

@Celelibi
Copy link

Celelibi commented Apr 3, 2018

Description

With MultiOutputRegressor.fit, It would be nice to be able to provide one sample_weight vector per output. The gain would just be the ease of use when you have several regression to perform at once instead of having necessarily one estimator per regression. This shouldn't be difficult to implement.

I think it would be nice also to have this directly supported by the underlying estimators that support multiple outputs (like most linear regressors). However, I have no idea how hard it could be to implement.

Steps/Code to Reproduce

Expected usage example.

#!/usr/bin/env python3

import numpy as np
import sklearn as sk
import sklearn.linear_model
import sklearn.multioutput

X = np.random.random((10, 10))
y = np.random.random((10, 2))
w = np.random.random((10, 2))
reg = sk.linear_model.LinearRegression()
reg = sk.multioutput.MultiOutputRegressor(reg)
reg.fit(X, y, w)

Expected Results

Perform the multiple regressions using one column of w as sample_weight per call to fit to the underlying estimator.

Actual Results

Unsupported yet.

Versions

Python 3.6.5rc1 (default, Mar 14 2018, 06:54:23) 
[GCC 7.3.0]
NumPy 1.14.0
SciPy 0.19.1
Scikit-Learn 0.19.1
@jnothman
Copy link
Member

jnothman commented Apr 3, 2018

What is the use-case?

@Celelibi
Copy link
Author

Celelibi commented Apr 3, 2018

I am unsure why the specific use-case matters, but here it is.
From the features I have 5 outputs to predict. The values of those outputs follow more-or-less a normal distribution, and are (assumed to be) independent.
As it turns out, it's more important to predict correctly the samples whose output is far from average, while we don't care much about being wrong with the samples whose output is close to the average.

Therefore I wanted to try to put some weight on the samples so that the rarer they are (according to the estimated distribution) the higher the weight. I would then have 5 weight vectors, one per target, since there are 5 outputs.

@kevin1kevin1k
Copy link
Contributor

I'm in the same situation, hoping for such a feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants