Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiOutputRegressor doesn't support sparse y matrix #16686

Open
allentran opened this issue Mar 13, 2020 · 4 comments · May be fixed by #16800
Open

MultiOutputRegressor doesn't support sparse y matrix #16686

allentran opened this issue Mar 13, 2020 · 4 comments · May be fixed by #16800

Comments

@allentran
Copy link

allentran commented Mar 13, 2020

This snippet works fine because the sparse y is converted to a numpy array.

from sklearn import linear_model, multioutput
from scipy.sparse import csr_matrix

x = csr_matrix((13, 5))
y = csr_matrix((13, 2))

m_reg = multioutput.MultiOutputRegressor(linear_model.SGDRegressor())
m_reg.fit(x, y.toarray())

Same snippet without converting the sparse y matrix:

from sklearn import linear_model, multioutput
from scipy.sparse import csr_matrix

x = csr_matrix((13, 5))
y = csr_matrix((13, 2))

m_reg = multioutput.MultiOutputRegressor(linear_model.SGDRegressor())
m_reg.fit(x, y)

This is the traceback:

/apps/python3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    758                         dtype=None)
    759     else:
--> 760         y = column_or_1d(y, warn=True)
    761         _assert_all_finite(y)
    762     if y_numeric and y.dtype.kind == 'O':

/apps/python3/lib/python3.7/site-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
    795         return np.ravel(y)
    796 
--> 797     raise ValueError("bad input shape {0}".format(shape))
    798 
    799 

ValueError: bad input shape ()

My reading of the source is that the line here does not convert a sparse matrix into a numpy array. That is

np.asarray(csr_matrix((13, 2)))

does not return a numpy array of shape (13, 2), instead it is a shape () array containing the sparse matrix as its only element.

@jnothman
Copy link
Member

I'm curious in what case you'd need a sparse y for regression. Could you describe your use case / application a bit more?

But yes, we could implement densifying each y as we need to...?

@allentran
Copy link
Author

Basically I have a huge sparse Y matrix as well as the X (I'd been using numpy's multi-output lstsq but it does not work for sparse matrices and blows up in RAM). I think it is a common enough use-case (or at least in the docs).

I wrote up a small Keras model to do a similar thing in mini-batches where I convert y.toarray() for each batch on the fly and wanted to see the performance against scikit-learn.

Converting each y as necessary would be fine as long as you don't convert the entire y at once (which obviously the user could just do on their own).

@jnothman
Copy link
Member

jnothman commented Mar 17, 2020 via email

@allentran
Copy link
Author

For context, the use case isn't for prediction, rather I'm interested in the coefficients that are estimated. At least for what I'm using it for, I want the beta/parameter that reflects the average effect for the cross sectional unit, including the zero units.

The hierarchical model definitely makes sense for the prediction case though (something like a latent z that manifests as the observed y > 0 if z >= 0 else 0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants