Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

RidgeCV triggers a call to toarray on sparse matrix input #1921

Closed
ogrisel opened this Issue · 4 comments

4 participants

@ogrisel
Owner

and thus causes a MemoryError on high dimensional data.

Details here: http://stackoverflow.com/a/16351308/163740

@mblondel
Owner

As I answered on SO, using gcv_mode="eigen" should allow to work around the problem.

To fix the problem in scikit-learn, we need use a proper sparse SVD when gcv_mode="svd". We can also automatically choose the "eigen" mode when gcv_mode="auto" and the data is sparse (currently "auto" uses the svd mode if n_samples > n_features, otherwise it uses the eigen mode).

BTW, I didn't write this part of the code...

@larsmans larsmans referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@larsmans larsmans referenced this issue from a commit in larsmans/scikit-learn
@larsmans larsmans BUG disable memory-blowing SVD for sparse input in RidgeCV
Fixes #1921.
48ed65e
@larsmans larsmans closed this in daf5277
@bdkearns

Certainly one can construct arrays that blow up any method in memory usage? So here the 'fix' was to not only make eigen the default for sparse arrays but to force all sparse inputs to eigen and actually ban them from SVD? That definitely broke other usage, see #2354

@ddofer

Error is still occuring. (Latest version of sklearn, Windows 8.1 , MNIST data [Don't know if that counts as sparse or not])

@ogrisel
Owner

@ddofer MNIST is typically loaded as a dense numpy array. This is probably a different issue. Can you please a separate issue with a minimalistic reproduction script and the full traceback you observe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.