Skip to content


Subversion checkout URL

You can clone with
Download ZIP


RidgeCV triggers a call to toarray on sparse matrix input #1921

ogrisel opened this Issue · 4 comments

4 participants


and thus causes a MemoryError on high dimensional data.

Details here:


As I answered on SO, using gcv_mode="eigen" should allow to work around the problem.

To fix the problem in scikit-learn, we need use a proper sparse SVD when gcv_mode="svd". We can also automatically choose the "eigen" mode when gcv_mode="auto" and the data is sparse (currently "auto" uses the svd mode if n_samples > n_features, otherwise it uses the eigen mode).

BTW, I didn't write this part of the code...

@larsmans larsmans referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@larsmans larsmans referenced this issue from a commit in larsmans/scikit-learn
@larsmans larsmans BUG disable memory-blowing SVD for sparse input in RidgeCV
Fixes #1921.
@larsmans larsmans closed this in daf5277

Certainly one can construct arrays that blow up any method in memory usage? So here the 'fix' was to not only make eigen the default for sparse arrays but to force all sparse inputs to eigen and actually ban them from SVD? That definitely broke other usage, see #2354


Error is still occuring. (Latest version of sklearn, Windows 8.1 , MNIST data [Don't know if that counts as sparse or not])


@ddofer MNIST is typically loaded as a dense numpy array. This is probably a different issue. Can you please a separate issue with a minimalistic reproduction script and the full traceback you observe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.