RidgeCV triggers a call to toarray on sparse matrix input #1921

Closed
ogrisel opened this Issue May 3, 2013 · 4 comments

Comments

Projects
None yet
4 participants
Owner

ogrisel commented May 3, 2013

and thus causes a MemoryError on high dimensional data.

Details here: http://stackoverflow.com/a/16351308/163740

Owner

mblondel commented May 3, 2013

As I answered on SO, using gcv_mode="eigen" should allow to work around the problem.

To fix the problem in scikit-learn, we need use a proper sparse SVD when gcv_mode="svd". We can also automatically choose the "eigen" mode when gcv_mode="auto" and the data is sparse (currently "auto" uses the svd mode if n_samples > n_features, otherwise it uses the eigen mode).

BTW, I didn't write this part of the code...

larsmans added a commit to larsmans/scikit-learn that referenced this issue May 3, 2013

@larsmans larsmans closed this in daf5277 May 3, 2013

Contributor

bdkearns commented Oct 26, 2013

Certainly one can construct arrays that blow up any method in memory usage? So here the 'fix' was to not only make eigen the default for sparse arrays but to force all sparse inputs to eigen and actually ban them from SVD? That definitely broke other usage, see #2354

ddofer commented Apr 24, 2015

Error is still occuring. (Latest version of sklearn, Windows 8.1 , MNIST data [Don't know if that counts as sparse or not])

Owner

ogrisel commented Apr 24, 2015

@ddofer MNIST is typically loaded as a dense numpy array. This is probably a different issue. Can you please a separate issue with a minimalistic reproduction script and the full traceback you observe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment