New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RidgeCV triggers a call to toarray on sparse matrix input #1921
Comments
As I answered on SO, using gcv_mode="eigen" should allow to work around the problem. To fix the problem in scikit-learn, we need use a proper sparse SVD when gcv_mode="svd". We can also automatically choose the "eigen" mode when gcv_mode="auto" and the data is sparse (currently "auto" uses the svd mode if n_samples > n_features, otherwise it uses the eigen mode). BTW, I didn't write this part of the code... |
Certainly one can construct arrays that blow up any method in memory usage? So here the 'fix' was to not only make eigen the default for sparse arrays but to force all sparse inputs to eigen and actually ban them from SVD? That definitely broke other usage, see #2354 |
Error is still occuring. (Latest version of sklearn, Windows 8.1 , MNIST data [Don't know if that counts as sparse or not]) |
@ddofer MNIST is typically loaded as a dense numpy array. This is probably a different issue. Can you please a separate issue with a minimalistic reproduction script and the full traceback you observe? |
and thus causes a
MemoryError
on high dimensional data.Details here: http://stackoverflow.com/a/16351308/163740
The text was updated successfully, but these errors were encountered: