Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support large sparse matrices in SGD* and SequentialDataset #11355

Open
jnothman opened this issue Jun 25, 2018 · 2 comments
Open

Support large sparse matrices in SGD* and SequentialDataset #11355

jnothman opened this issue Jun 25, 2018 · 2 comments
Labels
Enhancement help wanted Moderate Anything that requires some knowledge of conventions and best practices module:linear_model

Comments

@jnothman
Copy link
Member

Support for sparse matrices having indices, indptr, row or col attributes with int64 dtype was recently added or confirmed for most sparse-supporting estimators.

SGDClassifier and SGDRegressor do not yet support such large sparse matrices (they use accept_large_sparse=False). We could try to fix this.

@jnothman jnothman added Enhancement Moderate Anything that requires some knowledge of conventions and best practices help wanted labels Jun 25, 2018
@TomDLT
Copy link
Member

TomDLT commented Jul 4, 2018

Supporting int64 indices in sparse SGD mainly resorts to updating CSRDataset.
As Cython fused types do not work with class attributes, this will need to use the same template workaround as in #11155, which might become quite heavy.
I wonder if there is not a cleaner way based on fused types, maybe by dropping SequentialDataset.

@rth
Copy link
Member

rth commented Mar 26, 2019

As Cython fused types do not work with class attributes, this will need to use the same template workaround as in #11155, which might become quite heavy.

That or some void pointer arithmetics maybe (cf http://blog.yclin.me/deep/learning/2016/08/08/Fused-Types-Limitation/). Neither seems ideal.

Also actually there is an existing PR for this issue in #6889 . This was also previously reported in #5776.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement help wanted Moderate Anything that requires some knowledge of conventions and best practices module:linear_model
Projects
None yet
Development

No branches or pull requests

4 participants