Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast NMF algorithm for dense and sparse data #896

Closed
mblondel opened this issue Jun 8, 2012 · 8 comments
Closed

Fast NMF algorithm for dense and sparse data #896

mblondel opened this issue Jun 8, 2012 · 8 comments
Milestone

Comments

@mblondel
Copy link
Member

mblondel commented Jun 8, 2012

Here's an algorithm which I think would be a good candidate for inclusion in scikit-learn:

http://www.cs.utexas.edu/~cjhsieh/nmf/

@ogrisel
Copy link
Member

ogrisel commented Jun 8, 2012

I would rather like an algorithm that scales with n_samples rather than n_features. Maybe an Averaged SGD optimization of the NMF cost function + positive projections?

@mblondel
Copy link
Member Author

mblondel commented Jun 9, 2012

Their algorithm didn't gave me the impression that it doesn't scale wrt n_samples (the datasets they use for their experiments are pretty large). And variable selection seems to help accelerate convergence a lot.

BTW, the tricks necessary for efficient implementation of averaging in the sparse case may not be applicable if there's a projection step (to be verified).

@ogrisel
Copy link
Member

ogrisel commented Jun 9, 2012

Ok interesting. The code seems simple enough to implement too.

@mblondel
Copy link
Member Author

I added a preliminary implementation of this method here:
https://gist.github.com/mblondel/09648344984565f9477a

A difference is that my implementation uses cyclic coordinate selection instead of greedy.

CC @vene @larsmans

@mblondel
Copy link
Member Author

I obtained a 20x speed up by numba-ing the most computationally expensive part. Now computing the NMF on the full News20 dataset takes 10 seconds.

https://gist.github.com/mblondel/09648344984565f9477a

@amueller
Copy link
Member

should we close this in favor of #4811?

@mblondel
Copy link
Member Author

Let's close it when #4852 is merged :)

@amueller
Copy link
Member

#4852 is merged now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants