Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence #5295

Merged
merged 35 commits into from
Dec 12, 2016

Conversation

TomDLT
Copy link
Member

@TomDLT TomDLT commented Sep 22, 2015

This PR is a second part of what have been discussed in #4811. (first part, #4852, merged)
It includes :

  • a Multiplicative-Update solver in NMF. This solver is generally slower than the Coordinate-Descent solver, but it handles all beta-divergences (including Frobenius, generalized Kullback_Leibler and Itakura-Saito).
  • a plot to visualize the beta-divergence for several values of beta.
  • benchmarks below

Link to the rendered doc


plot_beta_divergence.py
beta_div


This change is Reviewable

@mblondel
Copy link
Member

I would favor using CD for beta divergences as well. We decided to remove the slow pg solver. I would prefer not add a new slow solver if possible. The KDD paper shows that for generalized KL, we can just use the element-wise Newton update without line search. For other divergences, this will need some verification (unless there are more recent works that cover this?) but in worst case we can implement a simple line search. As a start, it would be nice to implement the CD update for generalized KL to compare.

@tttthomasssss
Copy link
Contributor

Could anybody comment on when this is planned to be merged into master? I'm quite keen on using NMF with generalized KL divergence.

@agramfort
Copy link
Member

agramfort commented Oct 21, 2015 via email

@TomDLT
Copy link
Member Author

TomDLT commented Nov 10, 2015

Sorry for the long silence, I have not as much time for this as before.
@mblondel I spent some time trying to implement the coordinate descent for generalized Kullback Leibler divergence, as described in Coordinate descent with greedy selection, at least in the dense case.
However, I did not manage to be faster than multiplicative update, which is not what is indicated in the paper, and probably underlines my own limits (or proves that my implementation of multiplicative update is very good :)). More seriously, even if we manage to be faster, the extension to all beta-divergence is not trivial and would require much more work.

I understand that multiplicative update is an old method and probably not the fastest, yet it is still a good way to improve the NMF in scikit-learn, extending to a lot of different loss functions.

@TomDLT
Copy link
Member Author

TomDLT commented Nov 19, 2015

I compared my implementation of multiplicative update with #2540 (Frobenius norm) and #1348 (KL divergence).


Comparing to #2540, the results are identical, provided some slight modifications:
To avoid division by zero, what do we prefer?:

  1. A / (B + eps) (like in [WIP] NMF estimator based on the Lee and Seung algorithm #2540)
  2. (A + eps) / (B + eps) (like in NMF for Kullback-Leibler divergence #1348)
  3. B[B == 0] = eps then A / B
    I have a preference for the last one, for it does not affect update with nonzero elements. WDYT ?

Comparing to #1348, the results are different since it forgets a division in the update
(see Eq 5 here), and adds a normalization of H.

@TomDLT
Copy link
Member Author

TomDLT commented Jan 22, 2016

Some benchmarking:
20_news - sparse (11314, 39116)
20news
Faces - dense (400, 4096)
faces
MNIST - dense (70000, 784)
mnist
RCV1 - sparse (804414, 47236)
rcv1

The MU solver is not very fast, but it was implemented not for its speed, but for it handles all beta-divergences. Note that the poor performances with 'nndsvd' initialization (top-left) are expected, since this initialization have a lot of zeros that cannot be modified by a Multiplicative Update.

@TomDLT TomDLT changed the title [WIP] Add multiplicative-update solver in NMF, with all beta-divergence [MRG] Add multiplicative-update solver in NMF, with all beta-divergence Jan 22, 2016
@tguillemot
Copy link
Contributor

@TomDLT @mblondel @agramfort This PR is mergeable ?
I will do a first round of review but I don't know those methods. Someone can have a look with me ?


return timeset, err
import matplotlib.pyplot as plt
import pandas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use pandas for benchs ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using it in a benchmark does not make it a dependency of the package, as for matplotlib.

@agramfort
Copy link
Member

@mblondel what's your take on this PR?

@mblondel
Copy link
Member

I spent some time trying to implement the coordinate descent for generalized Kullback Leibler divergence, as described in Coordinate descent with greedy selection, at least in the dense case.

Thanks for the investigation.

More seriously, even if we manage to be faster, the extension to all beta-divergence is not trivial and would require much more work.

Agreed.

@mblondel what's your take on this PR?

+1 on my side and sorry for the late reply. As a big fan of CD, I would be potentially interested in the CD code for GKL divergence as it could be faster in the sparse case.

@agramfort
Copy link
Member

agramfort commented Aug 20, 2016 via email

@TomDLT
Copy link
Member Author

TomDLT commented Aug 22, 2016

let's make it a WIP->MRG then

it is, all reviews will be greatly appreciated.

However, NMF can also be used with a different function to measure the
distance between X and the matrix product WH. Another typical distance
function used in NMF is the (generalized) Kullback-Leibler (KL) divergence,
also referred as I-divergence:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change a bit the formulation :
"Other distance functions can be used in NMF as, for example, the (generalized) Kullback-Leibler (KL) divergence, also referred as I-divergence:

..math::
d_{KL}(X, Y) = \sum_{i,j} (X_{ij} * log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})

Or, the the Itakura-Saito (IS) divergence:"
See if you prefer.

@TomDLT
Copy link
Member Author

TomDLT commented Oct 26, 2016

Done

@amueller
Copy link
Member

Thanks. Hm so @mblondel I saw a +1 from you up there, but that wasn't a full review, was it? @agramfort did you review? I see +1 from @ogrisel

@agramfort
Copy link
Member

agramfort commented Oct 27, 2016 via email

@VictorBst
Copy link

Hi, @TomDLT and @agramfort invited to have a look at the code.

I mostly looked at the math aspect of the code and it seems to be correct. After quick experiments on my data, everything seems to work as intended and gives similar results to other simple NMF python codes I tested. As a side note, this one was always either equal or the fastest in my tests, especially for higher dimensions.

@amueller
Copy link
Member

hm I'm not sure if this should be merged before #7927, there are some conflicts, I think.... opinions @TomDLT ?

@TomDLT
Copy link
Member Author

TomDLT commented Nov 29, 2016

ok let's wait for #7927

@ogrisel
Copy link
Member

ogrisel commented Dec 12, 2016

@TomDLT now that #7927 has been merged, this needs a rebase.

@TomDLT
Copy link
Member Author

TomDLT commented Dec 12, 2016

Done

@ogrisel ogrisel merged commit ae4f710 into scikit-learn:master Dec 12, 2016
@ogrisel
Copy link
Member

ogrisel commented Dec 12, 2016

Thanks @TomDLT (and @VictorBst for the review). I squash-merged. 🍻

@TomDLT
Copy link
Member Author

TomDLT commented Dec 12, 2016

🍾 Cool ! Thanks a lot for the reviews ! 🍾

@tguillemot
Copy link
Contributor

Yeah !!!

@amueller
Copy link
Member

Awesome! Congrats!

@TomDLT TomDLT deleted the nmf_mu branch December 20, 2016 14:19
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants