{{ message }}

# [MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence #5295

Merged
merged 35 commits into from Dec 12, 2016
Merged

# [MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence#5295

merged 35 commits into from Dec 12, 2016

## Conversation

### TomDLT commented Sep 22, 2015 • edited

 This PR is a second part of what have been discussed in #4811. (first part, #4852, merged) It includes : a Multiplicative-Update solver in NMF. This solver is generally slower than the Coordinate-Descent solver, but it handles all beta-divergences (including Frobenius, generalized Kullback_Leibler and Itakura-Saito). a plot to visualize the beta-divergence for several values of beta. benchmarks below Link to the rendered doc plot_beta_divergence.py This change is
mentioned this pull request Sep 22, 2015

### mblondel commented Sep 23, 2015

 I would favor using CD for beta divergences as well. We decided to remove the slow pg solver. I would prefer not add a new slow solver if possible. The KDD paper shows that for generalized KL, we can just use the element-wise Newton update without line search. For other divergences, this will need some verification (unless there are more recent works that cover this?) but in worst case we can implement a simple line search. As a start, it would be nice to implement the CD update for generalized KL to compare.

### tttthomasssss commented Oct 20, 2015

 Could anybody comment on when this is planned to be merged into master? I'm quite keen on using NMF with generalized KL divergence.

### agramfort commented Oct 21, 2015

 @mblondel maybe we can do better than multiplicative updates but I think we should be pragmatic here. We can deprecate after when we have something better. wdyt?
mentioned this pull request Nov 10, 2015

### TomDLT commented Nov 10, 2015 • edited

 Sorry for the long silence, I have not as much time for this as before. @mblondel I spent some time trying to implement the coordinate descent for generalized Kullback Leibler divergence, as described in Coordinate descent with greedy selection, at least in the dense case. However, I did not manage to be faster than multiplicative update, which is not what is indicated in the paper, and probably underlines my own limits (or proves that my implementation of multiplicative update is very good :)). More seriously, even if we manage to be faster, the extension to all beta-divergence is not trivial and would require much more work. I understand that multiplicative update is an old method and probably not the fastest, yet it is still a good way to improve the NMF in scikit-learn, extending to a lot of different loss functions.

### TomDLT commented Nov 19, 2015 • edited

 I compared my implementation of multiplicative update with #2540 (Frobenius norm) and #1348 (KL divergence). Comparing to #2540, the results are identical, provided some slight modifications: To avoid division by zero, what do we prefer?: A / (B + eps) (like in #2540) (A + eps) / (B + eps) (like in #1348) B[B == 0] = eps then A / B I have a preference for the last one, for it does not affect update with nonzero elements. WDYT ? Comparing to #1348, the results are different since it forgets a division in the update (see Eq 5 here), and adds a normalization of H.

### TomDLT commented Jan 22, 2016 • edited

 Some benchmarking: 20_news - sparse (11314, 39116) Faces - dense (400, 4096) MNIST - dense (70000, 784) RCV1 - sparse (804414, 47236) The MU solver is not very fast, but it was implemented not for its speed, but for it handles all beta-divergences. Note that the poor performances with 'nndsvd' initialization (top-left) are expected, since this initialization have a lot of zeros that cannot be modified by a Multiplicative Update.
changed the title [WIP] Add multiplicative-update solver in NMF, with all beta-divergence [MRG] Add multiplicative-update solver in NMF, with all beta-divergence Jan 22, 2016
added the label Apr 26, 2016

### tguillemot commented Aug 18, 2016

 @TomDLT @mblondel @agramfort This PR is mergeable ? I will do a first round of review but I don't know those methods. Someone can have a look with me ?
 return timeset, err import matplotlib.pyplot as plt import pandas

#### tguillemot Aug 18, 2016 Contributor

We can use pandas for benchs ?

#### TomDLT Aug 22, 2016 Author Member

Using it in a benchmark does not make it a dependency of the package, as for matplotlib.

 If beta == 1, this is the generalized Kullback-Leibler divergence If beta == 0, this is the Itakura-Saito divergence Else, this is the general beta-divergence. """

#### tguillemot Aug 18, 2016 • edited Contributor

Can you specify what are X, W, H, and beta with the classical docstring ?

 # np.sum(np.dot(W, H) ** beta) sum_WH_beta = 0 for i in range(X.shape[1]): sum_WH_beta += np.sum(fast_dot(W, H[:, i]) ** beta)

#### tguillemot Aug 18, 2016 Contributor

Is there a better option ?

#### TomDLT Aug 18, 2016 • edited Author Member

This is a temporary fix so as to be memory efficient in the sparse case.
(We cannot compute directly np.dot(W, H)).

We could have a cython code for it, yet the use case is rather limited: sparse X, and beta not in [0, 1, 2] (which are the most useful cases). So I propose to merge it as it is, and improve it in a separate PR.

Good for me :)

### agramfort commented Aug 20, 2016

 @mblondel what's your take on this PR?

### mblondel commented Aug 20, 2016

 I spent some time trying to implement the coordinate descent for generalized Kullback Leibler divergence, as described in Coordinate descent with greedy selection, at least in the dense case. Thanks for the investigation. More seriously, even if we manage to be faster, the extension to all beta-divergence is not trivial and would require much more work. Agreed. @mblondel what's your take on this PR? +1 on my side and sorry for the late reply. As a big fan of CD, I would be potentially interested in the CD code for GKL divergence as it could be faster in the sparse case.

### agramfort commented Aug 20, 2016

 cool. Looks like we can proceed then :) let's make it a WIP->MRG then

### TomDLT commented Aug 22, 2016 • edited

 let's make it a WIP->MRG then it is, all reviews will be greatly appreciated.
 However, NMF can also be used with a different function to measure the distance between X and the matrix product WH. Another typical distance function used in NMF is the (generalized) Kullback-Leibler (KL) divergence, also referred as I-divergence:

#### tguillemot Sep 1, 2016 Contributor

I would change a bit the formulation :
"Other distance functions can be used in NMF as, for example, the (generalized) Kullback-Leibler (KL) divergence, also referred as I-divergence:

..math::
d_{KL}(X, Y) = \sum_{i,j} (X_{ij} * log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})

Or, the the Itakura-Saito (IS) divergence:"
See if you prefer.

 d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - log(\frac{X_{ij}}{Y_{ij}}) - 1) These three distances are special cases of the beta-divergence family, with :math:\beta = 2, 1, 0 respectively [Fevotte, 2011]. The beta-divergence are

#### tguillemot Sep 1, 2016 Contributor

Maybe you can add a link to [Fevotte, 2011](even if it's on the references section)

 also referred as I-divergence: .. math:: d_{KL}(X, Y) = \sum_{i,j} (X_{ij} * log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})

#### tguillemot Sep 1, 2016 Contributor

Remove the multiplication symbol *

added 3 commits Oct 6, 2016
 minor leftovers 
 2af4a23 
 non-ascii and nitpick 
 259d827 
 safe_min instead of min 
 057c70c 

 Done

### amueller commented Oct 26, 2016

 Thanks. Hm so @mblondel I saw a +1 from you up there, but that wasn't a full review, was it? @agramfort did you review? I see +1 from @ogrisel

### agramfort commented Oct 27, 2016

 no I did not. No time :(

### VictorBst commented Nov 29, 2016

 Hi, @TomDLT and @agramfort invited to have a look at the code. I mostly looked at the math aspect of the code and it seems to be correct. After quick experiments on my data, everything seems to work as intended and gives similar results to other simple NMF python codes I tested. As a side note, this one was always either equal or the fastest in my tests, especially for higher dimensions.
 solve conflict with master 
 a9ac84a 

### amueller commented Nov 29, 2016

 hm I'm not sure if this should be merged before #7927, there are some conflicts, I think.... opinions @TomDLT ?

### TomDLT commented Nov 29, 2016

 ok let's wait for #7927

### ogrisel commented Dec 12, 2016

 @TomDLT now that #7927 has been merged, this needs a rebase.
added 2 commits Dec 12, 2016
 Merge branch 'master' into nmf_mu 
 5e017c6 
 minor doc update 
 928ea89 

### TomDLT commented Dec 12, 2016

 Done
merged commit ae4f710 into scikit-learn:master Dec 12, 2016
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

### ogrisel commented Dec 12, 2016

 Thanks @TomDLT (and @VictorBst for the review). I squash-merged. 🍻

### TomDLT commented Dec 12, 2016

 🍾 Cool ! Thanks a lot for the reviews ! 🍾

### tguillemot commented Dec 12, 2016

 Yeah !!!

### amueller commented Dec 12, 2016

 Awesome! Congrats!
deleted the TomDLT:nmf_mu branch Dec 20, 2016
added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
 [MRG+1] Add multiplicative-update solver in NMF, with all beta-diverg… 
 6ff493e 
…ence (scikit-learn#5295)
mentioned this pull request Mar 17, 2017
added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
 [MRG+1] Add multiplicative-update solver in NMF, with all beta-diverg… 
 d52e939 
…ence (scikit-learn#5295)
added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
 [MRG+1] Add multiplicative-update solver in NMF, with all beta-diverg… 
 8e6b103 
…ence (scikit-learn#5295)
added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
 [MRG+1] Add multiplicative-update solver in NMF, with all beta-diverg… 
 1e20981 
…ence (scikit-learn#5295)
added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
 [MRG+1] Add multiplicative-update solver in NMF, with all beta-diverg… 
 fb39155 
…ence (scikit-learn#5295)