Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence #5295

Merged
merged 35 commits into from Dec 12, 2016

Conversation

@TomDLT
Copy link
Member

@TomDLT TomDLT commented Sep 22, 2015

This PR is a second part of what have been discussed in #4811. (first part, #4852, merged)
It includes :

  • a Multiplicative-Update solver in NMF. This solver is generally slower than the Coordinate-Descent solver, but it handles all beta-divergences (including Frobenius, generalized Kullback_Leibler and Itakura-Saito).
  • a plot to visualize the beta-divergence for several values of beta.
  • benchmarks below

Link to the rendered doc


plot_beta_divergence.py
beta_div


This change is Reviewable

@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from f4405fb to d9d7d44 Sep 22, 2015
@mblondel
Copy link
Member

@mblondel mblondel commented Sep 23, 2015

I would favor using CD for beta divergences as well. We decided to remove the slow pg solver. I would prefer not add a new slow solver if possible. The KDD paper shows that for generalized KL, we can just use the element-wise Newton update without line search. For other divergences, this will need some verification (unless there are more recent works that cover this?) but in worst case we can implement a simple line search. As a start, it would be nice to implement the CD update for generalized KL to compare.

@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from d9d7d44 to 2f2f44a Sep 23, 2015
@tttthomasssss
Copy link
Contributor

@tttthomasssss tttthomasssss commented Oct 20, 2015

Could anybody comment on when this is planned to be merged into master? I'm quite keen on using NMF with generalized KL divergence.

@agramfort
Copy link
Member

@agramfort agramfort commented Oct 21, 2015

@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from 2f2f44a to cfc6955 Nov 10, 2015
@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Nov 10, 2015

Sorry for the long silence, I have not as much time for this as before.
@mblondel I spent some time trying to implement the coordinate descent for generalized Kullback Leibler divergence, as described in Coordinate descent with greedy selection, at least in the dense case.
However, I did not manage to be faster than multiplicative update, which is not what is indicated in the paper, and probably underlines my own limits (or proves that my implementation of multiplicative update is very good :)). More seriously, even if we manage to be faster, the extension to all beta-divergence is not trivial and would require much more work.

I understand that multiplicative update is an old method and probably not the fastest, yet it is still a good way to improve the NMF in scikit-learn, extending to a lot of different loss functions.

@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Nov 19, 2015

I compared my implementation of multiplicative update with #2540 (Frobenius norm) and #1348 (KL divergence).


Comparing to #2540, the results are identical, provided some slight modifications:
To avoid division by zero, what do we prefer?:

  1. A / (B + eps) (like in #2540)
  2. (A + eps) / (B + eps) (like in #1348)
  3. B[B == 0] = eps then A / B
    I have a preference for the last one, for it does not affect update with nonzero elements. WDYT ?

Comparing to #1348, the results are different since it forgets a division in the update
(see Eq 5 here), and adds a normalization of H.

@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from cfc6955 to a68d728 Nov 19, 2015
@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from a68d728 to a44f2b7 Jan 22, 2016
@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Jan 22, 2016

Some benchmarking:
20_news - sparse (11314, 39116)
20news
Faces - dense (400, 4096)
faces
MNIST - dense (70000, 784)
mnist
RCV1 - sparse (804414, 47236)
rcv1

The MU solver is not very fast, but it was implemented not for its speed, but for it handles all beta-divergences. Note that the poor performances with 'nndsvd' initialization (top-left) are expected, since this initialization have a lot of zeros that cannot be modified by a Multiplicative Update.

@TomDLT TomDLT changed the title [WIP] Add multiplicative-update solver in NMF, with all beta-divergence [MRG] Add multiplicative-update solver in NMF, with all beta-divergence Jan 22, 2016
@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from a44f2b7 to ec497a6 Jan 25, 2016
@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from ec497a6 to 9f668a2 Apr 26, 2016
@tguillemot
Copy link
Contributor

@tguillemot tguillemot commented Aug 18, 2016

@TomDLT @mblondel @agramfort This PR is mergeable ?
I will do a first round of review but I don't know those methods. Someone can have a look with me ?


return timeset, err
import matplotlib.pyplot as plt
import pandas

This comment has been minimized.

@tguillemot

tguillemot Aug 18, 2016
Contributor

We can use pandas for benchs ?

This comment has been minimized.

@TomDLT

TomDLT Aug 22, 2016
Author Member

Using it in a benchmark does not make it a dependency of the package, as for matplotlib.

If beta == 1, this is the generalized Kullback-Leibler divergence
If beta == 0, this is the Itakura-Saito divergence
Else, this is the general beta-divergence.
"""

This comment has been minimized.

@tguillemot

tguillemot Aug 18, 2016
Contributor

Can you specify what are X, W, H, and beta with the classical docstring ?

# np.sum(np.dot(W, H) ** beta)
sum_WH_beta = 0
for i in range(X.shape[1]):
sum_WH_beta += np.sum(fast_dot(W, H[:, i]) ** beta)

This comment has been minimized.

@tguillemot

tguillemot Aug 18, 2016
Contributor

Is there a better option ?

This comment has been minimized.

@TomDLT

TomDLT Aug 18, 2016
Author Member

This is a temporary fix so as to be memory efficient in the sparse case.
(We cannot compute directly np.dot(W, H)).

We could have a cython code for it, yet the use case is rather limited: sparse X, and beta not in [0, 1, 2] (which are the most useful cases). So I propose to merge it as it is, and improve it in a separate PR.

This comment has been minimized.

@tguillemot

tguillemot Aug 18, 2016
Contributor

Good for me :)

@agramfort
Copy link
Member

@agramfort agramfort commented Aug 20, 2016

@mblondel what's your take on this PR?

@mblondel
Copy link
Member

@mblondel mblondel commented Aug 20, 2016

I spent some time trying to implement the coordinate descent for generalized Kullback Leibler divergence, as described in Coordinate descent with greedy selection, at least in the dense case.

Thanks for the investigation.

More seriously, even if we manage to be faster, the extension to all beta-divergence is not trivial and would require much more work.

Agreed.

@mblondel what's your take on this PR?

+1 on my side and sorry for the late reply. As a big fan of CD, I would be potentially interested in the CD code for GKL divergence as it could be faster in the sparse case.

@agramfort
Copy link
Member

@agramfort agramfort commented Aug 20, 2016

@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Aug 22, 2016

let's make it a WIP->MRG then

it is, all reviews will be greatly appreciated.

@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from 9f668a2 to 9f81779 Aug 22, 2016
However, NMF can also be used with a different function to measure the
distance between X and the matrix product WH. Another typical distance
function used in NMF is the (generalized) Kullback-Leibler (KL) divergence,
also referred as I-divergence:

This comment has been minimized.

@tguillemot

tguillemot Sep 1, 2016
Contributor

I would change a bit the formulation :
"Other distance functions can be used in NMF as, for example, the (generalized) Kullback-Leibler (KL) divergence, also referred as I-divergence:

..math::
d_{KL}(X, Y) = \sum_{i,j} (X_{ij} * log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})

Or, the the Itakura-Saito (IS) divergence:"
See if you prefer.

d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - log(\frac{X_{ij}}{Y_{ij}}) - 1)
These three distances are special cases of the beta-divergence family, with
:math:`\beta = 2, 1, 0` respectively [Fevotte, 2011]. The beta-divergence are

This comment has been minimized.

@tguillemot

tguillemot Sep 1, 2016
Contributor

Maybe you can add a link to [Fevotte, 2011](even if it's on the references section)

also referred as I-divergence:

.. math::
d_{KL}(X, Y) = \sum_{i,j} (X_{ij} * log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})

This comment has been minimized.

@tguillemot

tguillemot Sep 1, 2016
Contributor

Remove the multiplication symbol *

@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from b5588db to c28ce4b Sep 1, 2016
@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from 37a6491 to 057c70c Oct 26, 2016
@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Oct 26, 2016

Done

@amueller
Copy link
Member

@amueller amueller commented Oct 26, 2016

Thanks. Hm so @mblondel I saw a +1 from you up there, but that wasn't a full review, was it? @agramfort did you review? I see +1 from @ogrisel

@agramfort
Copy link
Member

@agramfort agramfort commented Oct 27, 2016

@VictorBst
Copy link

@VictorBst VictorBst commented Nov 29, 2016

Hi, @TomDLT and @agramfort invited to have a look at the code.

I mostly looked at the math aspect of the code and it seems to be correct. After quick experiments on my data, everything seems to work as intended and gives similar results to other simple NMF python codes I tested. As a side note, this one was always either equal or the fastest in my tests, especially for higher dimensions.

@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from e508811 to bae1ede Nov 29, 2016
@TomDLT TomDLT force-pushed the TomDLT:nmf_mu branch from bae1ede to a9ac84a Nov 29, 2016
@amueller
Copy link
Member

@amueller amueller commented Nov 29, 2016

hm I'm not sure if this should be merged before #7927, there are some conflicts, I think.... opinions @TomDLT ?

@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Nov 29, 2016

ok let's wait for #7927

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Dec 12, 2016

@TomDLT now that #7927 has been merged, this needs a rebase.

TomDLT added 2 commits Dec 12, 2016
@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Dec 12, 2016

Done

@ogrisel ogrisel merged commit ae4f710 into scikit-learn:master Dec 12, 2016
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ogrisel
Copy link
Member

@ogrisel ogrisel commented Dec 12, 2016

Thanks @TomDLT (and @VictorBst for the review). I squash-merged. 🍻

@TomDLT
Copy link
Member Author

@TomDLT TomDLT commented Dec 12, 2016

🍾 Cool ! Thanks a lot for the reviews ! 🍾

@tguillemot
Copy link
Contributor

@tguillemot tguillemot commented Dec 12, 2016

Yeah !!!

@amueller
Copy link
Member

@amueller amueller commented Dec 12, 2016

Awesome! Congrats!

@TomDLT TomDLT deleted the TomDLT:nmf_mu branch Dec 20, 2016
sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants
You can’t perform that action at this time.