[WIP] Add L-bfgs solver for NMF #2263

Closed
wants to merge 2 commits into
from

Projects

None yet

5 participants

Owner
vene commented Jul 26, 2013

Addresses #2124. Actually not yet, because I need to make the l-bfgs solver efficient on sparse data in the way @mblondel suggested.

I will benchmark and see in what cases and with what settings which solver is better, and implement 'auto'.

I will also try profiling it a bit.

Owner
vene commented Jul 27, 2013

I am not pleased with the performance of L-bfgs here compared to the current solver; probably because of the loop over n_targets (99% of the time is spent in the call for l-bfgs) I will actually adapt the current solver for use in transform and make a new PR.

Owner
vene commented Jul 27, 2013

I wasted too much time on this so I'll go for the quick fix, but I would appreciate any input on this approach.

Basically the l-bfgs converges ok withouth the adaptive n_iter as done for the pg, and this is good because it gets better reconstruction error than pg with adaptive n_iter.

Owner

+1 for giving up the L-BFGS-B solver for NMF. In retrospect, I now realize is more suitable for learning a single vector.

Owner
vene commented Jul 27, 2013

@mblondel,

@fabianp told me today that he has an application where he uses l-bfgs on
multiple outputs by concatenating the target vectors. I think the point
was that convergence is harder but iterations are much faster and it ends
up ok.

On Sat, Jul 27, 2013 at 2:46 PM, Mathieu Blondel
notifications@github.comwrote:

+1 for giving up the L-BFGS-B solver for NMF. In retrospect, I now realize
is more suitable for learning a single vector.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2263#issuecomment-21664906
.

Owner

But then you need to concatenate X n_target times right?

Owner
vene commented Jul 27, 2013

Maybe he's doing it with strides?

On Sat, Jul 27, 2013 at 3:00 PM, Mathieu Blondel
notifications@github.comwrote:

But then you need to concatenate X n_target times right?


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2263#issuecomment-21665086
.

Owner

Maybe he's doing it with strides?

Yey! np.lib.stride_tricks.as_strided! Use with a safety net

Owner
vene commented Jul 27, 2013

@gaelvaroquaux do you know for a fact that this is used in that solver?

On Sat, Jul 27, 2013 at 4:18 PM, Gael Varoquaux notifications@github.comwrote:

Maybe he's doing it with strides?

Yey! np.lib.stride_tricks.as_strided! Use with a safety net


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2263#issuecomment-21666117
.

Owner

@gaelvaroquaux do you know for a fact that this is used in that solver?

No

Owner

This is using L-BFGS as the inner loop of the alternating NLS algorithm, right? Is there any reason to not just use L-BFGS-B to solve the whole problem in one go? I'm asking because even after my recent optimizations, I still find p-g NMF to run too slowly for my needs (topic modeling on large document sets).

If we can just use a single L-BFGS, then an OnlineNMF that does a few iterations of optimization in a partial_fit isn't too far away.

As I understand L-BFGS-B, it's already a projected gradient method.

Owner
vene commented Sep 20, 2013

Hi Lars,

You mean to solve ||X - WH|| in one go? How would you even go about doing
this? It's non-convex as a whole (but convex in each of W, H given the
other).

IIRC the slowdown when using L-BFGS is because you need to do a for loop
over the columns.
I remember you mentioned having a wrapper for Naoaki Okazaki's liblbfgs
which also supports L1 regularization; does it do multi-target natively? It
might be a good idea to build an NMF implementation on top of that.

On Fri, Sep 20, 2013 at 4:44 PM, Lars Buitinck notifications@github.comwrote:

This is using L-BFGS as the inner loop of the alternating NLS algorithm,
right? Is there any reason to not just use L-BFGS-B to solve the whole
problem in one go? I'm asking because even after my recent optimizations, I
still find p-g NMF to run too slowly for my needs (topic modeling on large
document sets).

If we can just use a single L-BFGS, then an OnlineNMF that does a few
iterations of optimization in a partial_fit isn't too far away.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2263#issuecomment-24815373
.

Owner

What I want to try is just feed L-BFGS-B the cost function and gradient of the full problem with non-negativity enforced by its internal box constraints. The problem is non-convex, but GD algorithms have been thrown at it before, and L-BFGS is also popular among neural net people who typically deal with non-convex problems. I actually got it to find a pretty good reconstruction today, even with a numerically estimated gradient. But maybe alternating is better in the end.

I don't see why, even in the alternating setup, you'd need to do the columns separately? It's still optimizing a single scalar loss function on a vector of parameters.

As for liblbfgs, it solves anything a vanilla L-BFGS solves + L1 reg: any differentiable scalar loss function. But the thought had crossed my mind too :)

Owner

We can probably close this in light of #4852, right?

Owner

Right, with the upcoming CD solver, we won't need an L-BFGS solver. Closing.

@mblondel mblondel closed this Aug 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment