New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evaluate various changes to L-BFGS optimizer #370
Comments
This looks good; a relative termination condition is a good improvement, in my opinion. So for the sake of understanding, it seems to me that based on what you've said, L_BFGS gets stuck in some kind of valley with small gradient (but not sufficiently small to terminate), and "walks" down this valley to the R=0 saddle point despite the fact that the objective function improvement at each iteration is very small. This would imply that either increasing the gradient norm tolerance, or adding a relative objective function improvement termination criterion (wow that's a long set of strung-together nouns), would allow LRSDP to avoid that saddle point. Just one very minor question before I merge; where does the Your observations with LRSDP and mine in the past seem to further suggest that it's difficult to make LRSDP converge, and we may not be easily able to provide an LRSDP implementation that always converges. |
It's actually a bit subtle, b/c this diff tries to fix two problems. Let me try to explain. The optimization proceeds as follows for lovasz theta. The augmented lagrangian method starts at small values of Here's where the current mlpack (pre-merged) implementation differs from the scipy one. In the mlpack one, since X is approximately 0, grad F is also approximately 0 for the starting point of L-BFGS, even as we increase sigma. However, the current implementation was written so that if grad F is approx zero, it won't even run a single iteration of L-BFGS, it just exits immediately. Hence, we now check if Now, I also added a relative function value termination criterion to address your other concerns that the optimization was indeed slow. Indeed, it turns out as
To address your observation that LRSDP does not converge generically and needs a lot of hand-holding. YES! I mean even if you read the original paper, their experimental section is full of text saying they had to do all these hacks with initialization points and penalty schedules to get the SDPs to converge. In fact, if you email sam burer asking about these heuristics, he will tell you he can't even quite piece them together unless he spends quite a bit of time looking at his code. I think this is just an unfortunate property of the algorithm, and we don't know how to make it generically robust (if we did, that would be paper worthy itself). Hence, I think the best we can do is to offer a few choices of solvers (hence why I'm working slowly on this interior point solver, which will help for small/medium sized problem instances). Another thing we might want to think about is this: LRSDP doesn't provide any dual certificates, so we can't say anything about optimality. One cool thing we could do towards this, I think, is actually run (in parallel) a dual ascent solver, so at least you could certify that the duality gap is below a specific tolerance. Then, I could sort of grid search LRSDP from different starting points w/ different penalty parameters, until my duality gap is small enough. |
Okay; I understand the changes you've made then. Thanks for the explanation. One thing I overlooked, though---you've split out the line search termination conditions in order to give better failure messages; but on those failures, it no longer returns false, it just breaks. Is there a particular reason for this? I think that this could cause
Hah, I have one of those emails too! :) He suggested that he avoided the R=0 problem by using the determinant of R^T R, and linked to a Mathematical Programming paper titled "Local Minima and Convergence in Low-Rank Semidefinite Programming": and also suggested a (slightly) more recent paper: It has been a very long time since I have done so much as open those papers, though... I do think a more robust LRSDP would definitely be paperworthy and of high interest. It would also allow mlpack to provide faster implementations of SDP-based algorithms that actually work. (This is better than the old rejected code that was fast but didn't work...) A dual solver would definitely help us know when a restart was necessary, but I don't know if this would cause the entire algorithm to be slower overall than just a regular SDP solver. Either way, that is certainly an interesting avenue to investigate. |
I think I actually needed I don't think, however, you should change |
Also thanks for the pointer to the more recent paper on necessary and sufficient conditions. It's in my (ever growing) reading queue now. |
Did we ever come to a consensus on this? I can't remember |
I sat down and I ran some tests, and the L-BFGS implementation is not any slower than before (maybe a little faster? Probably about the same), but the modifications also cause LRSDP to converge on the test cases an order of magnitude faster. I pulled from your |
This is a continuation of the thread of discussion in #3.
The following diff stephentu/mlpack@mlpack:10d90d0...stephentu:8af98d5 allows the lovasz theta SDP test case to pass without removing the last edge that was causing the problem.
Essentially, the pattern of optimization looks like:
In my diff above, I added some logic to the mlpack optimizer to help guide it out of the saddle point. I'm not submitting this as a pull request yet because I'm not sure it is the right thing to do-- I'll defer that to Ryan who implemented L-BFGS in the first place.
@rcurtin: To respond to your desire to keep out the external fortran L-BFGS-B dependency. That's fine by me if you feel very strongly about it (which you clearly do 😄). But, I will point out that I was pushing for L-BFGS-B not because I think the specifics of the modified variant of the algorithm (versus what you have implemented) are necessary to get the SDPs to converge, but instead I was just looking for a fast, reliable black box with a self contained implementation which we could easily import. In other words, if you want to invest time in improving an in house optimization routine, I think it's better to go off of the existing implementation.
Anyways, this patch should be enough for you to decide how to proceed with the optimizer.
I'm going to turn my attention to writing a few more SDP test cases, and then building a higher level wrapper for matrix completion problems.
The text was updated successfully, but these errors were encountered: