New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the BFGS(Hess) Algorithm for minimization independent of target function. #5318
Conversation
Reading the documentation I was a bit confused as to what was going on. If I understand correctly, this is for gradient-only minimization (when the objective function is unavailable), and instead of a line search you just multiply the stepsize by beta if Is this idea basically backtracking linesearch? https://en.wikipedia.org/wiki/Backtracking_line_search |
Thanks for reading. I didn't know the actual term, but yes this is an application of Backtracking linesearch for the BFGS algorithm. In regards to f, I have made a new commit to allow for 'None' as a passable function if none are available. This allows users to run the function for (a) no objective functions and (b) any function that can be used to describe the objective function (but is not at a minimum when the objective function is). |
Heads up, I realize I hastily committed. It'll throw an error during output if fval is undefined. Quickly fixing this. |
General question: I'm also confused about the description and didn't try to figure out the code. If you are not minimizing an objective function, what's the difference to the system of equation solvers or root finding functions that are in scipy.optimize? |
What does this mean? |
I've been using Nudged Elastic Band (NEB) which is a method for minimizing a reaction pathway. This popular method calculates (1) the forces on atoms in each frame of the pathway and (2) forces on atoms due to virtual springs keeping consecutive frames from converging to each other. This allows for minimization of energy barriers and calculations of the pathways. In this situation, there is no well defined objective function; however, there is a way to check if the minimization is working. Hence the typical line-search is impractical, whereas the backtracking line-search isn't. The reason we don't use the other solvers is that the BFGS one is known to be efficient at these sorts of calculations and is typically the algorithm used. @argriffing |
This is a small detail but you probably don't want references to angstroms in the code. |
@argriffing I agree. The code had originally been used for NEB calculations in which they were done in Angstroms. Thanks for spotting! |
I'm not completely following this... is the idea to abuse the line-search procedure of BFGS as the inner stage of a two-stage minimization? |
@argriffing I don't quite understand what you mean by "abuse the line-search step of BFGS as the inner stage of a two-stage minimization". The normal BFGS method in scipy uses a wolfe line search; however, without a well-defined objective function you cannot use this method. The implementation of the BFGS(Hess) would normally run the BFGS method with a constant alpha; however, if running with a very small step size then it would take forever and potentially skip the minimum. The code I committed takes the BFGS and instead of using the wolfe line search, allows the user to either (a) use the BFGS(Hess) if f=None or (b) use the backtracking linesearch in which the step size is adjusted whenever f(x_k+1) > f(x_k). This has proven applicable to one situation (specifically the NEB I had previously mentioned) and I believe may be applicable to other situations. |
The idea is that the line search will repeatedly evaluate a function that is somewhat related to the true objective function but that is faster to compute, instead of trying to evaluate the actual objective function? I'm just trying to see how this would be used outside of the NEB application. |
@argriffing Yes, this is accurate. Sorry for not understanding the original question. |
@argriffing This code is useful outside of NEB whenever there is no analytical expression for the function to be minimized, this being the case in most quantum related problems. Another time would be whenever there is inconsistent post-processing of the function after each iteration (such as addition of spring forces in the NEB, but this is generalized to whenever hybrid functions need minimization). |
I didn't read the references yet, but some comments off the cuff: This looks somewhat related to the broyden-based root finding routines that we have, which do use backtracking by default. If the mathematics are actually the same, it could be more appropriate to instead improve the root finders by making it possible for the user to supply a merit function for the line search there, and think more closely on the QN step length strategy and resetting (which I think is not optimal currently). BTW, if I'm reading this right, the algorithm here doesn't maintain the quasi-newton secant condition (the BFGS solver in scipy on the other hand does by recomputing |
A1 = I - sk[:, numpy.newaxis] * yk[numpy.newaxis, :] | ||
A2 = I - yk[:, numpy.newaxis] * sk[numpy.newaxis, :] | ||
Hk = numpy.dot(A1, numpy.dot(Hk, A2)) + \ | ||
(sk[:, numpy.newaxis] * sk[numpy.newaxis, :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this BFGS update? --- no divisions by y^T s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I had to remove the y^T * s part (or as the original code had it):
rhok = 1.0 / (numpy.dot(yk, sk))
This was because adjusting alpha manually using the backtracking line search would cause rhok to blow up. As rhok was a scalar always multiplied by alpha and both were changing, I figured having the code adjust alpha alone would sort it out (which in the end it does). However I made a lot of adjustments since then and haven't tried adding rhok back in yet to see if the issue was caused by something else. I'll test that out now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And done. In fact, re-implementation of rhok better mimics the test function's trace from the original BFGS algorithm (as it should, being that the only difference now should just be the line-search method).
@pv I haven't read up much on the broyden-based root finding routines, but if it is a function to find roots and not optimize an arbitrary objective function then the BFGS_H method is not the same. In regards to the secant condition though, I think you're correct. I adjusted the algorithm to fix this (I don't know why I didn't have it this way originally). |
So, just to be clear: this is a "minimizer" that takes something that is supposed to be the gradient of an objective function, and seeks a minimum of the objective function without ever evaluating it. Is that right? Not every vector field is the gradient of a function; the exterior derivative has to be zero (and the domain needs to be simply connected). How does this solver behave if this constraint is violated? I'm not just talking about (say) rigid-rotation or vortex vector fields, but what about a gradient that is "jiggered"? What are the constraints on the jiggering? Obviously if the user provides sufficiently bogus input they won't get useful output, but are there any diagnostics for these pathological cases? Are infinite loops a possible failure mode? As I understand it, this solver can also take advantage of an approximate version of the objective function to provide hints for step sizes. How approximate can this fake objective function be? What features does it need to share with the real objective function to produce a reasonable minimization? |
@aarchiba You are correct in the first paragraph. I don't know if I can adequately answer your question about the exterior derivative not being zero though. I'm not sure which constraints should be checked and which ones are implicitly followed in regards to the exterior derivative and 'jiggering'. When it comes to infinite loops though, now that I look back at it the k incrementer comes after the continue statement and this is a potential for one. I will add a second incrementer prior to continue to check how many times the step size is being backtracked and to break the loop if it's been too many (other codes seem to use 10 so I'll start with that). Finally in regards to how approximate, that is all dependent on what the work is. In the case of NEB (sorry to keep using this as an example, it's just what I'm most readily able to give) there is the energies of the reaction pathway that works well. It is really dependent on the user to justify how 'approximate' they want to go and are willing to accept. |
BFGS enforces a symmetric Jacobian approximate, so it likely requires |
@pv I cannot guarantee super linear convergence right now as I need to first double check the proof given by Bryoden. Secondly, I'll try out the root finding (in particular scipy.optimize.broyden1) and get back to you on that. Finally, just a curiosity but does anyone know what happened to the build vm on travis-ci.org/scipy? One of the tests for my last push has been running for 16 hours when it normally finishes in roughly 30 min... |
@pv I tried using the broyden1 root finding method on fprime to no avail. In practice with a well defined gradient I think it would work; however, in the NEB simulations at the very least it causes overlap of atomic coordinates within 1 frame and crashes. I'm not entirely sure where the issue arises though, but with the default 'armijo' line search it fails on the second calculation and with the 'wolfe' line search it fails on the third. Maybe having a forced 'max' x step would work to restrain it, but then one would be manually messing with step sizes again instead of using the 'true' algorithm. Edit - More fiddling and I found that adjusting the alpha value allows the simulation to run for longer (though in the wrong direction depending on what alpha is chosen)... though I have no clue what alpha is as documentation says it is the "initial guess for the jacobian" but also just a float? |
That's probably question of the first step length --- you can try |
I don't know what went wrong with travis --- it's supposed to stop the job after 40min even if something hangs. I stopped it manually, but it's probably just some random glitch in their system, unrelated to what is done here. |
So here's the current comparison of everything. I used the scipy.optimize.broyden1 with alpha=-1 and the two line search methods, then I used the current BFGS(Hess). Each simulation was allowed to run for 20 function calls (or gradient calls, it's the same output either way). Data is output oddly, but an example output line is:
where the first integer is the number of function calls at that time, and from 4.2 to 13.0 is the reaction pathway being minimized and the last value is the RMS force for spring + atomic forces. So far, the BFGS(Hess) is working better (but that could just be that I haven't found the right 'settings' for running the broyden1 root finder).
|
Sheppard et al. are using L-BFGS, not BFGS. |
@larsmans you are correct that Shepard et al. use the L-BFGS. The method should still hold true for BFGS though, with the caveat that it would only work for systems with lower memory requirements. |
@pv Is the comparison I posted earlier good enough in showing the viability of BFGS_H? |
A backtracking line search works well with LBFGS because most implementations of that algorithm are extremely good at selecting a good step size as well as a step direction. My experience with BFGS implementations is that they often rely much more heavily on the line search to find a good step size. In particular there is an open issue #3581 regarding the step size in fmin_bfgs. This paper (http://pubs.acs.org/doi/abs/10.1021/ct5008718) gives benchmarks for different optimizers in both traditional optimization (energy minimization) and gradient only optimization (nudged elastic band and smallest eigenvalue search). Can you show us that this minimizer has similar performance to e.g. the "optim" or "pele" optimizers which both use LBFGS with backtracking line search? This website (http://theory.cm.utexas.edu/benchmarks/index.html) has the necessary data to run all the benchmarks in that paper. For example, minimizing the lennard jones potential (http://theory.cm.utexas.edu/benchmarks/minimization.html#lj38) |
Small correction: I spoke too quickly the smallest eigenvalue optimization is not "gradient only". |
Thanks for the heads up. I'll take a look at benchmarking it later this On Sun, Nov 8, 2015 at 6:24 AM, Jacob Stevenson notifications@github.com
Contact E-mail: hherbol@gmail.com |
Hey Jacob,
Here's the results of the benchmarking for the 1000 38 atom lennard-jones Pass: 1000 Fail: 0 Although it doesn't compare with the Optim or Pele lbfgs codes, it is a ALSO! Just as a heads up, when trying to benchmark I realized I needed to
Either way, the above results are for the new, updated, version. I'm Henry On Sun, Nov 8, 2015 at 10:09 PM, Henry Herbol hherbol@gmail.com wrote:
Contact E-mail: hherbol@gmail.com |
This implements the BFGS(Hess) algorithm for gradient based minimization of a function independent of the target function. This is an improvement to the existing code which requires a well-defined target function for minimization through using the line-search method.
Broyden-Fletcher-Goldfarb-Shanno (BFGS) is a popular algorithm for minimization optimization; however, it is typically instituted with a line search for the step size alpha. This is not always possible (one example being when Nudged Elastic Band simulations are run in which the target function is a not well-defined hybrid of two independent calculations) and thus, the BFGS(Hess) simply assumes a changing alpha that is updated by a beta value whenever optimization oversteps the minimum.