Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the BFGS(Hess) Algorithm for minimization independent of target function. #5318

Closed
wants to merge 21 commits into from

Conversation

hherbol
Copy link

@hherbol hherbol commented Oct 6, 2015

This implements the BFGS(Hess) algorithm for gradient based minimization of a function independent of the target function. This is an improvement to the existing code which requires a well-defined target function for minimization through using the line-search method.

Broyden-Fletcher-Goldfarb-Shanno (BFGS) is a popular algorithm for minimization optimization; however, it is typically instituted with a line search for the step size alpha. This is not always possible (one example being when Nudged Elastic Band simulations are run in which the target function is a not well-defined hybrid of two independent calculations) and thus, the BFGS(Hess) simply assumes a changing alpha that is updated by a beta value whenever optimization oversteps the minimum.

@js850
Copy link
Contributor

js850 commented Oct 7, 2015

Reading the documentation I was a bit confused as to what was going on. If I understand correctly, this is for gradient-only minimization (when the objective function is unavailable), and instead of a line search you just multiply the stepsize by beta if f increases. But what is f if the objective function is unavailable?

Is this idea basically backtracking linesearch? https://en.wikipedia.org/wiki/Backtracking_line_search

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

Thanks for reading. I didn't know the actual term, but yes this is an application of Backtracking linesearch for the BFGS algorithm. In regards to f, I have made a new commit to allow for 'None' as a passable function if none are available. This allows users to run the function for (a) no objective functions and (b) any function that can be used to describe the objective function (but is not at a minimum when the objective function is).

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

Heads up, I realize I hastily committed. It'll throw an error during output if fval is undefined. Quickly fixing this.

@josef-pkt
Copy link
Member

General question: I'm also confused about the description and didn't try to figure out the code.

If you are not minimizing an objective function, what's the difference to the system of equation solvers or root finding functions that are in scipy.optimize?

@argriffing
Copy link
Contributor

gradient based minimization of a function independent of the target function

What does this mean?

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

@josef-pkt

I've been using Nudged Elastic Band (NEB) which is a method for minimizing a reaction pathway. This popular method calculates (1) the forces on atoms in each frame of the pathway and (2) forces on atoms due to virtual springs keeping consecutive frames from converging to each other. This allows for minimization of energy barriers and calculations of the pathways. In this situation, there is no well defined objective function; however, there is a way to check if the minimization is working. Hence the typical line-search is impractical, whereas the backtracking line-search isn't. The reason we don't use the other solvers is that the BFGS one is known to be efficient at these sorts of calculations and is typically the algorithm used.

@argriffing
I'm sorry for the confusing wording. This was to describe the fact that the function passed to this solver does not need to use the objective function (or as I had stated, target function). Instead it can take 'None' in which a constant, user-defined alpha is used as a step size, or a function to minimize on. At this point it may appear as though one could just use the regular BFGS algorithm for this, if we are putting in a function anyways. But the problem is that this function needs to be inexpensive to calculate for most line-search methods (also referring back a paragraph, the NEB spring forces are not consistent as they are different on each iteration of the calculations); whereas the backtracking method allows for expensive calculations to be 'coarsened'.

@argriffing
Copy link
Contributor

This is a small detail but you probably don't want references to angstroms in the code.

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

@argriffing I agree. The code had originally been used for NEB calculations in which they were done in Angstroms. Thanks for spotting!

@argriffing
Copy link
Contributor

I'm not completely following this... is the idea to abuse the line-search procedure of BFGS as the inner stage of a two-stage minimization?

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

@argriffing I don't quite understand what you mean by "abuse the line-search step of BFGS as the inner stage of a two-stage minimization". The normal BFGS method in scipy uses a wolfe line search; however, without a well-defined objective function you cannot use this method. The implementation of the BFGS(Hess) would normally run the BFGS method with a constant alpha; however, if running with a very small step size then it would take forever and potentially skip the minimum. The code I committed takes the BFGS and instead of using the wolfe line search, allows the user to either (a) use the BFGS(Hess) if f=None or (b) use the backtracking linesearch in which the step size is adjusted whenever f(x_k+1) > f(x_k). This has proven applicable to one situation (specifically the NEB I had previously mentioned) and I believe may be applicable to other situations.

@argriffing
Copy link
Contributor

The idea is that the line search will repeatedly evaluate a function that is somewhat related to the true objective function but that is faster to compute, instead of trying to evaluate the actual objective function? I'm just trying to see how this would be used outside of the NEB application.

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

@argriffing Yes, this is accurate. Sorry for not understanding the original question.

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

@argriffing This code is useful outside of NEB whenever there is no analytical expression for the function to be minimized, this being the case in most quantum related problems. Another time would be whenever there is inconsistent post-processing of the function after each iteration (such as addition of spring forces in the NEB, but this is generalized to whenever hybrid functions need minimization).

@pv
Copy link
Member

pv commented Oct 7, 2015

I didn't read the references yet, but some comments off the cuff:

This looks somewhat related to the broyden-based root finding routines that we have, which do use backtracking by default. If the mathematics are actually the same, it could be more appropriate to instead improve the root finders by making it possible for the user to supply a merit function for the line search there, and think more closely on the QN step length strategy and resetting (which I think is not optimal currently).

BTW, if I'm reading this right, the algorithm here doesn't maintain the quasi-newton secant condition (the BFGS solver in scipy on the other hand does by recomputing sk according to step size).

A1 = I - sk[:, numpy.newaxis] * yk[numpy.newaxis, :]
A2 = I - yk[:, numpy.newaxis] * sk[numpy.newaxis, :]
Hk = numpy.dot(A1, numpy.dot(Hk, A2)) + \
(sk[:, numpy.newaxis] * sk[numpy.newaxis, :])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this BFGS update? --- no divisions by y^T s?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I had to remove the y^T * s part (or as the original code had it):

rhok = 1.0 / (numpy.dot(yk, sk))

This was because adjusting alpha manually using the backtracking line search would cause rhok to blow up. As rhok was a scalar always multiplied by alpha and both were changing, I figured having the code adjust alpha alone would sort it out (which in the end it does). However I made a lot of adjustments since then and haven't tried adding rhok back in yet to see if the issue was caused by something else. I'll test that out now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And done. In fact, re-implementation of rhok better mimics the test function's trace from the original BFGS algorithm (as it should, being that the only difference now should just be the line-search method).

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

@pv I haven't read up much on the broyden-based root finding routines, but if it is a function to find roots and not optimize an arbitrary objective function then the BFGS_H method is not the same. In regards to the secant condition though, I think you're correct. I adjusted the algorithm to fix this (I don't know why I didn't have it this way originally).

@aarchiba
Copy link
Contributor

aarchiba commented Oct 7, 2015

So, just to be clear: this is a "minimizer" that takes something that is supposed to be the gradient of an objective function, and seeks a minimum of the objective function without ever evaluating it. Is that right?

Not every vector field is the gradient of a function; the exterior derivative has to be zero (and the domain needs to be simply connected). How does this solver behave if this constraint is violated? I'm not just talking about (say) rigid-rotation or vortex vector fields, but what about a gradient that is "jiggered"? What are the constraints on the jiggering? Obviously if the user provides sufficiently bogus input they won't get useful output, but are there any diagnostics for these pathological cases? Are infinite loops a possible failure mode?

As I understand it, this solver can also take advantage of an approximate version of the objective function to provide hints for step sizes. How approximate can this fake objective function be? What features does it need to share with the real objective function to produce a reasonable minimization?

@hherbol
Copy link
Author

hherbol commented Oct 7, 2015

@aarchiba You are correct in the first paragraph. I don't know if I can adequately answer your question about the exterior derivative not being zero though. I'm not sure which constraints should be checked and which ones are implicitly followed in regards to the exterior derivative and 'jiggering'.

When it comes to infinite loops though, now that I look back at it the k incrementer comes after the continue statement and this is a potential for one. I will add a second incrementer prior to continue to check how many times the step size is being backtracked and to break the loop if it's been too many (other codes seem to use 10 so I'll start with that).

Finally in regards to how approximate, that is all dependent on what the work is. In the case of NEB (sorry to keep using this as an example, it's just what I'm most readily able to give) there is the energies of the reaction pathway that works well. It is really dependent on the user to justify how 'approximate' they want to go and are willing to accept.

@pv
Copy link
Member

pv commented Oct 8, 2015

BFGS enforces a symmetric Jacobian approximate, so it likely requires
the field is a gradient field to work properly. This is the difference
to the broyden root finding methods.
.
Convergence properties of the algorithm are then a different question
--- the difference here from standard BFGS is in the line search, but
some properties of BFGS come from the line search. It probably loses
superlinear convergence due to not trying step length 1. The condition
rhok>0 is probably not guaranteed with backtracking. For what problems
that matters in practice is then a different question, but I would
prefer to have a verification that the algorithm works better than root
finding applied on fprime, at least (which may be the case if there are
many saddle points).

@hherbol
Copy link
Author

hherbol commented Oct 8, 2015

@pv I cannot guarantee super linear convergence right now as I need to first double check the proof given by Bryoden. Secondly, I'll try out the root finding (in particular scipy.optimize.broyden1) and get back to you on that. Finally, just a curiosity but does anyone know what happened to the build vm on travis-ci.org/scipy? One of the tests for my last push has been running for 16 hours when it normally finishes in roughly 30 min...

@hherbol
Copy link
Author

hherbol commented Oct 8, 2015

@pv I tried using the broyden1 root finding method on fprime to no avail. In practice with a well defined gradient I think it would work; however, in the NEB simulations at the very least it causes overlap of atomic coordinates within 1 frame and crashes. I'm not entirely sure where the issue arises though, but with the default 'armijo' line search it fails on the second calculation and with the 'wolfe' line search it fails on the third. Maybe having a forced 'max' x step would work to restrain it, but then one would be manually messing with step sizes again instead of using the 'true' algorithm.

Edit - More fiddling and I found that adjusting the alpha value allows the simulation to run for longer (though in the wrong direction depending on what alpha is chosen)... though I have no clue what alpha is as documentation says it is the "initial guess for the jacobian" but also just a float?

@pv
Copy link
Member

pv commented Oct 8, 2015

That's probably question of the first step length --- you can try broyden1(..., alpha=-1) to start from identity jacobian approx, similarly to BFGS. (The BFGS update may be better here compared to the quasinewton routines, as it preserves symmetry, however.) By root finding, I actually primarily meant just the default minpack hybrd solver, but I realize it will try to calculate derivatives which may not be so nice for large scale problems.

@pv
Copy link
Member

pv commented Oct 8, 2015

I don't know what went wrong with travis --- it's supposed to stop the job after 40min even if something hangs. I stopped it manually, but it's probably just some random glitch in their system, unrelated to what is done here.

@hherbol
Copy link
Author

hherbol commented Oct 8, 2015

So here's the current comparison of everything. I used the scipy.optimize.broyden1 with alpha=-1 and the two line search methods, then I used the current BFGS(Hess). Each simulation was allowed to run for 20 function calls (or gradient calls, it's the same output either way). Data is output oddly, but an example output line is:

4. Real RMS: 0.306951, 4 0.061248 +   4.2  42.1  99.8 127.7  88.5  43.9  18.4  18.9  13.0  0.0420799234741

where the first integer is the number of function calls at that time, and from 4.2 to 13.0 is the reaction pathway being minimized and the last value is the RMS force for spring + atomic forces. So far, the BFGS(Hess) is working better (but that could just be that I haven't found the right 'settings' for running the broyden1 root finder).

Running neb with optimizaiton method BFGS_root, wolfe line search
Spring Constant for NEB: 0.1837 Ha/Ang
Convergence Criteria: gtol = 1e-05 (Ha/Ang)
Run_Name = hcn_bfgs_r
---------------------------------------------
0 0.061248 +   9.0  42.2  94.2 137.1  99.2  43.0  31.7  23.1  13.0  0.0851144614847
1 0.061248 +   9.0  42.2  94.2 137.1  99.2  43.0  31.7  23.1  13.0  0.0851144611473
2 0.061248 +   7.2  44.9 110.0 247.4 247.7 123.8  85.8 118.0  13.0  0.197715775992
3 0.061248 +   7.2  44.9 110.0 247.4 247.7 123.8  85.8 118.0  13.0  0.197715773232
4 0.061248 +   8.9  42.1  94.1 136.2  98.0  42.6  31.1  22.5  13.0  0.0796671924776
5 0.061248 +   8.9  42.1  94.1 136.2  98.0  42.6  31.1  22.5  13.0  0.0796671925868
0:  |F(x)| = 0.737527; step 0.01; tol 0.788488
6 0.061248 +   8.9  42.1  94.1 136.2  98.0  42.6  31.1  22.5  13.0  0.0796671922772
7 0.061248 +   5.1  42.7  98.9 148.1 128.5  44.3  22.1  48.7  13.0  0.170055522215
8 0.061248 +   5.1  42.7  98.9 148.1 128.5  44.3  22.1  48.7  13.0  0.170055519844
9 0.061248 +   8.8  42.0  94.0 136.2  98.2  42.6  30.7  22.4  13.0  0.0799902829522
10 0.061248 +   8.8  42.0  94.0 136.2  98.2  42.6  30.7  22.4  13.0  0.079990249716
11 0.061248 +   5.1  42.7  98.9 148.1 128.5  44.3  22.1  48.7  13.0  0.170055522322
1:  |F(x)| = 1.57431; step 1; tol 0.9999
12 0.061248 +   5.1  42.7  98.9 148.1 128.5  44.3  22.1  48.7  13.0  0.170055285398
13 0.061248 +   7.0  52.8 149.4 129.2  98.1  66.4 164.6 115.8  13.0  0.294960111146
14 0.061248 +   7.0  52.8 149.4 129.2  98.1  66.4 164.6 115.8  13.0  0.294960105702
15 0.061248 +   5.5  44.2  98.4 139.4 109.1  48.7  21.5  13.8  13.0  0.0962799198868
16 0.061248 +   5.5  44.2  98.4 139.4 109.1  48.7  21.5  13.8  13.0  0.0962799832531
2:  |F(x)| = 0.891321; step 0.290788; tol 0.89982
17 0.061248 +   5.5  44.2  98.4 139.4 109.1  48.7  21.5  13.8  13.0  0.0962799205067
18 0.061248 +   9.0  49.8  99.8 137.4 104.9  52.5  30.6 126.4  13.0  0.241020827699
3:  |F(x)| = 2.23127; step 1; tol 0.9999
19 0.061248 +   9.0  49.8  99.8 137.4 104.9  52.5  30.6 126.4  13.0  0.241020813252
20 0.061248 +  12.1  52.9 126.2 150.7 146.9  44.5  97.9 148.5  13.0  0.219924872762    



Running neb with optimizaiton method BFGS_root, armijo line search
Spring Constant for NEB: 0.1837 Ha/Ang
Convergence Criteria: gtol = 1e-05 (Ha/Ang)
Run_Name = hcn_bfgs_r
---------------------------------------------
0 0.061248 +   9.0  42.2  94.2 137.1  99.2  43.0  31.7  23.1  13.0  0.0851144614847
1 0.061248 +   7.2  44.9 110.0 247.4 247.7 123.8  85.8 118.0  13.0  0.197715775992
2 0.061248 +   8.1  41.3  93.4 131.6  92.1  40.6  27.0  19.3  13.0  0.0429526474726
0:  |F(x)| = 0.397638; step 0.0926604; tol 0.229201
3 0.061248 +   5.2  42.8  99.2 143.6 114.6  46.7  19.2  50.3  13.0  0.15826365002
4 0.061248 +   7.8  41.1  93.5 131.6  92.4  40.7  26.2  18.9  13.0  0.0436202719091
5 0.061248 +   8.0  41.2  93.4 131.6  92.2  40.6  26.8  19.2  13.0  0.0430854275883
6 0.061248 +   8.0  41.3  93.4 131.6  92.1  40.6  26.8  19.2  13.0  0.0430296513982
7 0.061248 +   5.2  42.8  99.2 143.6 114.6  46.7  19.2  50.3  13.0  0.158263650019
1:  |F(x)| = 1.46514; step 1; tol 0.9999
8 0.061248 +   6.4  50.6 143.1 128.7  95.9  51.6 111.2 133.4  13.0  0.217316788233
9 0.061248 +   5.4  44.1  99.3 137.5 106.8  47.9  20.7  14.2  13.0  0.0910745601212
2:  |F(x)| = 0.843132; step 0.265183; tol 0.89982
10 0.061248 +   8.6  50.1 101.7 132.7 102.7  48.0  27.3 105.6  13.0  0.211700535224
11 0.061248 +   5.6  44.6  99.2 134.7 103.2  46.3  20.2  15.4  13.0  0.076313111426
3:  |F(x)| = 0.706476; step 0.092538; tol 0.728708
12 0.061248 +  12.8  58.1 119.0 134.2 109.2  46.4  23.4  57.5  13.0  0.117511518039
13 0.061248 +   6.5  45.5  99.0 134.9 104.5  46.1  20.4  16.8  13.0  0.0709410428237
4:  |F(x)| = 0.656744; step 0.210866; tol 0.777749
14 0.061248 +  13.7 166.8 511.1 133.7 104.7  55.1  21.3  98.5  13.0  0.539271017794
15 0.061248 +   6.6  45.5  99.0 134.7 104.3  46.2  20.4  16.8  13.0  0.0698660069855
5:  |F(x)| = 0.646792; step 0.00865269; tol 0.87293
16 0.061248 +  59.8 197.0 420.9 186.7 116.7  75.2  90.7 238.4  13.0  0.199621153464
17 0.061248 +   7.6  46.5 104.5 136.4 105.2  46.6  21.5  17.3  13.0  0.0826852012196
18 0.061248 +   6.6  45.5  99.1 134.8 104.4  46.2  20.5  16.8  13.0  0.0703089446021
19 0.061248 +   6.8  45.6  99.7 135.1 104.6  46.3  20.7  16.8  13.0  0.0721557422351
20 0.061248 +   6.6  45.5  99.0 134.7 104.4  46.2  20.4  16.8  13.0  0.0699517359781    



Running neb with optimization method BFGS(Hess)
        alpha = 0.5, beta = 0.5, H_reset = True
Spring Constant for NEB: 0.1837 Ha/Ang
Convergence Criteria: gtol = 1e-05 (Ha/Ang)
Run_Name = hcn_bfgs_r
---------------------------------------------
0 0.061248 +   9.0  42.2  94.2 137.1  99.2  43.0  31.7  23.1  13.0  0.0851144614847
1. Real RMS: 0.50797, 1 0.061248 +   5.7  40.1  95.6 158.5 130.9  58.8  31.8  36.9  13.0  0.130415395523
Resetting System as 0.219739 > 0.198316!
        alpha: 0.5 -> 0.25
        mag(sk) = 0.128029
2. Real RMS: 0.721753, 2 0.061248 +   6.8  40.3  93.2 133.8  95.9  42.6  24.1  19.9  13.0  0.0572865869999
3. Real RMS: 0.360446, 3 0.061248 +   5.3  39.6  94.4 131.5  94.3  42.6  21.0  14.8  13.0  0.0442682979428
4. Real RMS: 0.306951, 4 0.061248 +   4.2  42.1  99.8 127.7  88.5  43.9  18.4  18.9  13.0  0.0420799234741
5. Real RMS: 0.296155, 5 0.061248 +   4.2  43.9 101.6 127.6  88.3  44.7  19.7  16.2  13.0  0.0332857608363
6. Real RMS: 0.271533, 6 0.061248 +   6.8  55.5 111.3 127.1  88.2  50.0  30.8  24.2  13.0  0.0732739765896
7. Real RMS: 0.428943, 7 0.061248 +   7.7  55.8 115.3 127.3  90.0  53.8  32.3  33.7  13.0  0.0910976754086
Resetting System as 0.188594 > 0.188347!
        alpha: 0.25 -> 0.125
        mag(sk) = 0.0827921
        <yk|sk> = -0.00309248    

8. Real RMS: 0.517904, 8 0.061248 +   6.7  52.3 111.2 126.3  88.2  49.2  27.9  15.0  13.0  0.021763957864
9. Real RMS: 0.238269, 9 0.061248 +   6.6  52.4 111.1 126.1  88.5  49.5  27.1  15.0  13.0  0.0185400868059
10. Real RMS: 0.232608, 10 0.061248 +   6.5  52.9 111.1 125.9  89.4  50.6  26.8  15.4  13.0  0.0199107201377
11. Real RMS: 0.23289, 11 0.061248 +   7.6  55.5 111.9 125.6  91.6  53.4  28.2  16.3  13.0  0.0229113481883
12. Real RMS: 0.243299, 12 0.061248 + 120.6 300.7 180.6 125.3 119.0  80.0  73.4  37.7  13.0  0.123600680883
Resetting System as 0.361941 > 0.186866!
        alpha: 0.125 -> 0.0625
        mag(sk) = 0.612823
        <yk|sk> = 0.00309485    

13. Real RMS: 0.836394, 13 0.061248 +   7.5  55.3 111.9 125.5  91.6  53.4  28.3  16.3  13.0  0.0189794465275
14. Real RMS: 0.234434, 14 0.061248 +   7.5  55.4 112.0 125.5  91.7  53.7  28.5  16.4  13.0  0.0186162887824
Resetting System as 0.18677 > 0.186743!
        alpha: 0.0625 -> 0.03125
        mag(sk) = 0.00919875
        <yk|sk> = 0.000141258    

15. Real RMS: 0.233165, 15 0.061248 +   7.5  55.3 112.0 125.5  91.6  53.5  28.3  16.3  13.0  0.0181052822363
16. Real RMS: 0.232671, 16 0.061248 +   7.5  55.3 112.0 125.5  91.8  53.7  28.5  16.4  13.0  0.0180924553968
Resetting System as 0.186758 > 0.186727!
        alpha: 0.03125 -> 0.015625
        mag(sk) = 0.00837365
        <yk|sk> = 2.8658e-05    

17. Real RMS: 0.232316, 17 0.061248 +   7.5  55.3 112.0 125.5  91.6  53.5  28.3  16.3  13.0  0.0177952269343
18. Real RMS: 0.232065, 18 0.061248 +   7.5  55.3 112.0 125.5  91.7  53.6  28.5  16.4  13.0  0.0178413230141
Resetting System as 0.186745 > 0.186723!
        alpha: 0.015625 -> 0.0078125
        mag(sk) = 0.00550441
        <yk|sk> = 6.73725e-06    

19. Real RMS: 0.231982, 19 0.061248 +   7.5  55.3 112.0 125.5  91.6  53.5  28.4  16.3  13.0  0.0176627432305
20. Real RMS: 0.23181, 20 0.061248 +   7.5  55.3 112.0 125.5  91.7  53.6  28.4  16.4  13.0  0.0176977355638
Resetting System as 0.186734 > 0.186722!
        alpha: 0.0078125 -> 0.00390625
        mag(sk) = 0.00312175
        <yk|sk> = 1.64393e-06

@larsmans
Copy link
Contributor

larsmans commented Oct 8, 2015

Sheppard et al. are using L-BFGS, not BFGS.

@hherbol
Copy link
Author

hherbol commented Oct 8, 2015

@larsmans you are correct that Shepard et al. use the L-BFGS. The method should still hold true for BFGS though, with the caveat that it would only work for systems with lower memory requirements.

@hherbol
Copy link
Author

hherbol commented Oct 9, 2015

@pv Is the comparison I posted earlier good enough in showing the viability of BFGS_H?

@rgommers rgommers added enhancement A new feature or improvement scipy.optimize labels Oct 10, 2015
@js850
Copy link
Contributor

js850 commented Nov 8, 2015

A backtracking line search works well with LBFGS because most implementations of that algorithm are extremely good at selecting a good step size as well as a step direction. My experience with BFGS implementations is that they often rely much more heavily on the line search to find a good step size. In particular there is an open issue #3581 regarding the step size in fmin_bfgs.

This paper (http://pubs.acs.org/doi/abs/10.1021/ct5008718) gives benchmarks for different optimizers in both traditional optimization (energy minimization) and gradient only optimization (nudged elastic band and smallest eigenvalue search). Can you show us that this minimizer has similar performance to e.g. the "optim" or "pele" optimizers which both use LBFGS with backtracking line search? This website (http://theory.cm.utexas.edu/benchmarks/index.html) has the necessary data to run all the benchmarks in that paper. For example, minimizing the lennard jones potential (http://theory.cm.utexas.edu/benchmarks/minimization.html#lj38)

@js850
Copy link
Contributor

js850 commented Nov 8, 2015

Small correction: I spoke too quickly the smallest eigenvalue optimization is not "gradient only".

@hherbol
Copy link
Author

hherbol commented Nov 9, 2015

Thanks for the heads up. I'll take a look at benchmarking it later this
week.

On Sun, Nov 8, 2015 at 6:24 AM, Jacob Stevenson notifications@github.com
wrote:

Small correction: I spoke too quickly the smallest eigenvalue optimization
is not "gradient only".


Reply to this email directly or view it on GitHub
#5318 (comment).

Contact E-mail: hherbol@gmail.com
Contact Number: (814) 602-8286

@hherbol
Copy link
Author

hherbol commented Nov 26, 2015

Hey Jacob,

Sorry for taking so long to respond, I kind of got busy recently.

Here's the results of the benchmarking for the 1000 38 atom lennard-jones
clusters:

Pass: 1000 Fail: 0
Min: 381 Max: 5076 Avg: 796

Although it doesn't compare with the Optim or Pele lbfgs codes, it is a
method that, only taking into account the gradients, is comparable with the
FIRE optimization code. I haven't submitted the code yet to the benchmark
website though as I need to put it together in a concise manner for
submission, though I figured I should probably show the results by now.

ALSO! Just as a heads up, when trying to benchmark I realized I needed to
improve on the code itself. It did not have a max step size (causing it to
explode originally). The updates I made were:

  1. Include max step size
  2. Add a 'reset'. It can be none, or any positive integer. Say it is 10,
    for every 10 iterations that no backtracking occurs if (a) alpha is less
    than the original alpha, reset it (and the inv. Hess) to the original
    values (in the case of inv. Hess, set it to Identity) or (b) if equal to
    alpha, start increasing the step size by dividing by beta instead. After
    testing, this leads to faster convergance (note, this whole 'reset' thing
    can always be turned off by saying 'None').
  3. I also included the possibility of using the armijo linesearch method
    instead, but found that it doesn't seem to ever word (maybe I messed up in
    the implementation?).

Either way, the above results are for the new, updated, version. I'm
currently comparing results of NEB to see, given a gtol, if FIRE,
BFGS(Hess), Quick Min or Steepest Descent is best.

Henry

On Sun, Nov 8, 2015 at 10:09 PM, Henry Herbol hherbol@gmail.com wrote:

Thanks for the heads up. I'll take a look at benchmarking it later this
week.

On Sun, Nov 8, 2015 at 6:24 AM, Jacob Stevenson notifications@github.com
wrote:

Small correction: I spoke too quickly the smallest eigenvalue
optimization is not "gradient only".


Reply to this email directly or view it on GitHub
#5318 (comment).

Contact E-mail: hherbol@gmail.com
Contact Number: (814) 602-8286

Contact E-mail: hherbol@gmail.com
Contact Number: (814) 602-8286

@rgommers rgommers deleted the branch scipy:master January 3, 2022 16:32
@rgommers rgommers closed this Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants