Skip to content

Latest commit

 

History

History
124 lines (82 loc) · 5.66 KB

line_search.rst

File metadata and controls

124 lines (82 loc) · 5.66 KB

Line Search Methods

In line search methods, each iteration is given by xk + 1 = xk + αkpk, where pk is the search direction and αk is the step length.

The search direction is often of the form pk =  − Bk − 1fk where Bk is a symmetric and non-singular matrix. The form of pk is dependent on algorithm choice.

The ideal step length would be minα > 0f(xk + αpk) but this is generally too expensive to calculate. Instead an inexact line search condition such as the Wolfe Conditions can be used:

$$\begin{aligned} f(x_k + \alpha p_k) \leq f(x_k) + c_1 \alpha \nabla f_k^T p_k \\\ f(x_k + \alpha_k p_k)^T p_k \geq c_2 \nabla f_k^T p_k \end{aligned}$$

With 0 < c1 < c2 < 1. Here, the first condition ensures that αk gives a sufficient decrease in f, whilst the second condition rules out unacceptably short steps. [Nocedal]

Steepest Descent (steepest_descent)

Simple method where search direction pk is set to be  − ∇fk, i.e. the direction along which f decreases most rapidly.

Advantages:
  • Low storage requirements
  • Easy to compute
Disadvantages:
  • Slow convergence for nonlinear problems

[Nocedal]

Conjugate Gradient (conjugate_gradient)

Conjugate Gradient methods have a faster convergence rate than Steepest Descent but avoid the high computational cost of methods where the inverse Hessian is calculated.

Given an iterate x0, evaluate f0 = f(x0), ∇f0 = ∇f(x0).

Set p0 ←  − ∇f0, k ← 0

Then while fk ≠ 0:

Carry out a line search to compute the next iterate, then evaluate fk + 1 and use this to determine the subsequent conjugate direction pk + 1 =  − ∇f(xk + 1) + βkpk

Different variations of the Conjugate Gradient algorithm use different formulas for βk, for example:

Fletcher-Reeves: $\beta_{k+1} = \frac{f_{k+1}^T \nabla f_{k+1}}{\nabla f_k^T \nabla f_k}$ Polak-Ribiere: $\beta_{k+1} = \frac{ \nabla f_{k+1}^T ( \nabla f_{k+1} - \nabla f_k)}{\|\nabla f_k\|^2}$

[Nocedal] [Poczos]

Advantages:
  • Considered to be one of the best general purpose methods.
  • Faster convergence rate compared to Steepest Descent and only requires evaluation of objective function and it's gradient - no matrix operations.
Disadvantages:
  • For Fletcher-Reeves method it can be shown that if the method generates a bad direction and step, then the next direction and step are also likely to be bad. However, this is not the case with the Polak Ribiere method.
  • Generally, the Polak Ribiere method is more efficient that the Fletcher-Reeves method but it has the disadvantage is requiring one more vector of storage.

[Nocedal]

BFGS (bfgs)

Most popular quasi-Newton method, which uses an approximate Hessian rather than the true Hessian which is used in a Newton line search method.

Starting with an initial Hessian approximation H0 and starting point x0:

While ∥∇fk∥ > ϵ:

Compute the search direction pk =  − Hkfk

Then find next iterate xk + 1 by performing a line search.

Next, define sk = xk + 1 − xk and yk = ∇fk + 1 − ∇fk, then compute


Hk + 1 = (I − ρkskykT)Hk(I − ρkyksKT) + ρkskskT

with $\rho_k = \frac{1}{y_k^T s_k}$

Advantages:
  • Superlinear rate of convergence
  • Has self-correcting properties - if there is a bad estimate for Hk, then it will tend to correct itself within a few iterations.
  • No need to compute the Jacobian or Hessian.
Disadvantages:
  • Newton's method has quadratic convergence but this is lost with BFGS.

[Nocedal]

Gauss Newton (gauss_newton)

Modified Newton's method with line search. Instead of solving standard Newton equations


2f(xk)p =  − ∇f(xk),

solve the system


JkTJkpkGN =  − JkTrk

(where Jk is the Jacobian) to obtain the search direction pkGN. The next iterate is then set as xk + 1 = xk + pkGN.

Here, the approximation of the Hessian 2fk ≈ JkTJk has been made, which helps to save on computation time as second derivatives are not calculated.

Advantages:
  • Calculation of second derivatives is not required.
  • If residuals or their second order partial derivatives are small, then JkTJk is a close approximation to 2fk and convergence of Gauss-Newton is fast.
  • The search direction pJGN is always a descent direction as long as Jk has full rank and the gradient fk is nonzero.
Disadvantages:
  • Without a good initial guess, or if the matrix JkTJk is ill-conditioned, the Gauss Newton Algorithm is very slow to converge to a solution.
  • If relative residuals are large, then large amounts of information will be lost.
  • Jk must be full rank.

[Nocedal] [Floater]

Floater

Michael S. Floater (2018), Lecture 13: Non-linear least squares and the Gauss-Newton method, University of Oslo

Nocedal

Jorge Nocedal, Stephen J. Wright (2006), Numerical Optimization

Poczos

Barnabas Poczos, Ryan Tibshirani (2012), Lecture 10: Optimization, School of Computer Science, Carnegie Mellon University