# BFGS

## Description
The Broyden-Fletcher-Goldfarb-Shanno aka BFGS method is a second-order iterative optimization method.

### Quasi-Newton Methods
The BFGS method is referred to as Quasi-Newton in reference to the fact that unlike Newton's method which uses an explicit Hessian matrix, these methods approximate the Hessian. 

$$
x_{k+1} = x_k - \alpha_k B_k^{-1} \nabla f(x_k)
$$

where
* $B_k$ is approxmation to Hessian
* $\alpha_k$ is obtained from line search

Secant updating methods have superlinear convergence ($1 < r < 2$).
* Slower to converge than Newton's method, but cost-per-iteration is less.

### BFGS Algorithm
1. Start with some initial guess $x_0$ and approximate Hessian $B_0 = I$.
2. Solve $B_k s_k = -\nabla f(x_k)$ for $s_k$.
3. Compute $x_{k+1} = x_k + s_k$.
4. Compute the difference in gradients $y_k = \nabla f(x_{k+1}) - \nabla f(x_k)$.
5. Update approximate Hessian.
$$
B_{k+1} = B_k + \frac{y_k y_k^T}{y_k^T s_k} - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k}
$$
6. Repeat from step 2 until some stopping criteria is reached.

Alternate: Replace $B_k$ update with factorization to reduce $O(n^3)$ work to $O(n^2)$.

### References
> Michael T. Heath. 2018. Scientific Computing: An Introductory Survey, Revised Second Edition. SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.

In [1]:
# The numpy interface of autograd wraps all numpy ops with autodiff.
import autograd.numpy as np

from autograd import grad

import scipy.optimize as opt

## BFGS Method

In [2]:
def bfgs(fx, gradfx, x0, tol, maxiter):
    """
    bfgs returns the point xk where fx is minimum

    Parameters
    ----------
    fx : function
        function to minimize
    gradfx : function
        gradient of function to minimize
    x0 : numpy.ndarray
        initial guess for xk
    tol : float
        convergence threshold
    maxiter : int
        maximum number of iterations

    Returns
    -------
    numpy.ndarray
        vector xk where fx is minimum
    numpy.ndarray
        position and value history
        [[x0, fx(x0), gradfx(x0)],
         [x1, fx(x1), gradfx(x1)],...]
    """

    xk, gradfxk, Bk = x0, gradfx(x0), np.eye(x0.size)

    # Save current and minimum position and value to history.
    steps = np.zeros((maxiter, (x0.size*2)+1))
    steps[0,:] = np.hstack((x0, fx(x0), gradfxk))

    # Repeat up to maximum number of iterations.
    for k in range(1,maxiter):

        # Stop iteration when gradient is near zero.
        if np.linalg.norm(gradfxk) < tol:
            steps = steps[:-(maxiter-k),:]
            break

        # Solve Bk*sk = -grad(xk) for sk.
        sk = np.linalg.solve(Bk, -1. * gradfxk)

        # Update xk and evaluate gradient at new value of xk.
        xk = xk + sk
        gradfxk1 = gradfx(xk)

        # Compute difference in gradients.
        yk = gradfxk1 - gradfxk

        # Update approximate Hessian.
        term1 = np.outer(yk, yk.T) / np.dot(yk.T, sk)
        term2a = np.dot(np.dot(Bk, np.outer(sk, sk.T)), Bk)
        term2b = np.dot(np.dot(sk.T, Bk), sk)
        Bk = Bk + term1 - (term2a / term2b)

        # Update the gradient at xk.
        gradfxk = gradfxk1

        # Save iteration history.
        steps[k,:] = np.hstack((xk, fx(xk), gradfxk))

    return xk, steps

## Test Function: Rosenbrock Function

In [3]:
def rosenbrock(x):
    """
    rosenbrock evaluates Rosenbrock function at vector x

    Parameters
    ----------
    x : array
        x is a D-dimensional vector, [x1, x2, ..., xD]

    Returns
    -------
    float
        scalar result
    """
    D = len(x)
    i, iplus1 = np.arange(0,D-1), np.arange(1,D)
    return np.sum(100*(x[iplus1] - x[i]**2)**2 + (1-x[i])**2)

## Solution to Rosenbrock Function

In [4]:
fx, gradfx = rosenbrock, grad(rosenbrock)
x0, tol, maxiter = np.array([-1.,-1.]), 1e-6, 20000
xk, steps = bfgs(fx, gradfx, x0, tol, maxiter)

print("x0               :", x0)
print("rosenbrock f(w0) :", rosenbrock(x0))
print("----------------------------------")
print("xk               :", xk)
print("rosenbrock f(xk) :", rosenbrock(xk))
print("nsteps           :", len(steps))
print("norm(gradfx)     :", np.linalg.norm(steps[-1,3:]))

x0               : [-1. -1.]
rosenbrock f(w0) : 404.0
----------------------------------
xk               : [0.99999997 0.99999994]
rosenbrock f(xk) : 1.3256730840753958e-15
nsteps           : 123
norm(gradfx)     : 9.865687600739896e-07


## Test Function: Goldstein-Price Function

In [5]:
def goldstein_price(x):
    """
    goldstein_price evaluates Goldstein-Price function at vector x

    Parameters
    ----------
    x : array
        x is a 2-dimensional vector, [x1, x2]

    Returns
    -------
    float
        scalar result
    """
    a = (x[0] + x[1] + 1)**2
    b = 19 - 14*x[0] + 3*x[0]**2 - 14*x[1] + 6*x[0]*x[1] + 3*x[1]**2
    c = (2*x[0] - 3*x[1])**2
    d = 18 - 32*x[0] + 12*x[0]**2 + 48*x[1] - 36*x[0]*x[1] + 27*x[1]**2
    return (1. + a*b) * (30. + c*d)

## Solution to Goldstein-Price Function

In [6]:
fx, gradfx = goldstein_price, grad(goldstein_price)
x0, tol, maxiter = np.array([-1.0,-1.5]), 1e-15, 20000
#xk, steps = bfgs(fx, gradfx, x0, tol, maxiter)

# NOTE(mmorais): bfgs is failing on line search, use scipy instead.
def _bfgs(fx, gradfx, x0, tol, maxiter=None):
    res = opt.minimize(fx, x0, method='BFGS', jac=gradfx, tol=tol)
    # Copy OptimizeResult to equivalent returned from bfgs.
    xk = res.x
    # Save current and minimum position and value to history.
    steps = np.zeros((res.nit, (x0.size*2)+1))
    steps[-1,:] = np.hstack((res.x, res.fun, res.jac))
    return xk, steps

xk, steps = _bfgs(fx, gradfx, x0, tol, maxiter)

print("x0                    :", x0)
print("goldstein_price f(w0) :", goldstein_price(x0))
print("----------------------------------")
print("xk                    :", xk)
print("goldstein_price f(xk) :", goldstein_price(xk))
print("nsteps                :", len(steps))
print("norm(gradfx)          :", np.linalg.norm(steps[-1,3:]))

x0                    : [-1.  -1.5]
goldstein_price f(w0) : 1595.41015625
----------------------------------
xk                    : [ 1.45153988e-14 -1.00000000e+00]
goldstein_price f(xk) : 2.999999999999943
nsteps                : 23
norm(gradfx)          : 1.1705318380849725e-10
