# Gradient Descent

## Description
The Gradient Descent method is a first-order iterative optimization algorithm for finding the local minimum of a differentiable function.

### Gradient Descent Algorithm
1. Start with some initial guess $x_0$ and learning rate $\alpha$.
2. Update $x_k$ in the direction of negative gradient $x_k = x_{k-1} - \alpha \nabla f(x_{k-1})$.
3. Evaluate the gradient at the new minimum $\nabla f(x_k)$
4. Repeat from step 2 until $\nabla f(x_k) \approx 0$

### References
> Mykel J. Kochenderfer and Tim A. Wheeler. 2019. Algorithms for Optimization. The MIT Press.

In [1]:
# The numpy interface of autograd wraps all numpy ops with autodiff.
import autograd.numpy as np

from autograd import grad
import matplotlib.pyplot as plt

%matplotlib inline

## Gradient Descent Method

In [2]:
def gradient_descent(fx, gradfx, x0, alpha, tol, maxiter=None):
    """
    gradient_descent returns the point xk where fx is minimum

    Parameters
    ----------
    fx : function
        function to minimize
    gradfx : function
        gradient of function to minimize
    x0 : numpy.ndarray
        initial guess for xk
    alpha : float
        learning rate
    tol : float
        convergence threshold
    maxiter : int
        maximum number of iterations

    Returns
    -------
    numpy.ndarray
        point xk where fx is minimum
    list
        list containing points [(x0, fx(x0), gradfx(x0)), ...])
    """

    xk, fxk, gradfxk = x0, fx(x0), gradfx(x0)
    steps = [(x0, fxk, gradfxk)]
    
    # Stop iteration when gradient is near zero.
    while np.linalg.norm(gradfxk) > tol:

        # Update xk based on product of learning rate and gradient.
        xk = xk - alpha * gradfxk

        # Evaluate gradient at new value of xk.
        gradfxk = gradfx(xk)

        # Evaluate the function at new value of xk.
        fxk = fx(xk)

        # Append (xk, fxk, gradfxk) to iteration history.
        steps.append((xk, fxk, gradfxk))

        # Check early termination criteria.
        if maxiter is not None and len(steps) == maxiter:
            break

    return xk, steps

## Test Function: Rosenbrock Function

In [3]:
def rosenbrock(x):
    """
    rosenbrock evaluates Rosenbrock function at vector x

    Parameters
    ----------
    x : array
        x is a D-dimensional vector, [x1, x2, ..., xD]

    Returns
    -------
    float
        scalar result
    """
    D = len(x)
    i, iplus1 = np.arange(0,D-1), np.arange(1,D)
    return np.sum(100*(x[iplus1] - x[i]**2)**2 + (1-x[i])**2)

## Solution to Rosenbrock Function

In [4]:
fx, gradfx = rosenbrock, grad(rosenbrock)
x0, alpha, tol, maxiter = np.array([-1.,-1.]), 1e-3, 1e-2, 20000
xk, steps = gradient_descent(fx, gradfx, x0, alpha, tol, maxiter)

print("x0               :", x0)
print("rosenbrock f(w0) :", rosenbrock(x0))
print("----------------------------------")
print("xk               :", xk)
print("rosenbrock f(xk) :", rosenbrock(xk))
print("nsteps           :", len(steps))
print("norm(gradfx)     :", np.linalg.norm(steps[-1][2]))

x0               : [-1. -1.]
rosenbrock f(w0) : 404.0
----------------------------------
xk               : [0.98892181 0.97792171]
rosenbrock f(xk) : 0.0001229255320492028
nsteps           : 8345
norm(gradfx)     : 0.009997142295548968


## Test Function: Goldstein-Price Function

In [5]:
def goldstein_price(x):
    """
    goldstein_price evaluates Goldstein-Price function at vector x

    Parameters
    ----------
    x : array
        x is a 2-dimensional vector, [x1, x2]

    Returns
    -------
    float
        scalar result
    """
    a = (x[0] + x[1] + 1)**2
    b = 19 - 14*x[0] + 3*x[0]**2 - 14*x[1] + 6*x[0]*x[1] + 3*x[1]**2
    c = (2*x[0] - 3*x[1])**2
    d = 18 - 32*x[0] + 12*x[0]**2 + 48*x[1] - 36*x[0]*x[1] + 27*x[1]**2
    return (1. + a*b) * (30. + c*d)

## Solution to Goldstein-Price Function

In [6]:
fx, gradfx = goldstein_price, grad(goldstein_price)
x0, alpha, tol, maxiter = np.array([-1.0,-1.5]), 1e-5, 1e-2, 20000
xk, steps = gradient_descent(fx, gradfx, x0, alpha, tol, maxiter)

print("x0                    :", x0)
print("goldstein_price f(w0) :", goldstein_price(x0))
print("----------------------------------")
print("xk                    :", xk)
print("goldstein_price f(xk) :", goldstein_price(xk))
print("nsteps                :", len(steps))
print("norm(gradfx)          :", np.linalg.norm(steps[-1][2]))

x0                    : [-1.  -1.5]
goldstein_price f(w0) : 1595.41015625
----------------------------------
xk                    : [-2.23924172e-05 -1.00001049e+00]
goldstein_price f(xk) : 3.0000001231481543
nsteps                : 2419
norm(gradfx)          : 0.009960882484498525
