[**OPEN IN COLAB**](https://colab.research.google.com/drive/1tVXUROGgvxgcwfcV1E1eRt0fZuPihkLi)

# **Lab 7: Optimization**
**Edvin von Platen**

# **Abstract**
In this lab we implement and evaluate the following two methods for optimization in $R^n$:

1. Gradient Descent
2. Newton's Method

The implementation of the two methods appear to be sound and they perfom quite well, especially Newton's Method.

#**About the code**

In [0]:
"""This program is a template for lab reports in the course"""
"""DD2363 Methods in Scientific Computing, """
"""KTH Royal Institute of Technology, Stockholm, Sweden."""

# Copyright (C) 2020 Edvin von Platen

# This file is part of the course DD2363 Methods in Scientific Computing
# KTH Royal Institute of Technology, Stockholm, Sweden
#
# This is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# This template is maintained by Johan Hoffman
# Please report problems to jhoffman@kth.se

'KTH Royal Institute of Technology, Stockholm, Sweden.'

# **Set up environment**

To have access to the neccessary modules you have to run this cell. If you need additional modules, this is where you add them. 

In [0]:
# Load neccessary modules.
from google.colab import files

import time
import numpy as np

from matplotlib import pyplot as plt
from matplotlib import tri
from matplotlib import axes
from mpl_toolkits.mplot3d import Axes3D

# **Introduction**

We implement and evaluate the following two algorithms for optimization in $R^n$:

1. Gradient Descent
2. Newton's Method

All implementations and mathematical conccepts presented in this report are based on the lecture notes from the course [DD2363 Methods in Scientific Computing](https://kth.instructure.com/courses/17068). 

# **Methods**

### **Gradient Descent in $R^n$**
Gradient descent is an iterative algorithm for finding a minumum of an object function $f(x)$. The idea is to search for the minima in the opposite direction of the gradient. We implement Algorithm 15.1 with step length $\alpha^{(k)}$ satisfying,
$$
f(x^{(k)} - \alpha^{(k)}\nabla f(x^{(k)})) \leq \beta f(x^{(k)}),
$$
with $0 < \beta < 1$ as a parameter. For the stopping criterion we use,
$$
\Vert \nabla f(x^{(k)}) \Vert < TOL,
$$
A finite difference approximation of the gradient is computed in each step.

In [0]:
# Finite difference approximation
def compute_gradient(f, x, h=0.05):
  n = x.shape[0]
  Df = np.zeros(n)
  for i in range(n):
    x_tmp = np.copy(x)
    x_tmp[i] = x_tmp[i] +  h
    f1 = f(x_tmp)
    x_tmp[i] =  x_tmp[i] - 2*h
    f2 = f(x_tmp)
    Df[i] = (f1 - f2)/(2.0*h)
  return Df

def get_step_length(f, Df, x, beta):
  # With alpha = 1 the method fails to find minima for some functions.
  alpha = 0.99
  fx = f(x)
  factor = 0.7
  while f(x - alpha * Df) >= beta * fx and alpha > 0.005:
    alpha = alpha * factor
  return alpha

def gradient_descent_method(f, x0, beta=0.5, h=0.05, TOL=0.001):
  x = x0
  Df = compute_gradient(f, x0, h)
  iters = 0
  while np.linalg.norm(Df) > TOL:
    Df = compute_gradient(f, x, h)
    alpha = get_step_length(f, Df, x, beta)
    x =  x - alpha*Df
    iters += 1
  return x, iters

### **Newton's Method in $R^n$**

Newton's method is based on the Taylor series (15.3)
$$
f(x) \approx f(y) + \nabla f(y)^T(x-y)+ \frac{1}{2}(x-y)^T Hf(y)(x-y).
$$
Set $x= x^{(k+1)}, \ y = x^{(k)}$ such that,
$$
\frac{df(x^{(k+1)})}{d(\Delta x)} = \nabla f(x^{(k)}) + Hf(x^{(k)}) \Delta x = 0.
$$
where $\Delta x =  x^{(k+1)} - x^{(k)}$. Which gives us the increment for Newton's method for finding stationary points,
$$
\Delta x =  - (Hf(x^{(k)}))^{-1} \nabla f(x^{(k)}). 
$$
With the iterative formula,
$$
x^{(k+1)} = x^{(k)} - (Hf(x^{(k)}))^{-1} \nabla f(x^{(k)}).
$$
We implement algorithm 15.3 with $\alpha = 1.0$ and stopping criteria
$$
\Vert \nabla f(x^{(k)}) \Vert < TOL.
$$
Note that algorithm 15.3 says that $x = x - \alpha dx$ and $dx = solve(Hf, -Df)$, but we need to change one of the signs to get the correct formula.

In [0]:
def newtons_method(x0, grad, H, TOL= 0.001, alpha = 1.0):
  x = x0
  Df = grad(x)
  iters = 0
  while np.linalg.norm(Df) > TOL:
    Df = grad(x)
    Hf = H(x)
    dx = (-1)*np.linalg.solve(Hf, Df)
    x =  x + alpha*dx
    iters += 1
  return x, iters

# **Results**

We are to verify the accuracy and convergence of our methods with respect to the exact solution. We test two different functions,
$$
f(x,y) = (1 - x^2 - y^2)^2, \ \ g(x,y,z) = x^2 + y^2 + z^2 + x + y
$$
which have minima, $f(x,y) = 0$ for $x^2+ y^2 = 1$ and $g(x,y,z) = -\frac{1}{2}$ at $(-\frac{1}{2}, -\frac{1}{2}, 0)$. 

We start with gradient descent:

In [53]:
f = lambda x: (1-x[0]*x[0] - x[1]*x[1]) * (1-x[0]*x[0] - x[1]*x[1])
g = lambda x: x[0]*x[0] + x[1]*x[1] + x[2]*x[2] + x[0] + x[1]

g_exact = np.array([-1/2, -1/2, 0])
g_descent, iter_g_descent = gradient_descent_method(g, np.array([10.0,10.0,10.0]), TOL = 0.001, beta=0.5)
f_descent, iter_f_descent = gradient_descent_method(f, np.array([2.0, 2.0]), TOL = 0.001, beta=0.5)
print("GRADIENT DESCENT")
print("Stationary point f(x,y): " + str(f_descent))
print("x^2 +  y^2 = 1 check: " + str(f_descent[0]*f_descent[0] + f_descent[1]*f_descent[1]))
print("Iterations f: " + str(iter_f_descent))
print()
print("Absolute Error g(x,y,z): " + str(abs(g_descent - g_exact)))
print("Iterations g: " + str(iter_g_descent))

GRADIENT DESCENT
Stationary point f(x,y): [0.70630603 0.70630603]
x^2 +  y^2 = 1 check: 0.9977364030035543
Iterations f: 63

Absolute Error g(x,y,z): [0.00028484 0.00028484 0.00027127]
Iterations g: 336


The method converges to the minima within the tolererence.

We continue with Newton's method for which we have to compute the gradient and Hessian for both functions.
$$
\nabla f = (-4x+4x^3+ 4xy^2, \ -4y+4yx^2+4y^3)^T,
$$
$$
Hf = \begin{pmatrix} -4 + 12x^2 + 4y^2 & 8xy \\ 8yx & -4 + 4x^2 + 12y^2 \end{pmatrix},
$$
$$
\nabla g = (2x + 1, 2y + 1, 2z)^T,
$$
$$
Hg = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 2& 0 \\ 0 & 0 & 2 \end{pmatrix}.
$$

In [57]:
# grad and hessian for x^2 + y^2+ 10x (min at [-5,0])
def fG(x):
  g = np.array([[-4.0*x[0,0] + 4.0*(x[0,0]**3) + 4*x[0,0]*x[1,0]*x[1,0]], [-4.0*x[1,0] + 4.0*x[1,0]*x[0,0]*x[0,0] + 4.0*(x[1,0]**3)]])
  return g
def fH(x):
  H = np.array([[-4.0 + 12.0*x[0,0]*x[0,0] + 4.0*x[1,0]*x[1,0], 8*x[0,0]*x[1,0]], [8*x[0,0]*x[1,0], -4.0 + 4.0*x[0,0]*x[0,0] + 12.0*x[1,0]*x[1,0]]])
  return H

def gG(x):
  g = np.array([[2.0*x[0,0] + 1.0], [2.0*x[1,0] + 1], [2.0*x[2,0]]])
  return g

def gH(x):
  H = np.array([[2.0,0.0,0.0], [0.0,2.0,0.0], [0.0,0.0,2.0]])
  return H

f_newton, iter_f_newton  = newtons_method(np.array([[2.0],[2.0]]), fG, fH)
g_newton, iter_g_newton = newtons_method(np.array([[10.0],[10.0],[10.0]]), gG, gH)

print("NEWTONS METHOD")
print("Stationary point f(x,y): " + str(f_newton))
print("x^2 +  y^2 = 1 check: " + str(f_newton[0,0]*f_newton[0,0] + f_newton[1,0]*f_newton[1,0]))
print("Iterations f: " + str(iter_f_newton))
print()
print("Absolute Error g(x,y,z): " + str(abs(g_newton.flatten() - g_exact)))
print("Iterations g: " + str(iter_g_newton))

NEWTONS METHOD
Stationary point f(x,y): [[0.70710678]
 [0.70710678]]
x^2 +  y^2 = 1 check: 1.0000000000019007
Iterations f: 7

Absolute Error g(x,y,z): [0. 0. 0.]
Iterations g: 2


Newton's method also converge to the minima of the two functions.

# **Discussion**

The implementation of the two methods appears to be sound and they behave as expected. 

While it is expected that Newton's method is both more accurate and faster to converge than gradient descent, since it uses more information of the function. It was quite surprising just how much faster and accurate Newton's method was.  However, gradient descent uses an approximation of the gradient while Newton's has the exact Hessian, so the comparision is not entirely fair.