https://colab.research.google.com/drive/1rAiWyPGy48Mrj4Rl3DVvgfMNP4ppzetC

# **Lab 7 : Optimization and learning**
**Patrik Svensson**


# **Abstract**
In this lab we have explored the concept of optimization and learning. In optmization the we want to find a critical point for a function. To find the critical point we can use an iteration method, where we stop the iteration when we are close enough to the correct answer. The result is an implementation of gradient decent method in $R^n$ 

# **Set up environment**
To set up the environment, run the two following lines of code.

In [0]:
!pip install numdifftools

import numpy as np
import unittest
import math
from scipy.optimize import fmin
import scipy
import random
import numdifftools

Collecting numdifftools
[?25l  Downloading https://files.pythonhosted.org/packages/ab/c0/b0d967160ecc8db52ae34e063937d85e8d386f140ad4826aae2086245a5e/numdifftools-0.9.39-py2.py3-none-any.whl (953kB)
[K     |▍                               | 10kB 18.6MB/s eta 0:00:01[K     |▊                               | 20kB 3.2MB/s eta 0:00:01[K     |█                               | 30kB 4.2MB/s eta 0:00:01[K     |█▍                              | 40kB 3.0MB/s eta 0:00:01[K     |█▊                              | 51kB 3.4MB/s eta 0:00:01[K     |██                              | 61kB 4.1MB/s eta 0:00:01[K     |██▍                             | 71kB 4.3MB/s eta 0:00:01[K     |██▊                             | 81kB 4.6MB/s eta 0:00:01[K     |███                             | 92kB 5.1MB/s eta 0:00:01[K     |███▍                            | 102kB 4.9MB/s eta 0:00:01[K     |███▉                            | 112kB 4.9MB/s eta 0:00:01[K     |████▏                           | 122kB

# **Introduction**
This lab is all about optmization. Optimization problems are about finding a minimum or maximum points of a function, also known as critical points. In this lab we will explore some iterative methods for solving several optimization problems. 

# **Methods**
In this chapter, I will present how the implementation of the functions was conducted. The study was conducted in the following way.

1.   Literature research
2.   Implementation
3.   Testing

In the sections below, I have provided a reference to where the algorithms were founded, or how it was deduced, followed with a code implementation in Python, and lastly unit test for the assurance of the accuracy of the implementations.

# Gradient descent method in $R^n$ 
When talking about minimization problems in $R^n$, it is about finding a $x^* \in D$ that satisfies the following inequality from the lecture notes 15.1.

$f(x^*) \leq f(x)$, $\forall x \in D$

Where $D$ is a search space for a solution to the inequality. The function $f$ can be defined as $f: D \rightarrow R$.

One method of finding a minimum is the *gradient descent method*. the gradient descent method, the algorithm in the following way:


1.   Choose initial value for point $x$ in $R^n$ 
2.   Find $\nabla f(x)$
3.   Go as long in the direction of $-\nabla f(x)$ from point $x$ until the reached point has you reach a point where $\nabla f(x_{min})$ is close to zero, or $x_{new}$ where $\langle\nabla f(x), \nabla f(x_{orto})\rangle = 0$. If $\nabla f(x_{min})$  is reached the search for minimum is finished, otherwise iterate from step 2 with $x = x_{orto}.$

This algorithm is implemented below in python, togheter with unit tests. This code is based on the pseudo code in the lecture notes 15.2.


In [0]:
TOL = 0.001

def gradient_descent_method(f, x0):
  x = x0
  df = compute_gradient(f, x)
  while np.linalg.norm(df) > TOL:
    df = compute_gradient(f, x)
    alpha = get_step_length(f, df, x)
    x = x - alpha * df / np.linalg.norm(df)
  return x

def compute_gradient(f, x):
  # Let's take a sufficient small delta size
  delta_size = np.sqrt(np.finfo(float).eps)
  epsilon = np.full(x.shape[0], delta_size)
  return scipy.optimize.approx_fprime(x, f, epsilon)

def get_step_length(f, df, x):
  step = 0.0001
  norm_df = df / np.linalg.norm(df)
  
  i = 0
  while True:
    gradient = compute_gradient(f, x - norm_df * i * step)
    if(np.linalg.norm(gradient) < TOL or (np.inner(gradient, norm_df) < TOL and np.inner(gradient, norm_df) > -TOL)):
      return i * step
    i += 1

Below are unit test to assure the implementation of the gradient decent method. 

In [0]:
class TestEulerMethod(unittest.TestCase): 
  def test_accuracy_2D(self):
    for i in range(10):
      a = random.uniform(-10, 10)
      b = random.uniform(-10, 10)
      function = lambda x: (x[0] + a)**2 + (x[1] + b)**2
      expected_result = fmin(function, np.array([1,2]), disp=False)

      result = gradient_descent_method(function, np.array([0, 0]))
      np.testing.assert_almost_equal(result, expected_result, 1)

  def test_accuracy_3D(self):
    for i in range(10):
      a = random.uniform(-10, 10)
      b = random.uniform(-10, 10)
      c = random.uniform(-10, 10)
      function = lambda x: (x[0] + a)**2 + (x[1] + b)**2 + (x[2] + c)**2
      expected_result = fmin(function, np.array([0, 0, 0]), disp=False)

      result = gradient_descent_method(function, np.array([0, 0, 0]))
      np.testing.assert_almost_equal(result, expected_result, 1)

if __name__ == '__main__':
    # Help from user Pierre S. in the stack overflow thread to give the main arguments: 
    # https://stackoverflow.com/questions/49952317/python3-for-unit-test-attributeerror-module-main-has-no-attribute-kerne 
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

..
----------------------------------------------------------------------
Ran 2 tests in 64.707s

OK


# Newton's method in $R^n$

Newton's method in $R^n$ is based on Taylor's formula:

$f(x) \approx f(y) + \nabla f(y)^T(x-y + \frac{1}{2}(x-y)^THf(y)(x-y)$

If we substitute x with $x^{k+1}$ and y with $x^k$ and diffrentiate $f(x)$ and let it be equal to zero, we get:

$\frac{df(x^{(k+1)})}{d(\Delta x)} = \frac{d}{d(\Delta x)}(f(x^k) + \nabla f(x^k)*\Delta x + \frac{1}{2}\Delta x^T Hf(x^k)\Delta x) = $

$\nabla f(x^k) + Hf(x^k)\Delta x = 0$

Where $\Delta x = x^{k + 1} + x^k$

To get $\Delta x$ by: 

$\Delta x = -\frac{\nabla f(x^k)}{Hf(x^k)}$

With  $\Delta x$ we can retrieve $x^{k+1}$.

In Newton's method we will reiterate the calculation of $x^{k+1}$ until it's close enough to the minimum.

This code is based on the pseudo code in the lecture notes 15.3.

In [0]:
def compute_gradient(f, x):
  delta_size = np.sqrt(np.finfo(float).eps)
  epsilon = np.full(x.shape[0], delta_size)
  return scipy.optimize.approx_fprime(x, f, epsilon)

def compute_hessian(f, x):
  return numdifftools.Hessian(f)(x)

def solve_linear_system(a, b):
  return np.linalg.solve(a, b)

def newton_method(f, x0):
  x = x0
  df = compute_gradient(f, x)
  TOL = 0.01

  while np.linalg.norm(df) > TOL:
    df = compute_gradient(f, x)
    hf = compute_hessian(f, x)
    dx = solve_linear_system(hf, -df)
    x = dx + x

  return x

The code below is the implementation of unit tests.

In [0]:
class TestEulerMethod(unittest.TestCase): 
  def test_accuracy_2D(self):
    for i in range(10):
      a = random.uniform(-10, 10)
      b = random.uniform(-10, 10)
      function = lambda x: (x[0] + a)**2 + (x[1] + b)**2
      expected_result = fmin(function, np.array([1,2]), disp=False)

      result = newton_method(function, np.array([0, 0]))
      np.testing.assert_almost_equal(result, expected_result, 1)

  def test_accuracy_3D(self):
    for i in range(10):
      a = random.uniform(-10, 10)
      b = random.uniform(-10, 10)
      c = random.uniform(-10, 10)
      function = lambda x: (x[0] + a)**2 + (x[1] + b)**2 + (x[2] + c)**2
      expected_result = fmin(function, np.array([0, 0, 0]), disp=False)

      result = newton_method(function, np.array([0, 0, 0]))
      np.testing.assert_almost_equal(result, expected_result, 1)

if __name__ == '__main__':
    # Help from user Pierre S. in the stack overflow thread to give the main arguments: 
    # https://stackoverflow.com/questions/49952317/python3-for-unit-test-attributeerror-module-main-has-no-attribute-kerne 
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

..
----------------------------------------------------------------------
Ran 2 tests in 0.286s

OK


# **Results**

The results of this lab is an implementation of gradient descent method, and newtons method, both in $R^n$.

In [0]:
if __name__ == '__main__':
    # Help from user Pierre S. in the stack overflow thread to give the main arguments: 
    # https://stackoverflow.com/questions/49952317/python3-for-unit-test-attributeerror-module-main-has-no-attribute-kerne 
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

..
----------------------------------------------------------------------
Ran 2 tests in 0.284s

OK


# **Discussion**
The gradient descent method take some serious time to perform when a decent precision is required and a more complex and higher dimensions are used. It would be intresting to solve a difficult problem with this problem together with more computing resources.

