# **Lab 6: Optimization**
**Lovisa Strange**

# **Abstract**

In this lab report, a method for finding an extreme point of a functions is presented.

#**About the code**

A short statement on who is the author of the file, and if the code is distributed under a certain license.

In [1]:
"""This program is a template for lab reports in the course"""
"""DD2363 Methods in Scientific Computing, """
"""KTH Royal Institute of Technology, Stockholm, Sweden."""

# Copyright (C) 2024 Lovisa Strange (lstrange@kth.se)

# This file is part of the course DD2365 Advanced Computation in Fluid Mechanics
# KTH Royal Institute of Technology, Stockholm, Sweden
#
# This is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# This template is maintained by Johan Hoffman
# Please report problems to jhoffman@kth.se

'KTH Royal Institute of Technology, Stockholm, Sweden.'

# **Set up environment**

To have access to the neccessary modules you have to run this cell. If you need additional modules, this is where you add them.

In [2]:
# Load neccessary modules.
from google.colab import files

import time
import numpy as np
from scipy.optimize import fsolve
#try:
#    from dolfin import *; from mshr import *
#except ImportError as e:
#    !apt-get install -y -qq software-properties-common
#    !add-apt-repository -y ppa:fenics-packages/fenics
#    !apt-get update -qq
#    !apt install -y --no-install-recommends fenics
#    from dolfin import *; from mshr import *

#import dolfin.common.plotting as fenicsplot

from matplotlib import pyplot as plt
from matplotlib import tri
from matplotlib import axes
from mpl_toolkits.mplot3d import Axes3D

# **Introduction**

An important problem in mathematics is finding the maximum (or minimum) of a function. This can be done analytically, for example by looking at the derivative of a function.

There are also different types of minimixation problems. (Methods in Computational Science, p.325, Johan Hoffman) For example, we can minimizie a function without any constraint. Another version is constrained minimization problems, where we have some condition on the solution. It could for example be $$
\begin{cases}
min \;\; f(x_1,x_2)\\
x_1+x_2 = 1
\end{cases}
$$
We also have global extreme points, which is the minimum or maximum for a function on the whole domain. A local extreme point on the other hand is only guaranteed to be a minimal or maximal point locally aroundthe point. It is also worth to note that the  minimum of a function $f(x)$ is the same as the maximum of the function $-f(x),$ so it is enough to discuss one of these two cases. Any method for finding a minimum point applied on $-f(x)$ is then a method of finding a maximum point.

However, it is also useful to be able to minimize a function numerically, and one example of such a function is the gradient descent method.

# **Method**

##Gradient descent method in $R^n$
The gradient descent method (Methods in Computational Science, p.327, Johan Hoffman) is based on searching for a minimum of the function in the direction opposite to the gradient of the function in that point. We know that the gradient is orthogonal to the level curves $$
L_c(f) = \{x\in D: f(x) = c\}.
$$
The negative direction of the gradient can then be used to move closer to a minimum of the function.

Using Taylors theorem, we get $$
f(x+ \Delta x) - f(x) = ∇ f(x)^T Δx + \mathcal{O}(||Δx||^2),
$$
and we want to minimize $$∇ f(x)^T Δx.$$ This is the case if $Δx$ is in the direction of $-\nabla f.$ We get the iterative method $$
x^{(k+1)} = x^{(k)} - α^{(k)}∇f(x^{(k)}).
$$

In each step, the step length $α^{(k)}$ is computed. Thos can be done by searching for the minimum of the one-dimentional function $$
f(s) = f(x^{(k)}-s∇ f(x^{(k)})).
$$
We can do this using a golden ratio line search method. This is done by picking an interval, and computing two interior points $s_i$, keeping the ratio between the length of the sub-intervals the same. Depending on which point has the samllest function value, we pick one of the points as a new end point for the interval. This is done iteratively until the interval is small enough. We then get a value for the step length. This is done in the algorithm below.

In [3]:
## Golden ratio line search

# Input: Function f, vector x
# Output: s so that f(x-s*grad(f)) is minimized

def line_search(f,x):
  tolerance = 10**(-6)
  phi = (np.sqrt(5)-1)/2

  max_iterations = 100
  iterations = 0
  grad_f = np.array(gradient(x))

  s_a  = 0
  s_b = 2

  s3 = s_a + (1-phi)*(s_b-s_a)
  s4 = s_a + phi*(s_b-s_a)

  f_3 = f(x - s3*grad_f)
  f_4 = f(x - s4*grad_f)

  while abs(s_a-s_b) > tolerance and max_iterations > iterations:
    iterations +=1

    if f_3 < f_4:
      s_b = s4
      s4 = s3
      s3 = s_a + (1-phi)*(s_b-s_a)
      f_4 = f_3
      f_3 = f(x - s3*grad_f)
    else:
      s_a = s3
      s3 = s4
      s4 = s_a + phi*(s_b-s_a)
      f_3 = f_4
      f_4 = f(x - s4*grad_f)

  return (s_a+s_b)/2



Then, we have the algorithm for the gardient descent method.

In [10]:
## Jacobi iteration, based on Algorithm 15.1 (gradient_descent_method), p.327, Methods in Computational Science

# Input: Function f, starting guess x0
# Output: solution x

def gradient_descent(f,x0):
  tolerance = 10**(-10)
  x = np.array(x0)
  Df = np.array(gradient(x))
  i = 0

  while np.linalg.norm(Df) > tolerance:

    if i % 1000 ==0: # For printing convergence
      print("Iteration: ",i," x = ", x)

    Df = np.array(gradient(x))
    alpha = line_search(f,x)

    x = x - alpha*Df
    i+=1

  return x



# **Results**
In this section, the results produced by the algorithm in the previous section are presented

##Gradient descent method in $R^n$

We can define a function $$
f(x_1,x_2) = x_1^2+x_2^2 +x_1x_2 + 3x_1+2x_2 + 20.
$$
It has a gradient of $$
∇ f (x_1,x_2) = [2x_1+x_2+3, 2x_2+x_1+2]^T.
$$
We also know that it has a minimum in $$
(x_1,x_2) = (-4/3, -1/3).
$$

In [5]:
def f(x):
  return 20 + x[0]**2 + x[1]**2 +x[0]*x[1] + 3*x[0] +2*x[1]

In [6]:
def gradient(x):
  return [2*x[0] + x[1]+3,2*x[1]+x[0]+2]

We can look at the convergence of the solution by printing the solution vector at some iterations during the calculation.

Another way of checking the result is by comparing the analytical solution to the computed one, using an initial guess $$
x_0 = [-5,-1]
$$

In [13]:
print("convergence of x-values:")
num_sol = gradient_descent(f,[-5,-1])
exact_sol = [-4/3,-1/3]
print(" ")
print("exact solution - nummerical solution = ", exact_sol-num_sol)



convergence of x-values:
Iteration:  0  x =  [-5 -1]
Iteration:  1000  x =  [-1.33333328 -0.33333328]
Iteration:  2000  x =  [-1.33333333 -0.33333333]
Iteration:  3000  x =  [-1.33333333 -0.33333333]
Iteration:  4000  x =  [-1.3333333 -0.3333333]
Iteration:  5000  x =  [-1.33333331 -0.33333331]
Iteration:  6000  x =  [-1.33333334 -0.33333334]
Iteration:  7000  x =  [-1.33333331 -0.33333331]
Iteration:  8000  x =  [-1.33333336 -0.33333336]
 
exact solution - nummerical solution =  [8.23949797e-11 8.23945356e-11]


As we can see, the solution converges to the exact solution. We can also see that the exact and nummerical solutions are close to each other, as they should be. This also holds for other functions f.

# **Discussion**

The results were mostly expected. The solution converged to the expected value, after trying some different starting points which converged at different rates. Different functions also affected the convergence rate of the function. For example, it was harder to find a starting point that made the method converge for some functions that were too flat around the minimum. The flatness affects the gradient of the function, which makes the solution converge very slowly for some cases.