<a href="https://colab.research.google.com/github/johanhoffman/DD2363_VT23/blob/juliusha/Lab6/report_lab_6_juilus.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab 6: Optimization and Learning**
**Julius Häger**

# **Abstract**

Two methods for function minimization, gradient descent and newtons method for optimization, is presented and tested. It is found that gradient descent is simpler and more intuitive than Newton's method while maintaining decent convergence.

#**About the code**

In [None]:
"""This program is lab report in the course"""
"""DD2363 Methods in Scientific Computing, """
"""KTH Royal Institute of Technology, Stockholm, Sweden."""

# Copyright (C) 2023 Julius Häger (juliusha@kth.se)

# This file is part of the course DD2365 Advanced Computation in Fluid Mechanics
# KTH Royal Institute of Technology, Stockholm, Sweden
#
# This is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

'KTH Royal Institute of Technology, Stockholm, Sweden.'

# **Set up environment**

In [None]:
# Load neccessary modules.
from google.colab import files

import time
import numpy as np

from IPython.display import display, Math

from typing import Callable, Tuple

import pandas as pd
import plotly.graph_objs as go

# To we can output png/svg graphs for github
#!pip install -U kaleido

#try:
#    from dolfin import *; from mshr import *
#except ImportError as e:
#    !apt-get install -y -qq software-properties-common 
#    !add-apt-repository -y ppa:fenics-packages/fenics
#    !apt-get update -qq
#    !apt install -y --no-install-recommends fenics
#    from dolfin import *; from mshr import *
    
#import dolfin.common.plotting as fenicsplot

from matplotlib import pyplot as plt
from matplotlib import tri
from matplotlib import axes
from mpl_toolkits.mplot3d import Axes3D

import plotly.io as pio


# **Introduction**

**This notebook should we viewed in Google Colab!**

*This notebook contains interactive plots that are not supported on GitHub, therefor it is recommended to view this notebook in Google Colab.*

Finding the minimum value of a function $\min_x f(x)$ can be of great value. The function could be anything from finding the most stable and light weight wing construction to reducing the travel distance a mailtruck has to drive to deliver packages. The function for measuring the thing we want to minimize is often called the objective function.

These optimization problems are often divided into discrete and continous optimization problems, where discrete optimization problems are problems where the function input $x$ is some vector of discrete values while continous optimization problems have a continous input, often $x \in \mathbb{R}^n$. Notable for discrete optimization problems is that arbitrary small changes in input can't be made which means that some algorithms that rely on small changes likely won't work as well for discrete optimization problems.

This lab will focus on continous optimization problems of the form $\min_{x} f(x),\,\,x \in \mathbb{R}^n$, with known gradient $\nabla f$ and for the second method known hessian $\mathbf{H}(f) = \mathbf{J}(\nabla f)$. The first method presented is what is called *gradient descent* which uses the gradient of the function to find the direction in the input space that will create the largest minimization in the output space. This is simply implemented using the update rule $x_{n+1} = x_n - λ \nabla f(x_n)$ whith $λ$ being a parameter of the method, often called *learning rate* (from its prevalent use in machine learning).

The second method is a modification of Newton's method to make it converge on critical points of the function $f$, which given a starting point close enough to a minima will converge on that minima. This is done by modifying the original update rule from newton's method $x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$ into the following form $x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)} = x_n - [f''(x_n)]^{-1}f'(x_n)$. The intuition behind this is that the new update rule will converge on points where $f'(x) = 0$ which are the critical points of the function $f$.

# **Method**

## Assignment 1: Gradient descent method in $\mathbb{R}^n$



The implementation of gradient descent is very simple and just follows the update formula $x_{n+1} = x_n - λ \nabla f(x_n)$. The stopping criteria used is the norm of the gradient $| \nabla f(x_n) |$, which will be close to 0 when close to a minima of $f$. This implementation also stores all intermediate values for visualization purposes.

In [None]:
# Assignment 1: Gradient descent method in Rn

def gradient_descent(df, initial_guess : np.ndarray, λ = 0.01, TOL = 1e-16):
  guesses = []
  guesses.append(initial_guess)
  xi = initial_guess
  iterations = 0
  max_iter = 10_000
  while (np.linalg.norm(grad := df(xi))) > TOL and (iterations < max_iter):
    xi = xi - λ*grad
    guesses.append(xi)
    iterations += 1
  return np.array(guesses)

## Extra Assignment: Newton's method in $\mathbb{R}^n$

Similarly the implementation of Newton's method isn't very complicated. It uses the fact that the value $\Delta x = -[f''(x_n)]^{-1}f'(x_n)$ in the update rule $x_{n+1} = x_n - [f''(x_n)]^{-1}f'(x_n)$ can be calculated as $f''(x_n) \Delta x = -f'(x_n)$. To solve tihs system we use `numpy.linalg.solve` to solve this system of linear equations. But if the form of the gradient and hessian matrix allowed more specialized solvers might be utilized, the solution might event be directly computed.

In [None]:
# Extra assignment: Newton's method in Rn

def newtons_method(df, hf, initial_guess : np.ndarray, λ = 1, TOL = 1e-16):
  guesses = []
  guesses.append(initial_guess)
  xi = initial_guess
  iterations = 0
  max_iter = 10_000
  while (np.linalg.norm(grad := df(xi))) > TOL and (iterations < max_iter):
    hess = hf(xi)
    dx = np.linalg.solve(hess, -grad)
    xi = xi + λ*dx
    guesses.append(xi)
    iterations += 1
  return np.array(guesses)

# **Results**

## Assignment 1: Gradient descent method in $\mathbb{R}^n$

To test the gradient descent procedure we compare it to an analytically computed solution. The function used is $f = \cos{x}\sin{y}$ and the analytical minimum we are looking for is $x=0,\,y=-2π+\frac{3π}{2}$. The starting point $(0.4, 1.0)$ is picked to produce an interesting result when visualizing the method. Four different values for the *learning rate* parameter $λ$ is tested and plotted to see how this parameter affects the method.

In [None]:
# Assignment 1: Gradient descent method in Rn

def test_λ(grad, guess, λs):
  return [ gradient_descent(grad, guess, λ=λ) for λ in λs ]

def fit_function(array_of_values, margin=0.5, N=100):
  min_x, min_y = np.amin(np.array([ np.amin(values, axis=0) for values in array_of_values ]), axis=0)
  max_x, max_y = np.amax(np.array([ np.amax(values, axis=0) for values in array_of_values ]), axis=0)

  x_mid = (min_x + max_x) * 0.5
  y_mid = (min_y + max_y) * 0.5

  x_diff = (max_x - min_x)
  y_diff = (max_y - min_y)

  h_diff = max(x_diff, y_diff) * 0.5

  xs = np.linspace(x_mid-h_diff-margin, x_mid+h_diff+margin, N)
  ys = np.linspace(y_mid-h_diff-margin, y_mid+h_diff+margin, N)

  return np.meshgrid(xs, ys)

f = lambda x : np.cos(x[0]) * np.sin(x[1])
grad = lambda x : np.array( [-np.sin(x[0])*np.sin(x[1]), np.cos(x[0])*np.cos(x[1])] )

λs = [0.01, 0.1, 1, 1.5]
results = test_λ(grad, np.array([0.4, 1.0]), λs)

exact = np.array([0, 2*np.pi*-1+(3*np.pi/2)])

final = [ r[-1] for r in results ]
diffs = final - exact

xs, ys = fit_function(results)
zs = f([xs, ys])

fig = go.Figure()
fig.add_surface(x=xs, y=ys, z=zs, opacity=0.7)
for (λ, error, result) in zip(λs, np.linalg.norm(diffs, axis=1), results):
  print(f"λ: {λ:<5} -> iterations: {result.shape[0]:<5}, error: {error:<25}")
  fig.add_scatter3d(x=result[:,0], y=result[:,1], z=f(result.T), mode='lines', name=f"λ={λ}")

fig.update_layout(title="test", xaxis_title="x", yaxis_title="y", width=700, height=500)
# FIXME: Why does the colorbar not dissapear??
fig.update_coloraxes(showscale=False)
fig.update_layout(legend_orientation="h")
fig.show()

λ: 0.01  -> iterations: 10001, error: 1.0880185641326534e-14   
λ: 0.1   -> iterations: 10001, error: 8.881784197001252e-16    
λ: 1     -> iterations: 9    , error: 0.0                      
λ: 1.5   -> iterations: 56   , error: 4.3883303761312494e-17   


As we can see, all four iterations converge to a solution close to the actual solution. For small $λ$ the method never quite reaches the desired tolerance likely as the multiplication $λ \nabla f(x_n)$ with small $λ$ and $\nabla f(x_n)$ results in a value too small to change $x_n$ due to floating point precision. For $λ=1$ only 9 steps is needed to get an answer with zero representable error. A value of $λ=1.5$ results in visible overshooting which results in 56 iterations to get within tolerance.

## Extra Assignment: Newton's method in $\mathbb{R}^n$

To test the Newton's method procedure the same method is used, we compare our results to a analytically calculated exact answer. For this method however we need to change our initial starting point so that Newton's method actually finds a minima instead of some other critical point.

In [None]:
f = lambda x : np.cos(x[0]) * np.sin(x[1])
grad = lambda x : np.array( [-np.sin(x[0])*np.sin(x[1]), np.cos(x[0])*np.cos(x[1])] )
hess = lambda x : np.array([[-np.cos(x[0])*np.sin(x[1]), -np.sin(x[0])*np.cos(x[1])],\
                            [-np.sin(x[0])*np.cos(x[1]), -np.cos(x[0])*np.sin(x[1])]])

exact = np.array([0, 2*np.pi*-1+(3*np.pi/2)])

x0 = np.array([0.4, -1.0])

result = newtons_method(grad, hess, x0, λ=1)

λs = [0.01, 0.1, 1, 1.5]
results = test_λ(grad, x0, λs)

final = [ r[-1] for r in results ]
diffs = final - exact

xs, ys = fit_function([result] + results)
zs = f([xs, ys])

print(f"Final: {result[-1]}")
print(f"            Iterations: {result.shape[0]:<5}, error: {np.linalg.norm(result[-1] - exact):<25}")

fig = go.Figure()
fig.add_surface(x=xs, y=ys, z=zs, opacity=0.7)
fig.add_scatter3d(x=result[:,0], y=result[:,1], z=f(result.T), mode='lines', name=f"newtons method")
for (λ, error, result) in zip(λs, np.linalg.norm(diffs, axis=1), results):
  print(f"λ: {λ:<5} -> iterations: {result.shape[0]:<5}, error: {error:<25}")
  fig.add_scatter3d(x=result[:,0], y=result[:,1], z=f(result.T), mode='lines', name=f"λ={λ}")

fig.update_layout(title="test", xaxis_title="x", yaxis_title="y", width=700, height=500)
# FIXME: Why does the colorbar not dissapear??
fig.update_coloraxes(showscale=False)
fig.update_layout(legend_orientation="h")
fig.show()

Final: [ 0.         -1.57079633]
            Iterations: 6    , error: 0.0                      
λ: 0.01  -> iterations: 10001, error: 1.0880185641326534e-14   
λ: 0.1   -> iterations: 10001, error: 8.881784197001252e-16    
λ: 1     -> iterations: 5    , error: 0.0                      
λ: 1.5   -> iterations: 53   , error: 3.799445756768334e-17    


As we can see here the number of steps needed to converge on an answer using Newton's method is one iteration more than what was needed with $λ=1$ in gradient descent. This could be due to the particular method being minimized.

# **Discussion**

The field of function minimization and optimization in general is vast and there are so many topics just related two the two methods presented in this lab that are not touched on. Convergence, convergence rate, numerical stability, basins of attraction etc. This lab is merely scratching the surface of using these methods. These methods are also heavily utilized in machine learning, which this lab would not be complete without mentioning.

Also of note is that the implementation of newtons methods for this kind of minimization problem is highly volatile. For oscillating functions it is very likely that it will converge to other critical points than minima of the function. This can be quite unintuitive as starting points close to a minima sometimes do not converge toward that minima. This leads me to favor gradient descent for it's simplicity and intuitive results.