<a href="https://colab.research.google.com/github/johanhoffman/DD2363_VT22/blob/leogabac-Lab7/Lab7/leogabac_Lab7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab 7: Optimization and learning**
**Leonardo Gabriel Alanis Cantú**

# **Abstract**

In this report we look at the gradient descent method in $\mathbb{R}^n$ for minimization of functions of several variables. The implementations were done in code with acceptable results. Details are explained in the pertinent sections.

In [1]:
"""This program is a template for lab reports in the course"""
"""DD2363 Methods in Scientific Computing, """
"""KTH Royal Institute of Technology, Stockholm, Sweden."""

# Copyright (C) 2020 Johan Hoffman (jhoffman@kth.se)

# This file is part of the course DD2365 Advanced Computation in Fluid Mechanics
# KTH Royal Institute of Technology, Stockholm, Sweden
#
# This is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# This template is maintained by Johan Hoffman
# Please report problems to jhoffman@kth.se

'KTH Royal Institute of Technology, Stockholm, Sweden.'

# **Set up environment**

In [2]:
# Load neccessary modules.
from google.colab import files

import time
import numpy as np
from math import *
from numpy import mean

from matplotlib import pyplot as plt
from matplotlib import tri
from matplotlib import axes
from mpl_toolkits.mplot3d import Axes3D

# **Introduction**

The main task of this report is that given a function $f(x_1, \ldots, x_n)$

$$
f: \mathbb{R}^n \to \mathbb{R}
$$

we want to find (iteratively) a local minimum of this function. A way to do this is by the update formula

$$
x_{k+1} = x_k - \alpha_k \hat{n}.
$$

Here $x$ denotes the vector (x_1,\ldots,x_n), $\hat{n}$ denotes the direction in which the following step is going to be made, and $\alpha_k$ is the size of that step. Since we want to eventually reach a local minimum, a natural choice of $\hat{n}$ is the negative of the gradient, which intuitively points towards the direction of greatest descend. This method is known as _gradient descent_.

$$
x_{k+1} = x_k - \alpha_k \nabla f(x_k).
$$

Note that we allow $\alpha_k$ to be different at each step. A suitable choice of $\alpha$ is a matter to pause and ponder (a lot), since a small value would slow down the convergence, whereas a big enough value could cause a divergence, hence the value of $\alpha$ should be bounded between 0 and 1.

A "good" choice of $\alpha$ at step $k$ is the one that induces the greatest descent. We can find this value by minimizing

$$
h(\alpha_k) = f(x_k - \alpha_k\nabla f(x_k)) , \quad \alpha_k \in (0,1)
$$

which is a another optimization problem, but it is only a line search on the direction of descent.

To implement the gradient descent algorithm, we need to be able to compute numerical gradients, hence the necessity of numerical partial derivatives. We will implement this by a centran difference scheme

$$
\dfrac{\partial f}{\partial x^i}(x) \approx \dfrac{f(x + \tilde{h}) - f(x - \tilde{h})}{2h}
$$

where $\tilde{h}$ is the vector $(0,\ldots,h,\ldots,0)^\text{T}$ with an $h$ in the $i$th position.

# **Method**

Let us implement all needed routines.

In [3]:
def pdv(f,x,variable, h = 0.01):
    # ===== INPUT ===== #
    # f: Function in several variables
    # x: Point at which the derivative is taken at
    # variable: Variable to which we are going to differentiate
    # h: Spacing
    # ===== Output ===== #
    # ∂f/∂x[variable] evaluated at x
    
    h_vec = np.zeros(len(x))
    h_vec[variable] = h
    
    diff = f(x + h_vec) - f(x - h_vec)
    
    return diff/(2*h)

def gradient(f, x):
    # ===== INPUT ===== #
    # f: Function in several variables
    # x: Point at which the gradient is evaluated
    # ===== Output ===== #
    # ∇f evaluated at x
    
    n = len(x)
    grad = np.array( [pdv(f,x,xi) for xi in range(n)] )
    return grad


def line_search2(f,x,grad):
    
    # Extremely sloppy minimization
    
    a = np.linspace(0,1,50)
    feval = [f(x-ai*grad) for ai in a]
    ind = np.argmin(feval)
    
    return a[ind]
        
    
def gradient_descent(f,ansatz, alpha, TOL = 0.01):
    # ===== INPUT ===== #
    # f: Function to minimize
    # ansatz: Initial guess
    # alpha: parameter
    # ===== Output ===== #
    # x s.t. f is minimum
    
    x_min = np.copy(ansatz) # initialize
    grad = np.ones( len(ansatz) ) # initialize
    
    while np.linalg.norm(grad) > TOL:
        grad = gradient(f,x_min)
        alpha = line_search2(f,x_min,grad)
        x_min = x_min - alpha*grad
        
    return x_min
    
    

# **Results**

To test our results, we will compare against some functions whose solutions we already know.

$$
f(x,y) = (x-1)^2 + (y+2)^2 -3
$$

is a paraboloid centered at $(1,-2)$ (hence its minimum). This test should be simple enough for the algorithm.

In [4]:
def test_func(x):
    return (x[0]-1)**2 + (x[1] + 2)**2 - 3

x_min = gradient_descent(test_func, [5,5], 0.1, TOL = 1e-5)
diff = np.array([1,-2]) - x_min
print("Error:", np.linalg.norm(diff))

Error: 2.854146892646271e-08


Now, the function

$$
f(x,y) = \cos(x) \sin(y)
$$

has several minima. In fact, for any two integers $m$ and $n$, the minimum are located at

$$
x = 2\pi n, \quad y = 2\pi m - \dfrac{\pi}{2}
$$

Let us try to find some of the minima.

In [5]:
def test_func(x):
    return cos(x[0])*sin(x[1])

x_min = gradient_descent(test_func, [4.125,-1], 0.1, TOL = 1e-5)
diff = np.array([2*pi,-pi/2]) - x_min
print("Error:", np.linalg.norm(diff))

Error: 9.610881359995597e-11


# Discussion

In this last report we looked at gradient descent. One of the most natural minimization techniques to make whenever we are dealing with functions of many variables. The results were good, which is expected from this type of algorithm.

One personal comment, is that I always wanted to implement a numerical gradient. I took the chance to do it here, I was excited.