# Example 38: Uncostrained Optimization With Gradient Descent

## Contents

* [Overview](#overview) 
    * [Uncostrained optimization with Gradient Descent](#ekf)
* [Include files](#include_files)
* [The main function](#m_func)
* [Results](#results)
* [Source code](#source_code)
* [References](#refs)

## <a name="overview"></a> Overview

Perhaps the simplest algorithm for uncostrained optimization is <a href="https://en.wikipedia.org/wiki/Gradient_descent"> gradient descent</a> also known as steepest descent.

### <a name="ekf"></a> Uncostrained optimization with Gradient Descent

Let's consider the following function [1]

$$f(\theta_1, \theta_2) = \frac{1}{2}(\theta_{1}^2 - \theta_2)^2 + \frac{1}{2}(\theta_1 -1)^2$$

We are interested in finding $\theta_1, \theta_2$ that minimize $f$. Gradient descent is an iterative algorithm that uses the gradient of the function in order to update the parameters. The update rule is 

$$\boldsymbol{\theta}_k = \boldsymbol{\theta}_{k-1} - \eta \nabla f|_{\boldsymbol{\theta}_{k-1}} $$

$\eta$ is the so called learning rate and tunes how fast we move to the direction of the gradient. A small $\eta$ slows down convergence whilst a large value may not allow convergence of the algorithm. This is shown in the two figures below: 

<figure>
<img src="gd_1.png" alt="Gradient descent eta 0.1"
	title="Gradient descent eta 0.1" width="400" height="350" />
<figcaption>Figure: Gradient descent with eta 0.1.</figcaption>
</figure>

<figure>
<img src="gd_2.png" alt="Gradient descent eta 0.6"
	title="Gradient descent eta 0.6" width="400" height="350" />
<figcaption>Figure: Gradient descent with eta 0.6.</figcaption>
</figure>

## <a name="include_files"></a> Include files

```
#include "kernel/base/config.h"
#include "kernel/base/types.h"
#include "kernel/base/kernel_consts.h"
#include "kernel/utilities/common_uitls.h"
#include "kernel/maths/optimization/serial_gradient_descent.h"
#include "kernel/maths/functions/function_base.h"

#include <iostream>
```

## <a name="m_func"></a> The main function

```
namespace example {

using kernel::real_t;
using kernel::uint_t;
using kernel::DynMat;
using kernel::DynVec;
using kernel::maths::opt::Gd;
using kernel::maths::opt::GDConfig;

class Function: public kernel::FunctionBase<real_t, DynVec<real_t>>
{
public:

    typedef kernel::FunctionBase<real_t, DynVec<real_t>>::output_t output_t;

    // constructor
    Function(const DynVec<real_t>& coeffs);

    // compute the value of the function
    virtual output_t value(const DynVec<real_t>&  input)const override final;

    // compute the gradients of the function
    virtual DynVec<real_t> gradients(const DynVec<real_t>&  input)const override final;

    // the number of coefficients
    virtual uint_t n_coeffs()const override final{return 2;}

    // reset the coefficients
    void set_coeffs(const DynVec<real_t>&  coeffs){coeffs_ = coeffs;}

    // get a copy of the coefficients
    DynVec<real_t> coeffs()const{return coeffs_;}

private:

    // coefficients vector
    DynVec<real_t> coeffs_;

};

Function::Function(const DynVec<real_t>& coeffs)
    :
      coeffs_(coeffs)
{}

Function::output_t
Function::value(const DynVec<real_t>&  input)const{

    std::cout<<"th1 "<<input[0]<<" th2 "<<input[1]<<std::endl;
    return 0.5*(kernel::utils::sqr(kernel::utils::sqr(input[0]) - input[1])) +
           0.5*(kernel::utils::sqr(input[0] - 1.0));
}

DynVec<real_t>
Function::gradients(const DynVec<real_t>&  input)const{

    auto grad1= 2.0*input[0]*(kernel::utils::sqr(input[0]) - input[1]) + (input[0] - 1.0);
    auto grad2 = -(kernel::utils::sqr(input[0]) - input[1]);
    DynVec<real_t> rslt(2, 0.0);
    rslt[0] = grad1;
    rslt[1] = grad2;
    return rslt;
}

}

int main(){

    using namespace example;
    try{

        GDConfig config(10, kernel::KernelConsts::tolerance(), 0.1);
        config.set_show_iterations_flag(true);
        Gd gd(config);

        DynVec<real_t> coeffs(2, 0.0);

        Function f(coeffs);

        auto info = gd.solve(f);
        std::cout<<info<<std::endl;

    }
    catch(std::logic_error& error){

        std::cerr<<error.what()<<std::endl;
    }
    catch(...){
        std::cerr<<"Unknown exception occured"<<std::endl;
    }

    return 0;
}
```

## <a name="results"></a> Results

```
>GD: iteration: 1
	 eta: 0.1 ABS error: 0.09495 Exit Tol: 1e-08
>GD: iteration: 2
	 eta: 0.1 ABS error: 0.0762246 Exit Tol: 1e-08
>GD: iteration: 3
	 eta: 0.1 ABS error: 0.0596829 Exit Tol: 1e-08
>GD: iteration: 4
	 eta: 0.1 ABS error: 0.0452378 Exit Tol: 1e-08
>GD: iteration: 5
	 eta: 0.1 ABS error: 0.0333377 Exit Tol: 1e-08
>GD: iteration: 6
	 eta: 0.1 ABS error: 0.0242542 Exit Tol: 1e-08
>GD: iteration: 7
	 eta: 0.1 ABS error: 0.0178224 Exit Tol: 1e-08
>GD: iteration: 8
	 eta: 0.1 ABS error: 0.0135337 Exit Tol: 1e-08
>GD: iteration: 9
	 eta: 0.1 ABS error: 0.0107686 Exit Tol: 1e-08
>GD: iteration: 10
	 eta: 0.1 ABS error: 0.00898365 Exit Tol: 1e-08
>GD: iteration: 11
	 eta: 0.1 ABS error: 0.00778693 Exit Tol: 1e-08
>GD: iteration: 12
	 eta: 0.1 ABS error: 0.00693079 Exit Tol: 1e-08
>GD: iteration: 13
	 eta: 0.1 ABS error: 0.0062723 Exit Tol: 1e-08
>GD: iteration: 14
	 eta: 0.1 ABS error: 0.00573354 Exit Tol: 1e-08
>GD: iteration: 15
	 eta: 0.1 ABS error: 0.00527306 Exit Tol: 1e-08
>GD: iteration: 16
	 eta: 0.1 ABS error: 0.00486855 Exit Tol: 1e-08
>GD: iteration: 17
	 eta: 0.1 ABS error: 0.00450745 Exit Tol: 1e-08
>GD: iteration: 18
	 eta: 0.1 ABS error: 0.00418203 Exit Tol: 1e-08
>GD: iteration: 19
	 eta: 0.1 ABS error: 0.00388703 Exit Tol: 1e-08
>GD: iteration: 20
	 eta: 0.1 ABS error: 0.00361855 Exit Tol: 1e-08
# iterations:..20
# processors:..1
# threads:.....1
Residual:......0.00361855
Tolerance:.....1e-08
Convergence:...No
Total time:....0.000364831
Learning rate:..0.1


```

## <a name="source_code"></a> Source code

<a href="../exe.cpp">exe.cpp</a>

## <a name="refs"></a> References

1. Kevin P. Murphy, ```Machine Learning A Probabilistic Perspective```, The MIT Press