Skip to content

Latest commit

 

History

History
374 lines (230 loc) · 10.8 KB

cg.rst

File metadata and controls

374 lines (230 loc) · 10.8 KB

Conjugate Gradient

Table of contents


The Nonlinear Conjugate Gradient (CG) algorithm is used solve to optimization problems of the form

\min_{x \in X} f(x)

where f is convex and (at least) once differentiable. The algorithm requires the gradient function to be known.

The updating rule for CG is described below. Let x^{(i)} denote the function input values at stage i of the algorithm.

  1. Compute the descent direction using:
d^{(i)} = - [\nabla_x f(x^{(i)})] + \beta^{(i)} d^{(i-1)}
  1. Determine if \beta^{(i)} should be reset (to zero), which occurs when
\dfrac{| [\nabla f(x^{(i)})]^\top [\nabla f(x^{(i-1)})] |}{ [\nabla f(x^{(i)})]^\top [\nabla f(x^{(i)})] } > \nu

where \nu is set via cg_settings.restart_threshold.

  1. Compute the optimal step size using line search:
\alpha^{(i)} = \arg \min_{\alpha} f(x^{(i)} + \alpha \times d^{(i)})
  1. Update the candidate solution vector using:
x^{(i+1)} = x^{(i)} + \alpha^{(i)} \times d^{(i)}

The algorithm stops when at least one of the following conditions are met:

  1. the norm of the gradient vector, \| \nabla f \|, is less than grad_err_tol;
  2. the relative change between x^{(i+1)} and x^{(i)} is less than rel_sol_change_tol;
  3. the total number of iterations exceeds iter_max.

  • cg_settings.method = 1 Fletcher–Reeves (FR):

    \beta_{\text{FR}} = \dfrac{ [\nabla_x f(x^{(i)})]^\top [\nabla_x f(x^{(i)})] }{ [\nabla_x f(x^{(i-1)})]^\top [\nabla_x f(x^{(i-1)})] }
    
  • cg_settings.method = 2 Polak-Ribiere (PR):

    \beta_{\text{PR}} = \dfrac{ [\nabla_x f(x^{(i)})]^\top [\nabla_x f(x^{(i)})] }{ [\nabla_x f(x^{(i-1)})]^\top [\nabla_x f(x^{(i-1)})] }
    
  • cg_settings.method = 3 FR-PR Hybrid:

    \beta = \begin{cases}
        - \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} < - \beta_{\text{FR}} \\
        \beta_{\text{PR}} & \text{ if } |\beta_{\text{PR}}| \leq \beta_{\text{FR}} \\
        \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} > \beta_{\text{FR}} \end{cases}
    
  • cg_settings.method = 4 Hestenes-Stiefel:

    \beta_{\text{HS}} = \dfrac{[\nabla_x f(x^{(i)})] \cdot ([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})])}{([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})]) \cdot d^{(i)}}
    
  • cg_settings.method = 5 Dai-Yuan:

    \beta_{\text{DY}} = \dfrac{[\nabla_x f(x^{(i)})] \cdot [\nabla_x f(x^{(i)})]}{([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})]) \cdot d^{(i)}}
    
  • cg_settings.method = 6 Hager-Zhang:

    \begin{aligned}
    \beta_{\text{HZ}} &= \left( y^{(i)} - 2 \times \dfrac{[y^{(i)}] \cdot y^{(i)}}{y^{(i)} \cdot d^{(i)}} \times d^{(i)} \right) \cdot \dfrac{[\nabla_x f(x^{(i)})]}{y^{(i)} \cdot d^{(i)}} \\
    y^{(i)} &:= [\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})]
    \end{aligned}
    

Finally, we set:

\beta^{(i)} = \max \{ 0, \beta_{*} \}

where \beta_{*} is the update method chosen.


.. doxygenfunction:: cg(ColVec_t& init_out_vals, std::function<fp_t (const ColVec_t& vals_inp, ColVec_t* grad_out, void* opt_data)> opt_objfn, void* opt_data)
   :project: optimlib

.. doxygenfunction:: cg(ColVec_t& init_out_vals, std::function<fp_t (const ColVec_t& vals_inp, ColVec_t* grad_out, void* opt_data)> opt_objfn, void* opt_data, algo_settings_t& settings)
   :project: optimlib


The basic control parameters are:

  • fp_t grad_err_tol: the error tolerance value controlling how small the L_2 norm of the gradient vector \| \nabla f \| should be before 'convergence' is declared.

  • fp_t rel_sol_change_tol: the error tolerance value controlling how small the proportional change in the solution vector should be before 'convergence' is declared.

    The relative change is computed using:

    \left\| \dfrac{x^{(i)} - x^{(i-1)}}{ |x^{(i-1)}| + \epsilon } \right\|_1
    

    where \epsilon is a small number added for numerical stability.

  • size_t iter_max: the maximum number of iterations/updates before the algorithm exits.

  • bool vals_bound: whether the search space of the algorithm is bounded. If true, then

    • ColVec_t lower_bounds: defines the lower bounds of the search space.
    • ColVec_t upper_bounds: defines the upper bounds of the search space.

Additional settings:

  • int cg_settings.method: Update method.
    • Default value: 2.
  • fp_t cg_settings.restart_threshold: parameter \nu from step 2 in the algorithm description.
    • Default value: 0.1.
  • bool use_rel_sol_change_crit: whether to enable the rel_sol_change_tol stopping criterion.
    • Default value: false.
  • fp_t cg_settings.wolfe_cons_1: Line search tuning parameter that controls the tolerance on the Armijo sufficient decrease condition.
    • Default value: 1E-03.
  • fp_t cg_settings.wolfe_cons_2: Line search tuning parameter that controls the tolerance on the curvature condition.
    • Default value: 0.10.
  • int print_level: Set the level of detail for printing updates on optimization progress.
    • Level 0: Nothing (default).
    • Level 1: Print the iteration count and current error values.
    • Level 2: Level 1 plus the current candidate solution values, x^{(i+1)}.
    • Level 3: Level 2 plus the direction vector, d^{(i)}, and the gradient vector, \nabla_x f(x^{(i+1)}).
    • Level 4: Level 3 plus \beta^{(i)}.

Code to run this example is given below.

.. toggle-header::
    :header: **Armadillo (Click to show/hide)**

    .. code:: cpp

        #define OPTIM_ENABLE_ARMA_WRAPPERS
        #include "optim.hpp"

        inline
        double
        sphere_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
        {
            double obj_val = arma::dot(vals_inp,vals_inp);

            if (grad_out) {
                *grad_out = 2.0*vals_inp;
            }

            return obj_val;
        }

        int main()
        {
            const int test_dim = 5;

            arma::vec x = arma::ones(test_dim,1); // initial values (1,1,...,1)

            bool success = optim::cg(x, sphere_fn, nullptr);

            if (success) {
                std::cout << "cg: sphere test completed successfully." << "\n";
            } else {
                std::cout << "cg: sphere test completed unsuccessfully." << "\n";
            }

            arma::cout << "cg: solution to sphere test:\n" << x << arma::endl;

            return 0;
        }

.. toggle-header::
    :header: **Eigen (Click to show/hide)**

    .. code:: cpp

        #define OPTIM_ENABLE_EIGEN_WRAPPERS
        #include "optim.hpp"

        inline
        double
        sphere_fn(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* opt_data)
        {
            double obj_val = vals_inp.dot(vals_inp);

            if (grad_out) {
                *grad_out = 2.0*vals_inp;
            }

            return obj_val;
        }

        int main()
        {
            const int test_dim = 5;

            Eigen::VectorXd x = Eigen::VectorXd::Ones(test_dim); // initial values (1,1,...,1)

            bool success = optim::cg(x, sphere_fn, nullptr);

            if (success) {
                std::cout << "cg: sphere test completed successfully." << "\n";
            } else {
                std::cout << "cg: sphere test completed unsuccessfully." << "\n";
            }

            std::cout << "cg: solution to sphere test:\n" << x << std::endl;

            return 0;
        }


Code to run this example is given below.

.. toggle-header::
    :header: **Armadillo Code (Click to show/hide)**

    .. code:: cpp

        #define OPTIM_ENABLE_ARMA_WRAPPERS
        #include "optim.hpp"

        inline
        double
        booth_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
        {
            double x_1 = vals_inp(0);
            double x_2 = vals_inp(1);

            double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);

            if (grad_out) {
                (*grad_out)(0) = 10*x_1 + 8*x_2   2*(- 7.0) + 4*(x_2 - 5.0);
                (*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
            }

            return obj_val;
        }

        int main()
        {
            arma::vec x_2 = arma::zeros(2,1); // initial values (0,0)

            bool success_2 = optim::cg(x, booth_fn, nullptr);

            if (success_2) {
                std::cout << "cg: Booth test completed successfully." << "\n";
            } else {
                std::cout << "cg: Booth test completed unsuccessfully." << "\n";
            }

            arma::cout << "cg: solution to Booth test:\n" << x_2 << arma::endl;

            return 0;
        }

.. toggle-header::
    :header: **Eigen Code (Click to show/hide)**

    .. code:: cpp

        #define OPTIM_ENABLE_EIGEN_WRAPPERS
        #include "optim.hpp"

        inline
        double
        booth_fn(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* opt_data)
        {
            double x_1 = vals_inp(0);
            double x_2 = vals_inp(1);

            double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);

            if (grad_out) {
                (*grad_out)(0) = 2*(x_1 + 2*x_2 - 7.0) + 2*(2*x_1 + x_2 - 5.0)*2;
                (*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
            }

            return obj_val;
        }

        int main()
        {
            Eigen::VectorXd x = Eigen::VectorXd::Zero(test_dim); // initial values (0,0)

            bool success_2 = optim::cg(x, booth_fn, nullptr);

            if (success_2) {
                std::cout << "cg: Booth test completed successfully." << "\n";
            } else {
                std::cout << "cg: Booth test completed unsuccessfully." << "\n";
            }

            std::cout << "cg: solution to Booth test:\n" << x_2 << std::endl;

            return 0;
        }