# Implementing conjugate gradient algorithm

Dmytro Fedoriakam 2019.

Here I create testing framework for testing performance of optimization algorithms.

Then I implement Conjugate Gradient algorithm from scratch, as described in [paper](http://users.clas.ufl.edu/hager/papers/CG/cg_compare.pdf) by Hager and Zhang and compare
its performance with perfromance SciPy's [minimize_cg](https://docs.scipy.org/doc/scipy/reference/optimize.minimize-cg.html). 


#### Algorthm comparison

First of all, for each test case I check that both algorithms find correct minimum. Then I compare their running cost.

As proposed in [2] (formula 32), let's define cost of running optimization algorthm as NF + 3 NG, where NF is number of function evaluations, NG is number of gradient evaluations.

For each test case below I ran both SciPy's and Hager-Zhang algorrithm, and evaluated ratio of their costs. If this ratio is greater than 1, it means Hager-Zhang algorithm is more effective. 

#### Parameters

I noticed that on my test cases it's better to take $\eta \approx 0.1$, rather than 0.9, as suggested in [2].

#### References

1. Hager, William W., and Hongchao Zhang.
      "A new conjugate gradient method with guaranteed descent and an efficient line search." 
      SIAM Journal on optimization 16.1 (2005): 170-192.
      [link](https://www.math.lsu.edu/~hozhang/papers/cg_descent.pdf)
      
2. Hager, William W., and Hongchao Zhang. 
     "Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent."
     ACM Transactions on Mathematical Software (TOMS) 32.1 (2006): 113-137.
     [link](http://users.clas.ufl.edu/hager/papers/CG/cg_compare.pdf)

3. Nocedal, Jorge, and Stephen Wright. 
      Numerical optimization. 
      Springer Science & Business Media, 2006.

In [1]:
import importlib
import test_cases, cg_hager_zhang
import numpy as np
import pandas as pd
importlib.reload(test_cases)
importlib.reload(cg_hager_zhang)
from scipy import optimize

def algo_scipy(f, x0, fprime, gtol=1e-4, maxiter=1000):
    return optimize.fmin_cg(f, x0, fprime, disp=False, maxiter=maxiter, gtol=gtol)

def algo_hz(f, x0, fprime, gtol=1e-4, maxiter=1000):
    return cg_hager_zhang.minimize_hz(f, x0, fprime,
                                      gtol=gtol, 
                                      maxiter=maxiter,
                                      sigma=0.09)


def compare_performance(test_case, algo1, algo2):
    results1 = test_case.check_algorithm(algo1)
    assert results1['correct']
    results2 = test_case.check_algorithm(algo2)
    assert results2['correct']
    return {
        'test_case' : test_case.name,
        'func_calls_1' : results1['func_calls'],
        'func_calls_2' : results2['func_calls'],
        'grad_calls_1' : results1['grad_calls'],
        'grad_calls_2' : results2['grad_calls'],
        'cost_ratio' : results1['cost']/results2['cost']
    }
     
results = [compare_performance(test_case, algo_scipy, algo_hz) for test_case in test_cases.TEST_CASES]
results = pd.DataFrame(results, columns=[
        'test_case', 'func_calls_1', 'func_calls_2', 'grad_calls_1', 'grad_calls_2', 'cost_ratio'])
results

 

Unnamed: 0,test_case,func_calls_1,func_calls_2,grad_calls_1,grad_calls_2,cost_ratio
0,Parabola 1D,5,32,5,35,0.145985
1,Paraboloid simplest,4,10,4,12,0.347826
2,Paraboloid 2D v1,22,9,22,11,2.095238
3,Paraboloid 3D v1,22,9,22,11,2.095238
4,Paraboloid 5D v1,49,42,49,53,0.975124
5,Paraboloid 10D v1,214,264,214,321,0.697637
6,Paraboloid of 4th order,76,43,76,52,1.527638
7,Rosenbrock 2D v1,62,75,62,117,0.58216
8,Rosenbrock 2D v1,51,58,51,81,0.677741
9,Rosenbrock 7D,258,505,258,648,0.421396
