# Optimization with scipy.optimize

When we want to optimize something, we do not ofcourse need to start everything from scratch. It is good to know how algorithms work, but if the development of new algorithms is not the main point, then one can just use packages and libraries that have been premade.

In Python, there are multiple packages for optimization. At this lecture, we are goint to take a look at *scipy.optimize* package.



## Starting up

When we want to study a package in Python, we can import it..

In [1]:
from scipy.optimize import minimize

If we want to see the documentation, we can write the name of the package and two question marks and hit enter:

In [5]:
minimize??

## Optimization of multiple variables

Let us define again our friendly objective function:

In [4]:
def f_simple(x):
    return (x[0] - 10.0)**2 + (x[1] + 5.0)**2+x[0]**2

### Method: `Nelder-Mead'

The documentation has the following to say:

<pre>
    Method :ref:`Nelder-Mead <optimize.minimize-neldermead>` uses the
    Simplex algorithm [1]_, [2]_. This algorithm has been successful
    in many applications but other algorithms using the first and/or
    second derivatives information might be preferred for their better
    performances and robustness in general.
...
     References
    ----------
    .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function
        Minimization. The Computer Journal 7: 308-13.
    .. [2] Wright M H. 1996. Direct search methods: Once scorned, now
        respectable, in Numerical Analysis 1995: Proceedings of the 1995
        Dundee Biennial Conference in Numerical Analysis (Eds. D F
        Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.
        191-208.
</pre>

In [9]:
res = minimize(f_simple,[0,0],method='Nelder-Mead', 
         options={'disp': True})
print res.x

Optimization terminated successfully.
         Current function value: 50.000000
         Iterations: 99
         Function evaluations: 189
[ 5.00003542 -4.99997315]


In [13]:
print type(res)
print res
print res.message

<class 'scipy.optimize.optimize.Result'>
  status: 0
    nfev: 189
 success: True
     fun: 50.000000003230383
       x: array([ 5.00003542, -4.99997315])
 message: 'Optimization terminated successfully.'
     nit: 99
Optimization terminated successfully.


### Method: `CG`

The documentation has the following to say:
<pre>
    Method :ref:`CG <optimize.minimize-cg>` uses a nonlinear conjugate
    gradient algorithm by Polak and Ribiere, a variant of the
    Fletcher-Reeves method described in [5]_ pp.  120-122. Only the
    first derivatives are used.
...
   References
    ----------
...
    .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.
       Springer New York.
</pre>
The Conjugate gradient method needs the gradient. The documentation has the following to say
<pre>
    jac : bool or callable, optional
        Jacobian (gradient) of objective function. Only for CG, BFGS,
        Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg.
        If `jac` is a Boolean and is True, `fun` is assumed to return the
        gradient along with the objective function. If False, the
        gradient will be estimated numerically.
        `jac` can also be a callable returning the gradient of the
        objective. In this case, it must accept the same arguments as `fun`.
</pre>

### Estimating the gradient numerically:

In [14]:
import numpy as np
res = minimize(f_simple, [0,0], method='CG', #Conjugate gradient method
               options={'disp': True})
print res.x

Optimization terminated successfully.
         Current function value: 50.000000
         Iterations: 2
         Function evaluations: 20
         Gradient evaluations: 5
[ 5.00000003 -4.99999999]


### Giving the gradient with ad

In [15]:
import ad
res = minimize(f_simple, [0,0], method='CG', #Conjugate gradient method
               options={'disp': True}, jac=ad.gh(f_simple)[0])
print res.x

Optimization terminated successfully.
         Current function value: 50.000000
         Iterations: 2
         Function evaluations: 5
         Gradient evaluations: 5
[ 5. -5.]


### Method: `Newton-CG` 

Newton-CG method uses a Newton-CG algorithm [5] pp. 168 (also known as the truncated Newton method). It uses a CG method to the compute the search direction. See also *TNC* method for a box-constrained minimization with a similar algorithm.

   References
    ----------
    .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.
       Springer New York.


The Newton-CG algorithm needs the Jacobian and the Hessian. The documentation has the following to say:
<pre>
    hess, hessp : callable, optional
        Hessian (matrix of second-order derivatives) of objective function or
        Hessian of objective function times an arbitrary vector p.  Only for
        Newton-CG, dogleg, trust-ncg.
        Only one of `hessp` or `hess` needs to be given.  If `hess` is
        provided, then `hessp` will be ignored.  If neither `hess` nor
        `hessp` is provided, then the Hessian product will be approximated
        using finite differences on `jac`. `hessp` must compute the Hessian
        times an arbitrary vector.
 </pre>

### Trying without reading the documentation

In [16]:
import numpy as np
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(f_simple, [0,0], method='Newton-CG', #Newton-CG method
               options={'disp': True})
print res.x

ValueError: Jacobian is required for Newton-CG method

### Giving the gradient

In [17]:
import ad
res = minimize(f_simple, [0,0], method='Newton-CG', #Newton-CG method
               options={'disp': True},jac=ad.gh(f_simple)[0])
print res.x

Optimization terminated successfully.
         Current function value: 50.000000
         Iterations: 7
         Function evaluations: 8
         Gradient evaluations: 23
         Hessian evaluations: 0
[ 4.99999999 -4.99999999]


### Giving also the hessian

In [18]:
import ad
res = minimize(f_simple, [0,0], method='Newton-CG', #Newton-CG method
               options={'disp': True},jac=ad.gh(f_simple)[0],
               hess=ad.gh(f_simple)[1])
print res.x

Optimization terminated successfully.
         Current function value: 50.000000
         Iterations: 7
         Function evaluations: 8
         Gradient evaluations: 7
         Hessian evaluations: 7
[ 5. -5.]


## Line search

In [19]:
def f_singlevar(x):
    return 2+(1-x)**2

In [27]:
from scipy.optimize import minimize_scalar
minimize_scalar??

### Method: `Golden`

The documentation has the following to say:
<pre>
    Method :ref:`Golden <optimize.minimize_scalar-golden>` uses the
    golden section search technique. It uses analog of the bisection
    method to decrease the bracketed interval. It is usually
    preferable to use the *Brent* method.
</pre>

In [34]:
minimize_scalar(f_singlevar,method='golden',tol=0.00001)

  fun: 2.0
    x: 1.0
 nfev: 30

### Method: `Brent`

The documentation has the following to say about the Brent method:

    Method *Brent* uses Brent's algorithm to find a local minimum.
    The algorithm uses inverse parabolic interpolation when possible to
    speed up convergence of the golden section method.

In [25]:
minimize_scalar(f_singlevar,method='brent')

  fun: 2.0
 nfev: 5
  nit: 4
    x: 0.99999998519