In [None]:
from IPython.core.display import HTML
HTML("<style>.container { width:95% !important; }</style>")

# Lecture 5: scipy.optimize

# Optimization software
* When we want to optimize something, we do not of course need to start everything from scratch. It is good to know how algorithms work, but if the development of new algorithms is not the main point, then one can just use packages and libraries that have been premade. 
* First, we have a look at some of the available software and, then, we have a closer look at scipy.optimize



## Have you used any optimization software before? Please share your experiences.

## Wolfram Alpha
* Free web version of Mathematica
* http://www.wolframalpha.com/
* Can perform either symbolic or numerical calculations
* Includes also some basic optimization

## Rosenbrock function
A non-convex function
$$
f(x) = (1-x_1)^2 +100(x_2-x_1^2)^2
$$
that has a global minimum in $x^*=(1,1)^T$ where $f(x^*)=0$. The minimum is located in a narrow, banana-shaped valley.

The coefficient of the second term can be adjusted but it does not affect the position of the global minimum. The Rosenbrock function is used to test optimization algorithms.


In [None]:
def f_rosenb(x):
    return (1.0 -x[0])**2 + 100*(x[1] - x[0]**2)**2

## Matlab - Optimization toolbox
* Interactive environment for numerical computing
* Subroutines for unconstrained optimization:
  * fminbnd: find minimum of single-variable function on fixed interval
  * fminsearch: find minimum of unconstrained multivariable function using derivative-free method
  * fminunc: find minimum of unconstrained multivariable function using gradient-based method
* Matlab codes for the subroutines can be found in the directory where Matlab is installed:
 ..\MATLAB\R2013a\toolbox\optim\optim\
* You can also use Octave (https://www.gnu.org/software/octave/) which is an open source software having compatibility with many Matlab scripts


# Optimization with scipy.optimize
In Python, there are multiple packages for optimization. At this lecture, we are goint to take a look at *scipy.optimize* package.

## Starting up

When we want to study a package in Python, we can import it..

In [None]:
from scipy.optimize import minimize

If we want to see the documentation, we can write the name of the package and two question marks and hit enter:

In [None]:
minimize??

## Optimization of multiple variables

Let us define again our friendly objective function:

In [None]:
def f_simple(x):
    return (x[0] - 10.0)**2 + (x[1] + 5.0)**2+x[0]**2

### Method: `Nelder-Mead'

![alt text](images/nelder_mead.png "Nelder-Mead")
The documentation has the following to say:

<pre>
    Method :ref:`Nelder-Mead <optimize.minimize-neldermead>` uses the
    Simplex algorithm [1]_, [2]_. This algorithm has been successful
    in many applications but other algorithms using the first and/or
    second derivatives information might be preferred for their better
    performances and robustness in general.
...
     References
    ----------
    .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function
        Minimization. The Computer Journal 7: 308-13.
    .. [2] Wright M H. 1996. Direct search methods: Once scorned, now
        respectable, in Numerical Analysis 1995: Proceedings of the 1995
        Dundee Biennial Conference in Numerical Analysis (Eds. D F
        Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.
        191-208.
</pre>

In [None]:
res = minimize(f_simple,[0,0],method='Nelder-Mead', 
         options={'disp': True})
print(res.x)
res = minimize(f_simple,[0,0],method='Powell', 
         options={'disp': True})
print(res.x)

In [None]:
print(type(res))
print(res)
print(res.message)

In [None]:
res = minimize(f_rosenb,[-2.0,-10],method='Nelder-Mead', 
         options={'disp': True})

In [None]:
print(res)

### Method: `CG`
* Idea is to improve convergence properties of steepest descent
* A search direction is a combination of the current search direction and a previous search direction


The documentation has the following to say:
<pre>
    Method :ref:`CG <optimize.minimize-cg>` uses a nonlinear conjugate
    gradient algorithm by Polak and Ribiere, a variant of the
    Fletcher-Reeves method described in [5]_ pp.  120-122. Only the
    first derivatives are used.
...
   References
    ----------
...
    .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.
       Springer New York.
</pre>
The Conjugate gradient method needs the gradient. The documentation has the following to say
<pre>
    jac : bool or callable, optional
        Jacobian (gradient) of objective function. Only for CG, BFGS,
        Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg.
        If `jac` is a Boolean and is True, `fun` is assumed to return the
        gradient along with the objective function. If False, the
        gradient will be estimated numerically.
        `jac` can also be a callable returning the gradient of the
        objective. In this case, it must accept the same arguments as `fun`.
</pre>

### Estimating the gradient numerically:

In [None]:
res = minimize(f_simple, [0,0], method='CG', #Conjugate gradient method
               options={'disp': True})
print(res.x)

### Giving the gradient with ad

In [None]:
import ad
res = minimize(f_simple, [0,0], method='CG', #Conjugate gradient method
               options={'disp': True}, jac=ad.gh(f_simple)[0])
print(res.x)

### Method: `Newton-CG` 

Newton-CG method uses a Newton-CG algorithm [5] pp. 168 (also known as the truncated Newton method). It uses a CG method to the compute the search direction. See also *TNC* method for a box-constrained minimization with a similar algorithm.

   References
    ----------
    .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.
       Springer New York.


The Newton-CG algorithm needs the Jacobian and the Hessian. The documentation has the following to say:
<pre>
    hess, hessp : callable, optional
        Hessian (matrix of second-order derivatives) of objective function or
        Hessian of objective function times an arbitrary vector p.  Only for
        Newton-CG, dogleg, trust-ncg.
        Only one of `hessp` or `hess` needs to be given.  If `hess` is
        provided, then `hessp` will be ignored.  If neither `hess` nor
        `hessp` is provided, then the Hessian product will be approximated
        using finite differences on `jac`. `hessp` must compute the Hessian
        times an arbitrary vector.
 </pre>

### Trying without reading the documentation

In [None]:
import numpy as np
res = minimize(f_simple, [0,0], method='Newton-CG', #Newton-CG method
               options={'disp': True})
print(res.x)

### Giving the gradient

In [None]:
import ad
res = minimize(f_simple, [0,0], method='Newton-CG', #Newton-CG method
               options={'disp': True},jac=ad.gh(f_simple)[0])
print(res.x)

### Giving also the hessian

In [None]:
import ad
res = minimize(f_simple, [0,0], method='Newton-CG', #Newton-CG method
               options={'disp': True},jac=ad.gh(f_simple)[0],
               hess=ad.gh(f_simple)[1])
print(res.x)

## Another example
$$
\begin{align}
\min \quad & (x_1-2)^4+(x_1 - 2x_2)^2\\
\text{s.t.}\quad &x_1,x_2\in\mathbb R
\end{align}  
$$
Optimal solution clearly is $x^*=(2,1)^T$

In [None]:
def f_simple2(x):
    return (x[0] - 2.0)**4 + (x[0] - 2.0*x[1])**2

In [None]:
import numpy as np
x0 = np.array([-96,-2000])
res = minimize(f_simple2,x0,method='Nelder-Mead', 
         options={'disp': True})
print(res.x)
res = minimize(f_simple2,x0,method='Powell', 
         options={'disp': True})
print(res.x)
res = minimize(f_simple2, x0, method='CG', #Conjugate gradient method
               options={'disp': True}, jac=ad.gh(f_simple2)[0])
print(res.x)
res = minimize(f_simple2, x0, method='Newton-CG', #Newton-CG method
               options={'disp': True},jac=ad.gh(f_simple2)[0],
               hess=ad.gh(f_simple2)[1])
print(res.x)

In [None]:
# Add here optimization of the Rosenbrock function with gradient based algorithms

## Line search

In [None]:
def f_singlevar(x):
    return 2+(1-x)**2

In [None]:
from scipy.optimize import minimize_scalar
minimize_scalar??

### Method: `Golden`

The documentation has the following to say:
<pre>
    Method :ref:`Golden <optimize.minimize_scalar-golden>` uses the
    golden section search technique. It uses analog of the bisection
    method to decrease the bracketed interval. It is usually
    preferable to use the *Brent* method.
</pre>

In [None]:
minimize_scalar(f_singlevar,method='golden',tol=0.00001)

### Method: `Brent`

The documentation has the following to say about the Brent method:

    Method *Brent* uses Brent's algorithm to find a local minimum.
    The algorithm uses inverse parabolic interpolation when possible to
    speed up convergence of the golden section method.

In [None]:
minimize_scalar(f_singlevar,method='brent')