**NOTE: This notebook is written for the Google Colab platform. However it can also be run (possibly with minor modifications) as a standard Jupyter notebook.** 



In [None]:
#@title -- Import of Necessary Packages -- { display-mode: "form" }
import numpy as np
import matplotlib.pyplot as plt
import sympy as sp
from sympy.utilities.lambdify import lambdify
from scipy.optimize import minimize

In [None]:
#@title -- Downloading Data -- { display-mode: "form" }
# also create a directory for storing any outputs
import os
os.makedirs("output", exist_ok=True)

In [None]:
#@title -- Auxiliary Functions -- { display-mode: "form" }
def plot_func(xx, yy, zz, X=None):
    plt.contour(xx, yy, zz, cmap='Spectral')
    # both axes at the same scale + create a legend
    plt.gca().set_aspect('equal')
    plt.xlabel('x'); plt.ylabel('y')
    plt.colorbar(label='z')
    
    if not X is None:
        plt.scatter(X[:, 0], X[:, 1])

## Optimization Using `scipy`

In our next example we are going to show how optimization can be applied using the `scipy` package. This package implements several advanced method, including second-order methods. These are typically more effective than gradient descent and its various versions, which we have considered up till now. Their disadvantage, however, is the lack of scalability: they typically cannot be applied to problems with a large number of parameters (and there is a similar scaling problem with dataset size in the context of machine learning).

### Defining the Objective Function

As the first step we will again define the objective function and derive its gradient.



In [None]:
symx, symy = sp.symbols('x y')
symf = (5*symx)**2 + symy ** 2
f = lambdify((symx, symy), symf, "numpy")

sym_grad_f = sp.Matrix([symf]).jacobian([symx, symy])
grad_f = lambdify((symx, symy), sym_grad_f, "numpy")

As usual, we will also display the visualization.



In [None]:
xx, yy = np.mgrid[-10:10.2:0.2, -10:10.2:0.2]
zz = f(xx, yy)
plot_func(xx, yy, zz)

### Minimization Using `scipy`

Next we are going to apply the `minimize` method. We will specify the following arguments:

* The objective function `fun` that is to be minimized. The function is expected to accept a vector as its input, which is why we will wrap our function in a lambda function, which will unwrap the input vector into the individual arguments $x$ and $y$ using operator *.
* The initial point `x0` from which the optimization starts.
* The method: we can pick one of a range of different solvers.
* Gradient: here denoted `jac`, because it is also possible to specify a full Jacobian (for vector functions).


In [None]:
res = minimize(fun=lambda X: f(*X),
               x0=[-9, -8],
               method='L-BFGS-B',
               jac=lambda X: grad_f(*X))

The function will return an object that contains the resulting point as well as the value of the objective function at that point:



In [None]:
print("The point: {}".format(res.x))
print("The value: {}".format(res.fun))

More detailed documentation of the function can be displayed using:



In [None]:
print(minimize.__doc__)

### Visualizing the Minimization

If we intend (like in the previous examples) to visualize the minimization itself and not just the result, we can also use the `callback` argument, which will add each new point into list `X`.



In [None]:
X = [[-9, -8]]

res = minimize(fun=lambda X: f(*X),
               x0=X[0],
               method='L-BFGS-B',
               jac=lambda X: grad_f(*X),
               callback=X.append)

X = np.array(X)

The resulting visualization will then look as follows:



In [None]:
xx, yy = np.mgrid[-10:10.2:0.2, -10:10.2:0.2]
zz = f(xx, yy)
plot_func(xx, yy, zz, X)

### Not Specifying the Gradient

It is possible to invoke function `minimize` without specifying the gradient (`jac`). For one thing, some solvers do not use the gradient. But even for the solvers that do, the gradient can be estimated numerically (by perturbing the input variable). Gradient can only be effectively numerically estimated if the input is low-dimensional – otherwise it becomes too computationally expensive.



In [None]:
X = [[-9, -8]]

res = minimize(fun=lambda X: f(*X),
               x0=X[0],
               method='L-BFGS-B',
               callback=X.append)

In [None]:
print("The result: {}".format(res.x))
print("The function's value: {}".format(res.fun))

You can compare the result of the minimization with that computed before. It is possible that it will be a bit less precise because the function does not have the real gradient a its disposal now.

