## SciPy's optimize package

* The scipy.optimize package provides several commonly used optimization algorithms including
   * Unconstrained and constrained minimization of multivariate scalar functions (minimize) using a variety of algorithms
   * Global (brute-force) optimization routines (e.g., anneal, basinhopping)
   * Least-squares minimization (leastsq) and curve fitting (curve_fit) algorithms
   * Scalar univariate functions minimizers (minimize_scalar) and root finders (newton)
   * Multivariate equation system solvers (root) using a variety of algorithms (e.g., hybrid Powell, Levenberg-Marquardt or large-scale methods such as Newton-Krylov).


In [None]:
import numpy as np
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy import optimize

### Let's get our feet wet with the **curve_fit** function...

* Uses non-linear least squares to fit a function, f, to data.
* Assumes ydata = f(xdata, *params) + eps

<img src="images/xkcd-curve-fitting.png" height="200" />

### ...and try our hand at curve fitting with a parametric function or two!

In [None]:
## Load and show the data we'll fit

pts = np.load('images/xkcd.points.npy')
plt.gca().invert_yaxis()
plt.scatter(*pts.T)

In [None]:
## Preliminary definitions and bookkeeping

from scipy.optimize import curve_fit

# Put the points in order for line plots
sorted_pts = pts[pts[:,0].argsort()]
xdata, ydata = sorted_pts[:,0], sorted_pts[:,1]


# Define a function to compute r^2 for assessing goodness of fit
# NOTE: r^2 range is [0,1] .The closer to 1 the better the fit.
def get_r2(func, popt, xdata, ydata, yhat=None):
    '''
    R^2 measures how much variance is captured by the model.
    '''
    if yhat is None:
        yhat = func(xdata, *popt)
    sse = ((ydata - yhat)**2).sum()
    avg_y = np.mean(ydata)
    sst = ((ydata - avg_y)**2).sum()
    r2 = 1 - sse/sst
    return r2

# Define a convenience function for plotting the fitted curve with the points
def show_curve(func, popt, pcov, xdata, ydata):
    plt.gca().invert_yaxis()
    plt.plot(xdata, func(xdata, *popt), 'g--')
    plt.scatter(xdata, ydata, c='blue')
    plt.show()
    print(f'params={popt},\npcov={pcov},\nerror={np.sqrt(np.diag(pcov))},\nr^2={get_r2(func, popt, xdata, ydata)}')

### Step 1: Define the parametric function to fit

In [None]:
# Let's just do a linear function for starters:
def func_lin(x, m, b):
    # y = mx + b
    # curve_fit will optimize the parameters m (slope) and b (y-intercept)
    return m * x + b

### Step 2: Call curve_fit with the function and data

In [None]:
# get the optimal values for the function's parameters and their estimated covariance
popt, pcov = optimize.curve_fit(func_lin, xdata, ydata)

show_curve(func_lin, popt, pcov, xdata, ydata)

### Exercise: Choose and fit another (parametric) curve from the xkcd cartoon to these points and show the results. How did your curve compare to the cartoon? Is your function a better or worse fit for the data than linear?

For example, to fit a quadratic curve use the following function:
```
   y = ax^2 + bx + c
```

Where the parameters to be fit are a, b, and c.

In [None]:
# Exercise: Fit a parametric curve to the points represented by xdata and ydata

# Step 1: Define a function (with the correct parameters)
def func_2(...):
    # Define the function here
    pass
    
# Step 2: Call curve_fit (with the correct parameters)
popt, pcov = optimize.curve_fit(...)

# Show the curve
show_curve(func_2, popt, pcov, xdata, ydata)

In [None]:
# Solution
# Exercise: Fit a parametric curve to the points represented by xdata and ydata

# Step 1: Define a function (with the correct parameters)
def func_2(x, a, b, c):
    return a * x**2 + b * x + c
    
# Step 2: Call curve_fit (with the correct parameters)
popt, pcov = optimize.curve_fit(func_2, xdata, ydata)

# Show the curve
show_curve(func_2, popt, pcov, xdata, ydata)

In [None]:
# Defining the "no slope" curve illustrates how we're relying on array, not scalar, calculations
def func_no_slope(x, b):
    return np.ones(x.shape) * b
popt,pcov=optimize.curve_fit(func_no_slope, xdata, ydata)
show_curve(func_no_slope, popt, pcov, xdata, ydata)

In [None]:
### And it's that simple. Feel free to define other functions to fit!