In [None]:
import numpy as np
np.random.seed(123)
import matplotlib.pyplot as plt

# Curve Fitting

Let's examine how we can use data stored in numpy arrays to perform curve fitting.

## Polynomial Functions in NumPy

`numpy` provides a function `np.poly1d` to allow you to easily construct and evaluate polynomial functions of arbitrary order. Note that you can construct polynomials by passing `np.poly1d` an array of coefficients or roots.

## $y(x) = c_0 + c_1 x + c_2 x^2 + .. + c_n x^n = \sum_{i=1}^n c_i x^i$

In [None]:
help(np.poly1d)

Let's plot a straight line $y(x)=mx+b$ with slope m=0.5 and y-intercept b=0.

In [None]:
x = np.linspace(-5, 5)
y = np.poly1d([0.5, 0.])(x)
plt.scatter(x, y)
plt.plot(x, y)

Let's construct a simple parabola $y(x)=x^2$

In [None]:
x = np.linspace(-5, 5)
y = np.poly1d([1.0, 0.0, 0.0])(x)
plt.scatter(x, y)
plt.plot(x, y)

## Polynomial Function Fitting

In this example, we attempt to fit a n-degree polynomial to the function $y(x) = sin(2x)$, where n is varied between 4-10.

### x ~ inputs (features, independent variables)
### y ~ outputs (labels, dependent variables)

In [None]:
x = np.linspace(0, 2*np.pi, 64)
y = np.sin(2*x)
plt.plot(x, y)
plt.show()
plt.close()

In [None]:
help(np.polyfit)

In [None]:
for deg in [4, 6, 8, 10]:
    z = np.polyfit(x, y, deg=deg)
    y_fit = np.poly1d(z)
    plt.plot(x, y_fit(x))
    plt.plot(x, y)
    plt.legend(['pred', 'target'])
    plt.title("poly deg: {}".format(deg))
    plt.show()
    plt.close()

Always be aware that sometimes models fit with data can perform unexpected when applied outside the training data distribution. If the model you use to fit does not match well with the model generating the data (here, using a polynomial to fit a periodic function), you may see odd results!

In [None]:
x_pred = np.linspace(0, 2.2*np.pi, 128)

for deg in [4, 6, 8, 10]:
    z = np.polyfit(x, y, deg=deg)
    y_fit = np.poly1d(z)
    plt.plot(x_pred, y_fit(x_pred))
    plt.plot(x_pred, np.sin(2*x_pred))
    plt.legend(['pred', 'target'])
    plt.title("poly deg: {}".format(deg))
    plt.show()
    plt.close()

## Fitting Arbitrary Functions

Here, we use the `scipy.optimize.curve_fit` function to fit the same data to a periodic function. Be aware that that the starting guess for parameters has an effect on the outcome!

In [None]:
from scipy.optimize import curve_fit
help(curve_fit)

In [None]:
x = np.linspace(0, 2*np.pi, 64)
y = np.sin(2.0*x)

def func(x, k):
    return np.sin(k*x)

pars, pcov = curve_fit(func, x, y, p0=[1.7])
print(pars, pcov)

In [None]:
plt.plot(x, func(x, *pars))
plt.plot(x, y)
plt.legend(['pred', 'target'])

# Example: Estimating the Rate Constant of a First-Order Reaction

Let's consider a batch-reactor where we are performing a first-order, irreversible reaction with some species "A". In the problem, we'd like to determine the rate constant $k_1$ by measuring the concentration of species A, $C_A$, at fixed time intervals.

For a first order, irreversible reaction, we have from the law of mass action:

### $\frac{dC_A}{dt} = k_1 C_A$

The resulting first order differential equation yields the solution:

$ C_A(t) = C_{A,0} exp(-k_1t)$

where $C_{A,0}$ is the starting concentration of species A

## Generating Data

In lieu of collection actual measurements, we will generate synthetic data using the solution to the ODE and add random noise to represent "measurement error"

In [None]:
k = 0.5 # s-1
Ca_0 = 10.0 # mol
noise = 5e-1 # measurement error
t = np.linspace(0, 10, 20)
Ca = Ca_0 * np.exp(-k*t)
Ca += noise * np.random.randn(*Ca.shape)
plt.scatter(t, Ca)

## Estimating the Rate Constant with Imperfect Data

Let's use `curve_fit` to determine the rate constant $k_1$ given our "measurement" data and functional form of the concentration as a function of time.

In [None]:
def func(t, Ca_0, k):
    return Ca_0 * np.exp(-k*t)

pars, pcov = curve_fit(func, t, Ca, p0=[1., 1.])
print(pars)
plt.scatter(t, Ca)
plt.plot(t, func(t, *pars))