# Nonlinear Regression

How do we estimate the parameters for a nonlinear regression problem? For example, suppose we have some data that we believe are represented by a power law with two parameters, e.g.

\begin{equation}
\hat{y}(t) = \hat{\beta_1} t^{\hat{\beta_2}}
\end{equation}

This model is not linear in the parameters since the derivative of $\hat{y}$ with respect to $\hat{\beta_1}$ depends on $\hat{\beta_2}$. Morover, the derivative with respect of $\hat{\beta_2}$ depends on both $\hat{\beta_1}$ and $\hat{\beta_2}$. 

\begin{align}
\frac{\partial \hat{y}}{\partial \hat{\beta_1}} = & t^{\hat{\beta_2}} \\
\frac{\partial \hat{y}}{\partial \hat{\beta_2}} = & \hat{\beta_1} t^{\hat{\beta_2}} \log(t) \\
\end{align}

For example, suppose we measure a time series of ocean currents. We believe that the power spectrum of the data should be red with frequency dependence of $\omega^{-1}$.

In [None]:
import colorednoise as cn
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as sig

We can generate synthetic data with a -1 spectral slope using the `colorednoise` package.

In [None]:
n_data = 2**10
t, dt = np.linspace(0, 100, n_data, retstep=True)
u = cn.powerlaw_psd_gaussian(1, n_data, random_state=1351)

fig, ax = plt.subplots()
ax.plot(t, u)
ax.set_xlabel("t")
ax.set_ylabel("u(t)")

The power spectrum may have a -1 slope. It is difficult to see with a linear axis.

In [None]:
f, Pu = sig.welch(u, fs=1 / dt, window="hann", nperseg=256, noverlap=128)

fig, ax = plt.subplots()
ax.plot(f, Pu)
ax.set_xlabel("f")
ax.set_ylabel("$P_u(f)$")

Transforming to a log-log axis makes the power law much more obvious. 

In [None]:
fig, ax = plt.subplots()
ax.loglog(f, Pu)
ax.set_xlabel("f")
ax.set_ylabel("$P_u(f)$")

f_guide = np.array([1e0, 2e0])
ax.loglog(f_guide, f_guide ** (-1), "k--")
ax.annotate("$f^{-1}$", xy=(f_guide[0], f_guide[0] ** (-1)))

Plotting on logarithmic axes is equivalet to transforming the model by taking the logarithm. 

\begin{equation}
\log(\hat{y}(t)) = \log(\hat{\beta_1}) + \hat{\beta_2} \log(t)
\end{equation}

In this case, taking the logarithm transforms the problem into a linear regression problem. We could rewrite the equation above as

\begin{equation}
z = b + m x 
\end{equation}

where $z = \log(\hat{y}(t))$, $b = \log(\hat{\beta_1})$, $m=\hat{\beta_2}$ and $x = \log(t)$.  It is now the univariate 2-parameter linear model. The parameters can be estimated using the linear regression techniques seen before.

One key assumption in this transformation is that the errors are multiplicative, not additive. The linear regression model we saw before had additive errors, e.g. $y = \hat{y} + \epsilon$, thus allowing us to minimize $\langle \epsilon^2 \rangle = \langle (y - \hat{y})^2 \rangle$. Taking the logarithm of the addative error model impedes our ability minimize the error. How can we rearrange $\log(\hat{y} + \epsilon)$ to minimize $\epsilon$? If the errors are multiplicative, meaning $y = \epsilon \hat{y}$, then taking the logarithm produces an additive error term that can be minimized.  

Another issue when applying linear regression to the logarithm of data is the asymmetric weighting of data in the fit. Small values are given more weight. In other words, you may not fit large values well. A model difference of 1 in $\log_{10}$ space is equivalent to a multiplicative difference of $10 \times$ in linear space. Thus, a misfit of 1 in $\log_{10}$ at a y data point that has a value of 100 could mean a difference of $10 \times 100 - 100 = 900$ or $100/10 - 100 = -90$. The same $\log_{10}$ difference of 1 on data with value 1 could mean a difference of +9 or -0.9. Much smaller! Be wary. 

What about a nonlinear model that cannot be reduced to a linear problem? E.g. $\hat{y}(t) = \hat{\beta_1} \cos(\hat{\beta_2} t)$. Or what if we don't want to reduce our power law to a linear regression problem because we are not sure that the assumptions hold? Then we need to use nonlinear regression. 

Minimizing the mean square error $\langle \epsilon^2 \rangle$ is still the main goal of nonlinear regression. However, now we must take a new approach. We are not able to derive an analytical solution, so we rely on a numerical solution. 

## Nonlinear Least Squares

Several methods are available to minimize a nonlinear least squares problem. We will consider the Gauss-Newton algorithm. 