Machine Learning - Curve Fitting

For fitting *$y = A + B \log x$*, just fit $y$ against (log x).

In [6]:
import numpy as np

x = np.array([1, 7, 20, 50, 79])
y = np.array([10, 19, 30, 35, 51])

np.polyfit(np.log(x), y, 1)

#np.array([ 8.46295607,  6.61867463])
# y ≈ 8.46 log(x) + 6.62

array([8.46295607, 6.61867463])

For fitting $y=A e^{B x}$, take the logarithm of both side gives $\log y=\log A+B x$. So fit $(\log y)$ against $x$
Note that fitting (log $y$ ) as if it is linear will emphasize small values of $y$, causing large deviation for large $y$. This is because polyfit (linear regression) works by minimizing $\sum_{i}(\Delta Y)^{2}=\sum_{i}\left(Y_{i}-\hat{Y}_{i}\right)^{2}$. When $Y_{i}=\log y_{i}$, the residues 

$\Delta Y_{i}=\Delta\left(\log y_{i}\right) \approx \Delta y_{i} /\mid y_{i} \mid.$. 

So even if polyfit makes a very bad
decision for large $y$, the "divide-by- $|y|$ " factor will compensate for it, causing polyfit favors small values.
This could be alleviated by giving each entry a "weight" proportional to $y$. polyfit supports weighted-least-squares via the $w$ keyword argument.

In [14]:
x = np.array([10, 19, 30, 35, 51])
y = np.array([1, 7, 20, 50, 79])

np.polyfit(x, np.log(y), 1)

#    y ≈ exp(-0.401) * exp(0.105 * x) = 0.670 * exp(0.105 * x)
# (^ biased towards small values)

np.polyfit(x, np.log(y), 1, w=np.sqrt(y))

#    y ≈ exp(1.42) * exp(0.0601 * x) = 4.12 * exp(0.0601 * x)
# (^ not so biased)

array([0.06009446, 1.41648096])

Now, if you can use scipy, you could use scipy.optimize.curve_fit to fit any model
without transformations.

For y = A + B log x the result is the same as the transformation method:

In [19]:
from scipy.optimize import curve_fit


x = np.array([1, 7, 20, 50, 79])
y = np.array([10, 19, 30, 35, 51])

curve_fit(lambda t,a,b: a+b*np.log(t),  x,  y)

# y ≈ 6.62 + 8.46 log(x)

(array([6.61867467, 8.46295606]),
 array([[28.15948002, -7.89609542],
        [-7.89609542,  2.9857172 ]]))