In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('default')

# Data Analysis and Curve Fitting
## Lecture 13

This lectures depends on the data file `falling_object.dat` that needs to be in the current directory.

# Best fits with polynomials

### Example 1

Suppose an experiment is run and the following data is generated:

In [None]:
balldata = np.array([[0,1.302], [0.03333,1.411],[0.06667,1.5],[0.1,1.578],
                    [0.1333,1.646],[0.1667,1.703],[0.2,1.745],[0.2333,1.781],
                    [0.2667,1.807],[0.3,1.828],[0.3333,1.818],[0.3667,1.818],
                    [0.4,1.807],[0.4333,1.776],[0.4667,1.734],[0.5,1.682],
                    [0.5333,1.63],[0.567,1.552],[0.6,1.469],[0.6333,1.37],
                    [0.667,1.266],[0.7,1.151],[0.733,1.026],[0.7667,0.875],
                    [0.8,0.719],[0.8333,0.557],[0.867,0.385],[0.9,0.193],
                    [0.9333,0.005]])

We can print the data out as a table. Observe the data set up as a matrix with two columns.

In [None]:
print(balldata)

We can label each column as a separate variable

In [None]:
t = balldata[:,0]
y = balldata[:,1]

Then we can plot the data

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plt.plot (t, y, '.')
plt.xlabel('Time (s)')
plt.ylabel('Position (m)')
plt.title('Ball thrown upward')
plt.show()

In experimental Physics curve fitting is an important statistical tool for analyzing data and quantifying correlations between variables. 
The command `np.polyfit` finds the parameters of a polynomial by doing a best fit, in the least squares sense, of the function on a set of data. 

For example, we can fit a quadratic to the ball data like this:

In [None]:
np.polyfit(t, y, 2)

The third argument is the degree of the polynomial fit; for a quadratic the degree is 2.  Notice the funciton `np.polyfit` returns an array of three numbers

This are the coefficients of a polynomial

$$P(t) = a t^2 + b t + c$$ 

We could write:

In [None]:
a, b, c = np.polyfit(t, y, 2)

Then

In [None]:
y_fit = a*t**2 + b*t + c

and finally compare the best fit curve to the original data

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plt.plot(t, y, '.')
plt.plot(t, y_fit, '-') 
plt.xlabel('Time (s)')
plt.ylabel('Position (m)')
plt.title('Ball thrown upward')
plt.show()

Since it is very common operation to evaluate a polynomial, there is function called `np.polyval` 

In [None]:
p = np.polyfit(t, y, 2)
y_fit = np.polyval(p, t)

which gives exactly the same thing

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plt.plot(t, y, '.')
plt.plot(t, y_fit, '-') 
plt.xlabel('Time (s)')
plt.ylabel('Position (m)')
plt.title('Ball thrown upward')
plt.show()

### Example 2

Here's another example with a linear fit and a set of artificial data

In [None]:
data=np.array([[0,-1],[1,2],[3,7],[4,6],[7,9],[11,14]])

In [None]:
print(data)

In [None]:
x, y = data.T # equivalent to x, y = data[:,0], data[:,1]

Fit the data to one degree polynomial -- a straight line.

In [None]:
p = np.polyfit(x, y, 1)
y_fit = np.polyval(p, x)

Plot the data and best fit line together in the same plot.

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plt.plot(x, y, 'o')
plt.plot(x, y_fit, '-') 
plt.xlabel('x')
plt.ylabel('y')
plt.show()

## Nonlinear Curve Fitting

In the section above, fit parameters to a polynomial.  Under the hood, this is typically done by solving a linear system of equations to find our parameters.  Let's bring up our small data consider our best fit again.

In [None]:
np.polyfit(x, y, 1)

We can also fit our data to function we are not polynomials.  The `curve_fit` from the `scipy.optimize` subpackage is useful here.

In [None]:
from scipy import optimize

To use `curve_fit` we need to first define a function for the model we want to fit.

In [None]:
def linear(x, A, B):
    return A*x + B

optimize.curve_fit(linear, x, y)

What is being returns is the parameters, followed by the covariance matrix. This covariance matrix can be used to estimate confidence intervals for the parameters.

Here's a routine that fits the data `x`, `y` to the function provided and then makes a plot.

In [None]:
def plot_fit(x, y, func):
    
    params, cov = optimize.curve_fit(func, x, y)
    x_fit = np.linspace(min(x), max(x), 100)
    y_fit = func(x_fit, *params)
    
    plt.plot(x, y, 'o')
    plt.plot(x_fit, y_fit, '-') 
    
    plt.xlabel('x')
    plt.ylabel('y')
    
    return params

So, 

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plot_fit(x, y, linear)
plt.show()

Which, of course, is what we had seen before.  

If we wanted, we could do a best fit to a different function:

$$ y = A + B \cos(x) + Cx^2 $$

In [None]:
def func1(x, A, B, C):
    return A + B*np.cos(x) + C*x**2

fig, axes = plt.subplots(figsize=(6,4))
plot_fit(x, y, func1)
plt.show()

This is still a linear fit since the fit is linear in terms of $\cos(x)$ and $x^2$.

For more complicated functions like

$$ y = A \sin(B x) $$

we need to do a non-linear fit because the parameter $B$ appears in a nonlinear way in the fitting function. Thankfully, `curve_fit` handles non-linear fits as well.

In [None]:
def func2(x, A, B):
    return A * np.exp(B*x)

fig, axes = plt.subplots(figsize=(6,4))
plot_fit(x, y, func2)
plt.show()


### Initial guesses

Note that in calling `curve_fit`, you are able to provide initial guesses for the parameters.  In general it is difficult to find the best fit in the nonlinear case.  Rather, routines find the best fit near the initial guess.  Different initial guesses may yield different fit parameters.  We won't go any deeper into the methods of finding parameters for nonlinear fits.


### Best fits vs interpolation

The idea of a best fit of a curve is different than an interpolation of data.  For interpolation, we could use the `interpolate.interp1d` function from scipy.  For comparison,

In [None]:
from scipy import interpolate

fig, axes = plt.subplots(1,2, figsize=(8,4))
x_fit = np.linspace(min(x), max(x), 100)

# linear interpolation vs linear fit  ########
plt.sca(axes[0])
plt.plot(x, y, 'o') # plot the data
func = lambda x, A, B: A*x + B
params, cov = optimize.curve_fit(func, x, y)
plt.plot(x_fit,func(x_fit, *params), label='best fit')
interp = interpolate.interp1d(x, y, kind='linear')
plt.plot(x_fit, interp(x_fit), label='interpolation') 
plt.xlabel('x')
plt.ylabel('y')
plt.title('Linear interpolation vs linear fit')
plt.legend(loc='lower right')

# cubic interpolation vs cubic fit  ########
plt.sca(axes[1])
plt.plot(x, y, 'o') # plot the data
func = lambda x, A, B, C, D: A*x**4 + B*x**2 + C**x* + D
params, cov = optimize.curve_fit(func, x, y)
plt.plot(x_fit,func(x_fit, *params), label='best fit')
interp = interpolate.interp1d(x, y, kind='cubic')
plt.plot(x_fit, interp(x_fit), label='interpolation')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Cubic interpolation vs cubic fit')
plt.legend(loc='lower right')

plt.show()

## Application of Fitting

There is a data file that you should download along with this exercise called `falling_object.dat` that has two columns.  The first column contains time (s) and the second contains height (m).

a) Load the data into a table and plot the height as a function of time.

In [None]:
falling_object = np.loadtxt('falling_object.dat')
print(falling_object)

In [None]:
data_t, data_y = falling_object.T

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plt.plot(data_t, data_y, '.')
plt.xlabel('time (s)')
plt.ylabel('height (m)')
plt.title('Falling Object')
plt.show()

b) Using the centred difference scheme, calculate and plot the velocity (do not interpolate) as a function of time.  You should see the velocity approach a terminal value.

#### Centered scheme
$$\frac{df}{dt}(t_0) \approx \frac{f(t_0+\Delta t)-f(t_0-\Delta t)}{2\Delta t}= \frac{y_{i+1} - y_{i-1}}{t_{i+1} - t_{i-1}}$$

In [None]:
vel_t = data_t[1:-1] #Shorthand to make it equal to data_t but without first or last element
vel_y = np.zeros(len(data_t) - 2)

for i in range(1, len(data_y) - 1):
    vel_y[i-1] = (data_y[i+1] - data_y[i-1]) / (data_t[i+1] - data_t[i-1])

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plt.plot(vel_t, vel_y, '.')
plt.xlabel('time (s)')
plt.ylabel('velocity (m/s)')
plt.title('Falling Object')
plt.show()

c) Calculate and plot the acceleration of the falling object directly from the height data by using the centred scheme for the second derivative.  You should see the acceleration approach zero.  The acceleration graph looks noisy. This is because the original measurements contain some uncertainty and random noise. This noise gets amplified by taking derivatives.

#### Centered scheme for 2nd order derivative
$$\frac{d^2 f}{dt^2}(t_0) \approx \frac{f(t_0+\Delta t)-2f(t_0)+f(t_0-\Delta t)}{(\Delta t)^2}=\frac{y_{i+1} - 2y_i + y_{i+1}}{(t_{i+1} - t_{i})^2} $$


In [None]:
accel_t = data_t[1:-1] #Shorthand to make it equal to data_t but without first or last element
accel_y = np.zeros(len(data_t) - 2)

for i in range(1, len(data_y) - 1):
    accel_y[i-1] = (data_y[i+1] - 2*data_y[i] + data_y[i-1]) / (data_t[i+1]-data_t[i])**2

In [None]:
fig, axes = plt.subplots(figsize=(6,4))
plt.plot(accel_t, accel_y, '.')
plt.xlabel('time (s)')
plt.ylabel('acceleration (m/s$^2$)')
plt.title('Falling Object')
plt.show()

d)  Let's assume that the object experiences a drag force of $F= - b v$.  We will attempt to find $b$.  As usual, we'll begin with Newton's 2nd illustrious Law

\begin{align}
F&=ma \\
ma &= -bv -mg
\end{align}
 
 Aha!

If we plot the quantity $ma$ as a function of $-v$, the graph should be a straight line (with noise) with intercept $-mg$ and having slope equal to $b$.

Make a plot of $m a$ vs $-v$.

(The mass of the object is 0.2 kg, while acceleration due to gravity is 9.81 m/s/s.  These values should jive with our intercept value. )

In [None]:
m = 0.2 # kg
g = 9.81 # m/s^2

In [None]:
fig, axes = plt.subplots(figsize=(8,6))
plt.plot(-vel_y, m*accel_y, 'o')
plt.xlabel('-v (m/s)')
plt.ylabel('m a (kg m/s$^2$)')
plt.title('Falling Object')
plt.show()

e) Make a linear fit for the data in part (d) using `np.polyfit` command

In [None]:
p = np.polyfit(-vel_y, m*accel_y, 1)
print(p)

f) Show both the fit line and the discrete data in single graph.

In [None]:
fig, axes = plt.subplots(figsize=(8,6))

plt.plot(-vel_y, m*accel_y, 'o')
plt.plot(-vel_y, np.polyval(p, -vel_y))

plt.xlabel('-v (m/s)')
plt.ylabel('m a (kg m/s$^2$)')
plt.title('Falling Object')
plt.show()

g) Consider the parameters of the best fit

In [None]:
print(p)

The slope of this line is $b$ (units of kg/s$^2$)

In [None]:
b = p[0]
print(b)

And the intercept is $-mg$ (units of kg m /s$^2$)

In [None]:
print (p[1])

Let's compare the intercept value with what we expect it to be ($-mg$).


In [None]:
print(-m*g)

Golden!

 h) Newton's equation $m a = -b v - m g$ can be solved analytically for the height as a function of time.  

$$
y(t) = C_1 + C_2 e^{-\frac{b}{m}t} - \frac{gm}{b}t
$$

where $C_1$ and $C_2$ are related to the initial height and initial velocity.  Use the values of $m$ and $g$ above.  

j)  Use  `optimize.curve_fit` to find the values of b, initial height and initial velocity.

In [None]:
def model1(t, b, C1, C2):
    return C1 + C2*np.exp(-b/m*t) - g*m/b*t

params1, cov = optimize.curve_fit(model1, data_t, data_y)
print(params1)

It is not immediately obvious how $C_1$ and $C_2$ relate with the initial height and initial velocity.

We can also use a tool like [Wolfram Alpha](https://www.wolframalpha.com) to solve this with the initial variable.  

    {m y''(t) = -(b y'(t))-m g, y(0) = y0, y'(0) = v0}
    
To get


$$y(t) = \frac{g m (m (-e^{-(b t)/m)})-b t+m)+b (m (v_0-v_0 e^{-(b t)/m})+b y_0)}{b^2}$$

In [None]:
def model2(t, b, y0, v0):
    return (g*m*(m*(-np.exp(-(b*t)/m)) - b*t + m) + b* (m *(v0-v0*np.exp(-(b*t)/m))+b*y0))/b**2

params2, cov = optimize.curve_fit(model2, data_t, data_y)
print(params2)

This make the interpretation of the coefficients much clearer. In this problem, $ y_0 = 100$ m and $v_0 = 0.0$ m/s.

k)  Now plot the fitting function.  Plot this fitting function over the data.  Does it seem like our model ($F=-b v - m g$) describes the data?

In [None]:
%matplotlib notebook

fit_t = np.linspace(min(data_t), max(data_t), 100)
fit_y1 = model1(fit_t, *params1)
fit_y2 = model2(fit_t, *params2)
    
fig, axes = plt.subplots(figsize=(6,4))

plt.plot(data_t, data_y, '.', markersize=1, label='Data')
plt.plot(fit_t, fit_y1, '-', linewidth=1, label='Model1') 
plt.plot(fit_t, fit_y2, '-', linewidth=1, label='Model2') 
plt.xlabel('time (s)')
plt.ylabel('height (m)')
plt.title('Falling Object')
plt.legend()

plt.show()