# Least-Squares Fitting

### Prof. Robert Quimby
&copy; 2018 Robert Quimby

## In this tutorial you will...

* find the best-fit model when you have more data than model parameters
* learn how to use `numpy.matrix` objects to do least-square fits
* estimate the uncertainty in the best-fit model parameters

## Plot some Data

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (7, 5)

x = [1.3, 3.4]
y = [2.1, 5.9]

plt.plot(x, y, 'ro')
plt.xlim(0, 8)
plt.ylim(0, 15)

## Fit a Line to these Data

$y = mx + b$

$$
\begin{align}
  5.9 & =  3.4  m  +  b \\
-( 2.1 & = 1.3  m  +  b) \\
\hline
  3.8 & =  2.1  m 
\end{align}
$$

In [None]:
m = 
b = 

## Plot the linear relation

In [None]:
plt.plot(x, y, 'ro')
plt.xlim(0, 8)
plt.ylim(0, 15)

import numpy as np
modelx = np.linspace(0, 8, 10)
modely = m * modelx + b
plt.plot(modelx, modely, '--')

## What if we want to fit a line to three points?

In [None]:
x = [1.3, 3.4, 6.4]
y = [2.1, 5.9, 13.5]
plt.plot(x, y, 'ro')

# overlay the model
plt.plot(modelx, modely, '--')
plt.xlim(0, 8)
plt.ylim(0, 15)

## Dealing with Overdetermined Data

* Real world measurements are imperfect!
* Your best attempts to measure something will still have some error
* The quantity you are trying to measure may deviate from the predictions of your model

#### Therefore...
* Three or more data points will usually **not** fix *exactly* on a line, even if you think they should

## Fitting Models to Overdetermined Data

* **IF** we can assume that the deviations from our ideal model are random and follow a Gaussian distribution
* **THEN** there is an ideal method for determining the best fitting model

## Least-Squares Fitting

* minimize the sum of the square of the deviations from the model

### Fit a line to the data in the least-squares sense

data:
$$x = [1.3, 3.4, 6.4]$$
$$y = [2.1, 5.9, 13.5]$$

model: $$y = mx + b$$

#### The sum of the squares of the deviations as a function of $m$ and $b$ is:
$$S(m, b) = \Sigma (y_i - \theta_i)^2$$

where $\theta_i$ is the **model prediction** for the $i^{\rm th}$ data point. With $\theta_i = mx_i + b$, we have:

$$S(m, b) =  (2.1 - (1.3m + b))^2 \\ + (5.9 - (3.4m + b))^2 \\ + (13.5 - (6.4m + b))^2$$

This simplifies to:

$$ S(m, b) = 54.21 m^2 + 22.2  m  b - 218.38  m + 3 b^2 - 43  b + 221.47$$

### Now we just have to minimize this...

### Find the partial derivatives, $\delta S / \delta m$ and $\delta S / \delta b$

$$ \delta S / \delta m  = 108.42 m + 22.2 b - 218.38 $$
$$ \delta S / \delta b = 6 b + 22.2 m - 43 $$

### ...set these to zero and solve for $m$ and $b$

In [None]:
m = 494. / 219.
b = (43 - 22.2 * m) / 6
print(m, b)

In [None]:
# now we can plot the best fit line with our data
modely = m * modelx + b
plt.plot(x, y, 'ro')
plt.plot(modelx, modely, '--')
plt.xlim(0, 8)
plt.ylim(0, 15)

## That was for a two parameter model with 3 data points...

* think about doing this for, say, 100 data points

## Matricies to the Rescue!

We can take the three equations:

$$
\begin{align}
2.1   &  =  1.3m + b\\
5.9   &  =  3.4m + b\\
13.5  &  =  6.4m + b\\
\end{align}
$$

and turn them into a single matrix equation:

$$ Y = Xp $$

$Y = 
\left[ \begin{array}{c}
2.1  \\
5.9  \\
13.5  \end{array} \right] $, 
$X = 
\left[ \begin{array}{cc}
1.3 & 1 \\
3.4 & 1 \\
6.4 & 1 \end{array} \right] 
$, and $p = \left[ \begin{array}{c}
m \\
b \end{array} \right] $

### Now, just solve for $p$!

$$ Xp = Y $$

$$ X^T X p = X^T Y $$

$$ (X^T X)^{-1} (X^T X) p = (X^T X)^{-1} X^T Y $$

$$ p = (X^T X)^{-1} X^T Y $$

## A quick intro to `numpy.matrix` objects

### `numpy.matrix` is NOT the same as `numpy.array`

In [None]:
# create an array and a matrix for testing
a1 = 
m1 = 
print("the array is:\n", a1)
print("the matrix is:\n", m1)

In [None]:
# you can multiply them by scalars
print("the scaled array is: \n", a1 * 2)
print("the scaled matrix is: \n", m1 * 2)

In [None]:
# you can add them
print("the array sum is:\n", a1 + a1)
print("the matrix sum is:\n", m1 + m1)

### Key difference between `numpy.matrix` and `numpy.array`: multiplication!

In [None]:
# note that array and matrix multiplacation is diferent
print("the array product is:\n", a1 * a1)
print("the matrix product is:\n", m1 * m1)

## Recall matrix multiplication

$$
\left[ \begin{array}{cc}
a & b \\
c & d \\
\end{array} \right]
\left[ \begin{array}{cc}
w & x \\
y & z \\
\end{array} \right] 
=
\left[ \begin{array}{cc}
aw + by & ax + bz \\
cw + dy & cx + dz \\      
\end{array} \right] $$



for more see:
 * https://en.wikipedia.org/wiki/Matrix_multiplication
 * http://mathworld.wolfram.com/MatrixMultiplication.html

## Matrix Math Makes Least-Squares Fitting a Snap!

Express the model, $y = mx + b$ as:
$$ Y = Xp $$

where $Y = 
\left[ \begin{array}{c}
y_0  \\
y_1  \\
 \vdots  \\
y_{N-1}  \end{array} \right] $, 
$X = 
\left[ \begin{array}{cc}
x_0 & 1 \\
x_1 & 1 \\
 \vdots & \vdots \\
x_{N-1} & 1 \end{array} \right] 
$, and $p = \left[ \begin{array}{c}
m \\
b \end{array} \right] $

### here's what that looks like with `numpy.matrix` objects

In [None]:
# set up the x in a 3 row by two column (N x 2) matrix
X = 

In [None]:
# set up the y in a single column matrix
Y = 

In [None]:
# solve for p
p = 

## What if we have additional parameters in our model?

As long as you can express the model as a linear equation with independant model parameters
$$ Y = Xp $$
will work.

### for example, what about a quadratic equation...

$$ y = ax^2 + bx + c $$

no problem!

$$
\left[ \begin{array}{c}
y_0  \\
y_1  \\
 \vdots  \\
y_N  \end{array} \right] = \left[ \begin{array}{ccc}
x_0^2 & x_0 & 1 \\
x_1^2 & x_1 & 1 \\
 \vdots & \vdots \\
x_N^2 & x_N & 1 \end{array} \right] 
\left[ \begin{array}{c}
a \\
b \\
c \end{array} \right] $$


## Sample Variance

* How close are the data points to the model?

In [None]:
# predicted y values
modelY = 

# residuals from observed values


### Variance (${\rm var} = \sigma^2$) of the data from the model 

In [None]:
M = ???? # number of data samples
N = ???? # number of model parameters
sample_var = ????
print(sample_var)

In [None]:
# plot residuals
plt.errorbar(x, y, np.sqrt(sample_var[0,0]), ls='None', marker='o', color='red', capsize=3)
plt.plot(modelx, modely, '--')

## Uncertainty in the model parameters

In [None]:
# now for the model parameter uncertainties
pvar = 
psig = 

In [None]:
# best fit parameters
m, b = 
msig, bsig = 

# now for the parameter errors
print("m is {:.3f} +/- {:.3f}".format(m, msig))
print("b is {:.3f} +/- {:.3f}".format(b, bsig))

## For more details see...

["Least-Squares and Chi-Square for the Budding Aficionado: Art and Practice"](http://ugastro.berkeley.edu/radio/2015/handout_links/lsfit_2008.pdf) by Carl Heiles (UC Berkeley)