# Least-Squares Fitting with a Shift

### Prof. Robert Quimby
&copy; 2018 Robert Quimby

## In this tutorial you will...

* Quickly review how to fit a line to data with least-squares
* Consider the uncertainty in the best-fit model
* Find that model uncertainty can often be minimized with the right choice of coordinates

## Load some data

In [None]:
import numpy as np
data = np.genfromtxt('media/xy.dat', names='x, y')

In [None]:
# setup plotting
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10, 7)

In [None]:
# make a scatter plot
plt.????
plt.xlabel('x value', weight='bold', fontsize=14)
plt.ylabel('y value', weight='bold', fontsize=14);

## Recall how to least-squares fit a line

Assume the model:
$$y_i = mx_i + b$$

where $x_i$ and $y_i$ are the x and y values, respectively for the $i^{\rm th}$ observation. Or in matrix form:

$$
\left[ \begin{array}{c}
y_1  \\
y_2  \\
\vdots \\
y_N  \end{array} \right]
= 
\left[ \begin{array}{cc}
x_1 & 1 \\
x_2 & 1 \\
\vdots & \vdots\\
x_N & 1 \end{array} \right] 
\left[ \begin{array}{c}
m \\
b \end{array} \right] $$

In [None]:
X = ????
Y = ????
p = ????

In [None]:
plt.scatter(data['x'], data['y'], c='r')
plt.xlabel('x value', weight='bold', fontsize=14)
plt.ylabel('y value', weight='bold', fontsize=14);
plt.plot(????, ls='dashed');

## Determine the model parameter uncertainties

In [None]:
modely = X * p
dY = Y - modely
M = X.shape[0] # number of data samples
N = p.shape[0] # number of model parameters
sample_var = (dY.T * dY) / (M - N)
pvar = (sample_var * np.diagonal((X.T * X).I))
psig = np.sqrt(pvar)

# best fit parameters
m, b = p.A1
msig, bsig = psig.A1

# now for the parameter errors
var = (sample_var * np.diagonal((X.T * X).I))
msig, bsig = np.sqrt(var).A1
print("m is {:.2f} +/- {:.2f}".format(m, msig))
print("b is {:.2f} +/- {:.2f}".format(b, bsig))

## Randomly generate some possible models within the uncertainty

In [None]:
def plot_models(data, m, msig, b, bsig, n=30):
    # show n realizations of the model with errors
    ms = ????
    bs = ????
    modelx = np.array([min(data['x'])-1, max(data['x'])+1])
    for thism, thisb in zip(ms, bs):
        plt.plot(modelx, thism * modelx + thisb, color='0.75')
    plt.plot(data['x'], data['y'], 'ro')
    plt.plot(modelx, modelx * m + b, '--')

In [None]:
plot_models(data, m, msig, b, bsig)

## One more time with a shifted origin

$$(y - y_0) = m(x - x_0) + b$$

In [None]:
cx = np.mean(data['x'])
cy = np.mean(data['y'])
X = np.matrix( [[????, 1] for xi in data['x']] )
Y = np.matrix( [[????] for yi in data['y']] )
p = (X.T * X).I * X.T * Y

In [None]:
# predicted y values
modely = X * p

# residuals from observed values
dY = Y - modely

# variance of the data from the model
M = X.shape[0] # number of data samples
N = p.shape[0] # number of model parameters
var = (dY.T * dY) / (M - N)

# best fit parameters
m2 = p[0, 0]
b2 = p[1, 0]

# now for the parameter errors
err_p = np.sqrt(var * np.diagonal((X.T * X).I))

msig2 = err_p[0, 0]
bsig2 = err_p[0, 1]
print("m is {:.2f} +/- {:.2f}".format(m2, msig2))
print("b is {:.2f} +/- {:.2f}".format(b2, bsig2))

In [None]:
def plot_models2(data, m, msig, b, bsig, n=30, cx=0, cy=0):
    # show n realizations of the model with errors
    ms = np.random.normal(m, msig, n)
    bs = np.random.normal(b, bsig, n)
    modelx = np.array([min(data['x'])-1, max(data['x'])+1])
    besty = modelx * m + b
    for thism, thisb in zip(ms, bs):
        plt.plot(modelx, (modelx - cx) * thism + thisb + cy, color='0.75')
    plt.plot(data['x'], data['y'], 'ro')
    plt.plot(modelx, besty, '--')

In [None]:
plot_models2(data, m2, msig2, b2, bsig2, cx=cx, cy=cy)

## Fit comparision

In [None]:
print("m  is {:.2f} +/- {:.2f}".format(m, msig))
print("m2 is {:.2f} +/- {:.2f}".format(m2, msig2))
print("b  is {:.2f} +/- {:.2f}".format(b, bsig))
print("b2 is {:.2f} +/- {:.2f}".format(b2, bsig2))
print("value of y at x=0 is {:.2f} (shifted coordinates)".format(cy - (cx * m2) + b2))