In [1]:
import numpy
%pylab inline

Populating the interactive namespace from numpy and matplotlib


# Analytically marginalizating the parameters of a linear fit

We are attempting to re-write the log likelihood = minus one-half chi-squared for a linear fit from the form $\chi^2 = ({\bf Y} - {\bf A X})^T {\bf C}^{-1}\,({\bf Y} - {\bf A X})$ to a form $\chi^2 = ({\bf X} - {\bf W})^T {\bf V}^{-1}\,({\bf X} - {\bf W}) + {\bf U}$. We will test this with the data from *Hogg et al. (2010)*. First we load this data:

In [2]:
datastring= """1 & 201 & 592 & 61
               2 & 244 & 401 & 25
               3 & 47 & 583 & 38
               4 & 287 & 402 & 15
               5 & 203 & 495 & 21
               6 & 58 & 173 & 15
               7 & 210 & 479 & 27
               8 & 202 & 504 & 14
               9 & 198 & 510 & 30
               10 & 158 & 416 & 16
               11 & 165 & 393 & 14
               12 & 201 & 442 & 25
               13 & 157 & 317 & 52
               14 & 131 & 311 & 16
               15 & 166 & 400 & 34
               16 & 160 & 337 & 31
               17 & 186 & 423 & 42
               18 & 125 & 334 & 26
               19 & 218 & 533 & 16
               20 & 146 & 344 & 22"""
data= []
for line in datastring.split('\n'):
    data.append([float(f) for f in line.split('&')])
data= numpy.array(data)

Create the necessary arrays, ${\bf Y}$, ${\bf A}$, and ${\bf C}$:

In [3]:
Y= data[:,2]
A= numpy.vstack((numpy.ones_like(Y),data[:,1])).T
C= numpy.diag(data[:,3]**2.)

Calculate the solution, ${\bf W} = [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y}$, and its uncertainty covariance ${\bf V} = ({\bf A}^T {\bf C}^{-1}{\bf A})^{-1}$:

In [4]:
V= numpy.linalg.inv(numpy.dot(A.T,numpy.dot(numpy.linalg.inv(C),A)))
W= numpy.dot(V,numpy.dot(A.T,numpy.dot(numpy.linalg.inv(C),Y)))

These agree with *Hogg et al. (2010; Fig. 2)*:

In [5]:
print(W)
print(numpy.sqrt(numpy.diag(V)))

[ 213.27349198    1.07674752]
[ 14.39403311   0.07740678]


Twice chi-squared is equal to $\chi^2 = ({\bf Y} - {\bf A X})^T {\bf C}^{-1}\,({\bf Y} - {\bf A X})$ which when evaluating it for the solution ${\bf X} = {\bf W}$ is equal to

In [6]:
YminusAW= Y-numpy.dot(A,W)
twochi2= numpy.dot(YminusAW.T,numpy.dot(numpy.linalg.inv(C),YminusAW))
print(twochi2)

289.963722782


The $\chi^2$ is also equal to 

\begin{equation}
\chi^2 = ({\bf X} - [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y} )^T [{\bf A}^T {\bf C}^{-1} {\bf A}] ({\bf X} - [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y} )-{\bf Y}^T {\bf C}^{-1} {\bf A} [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf Y} + {\bf Y}^T {\bf C}^{-1}{\bf  Y}\,.
\end{equation}

That is, when writing it as $({\bf X} - {\bf W})^T {\bf V}^{-1}\,({\bf X} - {\bf W}) + {\bf U}$ we have
\begin{equation}
    {\bf W} = [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y}\,,
\end{equation}

as well as

\begin{equation}
    {\bf V} = ({\bf A}^T {\bf C}^{-1}{\bf A})^{-1}\,,
\end{equation}

and also

\begin{equation}
    {\bf U} = -{\bf Y}^T {\bf C}^{-1} {\bf A} [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf Y} + {\bf Y}^T {\bf C}^{-1}{\bf  Y}\,,
\end{equation}

which we can also write as

\begin{equation}
    {\bf U} = {\bf Y}^T {\bf C}^{-1} ({\bf Y}-{\bf A} {\bf W})\,,
\end{equation}

For the solution, the first term is zero and we are left with ${\bf U}$:

In [7]:
new_twochi2= numpy.dot(Y.T,numpy.dot(numpy.linalg.inv(C),YminusAW))
print(new_twochi2)

289.963722782


This agrees with the directly calculated value above. Notice that ${\bf U}$ is *very* similar to the original $\chi^2$ that we calculated. In fact, we can show that we can just as easily write ${\bf U}$ as

\begin{equation}
        {\bf U} = ({\bf Y}-{\bf A} {\bf W})^T {\bf C}^{-1} ({\bf Y}-{\bf A} {\bf W})\,,
\end{equation}

because

\begin{equation}
        ({\bf A} {\bf W})^T {\bf C}^{-1} ({\bf Y}-{\bf A} {\bf W}) = 0\,.
\end{equation}

That this is the case follows directly from the derivation of the solution ${\bf W}$ as the maximum likelihood solution. We can also test numerically:

In [8]:
print(numpy.dot(numpy.dot(A,W).T,numpy.dot(numpy.linalg.inv(C),YminusAW)))

4.16378043155e-12


Thus, we have that

\begin{equation}
    ({\bf Y} - {\bf A X})^T {\bf C}^{-1}\,({\bf Y} - {\bf A X}) = ({\bf X} - {\bf W})^T {\bf V}^{-1}\,({\bf X} - {\bf W}) + ({\bf Y} - {\bf A W})^T {\bf C}^{-1}\,({\bf Y} - {\bf A W})\,.
\end{equation}