In [1]:
import numpy
%pylab inline

Populating the interactive namespace from numpy and matplotlib


# Analytically marginalizating the parameters of a linear fit

We are attempting to re-write the log likelihood = minus one-half chi-squared for a linear fit from the form $\chi^2 = ({\bf Y} - {\bf A X})^T {\bf C}^{-1}\,({\bf Y} - {\bf A X})$ to a form $\chi^2 = ({\bf X} - {\bf W})^T {\bf V}^{-1}\,({\bf X} - {\bf W}) + {\bf U}$. We will test this with the data from *Hogg et al. (2010)*. First we load this data:

In [2]:
datastring= """1 & 201 & 592 & 61
               2 & 244 & 401 & 25
               3 & 47 & 583 & 38
               4 & 287 & 402 & 15
               5 & 203 & 495 & 21
               6 & 58 & 173 & 15
               7 & 210 & 479 & 27
               8 & 202 & 504 & 14
               9 & 198 & 510 & 30
               10 & 158 & 416 & 16
               11 & 165 & 393 & 14
               12 & 201 & 442 & 25
               13 & 157 & 317 & 52
               14 & 131 & 311 & 16
               15 & 166 & 400 & 34
               16 & 160 & 337 & 31
               17 & 186 & 423 & 42
               18 & 125 & 334 & 26
               19 & 218 & 533 & 16
               20 & 146 & 344 & 22"""
data= []
for line in datastring.split('\n'):
    data.append([float(f) for f in line.split('&')])
data= numpy.array(data)

Create the necessary arrays, ${\bf Y}$, ${\bf A}$, and ${\bf C}$:

In [3]:
Y= data[:,2]
A= numpy.vstack((numpy.ones_like(Y),data[:,1])).T
C= numpy.diag(data[:,3]**2.)
X= np.matmul( np.linalg.inv( np.linalg.multi_dot([A.T,np.linalg.inv(C),A]) ), np.linalg.multi_dot([A.T,np.linalg.inv(C),Y]) )

In [4]:
A0 = numpy.vstack((numpy.zeros_like(Y),data[:,1])).T

In [8]:
np.linalg.multi_dot([A0.T,np.linalg.inv(C),A,X])

array([   0.        , 3447.19353805])

In [9]:
np.linalg.multi_dot([X.T,A.T,np.linalg.inv(C),A0])

array([   0.        , 3447.19353805])

Calculate the solution, ${\bf W} = [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y}$, and its uncertainty covariance ${\bf V} = ({\bf A}^T {\bf C}^{-1}{\bf A})^{-1}$:

In [4]:
V= numpy.linalg.inv(numpy.dot(A.T,numpy.dot(numpy.linalg.inv(C),A)))
W= numpy.dot(V,numpy.dot(A.T,numpy.dot(numpy.linalg.inv(C),Y)))

These agree with *Hogg et al. (2010; Fig. 2)*:

In [5]:
print(W)
print(numpy.sqrt(numpy.diag(V)))

[213.27349198   1.07674752]
[14.39403311  0.07740678]


Twice chi-squared is equal to $\chi^2 = ({\bf Y} - {\bf A X})^T {\bf C}^{-1}\,({\bf Y} - {\bf A X})$ which when evaluating it for the solution ${\bf X} = {\bf W}$ is equal to

In [6]:
YminusAW= Y-numpy.dot(A,W)
twochi2= numpy.dot(YminusAW.T,numpy.dot(numpy.linalg.inv(C),YminusAW))
print(twochi2)

289.9637227819993


The $\chi^2$ is also equal to 

\begin{equation}
\chi^2 = ({\bf X} - [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y} )^T [{\bf A}^T {\bf C}^{-1} {\bf A}] ({\bf X} - [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y} )-{\bf Y}^T {\bf C}^{-1} {\bf A} [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf Y} + {\bf Y}^T {\bf C}^{-1}{\bf  Y}\,.
\end{equation}

That is, when writing it as $({\bf X} - {\bf W})^T {\bf V}^{-1}\,({\bf X} - {\bf W}) + {\bf U}$ we have
\begin{equation}
    {\bf W} = [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf  Y}\,,
\end{equation}

as well as

\begin{equation}
    {\bf V} = ({\bf A}^T {\bf C}^{-1}{\bf A})^{-1}\,,
\end{equation}

and also

\begin{equation}
    {\bf U} = -{\bf Y}^T {\bf C}^{-1} {\bf A} [{\bf A}^T {\bf C}^{-1}{\bf A}]^{-1} {\bf A}^T {\bf C}^{-1}{\bf Y} + {\bf Y}^T {\bf C}^{-1}{\bf  Y}\,,
\end{equation}

which we can also write as

\begin{equation}
    {\bf U} = {\bf Y}^T {\bf C}^{-1} ({\bf Y}-{\bf A} {\bf W})\,,
\end{equation}

You can back out ${\bf U}$ as being, in general:

\begin{equation}
    {\bf U} = {\bf Y}^T {\bf C}^{-1} {\bf Y} - {\bf W}^{T} {\bf V}^{-1} {\bf W}
\end{equation}

By keeping ${\bf W}$ un-expanded when the original $\chi^{2}$ is equated with the new version

For the solution, the first term is zero and we are left with ${\bf U}$:

In [7]:
new_twochi2= numpy.dot(Y.T,numpy.dot(numpy.linalg.inv(C),YminusAW))
print(new_twochi2)

289.96372278199215


This agrees with the directly calculated value above. Notice that ${\bf U}$ is *very* similar to the original $\chi^2$ that we calculated. In fact, we can show that we can just as easily write ${\bf U}$ as

\begin{equation}
        {\bf U} = ({\bf Y}-{\bf A} {\bf W})^T {\bf C}^{-1} ({\bf Y}-{\bf A} {\bf W})\,,
\end{equation}

This equation becomes two terms when the bracket is multiplied out. The first of those terms is ${\bf U}$ above. The second is equal to 0:

\begin{equation}
        ({\bf A} {\bf W})^T {\bf C}^{-1} ({\bf Y}-{\bf A} {\bf W}) = 0\,.
\end{equation}

This can be shown by substituting ${\bf W}$ and ${\bf W}^T$, yielding the identity:

\begin{equation}
    {\bf Y}^T {\bf C}^{-1} {\bf A} [ {\bf A}^T {\bf C}^{-1} A ]^{-1} {\bf A}^T {\bf C}^{-1} {\bf Y} = {\bf Y}^T {\bf C}^{-1} {\bf A} [ {\bf A}^T {\bf C}^{-1} A ]^{-1} {\bf A}^T {\bf C}^{-1} A [ {\bf A}^T {\bf C}^{-1} A ]^{-1} {\bf A}^T {\bf C}^{-1} {\bf Y}
\end{equation}

Where two of the central brackets on the right hand side are inverses of one another and cancel. That this is the case follows directly from the derivation of the solution ${\bf W}$ as the maximum likelihood solution. We can also test that the above equation is equal to 0 numerically:

In [8]:
print(numpy.dot(numpy.dot(A,W).T,numpy.dot(numpy.linalg.inv(C),YminusAW)))

-7.208456054286216e-12


Thus, we have that

\begin{equation}
    ({\bf Y} - {\bf A X})^T {\bf C}^{-1}\,({\bf Y} - {\bf A X}) = ({\bf X} - {\bf W})^T {\bf V}^{-1}\,({\bf X} - {\bf W}) + ({\bf Y} - {\bf A W})^T {\bf C}^{-1}\,({\bf Y} - {\bf A W})\,.
\end{equation}

In [16]:
print( np.linalg.multi_dot( [(Y-np.matmul(A,X)).T,np.linalg.inv(C),(Y-np.matmul(A,X))] ) )
print( np.linalg.multi_dot( [(X-W).T,np.linalg.inv(V),(X-W)] ) )
print( np.linalg.multi_dot( [(Y-np.matmul(A,W)).T,np.linalg.inv(C),(Y-np.matmul(A,W))] ) )

289.9637227819993
0.0
289.9637227819993


Awesome!

# Doing the same thing with a prior

If now we perform the same marginalization, but with a Gaussian prior of the form:

\begin{equation}
    ( {\bf X} - {\bf X}_{0} )^{T} {\bf \Sigma}^{-1} ( {\bf X} - {\bf X}_{0} )
\end{equation}

For this prior, ${\bf X}_{0}$ is a column vector of the same shape as ${\bf X}$, which sets the mean value of the prior (i.e. $m$ or $b$). The variance, ${\bf \Sigma}$, sets the width of the Gaussian prior. For a flat uniform prior on a parameter set the corresponding element of ${\bf X}_{0}$ equal to 0 (or anything technically), and set the element of the $N \times N$ inverse variance matrix ${\bf \Sigma}^{-1}$ equal to 0 (i.e. the standard deviation for that parameter is infinity).

The new ${\bf V}$ takes the form:

\begin{equation}
    {\bf V}^{-1} = {\bf A}^{T} {\bf C}^{-1} {\bf A} + {\bf \Sigma}^{-1}
\end{equation}

i.e. the inverse variance matrices add linear.

The new ${\bf W}$ takes the form:

\begin{equation}
    {\bf W} = ( {\bf A}^{T} {\bf C}^{-1} {\bf A} + {\bf \Sigma}^{-1} )^{-1}( {\bf A}^{T} {\bf C}^{-1} {\bf Y} + {\bf \Sigma}^{-1} {\bf X}_{0} )
\end{equation}

Which is basically equivalent to the multivariate normal distribution product rules, for example as found in http://compbio.fmph.uniba.sk/vyuka/ml/old/2008/handouts/matrix-cookbook.pdf Section 8.1.8

Note that this is also the linear algebra solution to the linear best-fit