---
title: Gaussian Posterior Formulas
description: Some useful formulas for Gaussian posteriors.
date: 7/9/2022
draft: false
bibliography: references.bib
csl: annals_statistics.csl
format:
  html:
    code-fold: true
---

This post is a (working) collection of formulas useful for computing posteriors involving Gaussian likelihoods and priors. Here I give all formulas in terms of precisions rather than covariances, and I assume that all precisions are nonsingular. 

# 

# Notation

Recall that $x \sim \mathcal{N}\left( \mu_x, Q_{x}^{-1}   \right)$ denotes a random variable with density
$$
\begin{align*}
\pi(x) &= \frac{1}{\sqrt{(2 \pi)^n \det \left( Q_x^{-1} \right)}} \exp\left\{ - \frac{1}{2} \left( x - \mu_x \right)^T Q_x \left( x - \mu_x \right) \right\} \\
&\propto \exp\left\{ - \frac{1}{2} \left( x - \mu_x \right)^T Q_x \left( x - \mu_x \right) \right\}
\end{align*} 
$$
where $\mu_x \in \mathbb{R}^{n \times 1}$, $Q_x \in \mathbb{R}^{n \times n}$. It can be convenient to also work with the *canonical parameterization*^[See chapter 2 of [@Rue2005]] of a Gaussian, where $x \sim \mathcal{N}_C(b_x, Q_x)$ denotes a random variable with density
$$
\pi(x) \propto \exp \left\{ - \frac{1}{2} x^T Q_x x + b_x^T x \right\}.
$$

To condition a Gaussian in the canonical parameterization on an observation $y$, we use 
$$
\begin{align*}
x &\sim \mathcal{N}_C\left( b_x, Q_x \right), \\
y \, | \, x  &\sim \mathcal{N}\left( x, Q_{y}^{-1} \right), \\
x \, | \, y &\sim \mathcal{N}_C\left( b_x + Q_{y} y, Q_{x} + Q_{y}  \right).
\end{align*}
$$

To convert between the two notations, we can use
$$
\begin{align*}
x \sim \mathcal{N}\left( \mu_x, Q_x^{-1} \right) &\Leftrightarrow x \sim \mathcal{N}_C\left( Q_x \mu_x, Q_x \right), \\
x \sim \mathcal{N}_C\left( b_x, Q_x \right) &\Leftrightarrow x \sim \mathcal{N}\left( Q_x^{-1} b_x , Q_x^{-1} \right).
\end{align*}
$$

# Gaussian Prior and Likelihood
Using a Gaussian prior with Gaussian observations, we have
$$
\begin{align*}
x &\sim \mathcal{N}\left( \mu_x, Q_x^{-1} \right), \\
y \, | \, x &\sim \mathcal{N} \left( G_x x, Q_{y}^{-1} \right), \\
Q_{x \, | \, y} &=  Q_x + G_x^T Q_{y} G_x,  \\
b_{x \, | \, y} &= Q_x \mu_x + G_x^T Q_{y} y, \\
\mu_{x \, | \, y} &= Q_{x \, | \, y}^{-1} b_{x \, | \, y}, \\
x \, | \, y &\sim \mathcal{N}_C \left( b_{x \, | \, y}, Q_{x \, | \, y} \right), \\
x \, | \, y &\sim \mathcal{N} \left(  \mu_{x \, | \, y}  , Q_{x \, | \, y}^{-1} \right),
\end{align*}
$${#eq-gaussian_prior_gaussian_likelihood}
where the corresponding densities are
$$
\begin{align*}
\pi(x) &\propto \exp\left\{ -\frac{1}{2} \left( x - \mu_x \right)^T Q_x \left( x - \mu_x \right)   \right\}, \\
\pi(y \, | \, x) &\propto \exp\left\{ -\frac{1}{2} \left( y - A x \right)^T Q_{y} \left( y - A x \right)   \right\}, \\
\pi(x \, | \, y) &\propto \exp\left\{ -\frac{1}{2} \left( x - \mu_{x \, | \, y} \right)^T Q_{x \, | \, y}  \left(  x - \mu_{x \, | \, y} \right)    \right\}, \\
\pi(x \, | \, y) &\propto \exp\left\{ -\frac{1}{2} x^T Q_{x \, | \, y} x + b_{x \, | \, y}^T x   \right\}.
\end{align*}
$$
Note that the last two are equivalent.

## Diagonal Constant Precisions

Suppose that $Q_x = \frac{1}{\gamma^2} I$ and $Q_{y} = \frac{1}{\sigma^2} I$. Then our formulas become:
\begin{align*}
x &\sim \mathcal{N}\left( \mu_x, \gamma^2 I \right), \\
y \, | \, x &\sim \mathcal{N} \left( G_x x, \sigma^2 I \right), \\
Q_{x \, | \, y} &=  \frac{1}{\gamma^2} I + \frac{1}{\sigma^2} G_x^T G_x,  \\
b_{x \, | \, y} &= \frac{1}{\gamma^2}  \mu_x +  \frac{1}{\sigma^2} G_x^T  y, \\
\mu_{x \, | \, y} &= \left(  \frac{1}{\gamma^2} I + \frac{1}{\sigma^2} G_x^T G_x    \right)^{-1}  b_{x \, | \, y}, \\
x \, | \, y &\sim \mathcal{N}_C \left( b_{x \, | \, y}, Q_{x \, | \, y} \right), \\
x \, | \, y &\sim \mathcal{N} \left(  \mu_{x \, | \, y}  , Q_{x \, | \, y}^{-1} \right),
\end{align*}

## Diagonal Non-Constant Precisions

Suppose that $Q_x = \Pi$ and $Q_{y} = \Lambda$. Then these formulas become:
$$
\begin{align*}
x &\sim \mathcal{N}\left( \mu_x, \Pi^{-1} \right), \\
y \, | \, x &\sim \mathcal{N} \left( G_x x, \Lambda^{-1} \right), \\
Q_{x \, | \, y} &=  \Pi + G_x^T \Lambda G_x,  \\
b_{x \, | \, y} &= \Pi \mu_x + G_x^T \Lambda y, \\
\mu_{x \, | \, y} &= \left( \Pi + G_x^T \Lambda G_x \right)^{-1} b_{x \, | \, y}, \\
x \, | \, y &\sim \mathcal{N}_C \left( b_{x \, | \, y}, Q_{x \, | \, y} \right), \\
x \, | \, y &\sim \mathcal{N} \left(  \mu_{x \, | \, y}  , Q_{x \, | \, y}^{-1} \right).
\end{align*}
$$

# Exact Data Augmentation (EDA)

In the case of diagonal non-constant precisions, it can be useful to introduce auxiliary variables as proposed by [@Marnissi2018] to simplify sampling of conditionals. These methods are known as exact data augmentation (EDA) methods. 

## EDA for the Noise Precision

Suppose that $Q_{y} = \Lambda$ is a diagonal matrix. Define a new variable
$$
u \, | \, x \sim \mathcal{N}\left( G_u x, Q_u^{-1} \right).
$$
Then the joint density is proportional to
$$
\begin{align*}
\pi(u, x \, | \, y) &\propto \pi(u \, | \, x ) \pi(y \, | \, x ) \pi(x) \\ 
&\propto \exp\left\{ -\frac{1}{2} \left( G_x x - y \right)^T \Lambda \left( G_x x - y \right) \right\} \\
&\quad \times \exp\left\{  -\frac{1}{2} \left( u - G_u x  \right)^T Q_u \left( u - G_u x \right)   \right\} \times \pi(x) \\
&\propto \exp\left\{ -\frac{1}{2} \left( x^T G_x^T \Lambda G_x x + y^T \Lambda y - 2 y^T \Lambda G_x x + u^T Q_u u + x^T G_u^T Q_u G_u x - 2 u^T Q_u G_u x     \right)  \right\} \times \pi(x) \\
&\propto \exp\left\{ -\frac{1}{2} \left( x^T \left( G_x^T \Lambda G_x +  G_u^T Q_u G_u  \right) x + u^T Q_u u - 2 x^T \left(  G_x^T \Lambda y + G_u^T Q_u u \right)    \right)  \right\} \times \pi(x).
\end{align*}                                                                                                                        
$$
If we pick
$$
\begin{align*}
G_u &= G_x, \\
Q_u &= \left( \frac{1}{\lambda} I - \Lambda \right),
\end{align*}
$$
where $\lambda < \frac{1}{\| \Lambda \|}$, then 
$$
\begin{align*}
G_x^T \Lambda G_x +  G_u^T Q_u G_u &= G_x^T \Lambda G_x +  G_x^T \left( \frac{1}{\lambda} I - \Lambda \right) G_x \\
&= \frac{1}{\lambda} G_x^T G_x,
\end{align*}
$$
and the joint density becomes 
$$
\begin{align*}
\pi(u, x \, | \, y) &\propto \exp\left\{ -\frac{1}{2} \left( \frac{1}{\lambda} x^T G_x^T G_x x + u^T Q_u u - 2 x^T G_x^T \left(  \Lambda y + Q_u u \right)    \right)  \right\} \times \pi(x).
\end{align*}
$$
This motivates us to introduce a second variable
$$
v \sim \mathcal{N}\left( \left( \frac{1}{\lambda} I - \Lambda \right) G_x x , \left( \frac{1}{\lambda} I - \Lambda \right) \right), 
$$
with which $x$ has the conditional density
$$
\pi(x \, | \, v) \propto \exp\left\{ -\frac{1}{2\lambda} \| G_x x - \lambda \left( \Lambda y + v  \right) \|_2^2  \right\} \times \pi(x).
$$


If we pick
$$
x \sim \mathcal{N}\left(0, \left(R^T \Pi R \right)^{-1} \right),
$$
then from our formula in @eq-gaussian_prior_gaussian_likelihood the conditional for $x$ becomes
$$
\begin{align*}
Q_{x \, | \, v} &= R^T \Pi R + \frac{1}{\lambda} I,  \\
\mu_{x \, | \, v} &= Q_{x \, | \, v}^{-1} \left( G_{x}^T \left( \Lambda y + v \right) \right)  , \\
x &\sim \mathcal{N}\left( \mu_{x \, | \, v}, Q_{x \, | \, v}^{-1} \right).
\end{align*}
$$


Some algebra will show that the joint density is given as
$$
\pi(v, x \, | \, y) \propto \exp\left\{ - \frac{1}{2}  \left(  \frac{1}{\lambda} x^T G_x^T G_x x + v^T \Gamma^{-1} v +  y^T \Lambda y - 2 x^T G_x^T \left( \Lambda y + v     \right)     \right) \right\} \times \pi(x)
$$

## EDA for the Prior Precision

Suppose that $\Pi$ is a diagonal matrix and $Q_{x} = R^T \Pi R$. Define a new variable
$$
u \, | \, x \sim \mathcal{N}\left( G_u x, Q_u^{-1} \right).
$$
Then the joint density is proportional to
$$
\begin{align*}
\pi(u, x \, | \, y) &\propto \pi(u \, | \, x) \pi(y \, | \, x ) \pi(x) \\ 
&\propto \exp\left\{ - \frac{1}{2} x^T R^T B R x \right\} \times \exp\left\{  -\frac{1}{2} \left( u - G_u x  \right)^T Q_u \left( u - G_u x \right)   \right\} \times \pi(y \, | \, x) \\
&\propto \exp\left\{ -\frac{1}{2} \left( x^T \left( R^T \Pi R +  G_u^T Q_u G_u  \right) x + u^T Q_u u - 2 x^T G_x^T Q_u u \right)   \right\} \times \pi(y \, | \, x).
\end{align*}                                                                                                                        
$$
If we pick
$$
\begin{align*}
G_u &= R, \\
Q_u &= \left( \frac{1}{\lambda} I - \Pi \right),
\end{align*}
$$
where $\lambda < \frac{1}{\| \Pi \|}$, then 
$$
\begin{align*}
R^T \Pi R +  G_u^T Q_u G_u &= R^T \Pi R +  R^T \left( \frac{1}{\lambda} I - \Pi \right) R \\
&= \frac{1}{\lambda} R^T R,
\end{align*}
$$
and the joint density becomes 
$$
\begin{align*}
\pi(u, x \, | \, y) &\propto \exp\left\{ -\frac{1}{2} \left( \frac{1}{\lambda} x^T R^T R x + u^T Q_u u - 2 x^T R^T Q_u u  \right)  \right\} \times \pi(y \, | \, x).
\end{align*}
$$

This motivates us to introduce a second variable
$$
v \sim \mathcal{N}\left( \left( \frac{1}{\lambda} I - \Pi \right) R x , \left( \frac{1}{\lambda} I - \Pi \right) \right), 
$$
with which $x$ has the conditional density
$$
\pi(x \, | \, v) \propto \exp\left\{ -\frac{1}{2\lambda} \| R x - \lambda v \|_2^2  \right\} \times \pi(y \, | \, x).
$$
If we pick
$$
\pi(y \, | \, x) \propto \exp\left\{  -\frac{1}{2} \left( G_x x - y \right)^T \Lambda \left( G_x x - y \right)   \right\},
$$
then some algebra will show that the conditional density satisfies
$$
\pi(x \, | \, v) \propto \exp\left\{ -\frac{1}{2} \left( x^T \left( \frac{1}{\lambda} R^T R + G_x^T \Lambda G \right) x \right) + x^T \left( R^T v + G_x^T \Lambda y \right) \right\}.
$$
Matching this to the canonical parameterization, this is the density for
$$
\begin{align*}
Q_{x \, | \, v} &= \frac{1}{\lambda} R^T R + G_x^T \Lambda G, \\
b_{x \, | \, v} &= R^T v + G_x^T \Lambda y, \\
x \, | \, v &\sim \mathcal{N}_C\left( b_{x \, | \, v}, Q_{x \, | \, v} \right),
\end{align*}
$$
and in terms of the standard parameterization is
$$
x \, | \, v \sim \mathcal{N}\left( Q_{x \, | \, v}^{-1} \mu_{x \, | \, v} , Q_{x \, | \, v}^{-1} \right).
$$

Some algebra will show that the joint density is given as
$$
\pi(v, x \, | \, y) \propto \exp\left\{ -\frac{1}{2} \left(   \frac{1}{\lambda} R^T R +  v^T \Gamma^{-1} v + y^T \Lambda y - 2 v^T R x - 2 y^T \Lambda G_x x  \right) \right\} \times \pi(y \, | \, x).
$$


<!-- 
$$
\begin{align*}
Q_{x \, | \, v} &= R^T \Pi R + \frac{1}{\lambda} I,  \\
\mu_{x \, | \, v} &= Q_{x \, | \, v}^{-1} \left( G_{x}^T \left( \Lambda y + v \right) \right)  , \\
x &\sim \mathcal{N}\left( \mu_{x \, | \, v}, Q_{x \, | \, v}^{-1} \right).
\end{align*}
$$


Some algebra will show that the joint density is given as
$$
\pi(v, x \, | \, y) \propto \exp\left\{ - \frac{1}{2}  \left(  \frac{1}{\lambda} x^T G_x^T G_x x + v^T \Gamma^{-1} v +  y^T \Lambda y - 2 x^T G_x^T \left( \Lambda y + v     \right)     \right) \right\} \times \pi(x)
$$ -->