# Normal distribution and belief propagation 

In the following we derive belief propagation formulae for Hidden Markov Models where the states $x_i$ and the observations $y_i$ are continuous real variables and the correspondence between them are governed by linear equations 

\begin{align*}
x_{i+1}&=ax_i + b +w_i\\
y_i&=cx_i+ d +v_i 
\end{align*}

where 
* the initial state $x_0$ is fixed;
* state disturbances $w_i$ is modelled by a normal distribution $\mathcal{N}(0, \rho_i)$;
* measurement noise $v_i$ is modelled by a normal distribution $\mathcal{N}(0, \tau_i)$;
* all random variables  $w_1, \ldots, w_n, v_1, \ldots, v_n$ are assumed to be independent.


The setup where the states and observations are vectors of real numbers and the update rules are vector equations is analogous but highly technical and thus omitted from this tutorial. 
To simplify the treatment, we introduce couple facts about univariate and multivariate normal distributions. 


### Closeness under linear combinations

Linear combination $v=\alpha_1 u_1+\alpha_2 u_2+\cdots+\alpha_n u_n$ of independent univariate normal distributions $u_1\sim\mathcal{N}(\mu_1,\sigma_1), \ldots,u_n\sim\mathcal{N}(\mu_n,\sigma_n)$ is also a normal distribution $\mathcal{N}(\mu, \sigma)$ where parameters $\mu$ and $\sigma$ can be determined with moment matching.


### Closeness under conditioning with linear constraints

Let $\boldsymbol{x}$ be distributed according to multivariate normal distribution. Let $A\boldsymbol{x}=\boldsymbol{b}$ be a linear constraint on $\boldsymbol{x}$. Then the conditional distribution 
$p[\boldsymbol{x}|A\boldsymbol{x}=\boldsymbol{b}]$ can be expressed as a multivariate normal distribution, i.e., $\boldsymbol{x}|A\boldsymbol{x}=\boldsymbol{b}$ is distributed according to multivariate normal distribution.


## I. Known facts about normal distributions 

To simplify the treatment, we introduce couple facts about univariate and multivariate normal distributions. 


### Closeness under linear combinations

Linear combination $v=\alpha_1 u_1+\alpha_2 u_2+\cdots+\alpha_n u_n$ of independent univariate normal distributions

\begin{align*}
u_1&\sim\mathcal{N}(\mu_1,\sigma_1)\\
u_2&\sim\mathcal{N}(\mu_2,\sigma_2)\\
\cdots&\sim\cdots\\
u_n&\sim\mathcal{N}(\mu_n,\sigma_n)
\end{align*}

is also a normal distribution $\mathcal{N}(\mu, \sigma)$ where parameters $\mu$ and $\sigma$ can be determined with moment matching.


### Closeness under conditioning with linear constraints

Let $\boldsymbol{x}$ be distributed according to multivariate normal distribution. Let $A\boldsymbol{x}=\boldsymbol{b}$ be a linear constraint on $\boldsymbol{x}$. Then the conditional distribution 
$p[\boldsymbol{x}|A\boldsymbol{x}=\boldsymbol{b}]$ can be expressed as a multivariate normal distribution, i.e., $\boldsymbol{x}|A\boldsymbol{x}=\boldsymbol{b}$ is distributed normally.


## II. Belief propagation for Markov chains

Let us first consider the Markov chain with real-valued state that evolves according to the affine update rule 

\begin{align*}
x_{i+1}=a x_{i} + b + w_i 
\end{align*}

where the initial state $x_0$ is a fixed number and the error terms satisfy the following restrictions:

* all error terms $w_i$ are independent;  
* each error term $w_i$ is distributed according $\mathcal{N}(0,\sigma_i)$.

**First observation.** The state $x_1$ is distributed according to the normal distribution as an affine transformation of $w_1$. 


**Second observation.** If the state $x_i$ is distributed according to the normal $\mathcal{N}(\mu_i,\rho_i)$ distribution then the closeness under linear combinations assures that $x_{i+1}$ is also distributed as a normal distribution $\mathcal{N}(\mu_{i+1},\rho_{i+1})$. 
This guarantees that all states $x_{1}, \ldots, x_n$ are distributed according to normal distribution.


In principle, it is possible to derive density function for $x_{i+1}$ directly as

\begin{align}\tag{P1} 
\pi[x_{i+1}]&=\int\limits_{-\infty}^\infty p[x_i|x_0]\cdot p[x_{i+1}|x_i]\cdot dx_i = \int\limits_{-\infty}^\infty \pi[x_i]\cdot p[x_{i+1}|x_i]\cdot dx_i\enspace. 
\end{align}

However, the resulting integral

\begin{align*}
\pi[x_{i+1}]&=\frac{1}{2\pi\rho_i\sigma_i}\cdot\int\limits_{-\infty}^\infty 
\exp\left(-\frac{(x_i-\mu_i)^2}{2\rho_i^2}\right)\cdot \exp\left(-\frac{(x_{i+1}-ax_i-b)^2}{2\sigma_i^2}\right)\cdot dx_i
\end{align*}

is technically hard compute and it is much simpler to use moment matching

\begin{align*}
\mathbf{E}(x_{i+1})&=\mathbf{E}(ax_{i}+b+w_i)=a\mu_i\\
\mathbf{D}(x_{i+1})&=\mathbf{D}(ax_{i}+b+w_i)=a^2\rho_i^2+\sigma_i^2\enspace
\end{align*}

to derive the parameters of the resulting normal distribution

\begin{align*}
\mu_{i+1}  &=a\mu_i\\
\rho_{i+1} &=\sqrt{a^2\rho_i^2+\sigma_i^2}\enspace. 
\end{align*}

In other words, the prior propagation can be done by iteratively updating the parameters $\mu_i$ and $\rho_i$ from start of the chain to the end of the chain. 
For non-homogenous Markov chains, the update rule is just varies from node to node.


## III. Likelihood propagation and a reverse chain 

The second essential component of belief propagation is likelihood propagation through the formula

\begin{align}\tag{L1}
  \lambda[x_{i}] = p[x_n|x_i]= \int\limits_{-\infty}^\infty p[x_n|x_{i+1}]\cdot p[x_{i+1}|x_i]\cdot dx_{i+1}= \int\limits_{-\infty}^\infty \lambda[x_{i+1}]\cdot p[x_{i+1}|x_i]\cdot dx_{i+1}
\end{align}

which is surprisingly similar to the prior propagation formula. In fact, we can define a reverse Markov chain 


with carefully crafted transition probabilities $q[x_{i}|x_{i+1}]$ so that the corresponding prior $\pi^*[x_i]=c_i\lambda[x_i]$. 
For that it is sufficient 

\begin{align}\tag{R1}
   c_i\cdot\int\limits_{-\infty}^\infty \pi^*[x_{i+1}]\cdot p[x_{i+1}|x_i]\cdot dx_{i+1}= c_{i}c_{i+1}\lambda[x_i]=
   c_{i+1}\cdot\int\limits_{-\infty}^\infty \pi^*[x_{i+1}]\cdot q[x_i|x_{i+1}]\cdot dx_{i+1}\enspace
\end{align}

which is clearly satisfied when we define 

\begin{align}\tag{R2}
 q[x_i|x_{i+1}]=\propto p[x_{i+1}|x_i]\enspace
\end{align}

where the hidden coefficient just normalises density $q[x_i|x_{i+1}]$. 
Now note that 

\begin{align*}
p[x_{i+1}|x_i]\propto \exp\left( -\frac{(x_{i+1}-ax_i-b)^2}{2\sigma_i^2}\right)
= \exp\left( -\frac{\left(x_i-\frac{x_{i+1}}{a}+\frac{b}{a}\right)^2}{2\left(\frac{\sigma_i}{a}\right)^2}\right)
\end{align*}

which indicates that $q[x_i|x_{i+1}]$ can be defined as a normal distribution $\mathcal{N}(\mu_i^*, \sigma_i^*)$ with parameters

\begin{align*}
\mu_i^*&= \frac{x_{i+1}}{a}-\frac{b}{a}\\
\sigma^*_i&=\frac{\sigma_i}{a}\enspace.
\end{align*}

Note that the affine update rule

\begin{align*}
x_i=\frac{x_{i+1}+b-w_i}{a_i}
\end{align*}

gives exactly the same denisity  and thus we can compute the likelihood as prior in the reverse chain. 
Let the corresponding distribution $\pi^*[x_i]$ be denoted by $\mathcal{N}(\mu_i^*, \rho_i^*)$.  

## IV. Smoothing for Markov chains

From previous results we know that prior and the likelihoods in our Markov chain are normal distributions and thus we are left 
with the following analytical simplification task

\begin{align}\tag{S1}
p[x_i|x_0,x_n]\propto \pi[x_i]\cdot \lambda[x_i]\propto \exp\left(-\frac{(x_i-\mu_i)^2}{2\rho_i^2}\right)\cdot \exp\left(-\frac{(x_i-\mu_i^*)^2}{2{\rho_i^*}^2}\right)\enspace.
\end{align}

Again this is technically demanding unless you notice that we can define two-dimensional normal distribution

\begin{align*}
\xi_1&\sim \mathcal{N}(\mu_i, \rho_i)\\
\xi_2&\sim \mathcal{N}(\mu_i^*, \rho_i^*)
\end{align*}

and observe conditional distribution $\xi_1=\xi_2$.
As normal distribution is closed under conditionings against linear constraints we know that the resulting distribution is again a normal distribution. 
The easiest way to derive the parameters of the normal distribution is just expression manipulation
  
  
\begin{align*}
p[x_i| x_0, x_n]
&\propto\exp\Biggl(-\frac{(x_i-\mu_i)^2}{2\rho_i^2}\Biggr)\cdot
\exp\biggl(-\frac{(x_{i}-\mu_i^*)^2}{2\rho_i^{*2}}\biggr)\\
&\propto\exp\Biggl(-\frac{\rho_i^{*2}(x_i-\mu_i)^2+ \rho_i^2(x_i-\mu^*_i)^2}{2\rho_i^2\rho_i^{*2}}\Biggr)\\
&\propto\exp\Biggl(-\frac{(\rho_i^{*2}+\rho_i^*)x_i^2-2(\rho_i^{*2}\mu_i+\rho_i^2\mu_i^*)x_i}{2\rho_i^2\rho_i^{*2}}\Biggr)\\
&\propto\exp\Biggl(-\frac{\rho_i^{*2}+\rho_i^*}{2\rho_i^2\rho_i^{*2}}\cdot 
\biggl(x_i^2-2\cdot\frac{\rho_i^{*2}\mu_i+\rho_i^2\mu_i^*}{\rho_i^{*2}+\rho_i^2}x_i\biggr)\Biggr)\\
&\propto\exp\left(- 
\frac{\biggl(x_i-\frac{\rho_i^{*2}\mu_i+\rho_i^2\mu_i^*}{\rho_i^{*2}+\rho_i^2}\biggr)^2}
{2\frac{\rho_i^2\rho_i^{*2}}{\rho_i^{*2}+\rho_i^2}}\right)\enspace.
\end{align*}

From this we can conclude that the marginal distribution $x_i|x_0,x_n$ follows indeed a normal distribution $\mathcal{N}(\mu, \sigma)$ with parameters

\begin{align*}
\mu&=\frac{\rho_i^{*2}\mu_i+\rho_i^2\mu_i^*}{\rho_i^{*2}+\rho_i^2}\\
\sigma^2&=\frac{\rho_i^2\rho_i^{*2}}{\rho_i^{*2}+\rho_i^2}\enspace.
\end{align*}

**Fusion formula.**
This result is more general. Whenever the distribution is proportional to product of two normal distribution $\mathcal{N}(\mu_1, \sigma_1)$ and $\mathcal{N}(\mu_2, \sigma_2)$ we get again a normal distribution $\mathcal{N}(\mu,\sigma)$ with parameters

\begin{align*}
\mu&=\frac{\sigma_2^{2}\mu_1+\sigma_1^2\mu_2}{\sigma_1^2+\sigma_2^2}\\
\sigma^2&=\frac{\sigma_1^2\sigma_2^{2}}{\sigma_1^{2}+\sigma_2^2}\enspace.
\end{align*}




## V. Filtering for Hidden Markov Models

Let $f[x_i]=p[x_i|x_0, y_1,\ldots, y_{i}]$ denote the desired conditional density and
recall that the belief propagation in Hidden Markov Models is governed by the following equation

\begin{align*}
\pi[x_{i+1}]=p[x_{i+1}|x_0, y_1,\ldots, y_{i}]=\int\limits_{-\infty}^\infty p[x_{i+1},x_i|x_0, y_1,\ldots, y_{i}]\cdot dx_i=\int\limits_{-\infty}^\infty p[x_{i+1}|x_i]\cdot p[x_i|x_0, y_1,\ldots, y_{i}] \cdot dx_i\enspace.
\end{align*}

Then we get the relation 

\begin{align}\tag{F1}
\pi[x_{i+1}]=\int\limits_{-\infty}^\infty \pi[x_{i+1}|x_i]\cdot f[x_i] \cdot dx_i 
\end{align}

where

\begin{align}\tag{F2}
f[x_i]= p[x_i|x_0, y_1,\ldots, y_{i}]\propto p[y_i|x_i]\cdot \pi[x_i]\enspace.
\end{align}

**First observation.** The prior $\pi[x_1]$ is a density of normal distribution as there are no observations and the state $x_1$ is distributed according to the normal distribution as an affine transformation of $w_1$. 


**Second observation.** If the prior $\pi[x_i]$ is proportiona to a density of normal distribution then $f[x_i]$ is also proportional a density of a normal distribution.

Indeed, the formula (F2)  is analogous to the smoothing formula (S1) where the densities are of normal distributions.
As the emittion probability $p[y_i|x_i]$ is also density of a normal distribution, the analogous reasoning assures that $f[x_i]$ is a density of a normal distribution.


**Third observation.** If $f[x_i]$ is proportional to a density of a normal distribution then the next prior $\pi[x_i]$ is proportional to a density of a normal distribution.

Indeed, the formula (F1) is structurally identical to the prior propagation formula (P1) for Markov chains and thus the claim follows.

**Corollary.** Priors $\pi[x_i]$ and filtering $f[x_i]$ are all distributed according to normal distributions and their parameters can be found by moment matching and the fusion formula.

## VI. Likelihood propagation for Hidden Markov Models

We can copy the same techniques for propagating likelihoods in Markov chain. Let us define a reverse chain with property that the likelihood $\lambda[x_i]$  is proportional to the prior $\pi^*[x_i]$ in the reverse chain. 
For that we need to expand the likelihood 

\begin{align}\tag{L2}
\lambda[x_i]= p[y_{i+1},\ldots, y_n| x_i]=\int\limits_{-\infty}^\infty p[y_{i+1},\ldots, y_n| x_i, x_{i+1}]\cdot p[x_{i+1}|x_i]\cdot dx_{i+1}=\int\limits_{-\infty}^\infty p[y_{i+1},\ldots, y_n| x_{i+1}]\cdot p[x_{i+1}|x_i]\cdot dx_{i+1}
\end{align}

while the prior in the reverse chain can be expressed

\begin{align}\tag{R3}
\pi^*[x_i]=\int\limits_{-\infty}^\infty q[x_i|x_{i+1}]\cdot q[x_{i+1}|y_{i+1},\ldots, y_{n}] \cdot dx_i\enspace.
\end{align}

where $q[\cdot|\cdot]$ stands for conditional densities in the reverse chain. 
Again note that 

\begin{align}\tag{R4}
f^*[x_{i+1}]=q[x_{i+1}|y_{i+1},\ldots, y_{n}]\propto  q[y_{i+1}|x_{i+1}]\cdot q[x_{i+1}|y_{i+2}\ldots, y_n]=q[y_{i+1}|x_{i+1}]\cdot\pi^*[x_{i+1}] 
\end{align}

The structural similarity between the equation pair (R3)-(R4) and  (F1)-(F2) guaranmtees that if we define 

\begin{align*}
q[x_i|x_{i+1}]\propto p[x_{i+1}|x_i]
\end{align*}

and define $y_n$ as the starting point of reverse Hidden Markov Model we get the desired property $\lambda[x_i]\propto \pi^*[x_i]$.

## VII. Smoothing for Hidden Markov Models

Again let us start from the observation

\begin{align*}
p[x_i|y_1,\ldots,y_i,\ldots,y_n]\propto p[y_{i},\ldots, y_{n}|x_i, y_1,\ldots, y_{i-1}]\cdot p[x_i|y_1,\ldots, y_{i-1}]\propto p[y_i|x_i]\cdot \lambda[x_i] \cdot\pi[x_i]
\end{align*}

As all terms in the expression are densities of normal distribution the closeness under linear constraint assures that the result is also a normal distribution.
