### The problem

Let's prove a wonderfully useful result that will crop up a lot in many areas of Bayesian probability, notably the derivation of the Kalman filter.

Suppose we have a situation where

$$X \sim N(\mu, \sigma^2)$$

and the distribution of $Y$ _conditional on the value of_ $X$ is

$$ Y | X \sim N(X, \tau^2)$$

(in words: when we know the value of $X$, $Y$ is Normally distributed with a mean of $X$ and a variance of $\tau^2$)

The question we want to answer is, in this situation, what is the _marginal_ distribution of $Y$ (before we know the value of $X$)?

### Theorem of total probability

Well, to know the distribution of $Y$, we must know $P(Y = y)$ for any value $y$. 

This is calculated via the integral

$$
\begin{align}
P(Y = y) & = \int P(Y = y, X = x) dx \\
\end{align}
$$

Intuitively, for any value of $y$, there are lots of possible values of $X$ that could "lead to" $Y$ taking on the value $y$. The probability that $Y = y$ is the sum of all these possible "ways" that the situation $Y = y$ could happen. That is, we need to sum up $P(Y = y, X = x)$ for all $x$. And because we are dealing with continuous variables, the sum we are talking about is an integral.

What the previous paragraph is describing is actually a continuous version of the Theorem of Total Probability (a very important theorem, well worth understanding)

### Bayes' rule

We can now use Bayes' rule to convert our integral into terms we know about.

In words, Bayes rule states that the probability of two events X and Y happening together is the product of 
- the probability of X happening and 
- the probability of Y happening given that X has happened.

So

$$
\begin{align}
P(Y = y) & = \int P(Y = y, X = x) dx \\
         & = \int P(Y = y| X = x) P(X = x) dx
\end{align}
$$

### Fill in terms

Because we know the form of the Normal distribution's pdf, we know all the terms inside the integral. We can write it out in full.

$$
\begin{align}
P(Y = y) & = \int P(Y = y|X = x) P(X = x) dx \\
             & = \int \frac{1}{\sqrt{2 \pi} \tau} \exp \left\{ -\frac{1}{2} \frac{(y - x)^2}{\tau^2} \right\} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left\{ -\frac{1}{2} \frac{(x - \mu)^2}{\sigma^2} \right\}  \ dx \\
             & = \frac{1}{\sqrt{2 \pi} \tau} \frac{1}{\sqrt{2 \pi} \sigma} \int \exp \left\{ -\frac{1}{2} \left( \frac{(y - x)^2}{\tau^2} + \frac{(x - \mu)^2}{\sigma^2} \right) \right\}  \ dx \\
\end{align}
$$

### Focus on the exponent

Let's focus on the term inside the brackets in the exponent in the integral. Call this term E.

$$
\begin{align}
E & = \frac{(y - x)^2}{\tau^2} + \frac{(x - \mu)^2}{\sigma^2} \\
  & = \frac{1}{\tau^2 \sigma^2} \left[ \sigma^2 (y - x)^2 + \tau^2 (x - \mu)^2 \right] \\
  & = \frac{1}{\tau^2 \sigma^2} \left[ \sigma^2 (y^2 - 2xy + x^2) + \tau^2 (x^2 - 2x\mu + \mu^2) \right] \\
\end{align}
$$

### Collect terms

Now we want to collect together all the terms involving $x^2$, and all those involving $x$ and all constant terms.

$$
\begin{align}
E & = \frac{1}{\tau^2 \sigma^2} \left[ (\sigma^2 x^2 + \tau^2 x^2) + (\sigma^2.-2xy + \tau^2.-2x\mu) + (\sigma^2 y^2 + \tau^2 \mu^2) \right] \\
  & = \frac{1}{\tau^2 \sigma^2} \left[ (\sigma^2 + \tau^2) x^2 - 2(\sigma^2 y + \tau^2 \mu)x + (\sigma^2 y^2 + \tau^2 \mu^2) \right] \\
\end{align}
$$

Now let's rearrange (factorise) a bit

$$
\begin{align}
E & = \frac{\sigma^2 + \tau^2}{\tau^2 \sigma^2} \left[ x^2 - 2 \left( \frac{\sigma^2 y + \tau^2 \mu}{\sigma^2 + \tau^2}\right) x + \frac{(\sigma^2 y^2 + \tau^2 \mu^2)}{\sigma^2 + \tau^2} \right] \\
\end{align}
$$

### Complete the square

Now we use a very clever trick called "completing the square". What we want is an expression of the form 

$$A (x - B)^2 + C$$

(Why will become clear later). 

Although it looks a bit nasty, we can do this from the above expression. Let's relabel a few variables:

$$
\begin{align}
a & = \frac{\sigma^2 + \tau^2}{\tau^2 \sigma^2} \\
b & = \frac{\sigma^2 y + \tau^2 \mu}{\sigma^2 + \tau^2} \\
c & = \frac{(\sigma^2 y^2 + \tau^2 \mu^2)}{\sigma^2 + \tau^2} \\
\end{align}
$$

Then our expression $E$ becomes

$$E = a \left[ x^2 - 2 bx + c \right] $$

Now let's add and subtract $b^2$ inside that bracket. This doesn't change the bracket's value, so we still have $E$.

$$E = a \left[ x^2 - 2 bx + b^2 - b^2 + c \right] $$

And now we spot that the first few terms in the bracket are actually $(x-b)^2$.

$$
\begin{align}
E & = a \left[ (x - b)^2 - b^2 + c \right] \\
  & = a (x - b)^2 + a \left[- b^2 + c \right] \\
  & = a (x - b)^2 + a \left( c - b^2 \right) \\
\end{align}
$$

We've completed the square: we've got an expression for $E$ in the form 

$$A (x - B)^2 + C$$

where 

$$
\begin{align}
A & = a \\
B & = b \\
C & = a (c - b^2)
\end{align}
$$

### Relabel

Now, let's relabel a bit. If we let 

$$
\begin{align}
D = \frac{1}{\sqrt{a}} \iff a = \frac{1}{D^2}
\end{align}
$$

then

$$E = \frac{(x - B)^2}{D^2} + C$$

Again, the reason for this decision will become clearer below!

### Rearrange the original integral

Ok, let's go back to our original integral. We now have

$$
\begin{align}
P(Y = y) & = \frac{1}{\sqrt{2 \pi} \tau} \frac{1}{\sqrt{2 \pi} \sigma} \int \exp \left\{ -\frac{1}{2} \left( \frac{(x - B)^2}{D^2} + C \right) \right\}  \ dx \\
         & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\sqrt{2 \pi}} \frac{1}{\tau \sigma} \exp \left\{ -\frac{1}{2} C \right\} \int \exp \left\{ -\frac{1}{2} \frac{(x - B)^2}{D^2} \right\}  \ dx \\
\end{align}
$$

I'm now going to multiply the expression by $\frac{D}{D} = 1$, which doesn't change its value, and rearrange

$$
\begin{align}
P(Y = y) & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\sqrt{2 \pi}} \frac{1}{\tau \sigma} \frac{D}{D} \exp \left\{ -\frac{1}{2} C \right\} \int \exp \left\{ -\frac{1}{2} \frac{(x - B)^2}{D^2} \right\}  \ dx \\
         & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\tau \sigma} D \exp \left\{ -\frac{1}{2} C \right\} \int \frac{1}{\sqrt{2 \pi}} \frac{1}{D} \exp \left\{ -\frac{1}{2} \frac{(x - B)^2}{D^2} \right\}  \ dx \\
\end{align}
$$

### A fantastically useful trick

We now use a fantasically useful trick, which comes up a lot when we are working with probability density functions and integrals. 

Because of the choices we made above, we managed to rearrange the exponential term inside the integral into the form 

$$
\begin{align}
\exp \left\{ -\frac{1}{2} \frac{(x - B)^2}{D^2} \right\}
\end{align}
$$ 

Also, the constant in front of the exponential (inside the integral) is the correct constant for a Normal distribution with this mean and variance. So we recognise that the integral here is that of a Normal pdf with mean $B$ and variance $D^2$.

So the value of integral is therefore equal to 1 (we know that any probability distribution must integrate to 1). So it drops out of our expression for $P(Y = y)$ completely!

### The integral disappears

So we now have an expression for $P(Y = y)$ that does not involve an integral:

$$
\begin{align}
P(Y = y) 
 & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\tau \sigma} D \exp \left\{ -\frac{1}{2} C \right\} \\
 & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\tau \sigma} D \exp \left\{ -\frac{1}{2} a \left( c - b^2 \right) \right\} 
\end{align}
$$

All that remains is to work out the exact dependence on $y$ from the above expression.

We're getting there! It might seem like hard work, but I hope you're following and enjoying the fact that everything we've done is from first principles. All we've assumed are some very well-known laws of probability and calculus.

### Focus on the exponent again

In the remaining expression for $P(Y = y)$, we see that the only dependence on $y$ is once again inside the exponential term ($y$ does not appear in the definition of $D$ above or any of the other constants outside the exponential). 

So let's relabel the main part of that exponential term $F$, and simplify it.

$$
\begin{align}
F =  a \left( c - b^2 \right)
\end{align}
$$

Using our definitions of $a$, $b$ and $c$ from above, we have

$$
\begin{align}
F & = \frac{\sigma^2 + \tau^2}{\tau^2 \sigma^2} \left[ \frac{(\sigma^2 y^2 + \tau^2 \mu^2)}{\sigma^2 + \tau^2} - \left(\frac{\sigma^2 y + \tau^2 \mu}{\sigma^2 + \tau^2} \right)^2 \right] \\
  & = \frac{\sigma^2 + \tau^2}{\tau^2 \sigma^2} \left[ \frac{(\sigma^2 y^2 + \tau^2 \mu^2)}{\sigma^2 + \tau^2} - \frac{\left(\sigma^2 y + \tau^2 \mu\right)^2}{\left(\sigma^2 + \tau^2\right)^2} \right] \\
  & = \frac{\sigma^2 + \tau^2}{\tau^2 \sigma^2} \left[ \frac{(\sigma^2 + \tau^2)(\sigma^2 y^2 + \tau^2 \mu^2)}{\left(\sigma^2 + \tau^2\right)^2} - \frac{\left(\sigma^2 y + \tau^2 \mu\right)^2}{\left(\sigma^2 + \tau^2\right)^2} \right] \\
  & = \frac{1}{\tau^2 \sigma^2(\sigma^2 + \tau^2)} \left[ (\sigma^2 + \tau^2)(\sigma^2 y^2 + \tau^2 \mu^2) - \left(\sigma^2 y + \tau^2 \mu\right)^2 \right] \\
\end{align}
$$

### Simplify the exponent

This looks a bit ugly, but we can simplify it.

If we multiply out the terms in the square bracket, we get

$$
\begin{align}
F & = \frac{1}{\tau^2 \sigma^2(\sigma^2 + \tau^2)} \left[ \sigma^4 y^2 + \tau^2\sigma^2 y^2 + \tau^2 \sigma^2 \mu^2 + \tau^4 \mu^2 - \left(\sigma^4 y^2 + 2 \sigma^2 \tau^2 \mu y + \tau^4 \mu^2 \right) \right] \\
  & = \frac{1}{\tau^2 \sigma^2(\sigma^2 + \tau^2)} \left[ \tau^2\sigma^2 y^2 + \tau^2 \sigma^2 \mu^2 - 2 \sigma^2 \tau^2 \mu y \right] \\
  & = \frac{1}{\sigma^2 + \tau^2} \left[ y^2 + \mu^2 - 2 \mu y \right] \\
  & = \frac{(y - \mu)^2}{\sigma^2 + \tau^2}
\end{align}
$$

Wow - that's a real simplification of what previously looked like a nasty formula!

### Simplify constant terms

If we recall also that 

$$
D = \frac{1}{\sqrt{a}} 
  = \frac{1}{\sqrt{\frac{\sigma^2 + \tau^2}{\tau^2 \sigma^2}}} 
  = \frac{1} {\frac{\sqrt{\sigma^2 + \tau^2}}{\tau \sigma}} 
  = \frac{\tau \sigma}{\sqrt{\sigma^2 + \tau^2}} 
$$

then our final expression for $P(Y = y)$ becomes

$$
\begin{align}
P(Y = y) & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\tau \sigma} \frac{\tau \sigma}{\sqrt{\sigma^2 + \tau^2}} \exp \left\{ -\frac{1}{2} \frac{(y - \mu)^2}{\sigma^2 + \tau^2} \right\} \\
         & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\sqrt{\sigma^2 + \tau^2}} \exp \left\{ -\frac{1}{2} \frac{(y - \mu)^2}{\sigma^2 + \tau^2} \right\} \\
         & = \frac{1}{\sqrt{2 \pi}} \frac{1}{\sigma'} \exp \left\{ -\frac{1}{2} \frac{(y - \mu)^2}{\sigma'^2} \right\} \\
\end{align}
$$

where we have set $\sigma' = \sqrt{\sigma^2 + \tau^2}$.

### Recognise the pdf

We can use that trick from above again: we recognise that this is the pdf of a Normal distribution, with mean $\mu$ and variance 
$$\sigma'^2 = \sigma^2 + \tau^2$$

We're done!

### We're done!

We have proved that if we have 

$$X \sim N(\mu, \sigma^2)$$

and 

$$ Y | X \sim N(X, \tau^2)$$

then the marginal distribution for $Y$ is

$$ Y \sim N(\mu, \sigma^2 + \tau^2)$$

In words, marginally (without knowing the value of $X$) $Y$ has the same mean as $X$, but has variance equal to the sum of the marginal variance of $X$ and the conditional variance of $Y$ given $X$.