<img align="left" width="30%" style="padding-right:10px;" src="Images/Ccom.png">

___
# Multiplication of General Normal Distributions


## Semme J. Dijkstra

<a href="https://teams.microsoft.com/l/channel/19%3afd7ef9823b064892bc126bc40f2b4710%40thread.tacv2/VGNSS?groupId=ed82d769-1aaa-4613-9de0-2dd04127f30a&tenantId=d6241893-512d-46dc-8d2b-be47e25f5666"><img src="Images/help.png"  title="Ask questions on Teams" align="right" width="10%" alt="Teams.com\"></a><br><br> 

In [1]:
import sys
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
from numpy import pi, cos, sin
from numpy.linalg import inv, eig
from scipy.stats.distributions import chi2
from scipy.special import binom

mypath=Path('../') # Get the path to the folder containing the mycode folder
print(mypath.resolve())
sys.path.append(str(mypath.resolve())) # add the folder to the list of paths



/home/jupyter-semmed/Modules


___
<img align="left" width="6%" style="padding-right:10px;" src="./Images/info.png">

# LaTex<br>

This is a [Jupyter](https://jupyter.org/) notebook in which heavy use of [LaTex](https://www.latex-project.org/) is made. LaTex is *\"a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents.\"* according to [the Latex project](https://www.latex-project.org/)

Latex allows the creation of macros and other type-setting methods of convenience. This particular Markdown cell you are reading is used for the definition of new LaTex operators and commands, but they are hidden from view. If you are interested: double click on this cell to enter its edit mode and you will be able see how they are implemented.

<div hidden>
$\usepackage{amsmath,amssymb}$

$\DeclareRobustCommand{\bbone}{\text{\usefont{U}{bbold}{m}{n}1}}$

$\DeclareMathOperator{\EX}{\mathbb{E}}% expected value$

$\DeclareMathOperator{\res}{\vec{r}}$

$\DeclareMathOperator{\mf}{\mu_{1}\sigma_{2}^{2}+\mu_{2}\sigma_{1}^{2}\over{\sigma_{1}^{2} + \sigma_{2}^{2}}}$
    
$\DeclareMathOperator{\sf}{{\sigma_{1}^{2}\sigma_{2}^{2}}\over{\sigma_{1}^{2}+\sigma_{2}^{2}}}$
    
$\newcommand{\ex}[1]{\mathbb{E}\{#1\}}$
    
$\newcommand{\dev}[1]{\mathbb{E}\{#1^o_i - \mathbb{E}\{#1\}\}}$
    
$\newcommand{\cov}[1]{\mathbb{E}\{(#1^o_i - \mathbb{E}\{#1\})(#1 - \mathbb{E}\{#1\})^T\}}$
    
$\newcommand{\m}[1]{\mathbf{#1}}$
</div>

___
# Foundation and Acknowledgment for this Notebook

The notation used here is consistent with the paper R. Faragher, *"Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation (Lecture Notes)"*," in IEEE Signal Processing Magazine, vol. 29, no. 5, pp. 128-132, Sept. 2012, doi: 10.1109/MSP.2012.2203621 [Faragher(2012)](./Documentation/Understanding_the_Basis_of_the_Kalman_Filter_Via_a_Simple_and_Intuitive_Derivation_Lecture_Notes.pdf), with the exception that the parameter $x$ is used instead or $r$. The paper provides one of the most clearly written and intuitively understood introductions to the realm of Kalman filtering.

One of the key statements of the paper is:

    A key property of the Gaussian function is exploited at this point: the product of two Gaussian functions is another Gaussian function. This is critical as it permits an endless number of Gaussian pdfs to be multiplied over time
    
Though the paper is otherwise thorough in its explanations it glosses over the derivation of the Gaussian pdf (I prefer the term general normal pdf to avoid confusion) from the product of two Gaussian functions, thereby producing a logical jump. If a student is willing to accept the logical jump, or not clear on the distinction between a Gaussian function and pdf, the text is easy to follow. However, it is not so easy for a student to reproduce the steps needed to bridge the logical jump. This notebook tries to clarify these steps.
   

# Gaussian Function or *Bell Curve*

A Gaussian function has the form:

$$f(x;S,a,b) = Se^{{-\left(x-a\right)^2}\over{2b^2}}\tag{1}$$


# The General Normal Distribution PDF

Imagine that we obtain a set of observations of the variable $x$ that are normally distributed with a mean $\mu_1$ and variance $\sigma_1^2$. The **p**robability **d** ensity **f** unction ( **pdf** ) is then given by the Gaussian function:

$$y(x;\mu,\sigma)\triangleq {1\over \sqrt{2\pi \sigma^{2}}}e^{-{(x-\mu)^{2}\over 2\sigma^{2}}}\tag{2}$$

known as the **general normal pdf**. Note that we may interpret this as a Gaussian function

$$y(x;\mu,\sigma)=S_1\times f(x;\mu,\sigma)\tag{3}$$

with S:

$$S={1\over \sqrt{2\pi \sigma^{2}}}$$

where $S$ is a scaling factor used to ensure that the associated cumulative probability for $-\infty\leq x\leq\infty$ is 1. 

___
# The Product of two General Normal Distributions

When reading literature on subjects like Kalman filtering you are likely to encounter a statement like:

    It is well known that the product of two general normal distributions is another Gaussian function. 
    
If the sentence above means little to you, you should definitely study this section. If you are familiar with it, but have never seen a derivation showing this to be true then you are not alone. For this type of statement that is so commonly used it is surprisingly difficult to find any derivations that substantiate it, this includes [Faragher(2012)](./Documentation/Understanding_the_Basis_of_the_Kalman_Filter_Via_a_Simple_and_Intuitive_Derivation_Lecture_Notes.pdf) explaining the basis of Kalman filtering. A good explanation may be found in section 1 of *Products and Convolutions of Gaussian Probability Density Functions* authored by [P.A. Bromiley in 2014](./Documentation/http_www_lucamartino_altervista.org_2003-003.pdf).

In this section we will follow a path similar but slightly different to [Bromiley(2014)](./Documentation/http_www_lucamartino_altervista.org_2003-003.pdf), but using annotation consistent with [Faragher(2012)](./Documentation/Understanding_the_Basis_of_the_Kalman_Filter_Via_a_Simple_and_Intuitive_Derivation_Lecture_Notes.pdf). It may be seen from the following derivation that the logical step expected to be undertaken is not a trivial one for a student that seeks understanding by working out the math.

Imagine that we have two independently obtained set of observations of the same parameter $x$ with the pdf:

$$\begin{cases}y_{1}(x;\mu_{1},\sigma_{1})\triangleq {1\over \sqrt{2\pi \sigma_{1}^{2}}}e^{-{(x-\mu_{1})^{2}\over 2\sigma_{1}^{2}}} \\ y_{2}(x;\mu_{2},\sigma_{2})\triangleq {1\over \sqrt{2\pi \sigma_{2}^{2}}}e^{-{(x-\mu_{2})^{2}\over 2\sigma_{2}^{2}}}\end{cases}\tag{4}$$

We can fuse the two PDFs by multiplying them

$$y_{1}(x;\mu_{1},\sigma_{1})\times y_{2}(x;\mu_{2},\sigma_{2})= {1\over \sqrt{2\pi \sigma_{1}^{2}}}e^{-{(x-\mu_{1})^{2}\over 2\sigma_{1}^{2}}}\times{1\over \sqrt{2\pi \sigma_{2}^{2}}}e^{-{(x-\mu_{2})^{2}\over 2\sigma_{2}^{2}}}$$

$$y_{1,2}(x;\mu_{1},\sigma_{1},\mu_{2},\sigma_{2})= {1\over 2\pi\sqrt{ \sigma_{1}^{2}\sigma_{1}^{2}}}e^{-\left({(x-\mu_{1})^{2}\over 2\sigma_{1}^{2}}+{(x-\mu_{2})^{2}\over 2\sigma_{2}^{2}}\right)}$$

We may write:

$$y_{1,2}(x;\mu_{1},\sigma_{1},\mu_{2},\sigma_{2})= {1\over 2\pi\sqrt{ \sigma_{1}^{2}\sigma_{1}^{2}}}e^{-\alpha}\tag{5}$$

where:

$$\alpha = {(x-\mu_{1})^{2}\over 2\sigma_{1}^{2}}+{(x-\mu_{2})^{2}\over 2\sigma_{2}^{2}}$$

Let us expand $\alpha$:

$$\alpha = {x^2-2x\mu_{1}+\mu_{1}^{2}\over 2\sigma_{1}^{2}}+{x^2-2x\mu_{2}+\mu_{2}^{2}\over 2\sigma_{2}^{2}}$$

Common denominator:

$$\alpha = \sigma_{2}^{2}\left({x^2-2x\mu_{1}+\mu_{1}^{2}\over 2\sigma_{1}^{2}}\right)+\sigma_{1}^{2}\left({x^2-2x\mu_{2}+\mu_{2}^{2}\over 2\sigma_{2}^{2}}\right)$$

Expand:

$$\alpha = {{{x^2\sigma_{2}^{2}-2x\mu_{1}\sigma_{2}^{2}+\mu_{1}^{2}\sigma_{2}^{2}+ x^2\sigma_{1}^{2}-2x\mu_{2}\sigma_{1}^{2}+\mu_{2}^{2}\sigma_{1}^{2}}}\over{2\sigma_{1}^{2}\sigma_{2}^{2}}}$$

Combine:

$$\alpha = {{{x^2(\sigma_{1}^{2} + \sigma_{2}^{2})-2x(\mu_{1}\sigma_{2}^{2}+\mu_{2}\sigma_{1}^{2})+\mu_{1}^{2}\sigma_{2}^{2}+\mu_{2}^{2}\sigma_{1}^{2}}}\over{2\sigma_{1}^{2}\sigma_{2}^{2}}}$$

Divide numerator and denominator by $(\sigma_{1}^{2} + \sigma_{2}^{2})$:

$$\alpha = {{{x^2-2x\left(\mf\right)}+{\left({\mu_{1}^{2}\sigma_{2}^{2}+\mu_{2}^{2}\sigma_{1}^{2}}\over{\sigma_{1}^{2}+\sigma_{2}^{2}}\right)}}\over{\sf}}$$

We recognize the quadratic form for $x$ and $\mf$:

$$\alpha = {{{x^2-2x\left(\mf\right) + \left(\mf\right)^2- \left(\mf\right)^2}+{\left({\mu_{1}^{2}\sigma_{2}^{2}+\mu_{2}^{2}\sigma_{1}^{2}}\over{\sigma_{1}^{2}+\sigma_{2}^{2}}\right)}}\over{2\sf}}$$

Thus if we substitute:

$$\mu_f = \mf\tag{6}$$

and:

$$\sigma_f = \sf\tag{7}$$

We find:

$$\alpha = {{{x^2-2x\mu_f + \mu_f^2-{\left(\mu_{1}\sigma_{2}^{2}+\mu_{2}\sigma_{1}^{2}\over{\sigma_{1}^{2} + \sigma_{2}^{2}}\right)}^2}+{\left({\mu_{1}^{2}\sigma_{2}^{2}+\mu_{2}^{2}\sigma_{1}^{2}}\over{\sigma_{1}^{2}+\sigma_{2}^{2}}\right)}}\over{2\sigma_f^2}}$$

Rearrange:

$$\alpha = {{(x-\mu_f)^2}\over{2\sigma_f^2}}+{{{\left({\mu_{1}^{2}\sigma_{2}^{2}+\mu_{2}^{2}\sigma_{1}^{2}}\over{2\sigma_{1}^{2}+\sigma_{2}^{2}}\right)}-{\left(\mu_{1}\sigma_{2}^{2}+\mu_{2}\sigma_{1}^{2}\over{\sigma_{1}^{2} + \sigma_{2}^{2}}\right)}^2}\over{2\sigma_f^2}}$$

Define $\beta$:

$$\beta = {{{\left({\mu_{1}^{2}\sigma_{2}^{2}+\mu_{2}^{2}\sigma_{1}^{2}}\over{2\sigma_{1}^{2}+\sigma_{2}^{2}}\right)}-{\left(\mu_{1}\sigma_{2}^{2}+\mu_{2}\sigma_{1}^{2}\over{\sigma_{1}^{2} + \sigma_{2}^{2}}\right)}^2}\over{2{{\sigma_{1}^{2}\sigma_{2}^{2}}\over{\sigma_{1}^{2}+\sigma_{2}^{2}}}}}$$

Multiply numerator and denominator by ${(\sigma_{1}^{2}+\sigma_{2}^{2})}^2$:

$$\beta = {(\mu_{1}^2\sigma_{2}^{2}+\mu^2_{2}\sigma_{2}^{2})(\sigma_{1}^{2}+\sigma_{2}^{2})-{(\mu_{1}^2\sigma_{2}^{2}+\mu^2_{2}\sigma_{2}^{2})}^2\over{2\sigma_{1}^{2}\sigma_{2}^{2}(\sigma_{1}^{2}+\sigma_{2}^{2})}}$$

Rearrange:

$$\beta = {{\mu_{1}^2(\sigma_{1}^{2}\sigma_{2}^{2}+\sigma_{1}^{4}-\sigma_{1}^{4}) + \mu_{2}^2(\sigma_{1}^{2}\sigma_{2}^{2}+\sigma_{2}^{4}-\sigma_{2}^{4}) - 2\mu_{1}\mu_{2}\sigma_{1}^{2}\sigma_{2}^{2}}\over{2\sigma_{1}^{2}\sigma_{2}^{2}(\sigma_{1}^{2}+\sigma_{2}^{2})}}$$

Simplify:

$$\beta = {{\mu_{1}^2(\sigma_{1}^{2}\sigma_{2}^{2}) + \mu_{2}^2(\sigma_{1}^{2}\sigma_{2}^{2}) - 2\mu_{1}\mu_{2}\sigma_{1}^{2}\sigma_{2}^{2}}\over{2\sigma_{1}^{2}\sigma_{2}^{2}(\sigma_{1}^{2}+\sigma_{2}^{2})}}$$

Divide numerator and denominator by $(\sigma_{1}^{2}\sigma_{2}^{2})$:

$$\beta = {{\mu_{1}^2 - 2\mu_{1}\mu_{2} + \mu_{2}^2}\over{2(\sigma_{1}^{2}+\sigma_{2}^{2})}}$$

Simplify:

$$\beta = {{(\mu_{1}-\mu_{2})}^2\over{2(\sigma_{1}^{2}+\sigma_{2}^{2})}}$$

where The term $\beta$ is a constant over $x$ and does not affect the central tendency, substitute $\beta$

$$\alpha = {{(x-\mu_f)^2}\over{2\sigma_f^2}}+\beta$$

Substituting $\alpha$ back into equation ${(3)}$:

$$y_{1,2}(x;\mu_{1},\sigma_{1},\mu_{2},\sigma_{2})= {1\over 2\pi\sqrt{ \sigma_{1}^{2}\sigma_{1}^{2}}}e^{-{{(x-\mu_f)^2}\over{2\sigma_f^2}}+\beta}$$

Rearrange:

$$y_{1,2}(x;\mu_{1},\sigma_{1},\mu_{2},\sigma_{2})= {1\over 2\pi\sqrt{ \sigma_{1}^{2}\sigma_{1}^{2}}}e^{-{{(x-\mu_f)^2}\over{2\sigma_f^2}}}e^{\beta}$$

From eqn $(7)$ we find that $\sigma_{1}^{2}\sigma_{2}^{2}=\sigma_f^2(\sigma_{1}^{2}+\sigma_{2}^{2})$

$$y_{1,2}(x;\mu_{1},\sigma_{1},\mu_{2},\sigma_{2})= {1\over 2\pi\sqrt{ \sigma_f^2(\sigma_{1}^{2}+\sigma_{2}^{2})}}e^{-{{(x-\mu_f)^2}\over{2\sigma_f^2}}}e^{\beta}$$

Rearrange:
$$y_{1,2}(x;\mu_{1},\sigma_{1},\mu_{2},\sigma_{2})= {1\over\sqrt{2\pi\sigma_f^2}}e^{-{{(x-\mu_f)^2}\over{2\sigma_f^2}}}{1\over\sqrt{2\pi(\sigma_{1}^{2}+\sigma_{2}^{2})}}e^{\beta}$$

Substitute $\beta$ back:

$$y_{1,2}(x;\mu_{1},\sigma_{1},\mu_{2},\sigma_{2})= {1\over\sqrt{2\pi\sigma_f^2}}e^{-{{(x-\mu_f)^2}\over{2\sigma_f^2}}}{1\over\sqrt{2\pi(\sigma_{1}^{2}+\sigma_{2}^{2})}}e^{{(\mu_{1}-\mu_{2})}^2\over{2(\sigma_{1}^{2}+\sigma_{2}^{2})}}$$

We recognize this as the product of a regular normal distribution (eqn $2$) centered on $\mu_f$ with standard deviation $\sigma_f$ and a Gaussian function (eqn $1$) for parameters $\mu_1,\mu_2,\sigma_1,$ and $\sigma_2$ that acts as a scalar for the domain of $x$.

$$y_{1,2}(x;\mu_{f},\sigma_{f})=S\times y_f(x;\mu_{f},\sigma_{f})\tag{8}$$

It is important to note that many Gaussian estimation techniques (e.g. Kalman filtering) use only the Gaussian function defining parameters $\mu_f$ and $\sigma_f$ in their definition, not the functions themselves though they're implied. Subtituting $y_{1,2}$  by a general normal distribution $y_f$ is a scaling, it does not bias the results and allows for the usual interpretation of probability i.e., the integrand of $y_f$ over the full domain is equal to one.


You now know that when reading literature on subjects like Kalman filtering you encounter a statement like:

    It is well known that the product of two general normal distributions is another normal distribution. 
    
it is wrong; it should read either as the sentence that this section started of with instead, or at least should be modified to say:

    It is well known that the product of two general normal distributions is a scaled general normal distribution.

___
# Interpretation

Blabla! bla?