# Hidden Markov Models with continous state space

* Many physical system can be modelled with [Markov chains with continous state space](04_markov_chains_with_continous_state_space.ipynb).
* Unfortunately, the indernal state of the system is not directly observable.
* By adding a model for the measurement procedure we get a Hidden Markov model.


In [2]:
%config IPCompleter.greedy=True

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import sklearn
import string

from pandas import Series
from pandas import DataFrame
from typing import List,Tuple

from pandas import Categorical
from pandas.api.types import CategoricalDtype

from tqdm import tnrange#, tqdm_notebook
from sklearn.linear_model import LogisticRegression
from plotnine import *

# Local imports
from common import *
from convenience import *

from scipy.stats import norm
from scipy.stats import binom
from scipy.stats import multivariate_normal


## I. Hidden Markov models with a real-valued state

Let us consider a simple Hidden Markov model

\begin{align*}
x_{i+1}&=ax_i+ w_i\\
y_i&=cx_i+v_i 
\end{align*}

where $y_i$ is the observable quantity and $x_i$ is the hidden system state.
* Nonlinearity $w_i$ is modelled by a normal distribution $\mathcal{N}(0, \sigma_i)$. 
* Measurement noise $v_i$ is modelled by a normal distribution $\mathcal{N}(0, \tau_i)$.
* Quantities $x_0, w_1, \ldots, w_n, v_1, \ldots, v_n$ are assumed to be independent.
* The initial state $x_0$ is distributed according to the normal distribution $\mathcal{N}(\mu_0, \sigma_0)$. 

### Standard questions 

Similarly to Hidden Markov moddels with discrete statespace we can as following questions:
* What is the distribution of $x_i$ given observations $y_1,\ldots, y_{i-1}$?

* What is the distribution of $x_i$ given observations $y_{i+1},\ldots, y_{n}$?

* What is the distribution of $x_i$ given observations $y_{1},\ldots, y_{n}$?



To answer the first question we need to compute prior 

\begin{align*}
\pi_{X_i}(x_i)=p[x_i|y_1\ldots, y_{i-1}]\enspace.
\end{align*}

To answer the second question we need to compute likelihood

\begin{align*}
\lambda_{X_i}(x_i)=p[y_{i+1},\ldots, y_n|x_i]\enspace.
\end{align*}

To answer the third question we need to compute the marginal posterior

\begin{align*}
p_{X_i}(x_i)=p[x_i|y_1,\ldots, y_n]\propto \pi_{X_i}(x_i)\cdot p[y_i|x_i]\cdot \lambda_{X_i}(x_i) \enspace.
\end{align*}




## III. Conditional normal distribution as information fusion gate

* problem
*

## III. Prior propagation

We use same tricks to simplify the derivation of prior propagation rules.

* Evolution of states conditioned on observations $y_1,\ldots, y_{i-1}$

\begin{align*}
x_i=\mu_i+\varepsilon_i,\qquad \varepsilon_i\sim\mathcal{N}(0, \rho_i)\enspace.
\end{align*}

* Evolution of observations conditioned on $x_i$

\begin{align*}
y_i=cx_i+v_i= c(\mu_i+\varepsilon_i)+ v_i=c\mu_i + c\varepsilon_i+ v_i\enspace.
\end{align*}

### Base

By definition $x_1=a x_0+w_0$ for $w_0\sim\mathcal{N}(0, \sigma_0)$ and thus $\rho_1=\sigma_0$.

### General induction step

Assume that $p[x_{i-1}|y_{1},\ldots, y_{i-1}]$ is a normal distribution $\mathcal{N}(\mu_{i-1},\rho_{i-1})$ and we can express 

\begin{align*}
x_{i-1}=\mu_{i-1}+\varepsilon_{i-1}, \qquad \varepsilon_{i-1}\sim\mathcal{N}(0,\rho_{i-1})\enspace.
\end{align*}

Now let us simplify the probability 

\begin{align*}
p[x_i, x_{i-1}|y_1,\ldots, y_{i-1}]
&= p[x_i| x_{i-1},y_1,\ldots, y_{i-1}]\cdot p[x_{i-1}|y_1,\ldots, y_{i-1}]\\
&= p[x_i| x_{i-1}]\cdot p[x_{i-1}|y_1,\ldots, y_{i-1}]\\
\end{align*}

Note that the right term can be further expanded

\begin{align*}
p[x_{i-1}|y_1,\ldots, y_{i-1}] 
&=\frac{p[y_{i-1}|x_{i-1},y_1,\ldots, y_{i-2}]\cdot p[x_{i-1}|y_1,\ldots, y_{i-2}]}{p[y_{i-1}|y_1,\ldots, y_{i-1}]}\\
& \propto p[y_{i-1}|x_{i-1}]\cdot p[x_{i-1}|y_1,\ldots, y_{i-2}]\enspace.
\end{align*}

As a result, we can simplify

\begin{align*}
p[x_{i-1}|y_1,\ldots, y_{i-1}] 
&\propto
\exp\Biggl(-\frac{(y_{i-1}-cx_{i-1})^2}{2\tau_{i-1}^2}\Biggr)\cdot
\exp\Biggl(-\frac{(x_{i-1}-\mu_{i})^2}{2\rho_{i-1}^2}\Biggr)\\
&\propto\exp\Biggl(-\frac{\rho_{i-1}^2(y_{i-1}-cx_{i-1})^2 + \tau_{i-1}^2(x_{i-1}-\mu_{i})^2}{2\tau_{i-1}^2\rho_{i-1}^2}\Biggr)\\
&\propto\exp\Biggl(-\frac{(\rho_{i-1}^2c^2+ \tau_{i-1}^2)x_{i-1}^2-2(\rho_{i-1}^2y_{i-1}c -2\tau_{i-1}^2\mu_{i})x_{i-1}}{2\tau_{i-1}^2\rho_{i-1}^2}\Biggr)\\
&\propto\exp\Biggr(-\frac{(\rho_{i-1}^2c^2+ \tau_{i-1}^2)\bigl(x_{i-1}-\frac{\rho_{i-1}^2y_{i-1}c -2\tau_{i-1}^2\mu_{i}}{\rho_{i-1}^2c^2+ \tau_{i-1}^2}\bigr)^2}{2\tau_{i-1}^2\rho_{i-1}^2}\Biggr)\\
\end{align*}

and thus $x_{i-1}$ conditioned on the observations $y_1,\ldots, y_{i-1}$ has a normal distribution $\mathcal{N}(\mu_{i-1}^*, \sigma_{i-1}^*)$ with the parameters

\begin{align*}
\mu_{i-1}^*&=\frac{\rho_{i-1}^2y_{i-1}c -2\tau_{i-1}^2\mu_{i}}{\rho_{i-1}^2c^2+ \tau_{i-1}^2}\\
\sigma_{i-1}^{*2}&=\frac{\tau_{i-1}^2\rho_{i-1}^2}{\rho_{i-1}^2c^2+ \tau_{i-1}^2}
\end{align*}



Thus the joint probability can be expressed

\begin{align*}
p[x_i, x_{i-1}|y_1,\ldots, y_{i-1}]&\propto
\exp\Biggl(-\frac{(x_i-ax_{i-1})^2}{2\sigma_{i-1}^2}\Biggr)\cdot
\exp\Biggl(-\frac{(x_{i-1}-\mu_{i-1}^*)^2}{2\sigma_{i-1}^{*2}}\Biggr)\\
&\propto
\exp\Biggl(-\frac{\sigma_{i-1}^{*2}(x_i-ax_{i-1})^2+\sigma_{i-1}^2(x_{i-1}-\mu_{i-1}^*)^2}{2\sigma_{i-1}^2\sigma_{i-1}^{*2}}\Biggr)\\
&\propto
\exp\Biggl(-\frac{\sigma_{i-1}^{*2}x_i^2+ (\sigma_{i-1}^{*2}a^2++\sigma_{i-1}^2)x_{i-1}^2 -2\sigma_{i-1}^{*2}ax_ix_{i-1}  -2\sigma_{i-1}^2x_{i-1}\mu_{i-1}^*}{2\sigma_{i-1}^2\sigma_{i-1}^{*2}}\Biggr)\\
\end{align*}

which is a dendity function of a two-dimensional normal distrinution with parameters

\begin{align*}
??
\end{align*}

Finally, we can use the fact that marginal distribution of a normal distrinution is also a normal distribution and express the parameters of the resulting normal distribution

\begin{align*}
\mu_i&= ??\\
\sigma_i^2&= ??\\
\end{align*}

This completes the recursion.

## III. Likelihood propagation

The derivation of likelihood propagation formulae is exactly analogous. Also note that we can derive these formulae by revering the chain and using already derived formulae for belief propagation.



### IV. Marginal posterior

The derivation of marginal posterior is also analogous. Again note that engineers use maximal aposteriori estimate and thus the estimate of mean is often needed.