# Continuous HMM and change detection

Recall that in the HMM prectice session we tried to detect wet and try seasons of Singapore

* [03_change_detection_with_hidden_markov_models.ipynb](../03/03_change_detection_with_hidden_markov_models.ipynb)

but at that time we did not know how to fit model paramaters. The aim of this exercise session is to complete the task again but now with paramater fitting. 

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from matplotlib import pyplot as plt
import sklearn

from pandas import Series
from pandas import DataFrame
from typing import Tuple

from tqdm import tnrange#, tqdm_notebook
from plotnine import *

from sklearn.decomposition import PCA
from sklearn.cluster import AgglomerativeClustering

# Local imports
from common import *
from convenience import *

## I. E-step formulaes for clustering and HMM

For the clustering we must assign weight for each character image by rescaling the likelihood matrix. More precisely let $(p_{ij})$ be the probabilities that the $i$th image belongs to the cluster $j$ then 
\begin{align*}
 w_{ij} = \frac{p_{ij}}{\sum_{j} p_{ij}}\enspace.
\end{align*}
For the HMM we need to compute two marginal probabilities:
* $\gamma_{j}(i)$ probability that the $i$th internal state is $j$ given the observation vector and HMM parameters
* $\xi_{jk}(i)$ probability that the $i$th and $(i+1)$th internal states are $j$ and $k$ given the observation vector and HMM parameters.
More formally

\begin{align*}
\gamma_{j}(i)&=\Pr[x_i=j|\boldsymbol{y},\boldsymbol{\Theta}]\\
\xi_{jk}(i)&=\Pr[x_i=j, x_{i+1}=k|\boldsymbol{y},\boldsymbol{\Theta}]
\end{align*}




## II. M-step formulaes for clustering and HMM

For the clustering and HMM emission we can use gaussian mixtures discussed in the previous exercise session 

* [03_concepts_behind_expectation_maximisation_algorithm](../07/03_concepts_behind_expectation_maximisation_algorithm.ipynb)

For single observation sequence, the HMM transition probabilities we can use formulae

\begin{align*}
\beta_j&=\gamma_{i}(1)\\
\alpha_{jk}&=\frac{\sum_{i} \xi_{jk}(i)}{\sum_i \gamma_j(i)}
\end{align*}

The generalisation to multiple observation sequences as in our problem is obvious -- we just compute sums over all observations and normalise appropriately.


# Homework

## 2.1 Implement the EM-algorithm  for the HMM with normal emissions (<font color='red'>3p</font>)

Implement the weight computation for the HMM. The computation of $\gamma_{j}(i)$ has been already done in the HMM exercise session, as this is a marginal state probability given all observations. For the weights $\xi_{jk}(i)$ you need to define computation scheme that is analogous. Check the consistency of your derivations through the following formula

\begin{align*}
  \gamma_{j}(i)=\sum_k\xi_{ik}(i)
\end{align*}

Implement the M-step by updating the parameters and then assemble the entire algorithm. Use simple 100 iterations as stopping criterion. Run the algorithm on the dataset and output model parameters.
Redo the visual annotations by showing 4 types of state assignments as in the notebook 
[03_change_detection_with_hidden_markov_models.ipynb](../03/03_change_detection_with_hidden_markov_models.ipynb).


## 2.2 Experiment with the model structure (<font color='red'>3p</font>)

By defaul the EM-algorithm adjusts all model paramaters. Try different models:

* model where transition matrix is fixed and you must learn only the emision distribution;
* model where the number of states is three and all parameters are free;
* model with cascading states described in the exercise 3.1 in the notebook  [03_change_detection_with_hidden_markov_models.ipynb](../03/03_change_detection_with_hidden_markov_models.ipynb). 