# Hidden Markov Models
In this notebook we go through Hidden Markov Models (HHMs) from definition to implementation. We look at the following toy example (taken from Sebastian Thrun's lesson **Happy Grumpy Problem**):

[hgp]: ./assets/rainsun_happygrumpy.png

<center>

![alt text][hgp]

</center>
where we can only observe if person is Happy or Grumpy but not the rain/sun (hidden-state).

Let's define some notation for HMMs

* $\pi$: initial distribution of hidden state
* $a_{ij}$: represents the transition from state $i$ to state $j$
* $A = \left(a_{ij}\right)$: the set of state transition probabilites
* $s_t$: the state at time $t$: 
$$s_t = i \text{ with } i\in\left\{\text{rain, sun}\right\} $$
* $o_t$: the observation at time $t$: 
$$o_t = k \text{ with }k\in\left\{\text{happy, grumpy}\right\} $$
* $b$: state output probability i.e
$$b_i(k) \text{ represents the probability of generating }k \text{ in state }i$$
* $B = \left(b_i(k)\right)$ the set of state output probabilities
* a HMM is often represented by a tuple of $(\pi, A, B)$

The following notebook is organized as following

* Generating rain-sun/happy-grumpy
* Training HMMs using above generated observation to recover original probability

## Generating data
We use the following parameters for our HMM
* $\pi = [0.5, 0.5]$ i.e 
$$P(s_0 = \text{rain}) = P(s_0=\text{sun}) = 0.5$$
* transition matrix is given below
$$
\begin{array}{lcc}
            & s_{t+1}\\
     s_t    & \text{rain} & \text{sun}\\
\text{rain} & 0.6  & 0.4\\
\text{sun}  & 0.2  & 0.8
\end{array}
$$
i.e 
\begin{split}
P(s_{t+1}=\text{rain}\left|\ s_t=\text{rain}\right.) &= 0.6\\
P(s_{t+1}=\text{sun}\left|\ s_t=\text{rain}\right.) &= 0.4\\
P(s_{t+1}=\text{rain}\left|\ s_t=\text{sun}\right.) &= 0.2\\
P(s_{t+1}=\text{sun}\left|\ s_t=\text{sun}\right.) &= 0.8
\end{split}
* state output probabilites is
$$
\begin{array}{lcc}
& o_t \\
     s_t    & \text{grumpy} & \text{happy} \\
\text{rain} & 0.6  & 0.4\\
\text{sun}  & 0.1  & 0.9
\end{array}
$$
i.e
\begin{split}
P(o_t=\text{happy}\left|\ s_t=\text{rain}\right.) &= 0.4\\
P(o_t=\text{grumpy}\left|\ s_t=\text{rain}\right.) &= 0.6\\
P(o_t=\text{happy}\left|\ s_t=\text{sun}\right.) &= 0.9\\
P(o_t=\text{grumpy}\left|\ s_t=\text{sun}\right.) &= 0.1
\end{split}

In [1]:
import numpy as np

# initial probabilities
P0 = np.array([0.5, 0.5])

# transition probabilities
A = np.array([[0.6, 0.4],
              [0.2, 0.8]])

# output probabilities
B = np.array([[0.6, 0.4],
              [0.1, 0.9]])

We implement utility functions to generate hidden-states and output-states

In [9]:
def generate_hidden_states(P0, A, T):
    uv = np.random.uniform(size=(T+1))
    s0 = 0 if uv[0] < P0[0] else 1
    st = s0
    hidden_states = []
    for i in range(1, T+1):
        Pcond = A[st]
        if uv[i] < Pcond[0]:
            hidden_states.append('R')
            st = 0
        else:
            hidden_states.append('S')
            st = 1
    return hidden_states

def generate_output_states(B, hidden_states):
    uv = np.random.uniform(size=(len(hidden_states)))
    out_states = []
    for i,s in enumerate(hidden_states):
        s_idx = 0 if s == 'R' else 1
        Pcond = B[s_idx,:]
        if uv[i] < Pcond[0]:
            out_states.append('G')
        else:
            out_states.append('H')
    return out_states
            
def generate_hmm_seq(P0, A, B, T):
    hidden_states = generate_hidden_states(P0, A, T)
    out_states = generate_output_states(B, hidden_states)
    return out_states

We generate 10-sequences, each sequences of length $T=20$

In [18]:
training_datas = []
N = 10
T = 20
for i in range(N):
    training_datas.append(generate_hmm_seq(P0, A, B, T))

for i in range(N):
    print ('Observation {}-th:\t{}'.format(i, ''.join(training_datas[i])))

Observation 0-th:	HGHHHHHHHHHGHGGHHHHH
Observation 1-th:	HHHHHHHHHHHHHHHHGHHG
Observation 2-th:	GGGHHHHHHHHHHHHHHHHH
Observation 3-th:	GGHHGGHHHHHHHHHHHHHH
Observation 4-th:	GHHHHHGHHHHGGHHHHHHH
Observation 5-th:	GGHHGHGGHHHHHGHHHGHG
Observation 6-th:	HHHGHGHHHHHGHHHHHHHH
Observation 7-th:	HHHHHHGHGHGHHGHHGGHH
Observation 8-th:	HHHHHHHHHHHHHHHHHHHH
Observation 9-th:	HGHHHHHHHHHHHHHHGGGH


## HMM Decoding/Training
Now, given above observation data, one can ask question: can we recover the parameters for our HMM i.e find 

$$(\pi, A, B)\text{ that maximizes the chance that we see above observation.}$$

We will look at the following algorithm
* [Viterbi algorithm](https://en.wikipedia.org/wiki/Viterbi_algorithm) for finding the most **likely** sequence of hidden states - called the **Viterbi path**
* [Baum–Welch algorithm](https://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm) for finding the unknown parameters of a HMM

### Viterbi algorithm