# Hidden Markov Models
In this notebook we go through Hidden Markov Models (HHMs) from definition to implementation. We look at the following toy example (taken from Sebastian Thrun's lesson **Happy Grumpy Problem**):

[hgp]: ./assets/rainsun_happygrumpy.png

<center>

![alt text][hgp]

</center>
where we can only observe if person is Happy or Grumpy but not the rain/sun (hidden-state).

Let's define some notation for HMMs

* $\pi$: initial distribution of hidden state
* $a_{ij}$: represents the transition from state $i$ to state $j$
* $A = \left(a_{ij}\right)$: the set of state transition probabilites
* $s_t$: the state at time $t$: 
$$s_t = i \text{ with } i\in\left\{\text{rain, sun}\right\} $$
* $o_t$: the observation at time $t$: 
$$o_t = k \text{ with }k\in\left\{\text{happy, grumpy}\right\} $$
* $b$: state output probability i.e
$$b_i(k) \text{ represents the probability of generating }k \text{ in state }i$$
* $B = \left(b_i(k)\right)$ the set of state output probabilities
* a HMM is often represented by a tuple of $(\pi, A, B)$

The following notebook is organized as following

* Generating rain-sun/happy-grumpy
* Training HMMs using above generated observation to recover original probability

##### Generating data
We use the following parameters for our HMM
* $\pi = [0.5, 0.5]$ i.e 
$$P(s_1 = \text{rain}) = 0.4, P(s_1=\text{sun}) = 0.6$$
* transition matrix is given below
$$
\begin{array}{lcc}
            & s_{t}\\
     s_{t+1}& \text{rain} & \text{sun}\\
\text{rain} & 0.6  & 0.2\\
\text{sun}  & 0.4  & 0.8
\end{array}
$$
i.e 
\begin{split}
P(s_{t+1}=\text{rain}\left|\ s_t=\text{rain}\right.) &= 0.6\\
P(s_{t+1}=\text{sun}\left|\ s_t=\text{rain}\right.) &= 0.4\\
P(s_{t+1}=\text{rain}\left|\ s_t=\text{sun}\right.) &= 0.2\\
P(s_{t+1}=\text{sun}\left|\ s_t=\text{sun}\right.) &= 0.8
\end{split}
* state output probabilites is
$$
\begin{array}{lcc}
& s_t \\
     o_t    & \text{rain} & \text{sun} \\
\text{grumpy} & 0.6  & 0.1\\
\text{happy}  & 0.4  & 0.9
\end{array}
$$
i.e
\begin{split}
P(o_t=\text{happy}\left|\ s_t=\text{rain}\right.) &= 0.4\\
P(o_t=\text{grumpy}\left|\ s_t=\text{rain}\right.) &= 0.6\\
P(o_t=\text{happy}\left|\ s_t=\text{sun}\right.) &= 0.9\\
P(o_t=\text{grumpy}\left|\ s_t=\text{sun}\right.) &= 0.1
\end{split}

In [1]:
import numpy as np

# initial probabilities
P0 = np.array([0.4, 0.6])

# transition probabilities
A = np.array([[0.6, 0.2],
              [0.4, 0.8]])

# output probabilities
B = np.array([[0.6, 0.1],
              [0.4, 0.9]])

logP0 = np.log(P0)
logA  = np.log(A)
logB  = np.log(B)

# map to state to index
hstate_idx_map = {'R' :  0, 'S' :  1}
hidx_state_map = { 0  : 'R', 1  : 'S'}

ostate_idx_map = {'G' : 0, 'H' : 1}

We implement utility functions to generate hidden-states and output-states

In [2]:
def generate_hidden_states(P0, A, T):
    uv = np.random.uniform(size=(T))
    st = 0 if uv[0] < P0[0] else 1
    hidden_states = [hidx_state_map[st]]
    for i in range(1, T):
        Pcond = A[:, st]
        if uv[i] < Pcond[0]:            
            st = 0
        else:            
            st = 1
        # add new hidden-state
        hidden_states.append(hidx_state_map[st])
        
    return hidden_states

def generate_output_states(B, hidden_states):
    uv = np.random.uniform(size=(len(hidden_states)))
    out_states = []
    for i,s in enumerate(hidden_states):
        s_idx = hstate_idx_map[s]
        Pcond = B[:, s_idx]
        
        if uv[i] < Pcond[0]:
            out_states.append('G')
        else:
            out_states.append('H')
    return out_states
            
def generate_hmm_seq(P0, A, B, T):
    hidden_states = generate_hidden_states(P0, A, T)
    out_states = generate_output_states(B, hidden_states)
    return hidden_states, out_states

def generate_data(N, T):
    hidden_datas   = []
    training_datas = []
    for i in range(N):
        h_states, o_states = generate_hmm_seq(P0, A, B, T)
        hidden_datas.append(h_states)
        training_datas.append(o_states)
    
    return hidden_datas, training_datas

def count(hidden_datas, training_datas):
    init_count  = {}
    trans_count = {}
    out_count = {}
    for h,t in zip(hidden_datas, training_datas):
        init_count[h[0]] = init_count.get(h[0], 0) + 1
        for i in range(len(h)):
            oh = t[i] + h[i]
            out_count[oh] = out_count.get(oh, 0) + 1
            if i > 0:
                ts = h[i] + h[i-1]
                trans_count[ts] = trans_count.get(ts, 0) + 1
    return init_count, trans_count, out_count        

We generate 10-sequences, each sequences of length $T=20$

In [3]:
N = 1000
T = 100
hidden_datas, training_datas = generate_data(N, T)

In [4]:
init_count, trans_count, out_count = count(hidden_datas, training_datas)

In [5]:
print(init_count)
print('Prob s_t+1=S|s_t=S = {:.2f}'.format(trans_count['SS']/(trans_count['SS'] + trans_count['RS'])))
print('Prob s_t+1=S|s_t=R = {:.2f}'.format(trans_count['SR']/(trans_count['SR'] + trans_count['RR'])))
print('Prob o_t=H|s_t=S   = {:.2f}'.format(out_count['HS']/(out_count['HS'] + out_count['GS'])))
print('Prob o_t=H|s_t=R   = {:.2f}'.format(out_count['HR']/(out_count['HR'] + out_count['GR'])))

{'R': 380, 'S': 620}
Prob s_t+1=S|s_t=S = 0.80
Prob s_t+1=S|s_t=R = 0.40
Prob o_t=H|s_t=S   = 0.90
Prob o_t=H|s_t=R   = 0.30


In [6]:
N = 10
T = 5
hidden_datas, training_datas = generate_data(N, T)

for i in range(N):
    print ('Hidden-state {}-th:\t{}'.format(i, ''.join(hidden_datas[i])))
    print ('Observation  {}-th:\t{}\n'.format(i, ''.join(training_datas[i])))

Hidden-state 0-th:	RRSSS
Observation  0-th:	GGGHH

Hidden-state 1-th:	SSSSS
Observation  1-th:	HHHHH

Hidden-state 2-th:	SSSSS
Observation  2-th:	HHGHH

Hidden-state 3-th:	SSSRR
Observation  3-th:	HHHGG

Hidden-state 4-th:	SSSSS
Observation  4-th:	HHHHH

Hidden-state 5-th:	RSRSS
Observation  5-th:	GHGHH

Hidden-state 6-th:	RSSSS
Observation  6-th:	GHHGH

Hidden-state 7-th:	SSSSS
Observation  7-th:	HHHGH

Hidden-state 8-th:	RRRRS
Observation  8-th:	GGHGH

Hidden-state 9-th:	RSSSS
Observation  9-th:	HHGHH



## HMM Decoding/Training
Now, given above observation data, one can ask question: can we recover the parameters for our HMM i.e find 

$$(\pi, A, B)\text{ that maximizes the chance that we see above observation.}$$

We will look at the following algorithm
* [Viterbi algorithm](https://en.wikipedia.org/wiki/Viterbi_algorithm) for finding the most **likely** sequence of hidden states - called the **Viterbi path**
* [Baum–Welch algorithm](https://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm) for finding the unknown parameters of a HMM

### Viterbi algorithm
Suppose we know the model parameters i.e transition matrix $A$ and emission matrix $B$, and we observe a sequence of output $o_1,\ldots,o_T$, we know want to find the hidden-state $s_1,\ldots,s_T$ that most likely produce the observed sequence $o_t$.

The above problem can be solved by Viterbi algorithm (see [wiki](https://en.wikipedia.org/wiki/Viterbi_algorithm) for more detail). The main idea is to find $s_1,\ldots,s_T$ that maximizes the probability
$$
\mathrm{arg}\max_{s_1,\ldots,s_T}P(o_1,\ldots,o_T,s_1,\ldots,s_T)
$$

Using Markov assumption for HMMs, we can write
$$
P(o_1,\ldots,o_t,s_1,\ldots,s_t) = p(o_t|s_t) \times p(s_t|s_{t-1}) \times P(o_1,\ldots,o_{t-1},s_1,\ldots,s_{t-1})
$$
Look at above recursive form, we can derive the following dynamics programming 

* $V_{1,s} = P(o_1,s_1=s)=P(o_1|s_1=s)\times \pi_s$
* $V_{t,s} = \max_{x\in S} P(o_t|s_t=s)\times P(s_t=s|s_{t-1}=x) \times V_{t-1, x}$

This allow to find the probability of the most probable state sequence $s_1,...,s_t$ that ends at state $s_t=s$.

Let's implement Viterbi algorithm now

In [7]:
def viterbi_path_eval(hidden, observation, logP0, logA, logB, debug = False):
    # get state at t = 0
    s_0   = hstate_idx_map[hidden[0]]
    obs_0 = ostate_idx_map[observation[0]]   
    
    path_val = logP0[s_0] + logB[obs_0, s_0]
    if debug:
        print ('step {:2d}-th path={}'.format(0, path_val))
    
    # continue the path
    prev_s = s_0
    for i in range(1, len(hidden)):
        s   = hstate_idx_map[hidden[i]]
        obs = ostate_idx_map[observation[i]]   
        path_val += logA[s, prev_s] + logB[obs, s]
        
        if debug:
            print ('step {:2d}-th path={}'.format(i, path_val))
        
        prev_s = s
    
    return path_val

def viterbi_decode(observation, logP0, logA, logB, debug = False):
    hidden_states = []
    S = len(P0)
    
#     print('logA {}'.format(logA))
#     print('\nlogB {}\n'.format(logB))
    # compute V_{1,s}
    obs = ostate_idx_map[observation[0]]    
    V = logB[obs] + logP0
    if debug:
        print('step {:2d}-th V={}\n'.format(0, V))
        print('logP0 = {}'.format(logP0))
        print('logA = {}'.format(logA))
        print('logB = {}'.format(logB))
    
    prev_states = [] # to trace-back
    # dp update V_{t,s}    
    for i in range(1, len(observation)):
        obs_i = ostate_idx_map[observation[i]]
        
        # for each s, we compute trans_prob = logA[s, x] + V[x] for all x in S
        # this's equivalent to add each row of logA by V
        trans_prob = logA + V
                
        # get best x for each row
        best_states = np.argmax(trans_prob, axis=1)

        # update V with new best_states new_V[s] = logB[obs_i, s] + max_x logA[s,x] + V[x]
        V = logB[obs_i] + trans_prob[np.arange(S), best_states]
        
        if debug:
            print('\nstep {:2d}-th\ntrans_prob={}'.format(i, trans_prob))
            print('best_states={}'.format(best_states))
            print('V={}'.format(V))
            
        # keep track best-states to back-track
        prev_states.append(best_states)
        
    # trace back
    st = np.argmax(V)
    hidden_states.insert(0, hidx_state_map[st])
    for i in range(len(observation)-2,-1,-1):        
        st = prev_states[i][st]
        hidden_states.insert(0, hidx_state_map[st])
    return hidden_states

def decode_error(truth, decode):
    N = len(truth)
    err = 0
    for i in range(N):
        if truth[i] != decode[i]:
            err += 1
    return err/N

In [8]:
'''
This is quite interesting where it fails to find the most probable
Test sampe 3-th
obs      = GGHHH
hidden   = RRSRR
path-val = -8.19
decode   = RRSSS
path-val = -10.41
decode-err = 0.40
'''

idx = 9
decoded = viterbi_decode(training_datas[idx], logP0, logA, logB)
print('hidden   = {}\npath-val = {:.2f}\n'.format(''.join(hidden_datas[idx]), viterbi_path_eval(hidden_datas[idx],
                                                                                              training_datas[idx],
                                                                                             logP0, logA, logB)))
print('decode   = {}\npath-val = {:.2f}\n'.format(''.join(decoded), viterbi_path_eval(decoded,
                                                                                    training_datas[idx],
                                                                                    logP0, logA, logB)))
print('decode-err = {:.2f}\n'.format(decode_error(hidden_datas[idx], decoded)))


hidden   = RSSSS
path-val = -6.32

decode   = SSSSS
path-val = -4.13

decode-err = 0.20



In [9]:
for idx in range(N):
    decoded = viterbi_decode(training_datas[idx], logP0, logA, logB)
    print('Test sampe {}-th'.format(idx))
    print('obs      = {}'.format(''.join(training_datas[idx])))
    print('hidden   = {}\npath-val = {:.2f}'.format(''.join(hidden_datas[idx]), viterbi_path_eval(hidden_datas[idx],
                                                                                              training_datas[idx],
                                                                                             logP0, logA, logB)))
    print('decode   = {}\npath-val = {:.2f}'.format(''.join(decoded), viterbi_path_eval(decoded,
                                                                                    training_datas[idx],
                                                                                    logP0, logA, logB)))
    print('decode-err = {:.2f}\n'.format(decode_error(hidden_datas[idx], decoded)))

Test sampe 0-th
obs      = GGGHH
hidden   = RRSSS
path-val = -6.02
decode   = RRRSS
path-val = -4.36
decode-err = 0.20

Test sampe 1-th
obs      = HHHHH
hidden   = SSSSS
path-val = -1.93
decode   = SSSSS
path-val = -1.93
decode-err = 0.00

Test sampe 2-th
obs      = HHGHH
hidden   = SSSSS
path-val = -4.13
decode   = SSSSS
path-val = -4.13
decode-err = 0.00

Test sampe 3-th
obs      = HHHGG
hidden   = SSSRR
path-val = -4.11
decode   = SSSRR
path-val = -4.11
decode-err = 0.00

Test sampe 4-th
obs      = HHHHH
hidden   = SSSSS
path-val = -1.93
decode   = SSSSS
path-val = -1.93
decode-err = 0.00

Test sampe 5-th
obs      = GHGHH
hidden   = RSRSS
path-val = -5.61
decode   = RRRSS
path-val = -5.21
decode-err = 0.20

Test sampe 6-th
obs      = GHHGH
hidden   = RSSSS
path-val = -5.48
decode   = RSSSS
path-val = -5.48
decode-err = 0.00

Test sampe 7-th
obs      = HHHGH
hidden   = SSSSS
path-val = -4.13
decode   = SSSSS
path-val = -4.13
decode-err = 0.00

Test sampe 8-th
obs      = GGHGH
hidden 

### Baum-Welch algorithm
Now, let's assume that we only observe the output-states without knowing about the parameters $(\pi,A,B)$ and we want to recover $(\pi, A,B)$. This is the goal of [Baum-Welch](https://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm) which contains the following steps

1. Initialize $\theta=(\pi, A, B)$ by random where
\begin{split}
\pi(s) &= P(s_1=s)\\
A(s,x) &= P(s_{t}=s | s_{t-1} = x)\\
B(o,s) &= P(o_t=o|s_t=s)
\end{split}

2. Forward procedure, let 
$$
    \alpha_s(t) = P\left(o_1,\ldots,o_t,s_t=s|\theta\right)
$$
the probability of seeing $(o_1,\ldots,o_t)$ **and** being in hidden-state $s$ at time $t$. This is found recursively (via total-probability & Bayes-rule)
\begin{split}
\alpha_s(1) &= \pi(s)\times B(o_1, s)\\
\alpha_s(t) &= B(o_t,s) \times \sum_{x\in S} A(s, x) \times \alpha_x(t-1)
\end{split}

3. Backward procedure, let
$$
\beta_s(t) = P(o_{t+1},\ldots, o_T | s_t = s;\theta)
$$
that is the probability of the ending partial observation is $o_{t+1},\ldots, o_{T}$ **given** the hidden-state at time $t$ equals $s$. We calculate $\beta_s(t)$ as
\begin{split}
\beta_s(T) &= 1\\
\beta_s(t) &= \sum_{x\in S} \beta_x(t+1) \times B(o_{t+1}, x) \times A(x,s)
\end{split}
Let's derive above 
\begin{split}
\beta_s(t) &= \frac{P(o_{t+1},\ldots, o_T , s_t = s|\theta)}{P(s_t=s|\theta)}\\
&= \frac{\sum_{x\in S} P(o_{t+1},\ldots, o_T , s_t = s, s_{t+1}=x|\theta) }{P(s_t=s|\theta)}\\
&= \frac{\sum_{x\in S} P(o_{t+1},\ldots, o_T | s_t = s, s_{t+1}=x;\theta)\times P(s_t = s, s_{t+1}=x|\theta) }{P(s_t=s|\theta)}\\
&= \sum_{x\in S} P(o_{t+1},\ldots, o_T |s_{t+1}=x;\theta)\times P(s_{t+1}=x|s_t = s;\theta) \\
&= \sum_{x\in S} P(o_{t+2},\ldots, o_T |s_{t+1}=x;\theta)\times P(o_{t+1}|s_{t+1}=x;\theta)\times  P(s_{t+1}=x|s_t = s;\theta)\\
&= \sum_{x\in S} \beta_x(t+1)\times B(o_{t+1}, x)\times A(x,s)
\end{split}

4. Temporary variables, let
\begin{split}
\gamma_s(t) &= P(s_t=s|o_1,\ldots,o_T;\theta) \\
            &= \frac{P(s_t=s, o_1,\ldots,o_T|\theta)}{P(o_1,\ldots,o_T|\theta)}\\
            &= \frac{P(s_t=s, o_1,\ldots,o_t|\theta)P(o_{t+1},\ldots,o_T|s_t=s, o_1,\ldots,o_t;\theta)}{\sum_{s\in S}P(s_t=s, o_1,\ldots,o_T|\theta)}\\
            &= \frac{\alpha_s(t) \beta_s(t)}{\sum_{s\in S}\alpha_s(t)\beta_s(t)}
\end{split}
Similarly, we compute
$$
\xi_{x,s}(t)=P(s_t=s,s_{t+1}=x|o_1,\ldots,o_T;\theta) = \frac{\alpha_s(t)\cdot A(x,s)\cdot \beta_x(t+1)\cdot  B(o_{t+1},x)}{\sum_{s\in S}\sum_{x \in S} \alpha_s(t)\cdot A(x,s)\cdot \beta_x(t+1)\cdot  B(o_{t+1},x)} 
$$

5. Parameters update, $\theta$ can now be
\begin{split}
    \pi_s  &= \mathbb{E}\left[\gamma_s(1)\right]\\
    A(x,s) &= \frac{\mathbb{E}\left[\sum_{t=1}^{T-1}\xi_{x,s}(t)\right]}{\mathbb{E}\left[\sum_{t=1}^{T-1}\gamma_{s}(t)\right]}\\
    B(o,s) &= \frac{\mathbb{E}\left[\sum_{t=1}^{T}1_{o_t=o}\gamma_s(t)\right]}{\mathbb{E}\left[\sum_{t=1}^{T}\gamma_s(t)\right]}
\end{split}
where the mean is taken over all observation sequences.

Now, let's implement it 

In [73]:
def init_theta():
    tP0 = np.array([0.85, 0.15])
    tA = np.array([[0.3, 0.1],
                   [0.7, 0.9]])                   
    tB = np.array([[0.4, 0.5],
                   [0.6, 0.5]])
    return tP0, tA, tB

def forward_path(obs, P0, A, B, normalized=False):
    S = len(P0)
    T = len(obs)
    alpha = np.zeros((T,S))
    
    # compute alpha_s(1) (note that index start at 0 so time t is equivalent to index t-1)
    alpha[0, :] = np.multiply(P0, B[obs[0], :])    
    if normalized:
        alpha[0, :] /= np.sum(alpha[0, :])
    
    for t in range(1, T):
        alpha[t, :] = np.multiply(B[obs[t], :], A.dot(alpha[t-1,:]))
        if normalized:
            alpha[t, :] /= np.sum(alpha[t, :])
    
    return alpha

def backward_path(obs, A, B, normalized=False):
    S = B.shape[1]
    T = len(obs)
    beta = np.ones((T, S))
    for t in range(T-2,-1,-1):
        beta[t, :] = np.dot(beta[t+1,:], A * B[obs[t+1], :, None])
        if normalized:
            beta[t, :] /= np.sum(beta[t,:])
    return beta

def compute_temp_var(obs, alpha, beta, A, B):
    T, S = alpha.shape
    gamma = np.zeros((T,S))
    xi = np.zeros((T-1,S,S))
    
    alphabeta = alpha*beta
    prob_obs  = np.sum(alphabeta, axis=1, keepdims=True)
    gamma     = alphabeta / prob_obs
    
    for t in range(T-1):
        AB = A * B[obs[t+1], :, None]
        xi[t] = alpha[t] * beta[t+1,:,None] * AB 
        xi[t] /= prob_obs[t]    
    
    return gamma, xi

def compute_gamma_xi(obs, P0, A, B):
    alpha = forward_path(obs, P0, A, B)
    beta = backward_path(obs, A, B)
    return compute_temp_var(obs, alpha, beta, A, B)

def update_theta(obs, gamma, xi, tP0, tA, tB):
    tP0 += gamma[0, :]
    tA  += np.sum(xi, axis=0)
    for t,o in enumerate(obs):
        tB[o] += gamma[t]
        
def baum_welch(obs, init_theta, n_iter):
    P0, A, B = init_theta()
    for it in range(n_iter):
        nP0 = np.zeros_like(P0)
        nA  = np.zeros_like(A)
        nB  = np.zeros_like(B)
        cache = {}
        for o in obs:
            o = tuple(o) #so it can be used as key for dictionary
            if o in cache:
                gamma, xi = cache[o]
            else:
                gamma, xi = compute_gamma_xi(o, P0, A, B)
                cache[o] = (gamma, xi)
                
            update_theta(o, gamma, xi, nP0, nA, nB)
        
        # update theta        
        P0 = nP0 / np.sum(nP0)    
        A  = nA  / np.sum(nA, axis=0, keepdims=True)
        B  = nB  / np.sum(nB, axis=0, keepdims=True)
    return P0, A, B

In [54]:
tP0, tA, tB = init_theta()
obs = [0,1,1,0]
#obs = [1,0,1]
alpha = forward_path(obs, tP0, tA, tB)
beta = backward_path(obs, tA, tB)
gamma, xi = compute_temp_var(obs, alpha, beta, tA, tB)
print(alpha)
print(beta)
alphabeta = alpha*beta
obs_prob = np.sum(alphabeta, axis=1, keepdims=True)
print(gamma)
print(xi)

[[ 0.34        0.075     ]
 [ 0.0657      0.15275   ]
 [ 0.020991    0.0917325 ]
 [ 0.00618822  0.04862648]]
[[ 0.133143  0.127281]
 [ 0.2561    0.2487  ]
 [ 0.47      0.49    ]
 [ 1.        1.      ]]
[[ 0.82584825  0.17415175]
 [ 0.30695729  0.69304271]
 [ 0.17998404  0.82001596]
 [ 0.11289345  0.88710655]]
[[[ 0.28593281  0.02102447]
  [ 0.53991544  0.15312728]]

 [[ 0.10140018  0.07858385]
  [ 0.2055571   0.61445886]]

 [[ 0.04595337  0.06694008]
  [ 0.13403066  0.75307589]]]


In [66]:
obs1 = [0,1,1,0]
obs2 = [1, 0, 1]
gamma1, xi1 = compute_gamma_xi(obs1, tP0, tA, tB)
gamma2, xi2 = compute_gamma_xi(obs2, tP0, tA, tB)

In [67]:
nP0 = np.zeros_like(tP0)
nA  = np.zeros_like(tA)
nB  = np.zeros_like(tB)

for i in range(10):
    update_theta(obs1, gamma1, xi1, nP0, nA, nB)

for i in range(20):
    update_theta(obs2, gamma2, xi2, nP0, nA, nB)

nP0/= np.sum(nP0)    
nA /= np.sum(nA, axis=0, keepdims=True)
nB /= np.sum(nB, axis=0, keepdims=True)
print(nP0)
print(nA)
print(nB)

[ 0.85384446  0.14615554]
[[ 0.29820319  0.10593123]
 [ 0.70179681  0.89406877]]
[[ 0.35594186  0.42914219]
 [ 0.64405814  0.57085781]]


In [76]:
obs = [tuple(obs1)]*10 + [tuple(obs2)]*20
nP0, nA, nB = baum_welch(obs, init_theta, n_iter=10)
print(nP0)
print(nA)
print(nB)

[ 0.85782455  0.14217545]
[[ 0.24879284  0.1280343 ]
 [ 0.75120716  0.8719657 ]]
[[ 0.35691794  0.42734717]
 [ 0.64308206  0.57265283]]
