# An Intuitive Guide to Hidden Markov Model (HMM)


This is an intuitive tutorial for the Hidden Markov Model. The tutorial aims to give readers a basic understanding of HMM, without using too much mathematical formula to mess a beginner's mind. The focus (scope) of the tutorial is mainly on what HMM is and how it works, so that the algorithm's proving part and the application part will be relatively brief. 

The tutorial is organized as below:

0. **Why does it matter**,

1. **What is the HMM**,

2. **The three basic problems** for HMM,

3. \*Brief **Examples**.

Have fun reading!



## 1. Introduction

In our daily life, the following scenarios can always be found:

> *A foreigner is trying to say something in a language we don’t understand. We do hear the changing sounds and tones, but we cannot figure out the **hidden meaning** of each sound piece because we don’t know the language at all.*


> *The stock price for a certain company is going up and down for some reasons. We only observe that the price is changing from day to day, but typically we don’t know the **hidden market forces** that caused all these changes.*

In both of the above cases, there are some **hidden factors** (*meaning of sounds*, *market forces*) that lead to the events we observed (*sounds of speaking*, *price movement*). However, we don’t know **what the factors are** and **how they are affecting the observable results**. 

We really want to know though, because the hidden factors **is of great interests** (*we want to understand the speech*) and **is crucial for predicting** (*we want guess tomorrow's stock price*).

**Hidden Markov Model (HMM)** is such a model to deal with this kind of problem. It is widely used in data science and machine learning, including areas such as price predicting, time-series analysis, and speech & image & gesture recognition. 


## 2. What is Hidden Markov Model

### 2.1. An Intuitive Example

To begin with, let's consider a super easy situation where we are playing with three different dice. Let's name them as `D6`, `D4`, and `D8`.

<img src="images/dice_detail.png" width=500>

> As you can see, `D6` has six possible results `[1,2,3,4,5,6]`, with a probability of getting each result equals to `1/6`. `D4` and `D8` are similarly defined.

We are playing sequentially: we pick up one of the dice, roll it to get a number. Then we put it back, and do the process again.

We pick up the first die randomly (`Pr=1/3` for each). **After that, there is a special "transition setting"**:

<img src="images/transition.png">

> If we are holding `D6` right now, next time we are most likely to pick up `D4` ($P(D6\rightarrow D4)=0.7$), and less likely to pick up `D6` ($P(D6\rightarrow D6)=0.2$) or `D8` ($P(D6\rightarrow D8)=0.1$). Similarly, the probability of pick up a certain die will change regarding to which die (`D6`|`D4`|`D8`) we are holding right now.
 
In this manner, we will pick up and roll the die one by one. For example we may pick up a sequence of dice `D4,D6,D4,D6,D6,D8`, and get a sequence of numbers, let's say, `4,1,2,5,3,8`:

<img src="images/possible_result.png">

Now, what if we write down the result numbers, and show **ONLY** the result to one of our friend? 

To give him a sense of the whole picture, we can tell him about every part of our process:
1. there are **`three dice`**,
2. there are **`eight unique numbers`** generated by the dice,
3. there is a **probabilistic `“transition setting”`** between current die and the next pick up,
4. each **die has a `probability distribution`** to generate certain result values, and
5. we **`initialize the process`** by randomly choosing a die.

For our dear friend, only the **result sequence is observable**, although he knows that it is generated by a **dice sequence that is hidden** from him. 


<img src="images/possible_result_2.png">


> Can he make a good guess on **what the dice sequence is**?

**What we have done can already be modeled by a HMM.** In another word, if our friend has read this tutorial before, he can make a pretty good guess about the hidden dice sequence, and even predict the next result number.

### 2.2. Definition of Hidden Markov Model

#### 2.2.1. Five Elements of a Hidden Markov Model

Corresponding to the five pieces of information we gave to our friend (shown within parathesis in below definitions), a Hidden Markov Model is characterized by the following **five elements**:

1. **The number of hidden `states` in the model.** $N$. (`three dice`). The hidden states are something we cannot observe. They can result in some of the observable events, and can transite from one to another.<br><br>
2. **The number of distinct `observation symbols` in the model.** $M$. (`eight unique numbers`). The observation symbols correspond to the observed output of the system being modeled. Unlike hiddent state, they can be observed by us.<br><br>
3. **The `state transition` probability distribution.** $A$. (`transition setting`). State transition transiting from one state (current) to another state (next time). This can be denoted as `transition matrix` $A$. For example, we can denote the "transition setting" in the previous example as:<br><br>$$A=\left\lgroup\matrix{~&D6&D4&D8\cr D6&0.2&0.7&0.1\cr D4&0.5&0.3&0.2\cr D8&0.3&0.1&0.6}\right\rgroup$$<br>where $A_{ij}$ means the probability to pick up die $j$ in the next time when currently holding die $i$.<br><br>
4. **The `observation symbol transition` probability distribution in each state.** $B$. (`die's probability distribution`). For each state (die), there will be a probability distribution about which observation symbol may be generated. This can also be denoted as a matrix - `emission matrix` $B$. For the previous example:<br><br>$$B=\left\lgroup\matrix{~&1&2&3&4&5&6&7&8&9\cr D6&1/6&1/6&1/6&1/6&1/6&1/6&0&0&0\cr D4&1/4&1/4&1/4&1/4&0&0&0&0&0\cr D8&1/8&1/8&1/8&1/8&1/8&1/8&1/8&1/8&1/8}\right\rgroup$$<br>where $B_{ij}$ means the probability to roll a $j$ with die $i$.<br><br>
5. **The `initial state` distribution** $\pi$. (`start by randomly pick`). In the previous example, our initial state $\pi=[\frac{1}{3},\frac{1}{3},\frac{1}{3}]$, as we randomly pick one of the three dice.

Just like that our friend cannot see the hidden dice, the hidden states are always unobservable. That's why this is a **HIDDEN** Markov Model.

#### 2.2.2. Markov Process

Then why hidden **MARKOV**? The answer is in the state transition assumption. 

An essential feature of a HMM is that the **hidden states** are performing a **`markov process`**. In a conceptual level, this means that the future state is only determined by the current state, and is irrelevant to the previous states. **In another word, the future state is conditionally independent from the previous states given the current state.**


<img src="images/markov_process.png">

> In our dicing example, according to our setting, the probability of each die to be picked in the next round **depends only on which die we are holding right now**. This is exactly an example of `markov process`.

Ref: 
[CMU-stat-lecture-note](http://www.stat.cmu.edu/~cshalizi/754/notes/lecture-09.pdf), [wikipedia](https://en.wikipedia.org/wiki/Markov_chain)

### 2.3. A Practical Definition

Given all conceptual definition above, we can define a HMM model in Python as below:

In [1]:
import numpy as np

class HMM:
    def __init__(self, Ann, Bnm, pi1n):
        self.A = np.array(Ann) # A, transition matrix (N * N)
        self.B = np.array(Bnm) # B, emisson matrix (N * M)
        self.pi = np.array(pi1n) # pi, initial state (1 * N)
        self.N = self.A.shape[0] # N, number of hidden states
        self.M = self.B.shape[1] # M, number of observation symbols

        
    # print out the HMM model.
    def printhmm(self):
        print("========================================")
        print("HMM content: ")
        print("N = ", self.N)
        print("M = ", self.M)
        print("hmm.A = ", self.A)
        print("hmm.B = ", self.B)
        print("hmm.pi = ", self.pi)
        print("========================================")


## 3. Three Basic Problems for HMMs

Then what can a HMM do? 

> **Generally speaking, you can generate a HMM based only on your observation, and perform prediction based on this model.**

To achieve this, there are **three problems** of interest that must be solved for a HMM model to be useful. 

1. **Evaluation**: Given the model ($A,B,\pi$), what is the probability of getting a given observation sequence $O=o_1\dots o_t$?
2. **Decoding**: Given the model ($A,B,\pi$) and the observation sequence $O$, what is the most likely state sequence $P=p_1\dots p_t$?
3. **Learning**: Given the observation sequence $O$, the # states $N$, and # observations $M$, how to learn a model?

In the following sections, we will continue using the rolling dice example. Assume we roll dice for three times and get a observation sequence $O=[4,1,2]$.


<img src='images/example_observation.png' width = 650/>

### 3.1. Evaluation - Forward algorithm

> Given the model ($A,B,\pi$), what is the probability of getting a given observation sequence $O=o_1\dots o_t$?

The most intuitive solution is to exhaust every possible dice sequence to generate the given ovservation sequence, calculate each probability, and sum them up. However this can be super computational costly.

Alternative solution is the `forward algorithm`. Below is a simple illustration.

Assume that we roll a die first, and get the observation $o_1=4$. We can calculate **the probability of rolling that die and get the observed $o_1$** for each possible die (`D6`|`D4`|`D8`). e.g., for `D6`, 

$$Pr(p_1=D6,o_1=4)=Pr(p_1=D6)*Pr(o_1=4|p_1=D6)=\frac{1}{3}*\frac{1}{6}=\frac{1}{18}$$

In this way, we can generate a table:


<img src="images/forward_1.png">

Each cell represents **the probability of getting the observation sequence when a certain die is picked at this step.** (a formal definition can be found in the code below). For convenient, we denote the computed value of each cell as `val(D*, o*)`.

Similarly, we can pick and roll a die again (and get 1), and we can calculate the second column:

<img src="images/forward_2.png">

Notice that to calculate each cell in $o_2$, we need value in each cell in $o_1$ (`Val(Di, o1)`).

Again, calculate the third column:

<img src="images/forward_3.png">

**Notice that to calculate each cell in $o_3$, we need value of each cell in $o_2$ (`Val(Di, o1)`). However, we DO NOT need information in $o_1$!**

In this manner (in fact, `dynamic programming`), we can calculate total probability at any step, without worrying about computational cost issue.

Below is a simple implementation:

In [2]:
# An Intuitive Guide to Hidden Markov Model (HMM)
# Forward Algorithm
# @author author
# @params O       the observation sequence
# @return alphas  a sequence of alpha generated when doing the algorithm
# @return prob    the probability of generating this observation sequence
def Forward(self, O):
    # T: length of the observation sequence
    T = len(O) 

    # alphas: an empty table to store middle results (value of each cell in the table)
    alphas = np.zeros((T, self.N), np.float) 
    # the alphas(t, i) can be formally defined as:
    # at time t, given that the hidden state is i, 
    # the probability of generating the observation sub-sequence (o1, o2, ..., ot)

    # first calculate the initial situation
    for i in range(self.N):
        alphas[0, i] = self.pi[i] * self.B[i, O[0]]

    # then for each step, calculate the next step alpha value based on the current alpha value
    # according to the table shown above, for ot+1, if the hidden state is j,
    # alphas(t+1, j) = sum(i = 1...N){alpha(t, i) * A(i, j) * B(j, ot+1)}
    #               = sum(i = 1...N){alpha(t, i) * A(i, j)} * B(j, ot+1)
    for t in range(T - 1):
        for j in range(self.N):
            sum = 0.0
            for i in range(self.N):
                sum += alphas[t, i] * self.A[i, j] # A is the transition matrix
            alphas[t + 1, j] = sum * self.B[j, O[t + 1]] # B is the emission matrix

    # the final probability is the sum of the alphas of the current step
    prob = 0.0
    for i in range(self.N):
        prob += alphas[T - 1, i]

    return alphas, prob

# add this function to the customized HMM class
HMM.Forward = Forward

### 3.2. Decoding - Viterbi algorithm

> Given the model ($A,B,\pi$) and the observation sequence $O$, what is the most likely state sequence $P=p_1\dots p_t$?

Again, as the model is given, we can exhaust every possible dice sequence. This can be super computational costly.

Alternative solution is the `Viterbi algorithm`. Below is a simple illustration.

Again, let's assume that we roll a die first, and get the observation $o_1=4$. For each possible die (`D6`|`D4`|`D8`), the **probability of rolling that die and get $o_1=4$** can be generated in the same way as in Forward algorithm.


<img src="images/viterbi_1.png">

Now let's take a look at each cell in the table. In each column (step), we want to record **the highest probability to generate the given observation for a dice sequence**. e.g., for the cell (D6,$o_2$), it records the highest probability to generate `[4,1]` when the dice sequence's second die ($p_2$) is `D6`. i.e.

$$Val(D6,o_2)=max\{Var(D6,o_1)*Pr(p_2=D6|p_1=D6)*Pr(o_2=1|p_2=D6),\\\quad \quad \quad \quad \quad \quad \quad Var(D4,o_1)*Pr(p_2=D6|p_1=D4)*Pr(o_2=1|p_2=D6),\\\quad \quad \quad \quad \quad \quad \quad Var(D8,o_1)*Pr(p_2=D6|p_1=D8)*Pr(o_2=1|p_2=D6)\}$$


<img src="images/viterbi_2.png">

Similar for the third column.

<img src="images/viterbi_3.png">

It is easy to find out again that **to calculate current cell, we only need to use computed values in the previous column!** Step by step, when we reach the final step, we can find out the sequence of hidden states that has the highest probability to generate the given observation.

Below is a simple implementation:

In [3]:
# An Intuitive Guide to Hidden Markov Model (HMM)
# Viterbi Algorithm
# @author author
# @params O             the observation sequence
# @return hiddenStates  the predicted hidden states sequence
# @return prob          the probability that the hidden states sequence generate this observation sequence
def viterbi(self, O):
    # T: length of the observation sequence
    T = len(O)
    # deltas: an empty table to store middle results (value of each cell in the table)
    deltas = np.zeros((T, self.N), np.float)
    # phi: an empty table to store which state is chosen at each cell
    phi = np.zeros((T, self.N), np.float)
    # hiddenStates: the result hiddenStates that most likely generates such an observation
    hiddenStates = np.zeros(T)
    
    # first calculate the initial situation
    for i in range(self.N):
        deltas[0, i] = self.pi[i] * self.B[i, O[0]]
        phi[0, i] = 0

    # then for each step, calculate the current step delta value based on the previous delta value
    # according to the table shown above, for ot, if the hidden state is j,
    # delta(t, j) = max(i = 1...N){delta(t-1, i) * A(i, j) * B(j, ot)}
    #             = max(i = 1...N){delta(t-1, i) * A(i, j)} * B(j, ot)
    for t in range(1, T):
        for i in range(self.N):
            # calculate each product first to make it easier to choose max
            tmp = np.array([deltas[t - 1, j] * self.A[j, i] for j in range(self.N)])
            deltas[t, i] = self.B[i, O[t]] * tmp.max() # calculate new delta
            phi[t, i] = tmp.argmax() # record which state is chosen to reach the max delta

    # the max probability in the last column is the final probability
    prob = deltas[T - 1, :].max()
    # restore the sequence of hidden states
    hiddenStates[T - 1] = deltas[T - 1, :].argmax()
    for t in range(T - 2, -1, -1):
        # notice that each time we choose a previous state to reach the highest prob for current cell
        # so the t-th hiddenstates is actually chosen in the (t+1)-th step
        hiddenStates[t] = phi[t + 1, int(hiddenStates[t + 1])]
    return hiddenStates, prob

# add this function to the customized HMM class
HMM.viterbi = viterbi

### 3.3. Learning - Baum-Welch Algorithm

> Given the observation sequence $O$, the # states $N$, and # observations $M$, how to learn a model?

For most of the time, we don't know the model in advance. What we have is the observation sequence(s). Thus, learning the model is an extreme important task.

The Baum-Welch Algorithm is one of the EM algorithm. **In general, the algorithm initializes the model ($A,B,\pi$) randomly, and calculates the probability of generating the given observation sequence by this model (`E-step` in the following code). Then the algorithm try to update the model based on the calculation result (`M-step` in the following code).**

As this algorithm is much more difficult to explain in detail, here we'll simply take a look at a straight-forward example code, hopefully we can get a sense of what the algorithm is trying to achieve.

In [4]:
# An Intuitive Guide to Hidden Markov Model (HMM)
# Backward Algorithm
# Similar to forward algorithm. we need the betas to do the Baum-Welch Algorithm.
# @author author
# @params O      the observation sequence
# @return betas  middle variables storing information used for model update in Baum-Welch Algorithm
def Backward(self, O):
    # T: length of the observation sequence
    T = len(O)
    # betas: an empty table to store middle calculation results
    betas = np.zeros((T, self.N), np.float)
    # note: the true definition of beta can be found in reference documents

    # first calculate the initial situation
    for i in range(self.N):
        betas[T - 1, i] = 1.0

    # then for each step, calculate the current step beta value based on the next beta value
    for t in range(T - 2, -1, -1):
        for i in range(self.N):
            sum = 0.0
            for j in range(self.N):
                sum += self.A[i, j] * self.B[j, O[t + 1]] * betas[t + 1, j]
            betas[t, i] = sum # update beta, so the next iteration can use previous results
    return betas

HMM.Backward = Backward

In [5]:
# An Intuitive Guide to Hidden Markov Model (HMM)
# Compute Xi - an important middle calculation result to do the Baum-Welch Algorithm.
# @author author
# @params O      the observation sequence
# @params alpha  alpha in forward algorithm
# @params beta   beta in backward algorithm
# @params gamma  gamma generated by ComputeGamma
# @return xi     middle variables used for model update in Baum-Welch Algorithm
def ComputeXi(self, O, alpha, beta, gamma):
    # T: length of the observation sequence
    T = len(O)
    # xi: an empty table to store middle calculation results
    xi = np.zeros((T, self.N, self.N))

    # calculate xi based on current observation and current model
    # this implementation is based on lecture slides of 04831230-Theory of Automatic Control, Peking University
    for t in range(T - 1):
        sum = 0.0
        for i in range(self.N):
            for j in range(self.N):
                xi[t, i, j] = alpha[t, i] * beta[t + 1, j] * self.A[i, j] * self.B[j, O[t + 1]]
                sum += xi[t, i, j]
        for i in range(self.N):
            for j in range(self.N):
                xi[t, i, j] /= sum # normalized
    return xi

HMM.ComputeXi = ComputeXi

In [6]:
# An Intuitive Guide to Hidden Markov Model (HMM)
# Compute Gamma - another important middle calculation result to do the Baum-Welch Algorithm.
# @author author
# @params T      the length of observation sequence
# @params alpha  alpha in forward algorithm
# @params beta   beta in backward algorithm
# @return gamma  middle variables used for model update in Baum-Welch Algorithm
def ComputeGamma(self, T, alpha, beta):

    gamma = np.zeros((T, self.N), np.float)

    # calculate gamma based on current observation and current model
    # this implementation is based on lecture slides of 04831230-Theory of Automatic Control, Peking University
    for t in range(T):
        denominator = 0.0
        for j in range(self.N):
            gamma[t, j] = alpha[t, j] * beta[t, j]
            denominator += gamma[t, j]
        for i in range(self.N):
            gamma[t, i] = gamma[t, i] / denominator
    
    return gamma

HMM.ComputeGamma = ComputeGamma

In [7]:
# An Intuitive Guide to Hidden Markov Model (HMM)
# Baum-Welch Algorithm
# this implementation is based on lecture slides of 04831230-Theory of Automatic Control, Peking University
# @author author
# @params O      the observation sequences (list of list). multiple observations as training data
def BaumWelch(self, O):

    # initial params
    L = len(O) # number of training data (observations)
    T = len(O[0]) # length of observation
    stopThreshold = 0.01 # threshold to stop
    iterCount = 0 # number of iterations
    isInit = 1 # check if it is the first time
    prob = 0.0 # current model's highest probability in the current round of EM algorithm
    prevProb = 0.0 # last model's highest probability
    delta = 0.0 # difference between current probability and previous probability
    prevDelta = 10e-20 # previosu delta
    ratio = 0.0 # delta / prevdelta
    # middle variables to store calculation results
    alpha = np.zeros((T, self.N), np.float)
    beta = np.zeros((T, self.N), np.float)
    gamma = np.zeros((T, self.N), np.float)
    xi = np.zeros((T, self.N, self.N))
    pi = np.zeros((T), np.float)
    # sub middle variables to store calculation results based on above variables
    denominatorA = np.zeros((self.N), np.float)
    denominatorB = np.zeros((self.N), np.float)
    numeratorA = np.zeros((self.N, self.N), np.float)
    numeratorB = np.zeros((self.N, self.M), np.float)
    
    # begin iteration
    # Generally speaking, the E-step is calculating all the needed variables 
    # by using the current model to avaluate the current observation.
    # Then in the M-step, we will update the model parameter (A, B, pi) based on
    # our calculation result in E-step.
    # The detail about all formulas and why this works should be found in reference resources
    while True:
        iterCount += 1 # count the number of iterations
        
        # E-step: calculate all the middle variables based on current observation sequence and current model
        for l in range(L):
            # first, calculate different middle variables defined above (alpha, beta, gamma, xi)
            alpha, curprob = self.Forward(O[l])
            beta = self.Backward(O[l])
            gamma = self.ComputeGamma(T, alpha, beta)
            xi = self.ComputeXi(O[l], alpha, beta, gamma)
            # then, store the current probability, later on this will be used to judge if the iteration should stop
            prob += curprob
            # finally, pre-calculate some middle calculation results to make it more clear in M-step
            for i in range(self.N):
                pi[i] += gamma[0, i]
                for t in range(T - 1):
                    denominatorA[i] += gamma[t, i]
                    denominatorB[i] += gamma[t, i]
                denominatorB[i] += gamma[T - 1, i]

                for j in range(self.N):
                    for t in range(T - 1):
                        numeratorA[i, j] += xi[t, i, j]
                for k in range(self.M):
                    for t in range(T):
                        if O[l][t] == k:
                            numeratorB[i, k] += gamma[t, i]

        # M-step: use all pre-computed middle result in E-step to update model parameter (A, B, pi)
        for i in range(self.N):
            self.pi[i] = 0.001 / self.N + 0.999 * pi[i] / L # update pi
            for j in range(self.N):
                self.A[i, j] = 0.001 / self.N + 0.999 * numeratorA[i, j] / denominatorA[i] # update A
                numeratorA[i, j] = 0.0 # initialize the middle varibles for next iteration
            for k in range(self.M):
                self.B[i, k] = 0.001 / self.M + 0.999 * numeratorB[i, k] / denominatorB[i] # update B
                numeratorB[i, k] = 0.0 # initialize the middle varibles for next iteration

            pi[i] = denominatorA[i] = denominatorB[i] = 0.0 # initialize the middle varibles for next iteration

        # after each training observation has been taken into account, check if we should break the iteration
        if isInit == 1: # first iteration, there is no prevProb, so handle separately
            isInit = 0
            prevProb = prob
            ratio = 1
            continue
        # check how much the probability has been improved
        delta = prob - prevProb
        # the improve ratio compared to last time
        ratio = delta / prevDelta
        prevProb = prob
        prevDelta = delta
        prob = 0
        
        # if there is not much improvement, stop the EM algorithm
        if ratio <= stopThreshold:
            print("# iteration:", round)
            break
            
HMM.BaumWelch = BaumWelch

## 4. Applications

Given the three algorithms above, we can use HMM to train and predict different kind of datasets. In the beginning of this tutorial, we talked about the stock price data and speech sounds data. Both of them can be modeled by a HMM.

For simplicity purpose, let's take a look back at the rolling dice example. For our friend, he need to make a guess on what the hidden dice sequence is. Here is how he can achieve:

In [8]:
# Rolling dice example - prediction
# Observation: 4, 1, 2, 5, 8, 3
# True dice sequence (hidden from our friend): D4, D6, D4, D6, D6, D8

diceNames = ['D6', 'D4', 'D8']
diceNumber = ['1', '2', '3', '4', '5', '6', '7', '8']

A = [[0.2, 0.7, 0.1],
     [0.5, 0.3, 0.2],
     [0.3, 0.1, 0.6]]
B = [[0.167, 0.167, 0.167, 0.167, 0.167, 0.167, 0, 0],
     [0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0],
     [0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125]]
Pi = [0.333, 0.333, 0.334]

O = [3, 0, 1, 4, 2, 7]

print("Hidden dice sequence:")
print(["D4", "D6", "D4", "D6", "D6", "D8"])

print("Observation:")
print([diceNumber[i] for i in O])

# our friend will guess what's the possible hidden dice sequence is by using the viterbi algorithm
hmm1 = HMM(A, B, Pi)
guess, _ = hmm1.viterbi(O)

print("\nOur friend's guess:")
print([diceNames[int(i)] for i in guess])


Hidden dice sequence:
['D4', 'D6', 'D4', 'D6', 'D6', 'D8']
Observation:
['4', '1', '2', '5', '3', '8']

Our friend's guess:
['D4', 'D6', 'D4', 'D6', 'D4', 'D8']


A pretty much good guess!

He can also learn the model by himself:

In [9]:
# Rolling dice example - learning

# random guess
randA = [[0.3, 0.3, 0.4],
         [0.3, 0.4, 0.3],
         [0.4, 0.3, 0.3]]
randB = [[0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125],
         [0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125],
         [0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125]]
randPi = [0.333, 0.333, 0.334]

O = [3, 0, 1, 4, 2, 7]

# our friend can learn the model in this way
hmm2 = HMM(randA, randB, randPi)
hmm2.BaumWelch([O])


# iteration: <built-in function round>


**More concrete examples can be found here: [wikipedia-example](https://en.wikipedia.org/wiki/Hidden_Markov_model#A_concrete_example), [using-sklearn-for-the-wiki-problem](http://sujitpal.blogspot.com/2013/03/the-wikipedia-bob-alice-hmm-example.html).**

## 5. Further Resources

[A-simple-explanation-of-the-Hidden-Markov-Model](https://www.quora.com/What-is-a-simple-explanation-of-the-Hidden-Markov-Model-algorithm)

[A-tutorial-on-hidden-Markov-models-by-Rabiner](http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf) (recommended)

[What-are-some-good-resources-for-learning-about-Hidden-Markov-Models](https://www.quora.com/What-are-some-good-resources-for-learning-about-Hidden-Markov-Models)