# Hidden Markov Modelle (HMM): 

## Introduction:
Consider a stochastic process $X(t)$ that can assume $N$ different states: $s_1, s_2, ... ,s_N$ with first-order Markov chain dynamics. Let us also suppose that we cannot observe the state of $X(t)$, but we have access to another process $O(t)$, connected to $X(t)$, which produces observable outputs (often known as emissions). The resulting process is called a Hidden Markov Model (HMM).

A first-order hidden Markov model instantiates two simplifying assumptions:
1. **First-Order Markov Chain:** the probability of a particular state depends only on the previous state:
$$P(x_i|x_{1},...,x_{i−1}) = P(x_{i}|x_{i−1})$$
2. **Output Independence:** the probability of an output observation $o_i$ depends only on the state that produced the observation $x_i$ and not on any other states or any other observations.
$$P(o_i|x_1, ..., x_i,...,x_T ,o_1,...,o_i,...,o_T ) = P(o_i|x_i)$$

## Hidden Markov Model (HMM):
HMM has no input, and the probability distribution for the output should be given. An HMM is a five-tuple $\lambda = \{S,V,A,B,\Pi\} $

with:

* **State Vector:** $S= \{0,\ldots,N\}$
* **Output Vector:** $V= \{0,\ldots,M\}$
* **Matrix of transition probabilities:** $ A = (a_{ij}) $,  where  $ a_{ij} $ is the probability $s_j $ comes after $s_i $
* **Matrix of emission probabilities:** $ B $, where $b_i(k)$ is the probability to observe $v_k$ in the state $ s_i $
* **Initial state distribution:** $ \Pi $. where $ \pi_i $ is the probability that $ s_i $  is the intial  state

Three fundamental problems should characterize the hidden Markov models:

* **Problem 1 (Likelihood):** Given an HMM λ = (A, B) and an observation sequence O, determine the likelihood P(O|λ ).
* **Problem 2 (Decoding):** Given an observation sequence O and an HMM λ = (A, B), discover the best-hidden state sequence X.
* **Problem 3 (Learning):** Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B.

## Example:
The purpose of this example is to use the observations of how much ice cream Karam ate each day this summer to estimate the temperature on each day. To simplify this weather task, we assume that there are only two days: cold (C) and hot (H).

The two hidden states ($s_1= H$ and $s_2 = C$) correspond to hot and cold weather, and the observations $O = \{1, 2, 3\}$ correspond to the number of ice creams eaten by Karam on a given day.



### hmmlearn:
`hmmlearn` implements the Hidden Markov Models (HMMs). Three models are available in `hmmlearn`

* `hmm.GaussianHMM` Hidden Markov Model with Gaussian emissions.
* `hmm.GMMHMM` Hidden Markov Model with Gaussian mixture emissions.
* `hmm.MultinomialHMM` Hidden Markov Model with multinomial (discrete) emissions.

To install this library in your virtual environment, you can use the following cell

In [1]:
## Install the library
!pip install hmmlearn

You should consider upgrading via the '/Users/karamdaaboul/.env/venv_ml/bin/python3 -m pip install --upgrade pip' command.[0m


### Import Libraries:
first we will import all the packages that are required for this exercise. 
- [numpy](www.numpy.org) is the main package for scientific computing with Python.
- [matplotlib](http://matplotlib.org) is a library to plot graphs in Python.
- np.random.seed(1) is used to keep all the random function calls consistent
- `hmmlearn` implements the Hidden Markov Models (HMMs). 

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from hmmlearn import hmm

%matplotlib inline
np.random.seed(1)

In [3]:
class HMM(hmm.MultinomialHMM):
    def __init__(self,A,B,pi,**kwargs): #  keyword argument 
        n_components        = A.shape[0]
        super().__init__(n_components,**kwargs)
        self.transmat_     = A
        self.emissionprob_ = B
        self.startprob_    = pi
        
    def likelihood(self,obs_seq):
        if len(obs_seq.shape)==1:
            obs_seq = obs_seq.reshape(-1, 1)
        # logprob -> probability
        return np.exp(self.score(obs_seq))
         
    def decoding(self,obs_seq):
        if len(obs_seq.shape)==1:
            obs_seq = obs_seq.reshape(-1, 1)
        # logprob -> probability
        logprob, seq = self.decode(obs_seq)
        return np.exp(logprob), seq
    
    def learning(self,obs_seq):
        if len(obs_seq.shape)==1:
            obs_seq = obs_seq.reshape(-1, 1)
            
        self.fit(obs_seq)
    
    def show_model(self):
        np.set_printoptions(precision=4, suppress=True)
        print('A: Transition probability matrix')
        print(self.transmat_)
        print('------------------------------')
        print('B: Emission probability matrix')
        print(self.emissionprob_)
        print('-------------------------------')
        print('pi: Initital state distribution')
        print(self.startprob_)

<img src="https://i.imgur.com/4HLwgVN.png" style="width:600px;height:200px;">

In [11]:
# Prob(O | pi, A, B) = 0.1040
Tag1 = 0.8 * 0.2 + 0.2 * 0.5
Tag1_2 = 0.8*0.6*0.4 + 0.8*0.4*0.4 + 0.2*0.5*0.4 + 0.2*0.5*0.4
print(Tag1_2*Tag1)

0.10400000000000004


In [4]:
states = ('Hot', 'Cold')
 
observations = ('1','2','3')
 
start_probability = {'Hot': 0.8, 'Cold': 0.2}
 
transition_probability = {
   'Hot' : {'Hot': 0.6, 'Cold': 0.4},
   'Cold': {'Hot': 0.5, 'Cold': 0.5},
   }
 
emission_probability = {
   'Hot' : {'1': 0.2, '2': 0.4, '3': 0.4},
   'Cold': {'1': 0.5, '2': 0.4, '3': 0.1},
   }

In [5]:
# Define the Multinomial HMM
pi= np.array([0.8, 0.2])  # initial probability  
A = np.array([[0.6, 0.4],
              [0.5, 0.5]]) # transmition probability
B = np.array([[0.2, 0.4, 0.4],
              [0.5, 0.4, 0.1]]) # Emission probability

model = HMM(A,B,pi)   # n_components: number of state
model.show_model()

A: Transition probability matrix
[[0.6 0.4]
 [0.5 0.5]]
------------------------------
B: Emission probability matrix
[[0.2 0.4 0.4]
 [0.5 0.4 0.1]]
-------------------------------
pi: Initital state distribution
[0.8 0.2]


**Problem 1 (Likelihood):** Given an HMM $λ = (A, B)$ and an observation sequence $O$, determine the likelihood $P(O|λ )$

**Note:** The log likelihood is provided from calling `.likelihood.`

How likely is a given sequence?
* $ O= \{1\}$
* $ O= \{2\}$
* $ O= \{3\}$
* $ O= \{1,2,3\}$

In [6]:
# O= {1}
obs_seq = np.array([0])
print("Prob(O | pi, A, B) = {:0.4f}".format(model.likelihood(obs_seq)))

Prob(O | pi, A, B) = 0.2600


The probability of the first observation being “only one ice” equals to the multiplication of the initial state distribution and emission probability matrix. 0.8 x 0.2 + 0.2 x 0.5 = 0.26 (26%).

In [7]:
# O= {2}
obs_seq = np.array([1])
print("Prob(O | pi, A, B) = {:0.4f}".format(model.likelihood(obs_seq)))

Prob(O | pi, A, B) = 0.4000


In [8]:
# O= {3}
obs_seq = np.array([2])
print("Prob(O | pi, A, B) = {:0.4f}".format(model.likelihood(obs_seq)))

Prob(O | pi, A, B) = 0.3400


In [10]:
# O= {1,2,3}
obs_seq = np.array([0,1])
print("Prob(O | pi, A, B) = {:0.4f}".format(model.likelihood(obs_seq)))

Prob(O | pi, A, B) = 0.1040


**Problem 2 (Decoding):** Given an observation sequence {O} and an HMM {λ = (A, B)}, discover the best-hidden state sequence {X}.

The **Viterbi algorithm** is one of most common decoding algorithms for HMM. Its goal is to find the most likely hidden state sequence corresponding to a series of observations. 

**Note:** The decoding is provided from calling `.decoding.`

What is the most probable “path” for generating a given sequence?
* $ O= \{1\}$
* $ O= \{2\}$
* $ O= \{3\}$
* $ O= \{1,2,3\}$

In [12]:
# O= {1}
obs_seq = np.array([0])
prob,state_seq = model.decoding(obs_seq)
print ("Most likely state sequence: ", state_seq)
print("Probability: {:0.6f}".format(prob))

Most likely state sequence:  [0]
Probability: 0.160000


Given the known model and the observation “only one ice”, the state  was most likely “Hot” with ~1.6% probability.

* Probability for “Hot”  : 0.8 x 0.2 = 0.16 (16%)
* Probability for “Cold” : 0.2 x 0.5 = 0.1 (10%).

In [13]:
obs_seq = np.array([1])
prob,state_seq = model.decoding(obs_seq)
print ("Most likely state sequence: ", state_seq)
print("Probability: {:0.6f}".format(prob))

Most likely state sequence:  [0]
Probability: 0.320000


In [14]:
obs_seq = np.array([2])
prob,state_seq = model.decoding(obs_seq)
print ("Most likely state sequence: ", state_seq)
print("Probability: {:0.6f}".format(prob))

Most likely state sequence:  [0]
Probability: 0.320000


In [15]:
obs_seq = np.array([0,1,2])
prob,state_seq = model.decoding(obs_seq)
print ("Most likely state sequence: ", state_seq)
print("Probability: {:0.6f}".format(prob))

Most likely state sequence:  [0 0 0]
Probability: 0.009216


**Problem 3 (Learning):** Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B.

But first, we need to generate data that can with a high probability result from our HMM.

The HMM is a generative probabilistic model, in which a sequence of observable $O$ variables is generated by a sequence of internal hidden states $X$. 

**Note:** The decoding is provided from calling .sample.

In [16]:
# Generate the dataset a sequence of 100 measurements
O, _ = model.sample(100)
print("The first 10 observations in the sequence : ", O[:10].transpose())

The first 10 observations in the sequence :  [[2 1 0 1 1 2 2 2 1 0]]


In [18]:
model.likelihood(O)

3.5929450181935767e-48

**Note:**  In machine learning sense, observation is our training data, and the number of hidden states is our hyper parameter for our model. 

Define an intial model:

In [17]:
model.show_model()

A: Transition probability matrix
[[0.6 0.4]
 [0.5 0.5]]
------------------------------
B: Emission probability matrix
[[0.2 0.4 0.4]
 [0.5 0.4 0.1]]
-------------------------------
pi: Initital state distribution
[0.8 0.2]


In [19]:
# Define the intial HMM
pi_init = np.array([0.5, 0.5])  # initial probability  
A_init  = np.array([[0.5, 0.5],
                    [0.6, 0.4]]) # transmition probability
B_init  = np.array([[0.1, 0.4, 0.5],
                    [0.6, 0.3, 0.1]]) # Emission probability

model2 = HMM(A_init,B_init,pi_init,
                           init_params='', 
                           n_iter=100, 
                           tol=0.05)   # n_components: number of state
model2.show_model()

A: Transition probability matrix
[[0.5 0.5]
 [0.6 0.4]]
------------------------------
B: Emission probability matrix
[[0.1 0.4 0.5]
 [0.6 0.3 0.1]]
-------------------------------
pi: Initital state distribution
[0.5 0.5]


How can we learn the HMM parameters given a set of observations sequence?

**Note:** The learning is provided from calling `.learning.`

In [20]:
model2.learning(O)
print("Done!")

Done!


In [21]:
model2.show_model()

A: Transition probability matrix
[[0.5425 0.4575]
 [0.5543 0.4457]]
------------------------------
B: Emission probability matrix
[[0.0868 0.4293 0.4839]
 [0.5408 0.3638 0.0954]]
-------------------------------
pi: Initital state distribution
[0.9996 0.0004]


In [22]:
model.show_model()

A: Transition probability matrix
[[0.6 0.4]
 [0.5 0.5]]
------------------------------
B: Emission probability matrix
[[0.2 0.4 0.4]
 [0.5 0.4 0.1]]
-------------------------------
pi: Initital state distribution
[0.8 0.2]
