# Markov Models

* Forward algorithm - calculate the probability that a certain sequence of observations occurs
* Viterbi algorithm - find the most probable sequence of hidden states given a sequence of observations
* Baum-Welch algorithm - find the parameters of a model that maximize the likelihood of a certain sequence of observations occuring

The model must have a Markov property, which means that the distribution of the next hidden state and observation can only depend on the current state or a fixed number of past states. The model is also uniform, which means that the conditional distributions don't change. It should also have a property that the probability of getting from one state to any other one is always non zero.

The model has weights determining the distribution of hidden states in each hidden state, weights for distributions of observations in each given state and also an initial state (or a vector of probabilities of initial state).

The hidden states seem to be usually discrete, while the observations can be discrete continuous.

Number of hidden states grow exponentially with a number of features that are modelled. That is, if the model of a person speech has N states and you need to additionaly include information whether the person is male or female, you now need 2N states.

In [1]:
import functools as ft
import itertools as it
import json
import math
import operator as op
import os

from IPython.display import display
from ipywidgets import interact, interact_manual, widgets
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import misc, stats
from sklearn import metrics

# Discrete Observations

First I will consider a simpler case where hidden states and observations are discrete. Markov model is defined by choosing a number of hidden features $h$ and visible observations $v$ (we don't care about assigning symbols to them and will use numbers $0 \dots h$ and $0 \dots v$ as states. It's then necessary to define two matrices of probabilities.

First is a matrix of probabilities of transitions between hidden states $H$ of size $h \times h$ where $h$ is a number of hidden states. $H_{i, j}$ is probability of going from hidden state $i$ to hidden state $j$, for all $i$ $\sum_{j=0}^{h-1} H_{i, j} = 1$ and for all $i, j$ $H_{i, j} > 0$.

Second matrix is a matrix of probabilities of emitting observations while being in a certain state $V$ of size $h \times v$ where $v$ is a number of possible observations (visible states). $V_{i, k}$ is a probability of emitting observation $k$ while in hidden state $i$. For all $i$ $\sum_{k=0}^{v-1} V_{i, k} = 1$

In [11]:
hidden_n = 4
visible_n = 3

# square matrix h x h, each row sums to 1
hidden_weights = np.array([[0.1, 0.4, 0.4, 0.1], 
                           [0.3, 0.3, 0.2, 0.4], 
                           [0.4, 0.2, 0.3, 0.1], 
                           [0.3, 0.3, 0.1, 0.3]]) 
# matrix h x v, each row sums to 1
visible_weights = np.array([[0.3, 0.4, 0.3], 
                            [0.1, 0.9, 0.0], 
                            [0.5, 0.4, 0.1],
                            [0.2, 0.1, 0.6]]) 

def hidden_selected(hidden_n, index):
    hidden_states = np.zeros(hidden_n, dtype=np.float64)
    hidden_states[index] = 1.0
    return hidden_states

print(hidden_selected(hidden_n, 2))

def hidden_uniform(hidden_n):
    return np.ones(hidden_n, dtype=np.float64) / hidden_n

print(hidden_uniform(hidden_n))

initial_hidden = np.array([0.1, 0.1, 0.7, 0.1])
print(initial_hidden)

[ 0.  0.  1.  0.]
[ 0.25  0.25  0.25  0.25]
[ 0.1  0.1  0.7  0.1]


In [12]:
def generate_visible(visible_weights, hidden_weights):
    return hidden_weights @ visible_weights

print(generate_visible(visible_weights, hidden_selected(hidden_n, 2)))
print(generate_visible(visible_weights, hidden_uniform(hidden_n)))
print(generate_visible(visible_weights, initial_hidden))

[ 0.5  0.4  0.1]
[ 0.275  0.45   0.25 ]
[ 0.41  0.42  0.16]


## Forward Algorithm

Calculate probability of a certain sequence of observations occuring.

In [None]:
def forward_probability(hidden_weights, visible_weights, initial_hidden, observations):
    pass

## Viterbi Algorithm

Calculate the most likely sequence of hidden states given a sequence of observations.

In [None]:
def most_likely_hidden_viterbi(hidden_weights, visible_weights, initial_hidden, observations):
    pass

## Baum-Welch Algorithm

Estimate the weights $H$ and $V$ given sequences of observations.

In [None]:
def estimate_weights_baum_welch(hidden_n, visible_n, initial_hidden, observations):
    pass