# Lesson 3.1 Hidden Markov Models
---

## Introduction

We view the world as a series of frames or snapshots in time, each of which contains a set of random variables where some are observable, and some are not. Hidden Markov Models or HMM's, are a machine learning technique used to make sense of sequences in time. HMM's have recently come to dominate the field of speech recognition. These models are good at taking frames that make up an input signal and feedback a result based on that model. HMM's are based on a rigourous set of mathematical theory which allowed speech researchers to build onto previous mathematical results developed in other fields. HMM's must be trained for a particular end result. In speech recognition, this could be a word or  series of words. In HMMS' these models are able to distinguish a signal with noise present, or different magnitudes, or even durations of states. This is the power available in HMM's.

This lesson will cover the algorithm and methods for HMM's and their applications for use in detecting patterns in vibrational signitures in aeromechanical testing.

## The Algorithm

### Creating the Model

Hidden Markov Models are state based models of data used to recognize patterns in the data. Given a set of states S = {s1, s2, ..., si}, the series can be observed over time. For visualization, lets look at an example. Consider the following graph representing 4 states:

![graph](./images/graph.PNG)

Given these 4 states, we can compute the transition probabilities as follows:

1. State 1 consumes 10 frames. Therefore, the outpout probability is 0.1 and the probability of remaining in step 1 is 0.9.
2. State 2 consumes 5 frames. Therefore, its transition probabilities are 0.2 and 0.8.
3. State 3 consumes 20 frames. Therefore, its transistion probabilities are 0.05 and 0.95
4. State 4 consumes 3 frames. Therefore, its transition probabilites are 0.333 and 0.667

After calculating our transisition probabilities, we can draw the following Markov Model:

![examplehmm](./images/examplehmm.PNG)

Next, we need to consider the start probabilities. There are 2 different ways to look at this. First, the model could be as shown, and the entrance point to the model is only at state 1. However, we can have probable outcomes where the starting point is in the middle of the model. We can represent this in the following way, where the entrance probabilities are uniform.

![examplehmm2](./images/examplehmm2.PNG)

### The Forward-Backward Algorithm

HMM's are a temporal probabalistic models in which the state of the process is described by a single descrete variable. The possible values of the variable are the possible states of the world. We can represent HMM's in the following way:

$X_{t}$ = A single descrete state variable  
$S$ = Number of possible states

The transition model is represented by $P(X_{t} | X_{t-1})$ as an SxS matrix T where $T_{ij} = P(X_{t}=j | X_{t-1}=i)$ where $T_{ij}$ = the probability of transition from state i to state j.

Next, we look at the evidence and observations:

$e_{t}$ = Evidence variable at time, t

$P(e_{t} | X_{t})$ for each state, i for how likely it is that state i causes evidence $e_{t}$ to appear. These probabilities are represented by an SxS diagonal matrix, $O_{t}$.

Given the above, we can compute the forward equation by simpel matrix-vector operations:

$$f_{1:t+1} = \alpha O_{t+1}T^{T}f_{1:t}$$

and the backward equation:

$$b_{k+1:t} = TO_{k+1}b_{k+2:t}$$

These equations represent the forward-backward algorithm applied to a sequence of lenght *t*. The forward-backward algorithm is an inference algorithm for HMM's which compute the posterior marginals of all hidden state variables given a sequence of observations and emissions.

### The Viterbi Trellis

The Vrterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states, called the Viterbi path, that results in a sequence of observed events. The Viterbi algorithm can be built and represented as a trellis, where all possible paths are determined, and take into account the transition and emission probabilities of each state to determine the best path. Let's take our HMM model displayed previously and represent that as a Viterbi Trellis:

![VT](./images/viterbiTrellis.PNG)

Notice the filling in of the transition probabilities and that each state is a row on the trellis. The trellis displayed is a simple linear model where we must enter state 1 before 2 and state 2 before state 3, etc. However, a trellis can be built for models that have several combinational types such as state 1 to state 3, skipping state 2.

Next, we take an observation made such as the sequence: {-2,-1,-1,0,1,0,1,2,2} and compute the probability of each value being generated by a state on the trellis. For this example we will use the following:

$P(S_{1} | X_{t} = -2) = 1.0$  
$P(S_{1} | X_{t} = -1) = 0.5$  
$P(S_{2} | X_{t} = -1) = 0.5$  
$P(S_{2} | X_{t} = 0) = 0.5$  
$P(S_{3} | X_{t} = 0) = 0.5$  
$P(S_{3} | X_{t} = 1) = 0.5$  
$P(S_{4} | X_{t} = 1) = 0.5$  
$P(S_{4} | X_{t} = 2) = 1.0$  

This yields the following result with the probabilities placed in the circle of the trellis:

![VT2](./images/viterbiTrellis2.PNG)

Next, we compute the path probabilities and choose the path with the highest probability. We compute these by multiplying each state value probability by the previous transition probability, and multiplying each subsequent step. By doing this, we get the following where each step probability is shown in red and the path in green:

![VT3](./images/viterbiTrellis3.PNG)

## Visual Example: Recognizing "HEY" in Morse Code