In [7]:
import pandas as pd
import numpy as np
import os 


### Hidden Markov Model: parameters
A Markov process with unobservable (hidden) states is referred to as as a Hidden Markov Model (HMM). Explicitly, such a process evolves through time or space with state dependency defined by the Markov property, $P(Z_{t+1} | Z_t) \space = \space P(Z_{t+1} | Z_t, ... Z_0)$. The variable here, $Z_k$ is interpretted as the hidden state of the sequence at time k. As this hidden process evolves the hidden state, $Z_k$ generates observable variables termed $X_k$, defined as observation of the process at time k. It is assumed that the observed variables are only dependent upon the current state and not influenced by previous observable states.

Distilling the above concepts, a Hidden Markov Model is completely defined by three components: 
<br>
    &emsp; **Initial distribution**, a probability distribution across all initial/starting states: 
        $ \pi_i $
<br>
    &emsp; A **(hidden) state transition matrix**, models how the process evolves from one hidden state to the next. Again, the Markov property is assumed whereby the hidden state at time t+1, $Z_{t+1}$, is only dependent on the hidden state at time t, $Z_t$: 
        $ P(Z_{t+1} = j |Z_{t} = i) $ 
<br>
    &emsp; Finally, the **emission matrix**, is a distribution which defines a probability distribution of the observation $X_t$ given the hidden state $Z_t$.
        $ P(X_t = k |Z_t = j) $ 
 
 
Once a HMM is learned, the aim is to perform inference of hidden states of the evolved hidden probabilities. Being able to calaculate the different aspects (online & offline belief state prediction and most probable sequence covered later) of this hidden sequence is provide a powerful predictive toolset. Due to this predictive power and HMMs natural ability to model sequential dependencies; HMMs have applications in speech & handwriting recognition, genetics, natural language processing and reinforcement learning. Each of application is really concerned with the inverse of the distributions defined above. At inference time the exact hidden state (also termed the belief state) is the objective, $P(Z_t = j|X_t=i)$. Proof of derivation for this distribution is provided below though it is a simply an application of incorporating all three components: *initial distribution*, *state transition matrix* and *emission materix*.   

 
 
### Major inference methods
HMM inference has numerous applications and derivations; at the heart of HMM inferential tasks inferring the hidden state sequence, given estimates both observations of data and estimates for the above parameters.

#### Online vs offline
An important distinction for HMM inference methods is the difference between an _online_ vs _offline_ model. _Online_ methods can be used while simultaneously collecting data, useful for predicting a current hidden state (_filtering_, covered below). _Offline_ inference requires the sequence be _complete_, i.e. all the data is collected up to time T, the end of the sequence. A the cost of losing real-time prediction, _offline_ algorithms provide the added benefit of applying hindsight to predictions earlier in the chain decreasing the overall uncertainty.

__Filtering__; a method with the aim to derive the current _belief state_ of the _hidden state_ given the current and past _observed data_. Explicitly, the distribution for filtering is written as <br> $$ P(Z_t |X_{1:t})$$ 


__Smoothing__; a method to compute the belief of hidden states, _retrospectively_. _Smoothing_ calculates the distribuiton 

<br> $$ P(Z_t | X_{1:T}) $$

where T is the final time step of the sequence. By using the entire sequence of data to _retospectively_ update beliefs of hidden states; overall uncertainty is reduced i.e. given the current hidden states affect the future observations, observing future observations provides evidence for current hidden states. 

In [9]:
# Read in emission and transition probability matrices
HMM_PARAMS = pd.read_json("hmm_forward_alg_params.json")
HMM_EMISSION = HMM_PARAMS['emissions']
HMM_INITIAL = HMM_PARAMS['initial']
HMM_TRANSITION = HMM_PARAMS['transition']

Index(['fair', 'loaded'], dtype='object')

#### The FORWARD ALGORITHM
The forward algorithm is useful for deriving the current _belief state, $Z_t$,_ given observations up to time _t_. 

Define the game of loaded casino: 
    '664153216162115234653214356634261655234232315142464156663246'

In [10]:
ROLL_SEQUENCE = '664153216162115234653214356634261655234232315142464156663246'
observed_sequence = ROLL_SEQUENCE
initial_observation = observed_sequence[0]

initial_dist = HMM_INITIAL

In [11]:
initial_observation = observed_sequence[0]
alpha_1 = [ HMM_EMISSION[i][initial_observation] * HMM_INITIAL[i] for i in initial_dist.keys()]
alpha_1

[0.074999999999999997, 0.25]

In [94]:
def forward_dist(observed_sequence, initial_dist, transition, emission): 
    initial_observation = observed_sequence[0]
    hidden_states = initial_dist.keys()

    alpha_1_to_t = []
    
    # Calculate alpha for initial observation
    alpha_1 = [HMM_EMISSION[i][initial_observation] * HMM_INITIAL[i] for i in hidden_states ]
    alpha_1_normalised = np.array( alpha_1 / sum(alpha_1) )
    
    alpha_1_to_t.append(alpha_1_normalised)
    alpha_t_normalised = alpha_1_normalised
    
    
    # Iterate belief state calculation through remaining sequence
    for observation in observed_sequence[1:]:
        alpha_t = [ ]
        for j in hidden_states:
            temp = 0
            for index,i in enumerate(hidden_states):
                temp += HMM_EMISSION[j][observation] * transition[i][j] * alpha_t_normalised[index]
            alpha_t.append(temp)
            
        alpha_t_normalised = normalise_array(alpha_t)
        
        # Append result to 'belief state' list for time t
        alpha_1_to_t.append(alpha_t_normalised)
    
    #return 'belief state' list 
    return alpha_1_to_t




def normalise_array(array): 
    if isinstance(array,list):
        return np.array(array/sum(array))
    
    if isinstance(array,numpy.ndarray):
        return array/sum(array)
    
    
    
def binary_softmax(tuple_sequence, labels = [0,1], threshold = 0.5):
    """
    Return list of transformed sequence as bi-labelled data (default to binary) determined by threshold. 
    """
    binary_list = [ ]
    
    for i in tuple_sequence: 
        if i[0] > threshold:
            binary_list.append(labels[0])
        else: 
            binary_list.append(labels[1])
    
    return binary_list

In [84]:
belief_state_seq = forward_dist(ROLL_SEQUENCE, HMM_INITIAL, HMM_TRANSITION, HMM_EMISSION)

In [98]:
print(binary_softmax(belief_state_seq, labels = ["fair","loaded"]))
print(ROLL_SEQUENCE)
for i in belief_state_seq: 
    print(i)

['loaded', 'loaded', 'loaded', 'loaded', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'loaded', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'loaded', 'fair', 'fair', 'fair', 'fair', 'fair', 'loaded', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'fair', 'loaded', 'loaded', 'loaded', 'fair', 'fair', 'loaded']
664153216162115234653214356634261655234232315142464156663246
