## Part III Viterbi Algorithm
This algorithm outputs the most likely latent sequence considering the data and the MLE of the parameters.
`myViterbi`:
- **Input**:

    – data: a T-by-1 sequence of observations
    
    – parameters: `mx`, `mz`, `w`, `A` and `B`:
    - `mx`: Count of distinct values X can take.
    - `mz`: Count of distinct values Z can take.
    -  `w`: An mz-by-1 probability vector representing the initial distribution for Z1.
    - `A`: The mz-by-mz transition probability matrix that models the progression from Zt to Zt+1.
    - `B`: The mz-by-mx emission probability matrix, indicating how X is produced from Z.

- **Output**:

    – Z: A T-by-1 sequence where each entry is a number ranging from 1 to mz.


#### Note on Calculations in Viterbi:

Many computations in HMM are based on the product of a sequence of probabilities, resulting in extremely small values. At times, these values are so small that software like R or Python might interpret them as zeros. This poses a challenge, especially for the Viterbi algorithm, where differentiating between magnitudes is crucial. If truncated to zero, making such distinctions becomes impossible. Therefore, it’s advisable to evaluate these probabilities on a logarithmic scale in the Viterbi algorithm.


#### Testing
Test your code with the provided data sequence: [Coding4_part2_data.txt]. Set `mz = 2` and start with the following initial values
$$
    w = \begin{pmatrix}     
        0.5 \\
        0.5 
        \end{pmatrix}


    A = \begin{pmatrix}     
        0.5 & 0.5\\
        0.5 & 0.5
        \end{pmatrix} 


    B = \begin{pmatrix}     
        1/9 & 3/9 & 5/9\\
        1/6 & 2/6 & 3/6
        \end{pmatrix}        
$$,
Run your implementation with **100** iterations

In [1]:
import pandas as pd
import numpy as np

In [2]:
def viterbi(data, A, B, w,  mx=3, mz=2):
    T = len(data)
    delta = np.zeros((T, mz))

    log_A = np.log(A)
    log_B = np.log(B)
    log_w = np.log(w)


    # Compute delta
    delta[0,:] = log_w  +  log_B[:, data[0]]

    for t in range(1, T):
        for i in range(mz):
            # -1 because python index starts from 0
            delta[t,i] = np.max(delta[t-1,:] + log_A[:,i]) + log_B[i, data[t]-1]

    # compute the most prob sequence Z
    Z = np.zeros(T).astype(int)

    # start from the end

    Z[T-1] = np.argmax(delta[T-1, :])

    for t in range(T-2, -1, -1):
        Z[t] = np.argmax(delta[t, :] + log_A[:, Z[t+1]])    

    # +1: because python index start from 0
    return Z + 1

In [3]:
data = np.squeeze(pd.read_csv('coding4_part2_data.txt', header=None).values)

A_bw = np.array([[0.49793938, 0.50206062],
            [0.44883431, 0.55116569]])

B_bw = np.array([[0.22159897, 0.20266127, 0.57573976],
               [0.34175148, 0.17866665, 0.47958186]])

w = np.array([0.5, 0.5])
Z_est = viterbi(data, A_bw, B_bw, w)
print(Z_est)

[1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1
 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 1
 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1]


In [4]:
Z_act_list = np.squeeze(pd.read_csv('Coding4_part2_Z.txt',header=None).values)
Z_act_str_list = []
for z_i in Z_act_list:
    z_i_str_list = z_i.strip().split(" ")
    Z_act_str_list.extend(z_i_str_list)

Z_act = np.array([int(z) for z in Z_act_str_list])
print(Z_act)

[1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1
 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 1
 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1]


In [5]:
print((Z_act == Z_est).all())

True
