## Coding Assignment 4

Team:
- Olivia Dalglish (od4)
- Arindam Saha (saha2)

Contribution: 

Olivia: Part 1

Arindam: Part 2

In addition to the above, we discussed our approaches and checked each other's work.

### Part I: Gaussian Mixtures

In [1]:
import numpy as np

### Part II: HMM

##### Baum-Welch Algorithm

In [2]:
def BW_onestep(data, mx, mz, w, A, B):
    # Switching data to be 0-indexed
    data = data - 1
    n = len(data)

    alpha = np.zeros((n, mz))
    alpha[0] = w * B[:, data[0]]
    for t in range(1, n):
        alpha[t] = (alpha[t - 1] @ A) * B[:, data[t]]
    
    beta = np.zeros((n, mz))
    beta[n - 1] = np.ones(mz)
    for t in range(n - 2, -1, -1):
        beta[t] = A @ (B[:, data[t + 1]] * beta[t + 1])

    gamma = np.zeros((n - 1, mz, mz))
    for t in range(n - 1):
        gamma[t] = alpha[t][:, np.newaxis] * (A * (B[:, data[t + 1]] * beta[t + 1]))

    gamma_plus = np.sum(gamma, axis=0)
    A_next = gamma_plus / np.sum(gamma_plus, axis=1)[:, np.newaxis]
    
    gamma_ti = np.vstack((np.sum(gamma, axis=2), alpha[n - 1]))
    B_next = np.zeros(B.shape)
    for l in range(mx):
        t_idxs = np.where(data == l)[0]
        B_next[:, l] = np.sum(gamma_ti[t_idxs], axis=0) / np.sum(gamma_ti, axis=0)

    return A_next, B_next


def myBW(data, mx, mz, w, A, B, niter):
    for _ in range(niter):
        A, B = BW_onestep(data, mx, mz, w, A, B)
    return A, B

##### Viterbi Algorithm

In [3]:
def myViterbi(data, mx, mz, w, A, B):
    # Switching data to be 0-indexed
    data = data - 1

    # Evaluating probabilities in logarithmic scale to correctly compare really low probabilty values

    w = np.log(w)
    A = np.log(A)
    B = np.log(B)

    n = len(data)
    delta = np.zeros((n, mz))
    delta[0] = w + B[:, data[0]]
    for t in range(1, n):
        for i in range(mz):
            delta[t, i] = np.max(delta[t - 1] + A[:, i]) + B[i, data[t]]

    Z = np.zeros(n, dtype=int)
    Z[n - 1] = np.argmax(delta[n - 1])
    for t in range(n - 2, -1, -1):
        Z[t] = np.argmax(delta[t] + A[:, Z[t + 1]])

    # Switching Z to be 1-indexed
    return Z + 1

##### Testing

##### 1.

In [4]:
data = np.loadtxt('Coding4_part2_data.txt').astype(int)

mx = 3
mz = 2

w = np.array([0.5, 0.5])
A_init = np.array([
    [0.5, 0.5],
    [0.5, 0.5],
])
B_init = np.array([
    [1/9, 3/9, 5/9],
    [1/6, 2/6, 3/6],
])

A, B = myBW(data, mx, mz, w, A_init, B_init, 100)

print(f'A: the 2-by-2 transition matrix:\n{A}\n')
print(f'B: the 2-by-3 emission matrix:\n{B}\n')


Z = myViterbi(data, 3, 2, w, A, B)
print(f'Z: {Z}')

with open('Coding4_part2_Z.txt') as fh:
    expected_Z = np.fromstring(fh.read().strip(), dtype=int, sep= ' ')

print(f'matches: {np.all(Z == expected_Z)}')


A: the 2-by-2 transition matrix:
[[0.49793938 0.50206062]
 [0.44883431 0.55116569]]

B: the 2-by-3 emission matrix:
[[0.22159897 0.20266127 0.57573976]
 [0.34175148 0.17866665 0.47958186]]

Z: [1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1
 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 1
 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1]
matches: True


We can see above that our output matches the expected values.

##### 2.

In [5]:
B_init_same = np.array([
    [1/3, 1/3, 1/3],
    [1/3, 1/3, 1/3],
])
A_20, B_20 = myBW(data, mx, mz, w, A_init, B_init_same, 20)
A_100, B_100 = myBW(data, mx, mz, w, A_init, B_init_same, 100)

print(f'A_20: the 2-by-2 transition matrix:\n{A_20}\n')
print(f'B_20: the 2-by-3 emission matrix:\n{B_20}\n')

print(f'A_100: the 2-by-2 transition matrix:\n{A_100}\n')
print(f'B_100: the 2-by-3 emission matrix:\n{B_100}\n')

A_20: the 2-by-2 transition matrix:
[[0.5 0.5]
 [0.5 0.5]]

B_20: the 2-by-3 emission matrix:
[[0.285 0.19  0.525]
 [0.285 0.19  0.525]]

A_100: the 2-by-2 transition matrix:
[[0.5 0.5]
 [0.5 0.5]]

B_100: the 2-by-3 emission matrix:
[[0.285 0.19  0.525]
 [0.285 0.19  0.525]]



In [6]:
np.mean(data == 1), np.mean(data == 2), np.mean(data == 3)

(0.285, 0.19, 0.525)

We can see above that with indistinguishable latent states, Baum-Welch just converges to the probabilities of each value in the data in the emission matrix. This makes sense because it is equivalent to just having one state which emits according to the probabilities found in the data.