In [1]:
from hmmlearn.hmm import MultinomialHMM
import numpy as np
from collections import Counter

In [2]:
model = MultinomialHMM(n_components=2)
model.startprob_ = np.array([0.5, 0.5])
model.transmat_ = np.array([[11/12, 1/12],
                            [19/20, 1/20]])
model.emissionprob_ = np.array([[1/6, 1/6, 1/6, 1/6, 1/6, 1/6],
                                [0.5, 0.1, 0.1, 0.1, 0.1, 0.1]])

# a)

In [3]:
results = model.sample(100000)
c = Counter(results[0].flatten())
for i in range(len(c)):
    print(str(i) + ': ' + str(c[i] / 100000))

0: 0.19359
1: 0.16213
2: 0.16031
3: 0.16192
4: 0.16149
5: 0.16056


I approached the first problem by drawing a FSM with two states. This drawing is provided in the pdf along with the transition probabilities and emission probabilities. After modeling this using HMM's Multinomial class, I was able to sample this distribution 100000 times. We see that the probability that the emitted probability is a 1 is approximately 0.193, which is slightly higher than 1/6. The other values had probability slightly below 1/6, which is attributed to the lower probability on the unfair die.

# b)

In [4]:
num_list = [10, 100, 1000, 10000]

for num_symbols in num_list:
    samples = np.empty((10, num_symbols), dtype=np.int8)
    true_seq = np.empty((10, num_symbols), dtype=np.int8)
    for i in range(10):
        result = model.sample(num_symbols)
        samples[i,:], true_seq[i,:] = (result[0].ravel(), result[1].ravel())

    accuracies = np.empty(10, dtype=np.float32)

    for i in range(10):
        predict_seq = model.decode(samples[i,:].reshape(-1,1))[1].ravel()
        accuracies[i] = np.equal(predict_seq, true_seq[i,:]).astype(int).sum() / num_symbols
        
    print(str(num_symbols) + ': ' + str(accuracies.mean()))

10: 0.90999997
100: 0.919
1000: 0.92059994
10000: 0.92021


For this problem, I simulated draws of 10, 100, 1000, 10000, each 10 times and used MulinomialHMM.decode to predict what the original sequence states were given the sample that I drew. Once I had the predicted states and the true states, I could just count in how many places they varied. I recorded this for each of the 10 epochs and then averaged the accuracies. It looks like sampling a larger number of points did not help accuracy, as it was almost a constant ~0.915 throughout all epochs of the simulation.