# Using hmmlearn to fit a HMM model

## Modeling numerical values
The basic HMM model can use different types of emission probabilities. In our example we adopt a simple Gaussian emissions model (GaussianHMM), i.e. for a given state all the observations are generated as a Gaussian variable with a given means and variance (to be given as input or to be learned from the input data). Alternatives for numerical sequence data include mixtures of Gaussians (GMMHMM).

Taken from https://waterprogramming.wordpress.com/2018/07/03/fitting-hidden-markov-models-part-ii-sample-python-script/

In [None]:
import matplotlib.pyplot as plt
from hmmlearn.hmm import GaussianHMM, GMMHMM
import numpy as np
plt.rcParams["figure.figsize"] = (20,3)

ff = np.loadtxt('water_temp_ok_2.txt')

plt.plot(ff)
plt.grid()
plt.show()

In [None]:
dataset = np.reshape(ff,[len(ff),1])
model = GaussianHMM(n_components=5, n_iter=1000).fit(dataset)

In [None]:
print("Initial state probabilities:\n", model.startprob_)
print()
print("Mean value for each state:\n", model.means_)
print()
print("State-to-state transition matrix (%):\n", model.transmat_*100)


In [None]:
plt.plot(range(len(ff)), ff)
for i in range(len(model.means_)):
    plt.plot(np.repeat(model.means_[i][0],len(ff)))
plt.grid()
plt.show()

In [None]:
pred = model.predict(dataset)

In [None]:
plt.plot(ff)
for i in range(len(model.means_)):
    plt.plot(np.repeat(model.means_[i][0],len(ff)))
plt.grid()
plt.show()
plt.plot(pred*(max(ff)-min(ff))/len(model.means_))
plt.show()

In [None]:
samples = model.sample(len(dataset))
plt.plot(samples[0])
plt.plot(ff)
plt.grid()
plt.title('Generated timeseries vs. Original one')
plt.show()
plt.plot(samples[1])
plt.title('Sequence of generated states')
plt.show()


## HMM on discrete values
Discrete values (strings, symbols, tags, etc.) should be converted to integers -- more exactly, the values should be consecutive integers, starting from 0.
We use the simple emission model CategoricalHMM, which associates a probability p(o|s) for each observation "o" and state "s".

In [None]:
from hmmlearn.hmm import CategoricalHMM

dataset2 = np.array([np.ndarray.round(ff/3).astype(int)]).T
print(f"Distinct values after discretization: {np.unique(dataset2)}")
plt.plot(dataset2)
plt.grid()
plt.show()

In [None]:
dataset3 = np.reshape(dataset2, (1,-1))
model = CategoricalHMM(n_components=5, n_iter=1000)
model2 = model.fit(dataset3)

In [None]:
model2.transmat_

In [None]:
samples = model2.sample(len(dataset2))
plt.plot(samples[0])
plt.plot(ff)
plt.grid()
plt.show()
plt.plot(samples[1])
plt.show()

## Example on Flickr dataset
Use CategoricalHMM on a processed dataset of Flickr photos. Each line of the input file contains the ordered list of attractions photographed by a user in Venice. Here is a sample of a few lines:

Palazzo_Santa_Maria_del_Giglio  Palazzo_Ducale .

Opera_Santa_Maria_Della_Carita'  Palazzo_Ducale .

Palazzo_Ducale  Torre_dell'orologio .

Chiesa_di_San_Trovaso  Museo_Correr  Campo_San_Benedetto .


In [None]:
fvenice = open('sequences_of_poits.text')
POIs = [st for st in fvenice.read().replace('\n',' ').split(' ') if st != '' ]

Map strings to integers and concatenate sequences. Notice: "." is preserved as special POI.

In [None]:
seq_POI = []
dict_POI = {}
list_POI = []
for p in POIs:
    if p not in dict_POI:
        dict_POI[p] = len(dict_POI)
        list_POI.append(p)
    seq_POI.append([dict_POI[p]])

Learn the parameters from input sequence

In [None]:
model_POI = CategoricalHMM(n_components=5, n_iter=1000).fit(seq_POI)

In [None]:
[ list_POI[i[0]] for i in model_POI.sample(20)[0] ]

In [None]:
model_POI.transmat_

In [None]:
model_POI.emissionprob_

Show the 5 most important POIs for each hidden state

In [None]:
for j,s in enumerate(model_POI.emissionprob_):
    top5 = sorted(list(zip(s,list_POI)), key=lambda k: -k[0])[:5]
    #print([(i[1], int(i[0]*100)) for i in top5])
    print(f"state S_{j}:")
    for i in top5:
        print(f"\t{i[1]} ({int(i[0]*100)}\%)")