# 05 - hidden Markov state models (HMMs)
In this notebook, we will learn about hidden Marko state models and how to use them to deal with bad discretization. We also revisit coarse-grained MSMs and realize that they are just HMMs.

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import mdshare
import pyemma

## Case 1: preprocessed, two-dimensional data (toy model)
We load the two-dimensional as well as the true discrete trajectory from an archive using `numpy`...

In [None]:
file = mdshare.fetch('hmm-doublewell-2d-100k.npz', working_directory='data')
with np.load(file) as fh:
    data = fh['trajectory']
    good_dtraj = fh['discrete_trajectory']

... and discretize the two-dimensional data poorly:

In [None]:
poor_clustercenters = np.asarray([[-0.1, -0.6], [0.1, 1.4]])
poor_dtraj = pyemma.coordinates.assign_to_centers(data, centers=poor_clustercenters)[0]

The result of this poor clustering is depicted in the left panel below, whereas the correct discretization is shown on the right:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
pyemma.plots.plot_state_map(*data.T, poor_dtraj, ax=axes[0], cbar=False)
axes[0].scatter(*poor_clustercenters.T, s=15, c='C1')
pyemma.plots.plot_state_map(*data.T, good_dtraj, ax=axes[1], cbar=False)
axes[1].scatter(*np.asarray([[0, -1], [0, 1]]).T, s=15, c='C1')
for ax in axes.flat:
    ax.set_xlabel('$x$')
axes[0].set_ylabel('$y$')
axes[0].set_title('poor discretization')
axes[1].set_title('good discretization')
fig.tight_layout()

One of the first steps after discretization should always be an implied timescale convergence plot. We try this here for both discretizations...

In [None]:
lags = [i + 1 for i in range(10)]

fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
pyemma.plots.plot_implied_timescales(
    pyemma.msm.its(poor_dtraj, lags=lags, errors='bayes'), ylog=False, ax=axes[0])
pyemma.plots.plot_implied_timescales(
    pyemma.msm.its(good_dtraj, lags=lags, errors='bayes'), ylog=False, ax=axes[1])
axes[0].set_title('MSM with poor discretization')
axes[1].set_title('MSM with good discretization')
fig.tight_layout()

... and see that we need a lagtime of at least $4$ or $5$ steps to roughly convergerge the implied timescales for the poor discretization. The good discretization yields a converged timescale from lagtime $1$.

We continue to build MSM objects at the respective lagtimes and print out the first implied timescale for both:

In [None]:
poor_msm = pyemma.msm.estimate_markov_model(poor_dtraj, lag=5)
good_msm = pyemma.msm.estimate_markov_model(good_dtraj, lag=1)

print('MSM (poor): 1. implied timescale = {:.2f} steps'.format(poor_msm.timescales()[0]))
print('MSM (good): 1. implied timescale = {:.2f} steps'.format(good_msm.timescales()[0]))

Even though the `poor_msm` was estimated from a poorly discretized trajectory, the higher estimation lagtime could mend this issue to some extend and come close to the actual value estimated from a very good discretization.

Looking at the next in line validation step, a Chapman-Kolmogorow test, we observe that the `poor_msm`...

In [None]:
pyemma.plots.plot_cktest(poor_msm.cktest(2));

... as well as the `good_msm`...

In [None]:
pyemma.plots.plot_cktest(good_msm.cktest(2));

... show excellent agreement between higher lagtime estimation and model prediction.

Let us now repeat both estimations using hidden Markov state models instead of regular MSMs. We begin with the implied timescale convergence using the `pyemma.msm.timescales_hmsm()` function and two hidden states:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharey=True)
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(poor_dtraj, 2, lags=lags, errors='bayes'), ylog=False, ax=axes[0])
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(good_dtraj, 2, lags=lags, errors='bayes'), ylog=False, ax=axes[1])
axes[0].set_title('HMM with poor discretization')
axes[1].set_title('HMM with good discretization')
fig.tight_layout()

Now, both discretizations give us converged implied timescales from the very start (lagtime $1$).

We estimate HMMs using both discretizations at lagtime 1 and two hidden states...

In [None]:
poor_hmm = pyemma.msm.estimate_hidden_markov_model(poor_dtraj, 2, lag=1)
good_hmm = pyemma.msm.estimate_hidden_markov_model(good_dtraj, 2, lag=1)

print('HMM (poor): 1. implied timescale = {:.2f} steps'.format(poor_hmm.timescales()[0]))
print('HMM (good): 1. implied timescale = {:.2f} steps'.format(good_hmm.timescales()[0]))

... and obtain nearly identical estimates for the first implied timescale.

We observe that HMMs, unlike MSMs, seem to be somewhat resistant to discretization errors.

Regarding the CK test, we again see that the `poor_hmm`...

In [None]:
pyemma.plots.plot_cktest(poor_hmm.cktest(2));

... and the `good_hmm`...

In [None]:
pyemma.plots.plot_cktest(good_hmm.cktest(2));

... are in perfect agreement.

Let us now worsen the discretization using three badly chosen clustercenters and show the discretization, the MSM-ITS convergence, and the HMM-ITS convergence:

In [None]:
bad_clustercenters = np.asarray([[-2.5, -1.4], [0.3, 1.2], [2.7, -0.6]])
bad_dtraj = pyemma.coordinates.assign_to_centers(data, centers=bad_clustercenters)[0]

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_state_map(*data.T, bad_dtraj, ax=axes[0], cbar=False)
axes[0].scatter(*bad_clustercenters.T, s=15, c='C1')
pyemma.plots.plot_implied_timescales(
    pyemma.msm.its(bad_dtraj, lags=lags, errors='bayes'), ylog=False, ax=axes[1])
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(bad_dtraj, 2, lags=lags, errors='bayes'), ylog=False, ax=axes[2])
axes[0].set_xlabel('$x$')
axes[0].set_ylabel('$y$')
axes[0].set_title('bad discretization')
axes[1].set_title('MSM with bad discretization')
axes[2].set_title('HMM with bad discretization')
for ax in axes.flat[1:]:
    ax.set_ylim(-0.5, 12.5)
fig.tight_layout()

All three discrete states include data points from the two metastable regions (left panel) and, as the middle panel shows, this discretization error cannot be fixed by using a large lagtime and a regukar MSM estimation.

The HMM estimate (right panel) still yields a converged implied timescale, even at lagtime $1$. Also the HMM's CK test passes:

In [None]:
bad_hmm = pyemma.msm.estimate_hidden_markov_model(bad_dtraj, 2, lag=1)
print('HMM (bad): 1. implied timescale = {:.2f} steps'.format(bad_hmm.timescales()[0]))
pyemma.plots.plot_cktest(bad_hmm.cktest(2));

Next to the techniques shown above, there is another method to obtain an HMM, namely

### Coarse-graining an extisting MSM

To show how this works, we estimate a regular MSM using the bad discretization and (the unsuitable) lagtime $1$, and print the first implied timescale. Next, we coarse grain this MSM using the `coarse_grain()` method onto two states and print the type of the reulsting object:

In [None]:
bad_msm = pyemma.msm.estimate_markov_model(bad_dtraj, lag=1)
print('MSM (bad, normal): 1. implied timescale = {:.2f} steps'.format(bad_msm.timescales()[0]))

bad_coarse_msm = bad_msm.coarse_grain(2)
print(type(bad_coarse_msm))

Our coarse-grained MSM apparently is an HMM which, as we can see...

In [None]:
print('MSM (bad, coarse): 1. implied timescale = {:.2f} steps'.format(bad_coarse_msm.timescales()[0]))

pyemma.plots.plot_cktest(bad_coarse_msm.cktest(2));

... agrees well with the HMM directly estimated on the bad discretization.

## Case 2: low-dimensional molecular dynamics data (alanine dipeptide)
We are now illustrating two typical use cases of hidden markov state models: **coarse graining** an existing MSM and **reestimating** an HMM from scratch (to resolve faster processes than an MSM).

Let us begin with the (already familiar) coarse graining use case.

We fetch the alanine dipeptide data set, load the backbone torsions into memory, directly discretize the full space using $k$-means clustering, visualize the margial and joint distributions of both components as well as the cluster centers, and show the ITS convergence to help selecting a suitable lag time:

In [None]:
pdb = mdshare.fetch('alanine-dipeptide-nowater.pdb', working_directory='data')
files = mdshare.fetch('alanine-dipeptide-*-250ns-nowater.dcd', working_directory='data')

feat = pyemma.coordinates.featurizer(pdb)
feat.add_backbone_torsions()
data = pyemma.coordinates.load(files, features=feat)
data_all = np.concatenate(data)

cluster = pyemma.coordinates.cluster_kmeans(data, k=100, max_iter=50, stride=10)
dtrajs_all = np.concatenate(cluster.dtrajs)

its = pyemma.msm.its(cluster.dtrajs, lags=[1, 2, 5, 10, 20, 50], nits=4, errors='bayes')

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_feature_histograms(data_all, feature_labels=['$\Phi$', '$\Psi$'], ax=axes[0])
pyemma.plots.plot_density(*data_all.T, ax=axes[1], cbar=False, alpha=0.3)
axes[1].scatter(*cluster.clustercenters.T, s=15)
axes[1].set_xlabel('$\Phi$')
axes[1].set_ylabel('$\Psi$')
pyemma.plots.plot_implied_timescales(its, ax=axes[2], units='ps')
fig.tight_layout()

Based on the implied timescale convergence plot, we choose a lagtime of $10$ steps, estimate a Bayesian MSM, and show the results of a CK test:

In [None]:
bayesian_msm = pyemma.msm.bayesian_markov_model(cluster.dtrajs, lag=10, dt_traj='0.001 ns')

nstates = 4
pyemma.plots.plot_cktest(bayesian_msm.cktest(nstates));

At this point, we have a (bayesian) MSM with $100$ discrete states and basic validation. To obtain an HMM with only four states (the number for which we have validated our MSM), we call gthe `coarse_grain()` method and then print some properties of both our MSM and the HMM:

In [None]:
hmm = bayesian_msm.coarse_grain(nstates)

print('MSM: lagtime = {} ({} steps), data timestep = {}'.format(
    bayesian_msm.dt_model, bayesian_msm.lag, bayesian_msm.dt_traj))
print('HMM: lagtime = {} ({} steps), data timestep = {}'.format(
    hmm.dt_model, hmm.lag, hmm.dt_traj))

Apparently, the HMM has inherited the timestep and lagtime information of the original MSM object.

We now plot the stationary distribution (left) of the MSM as well as the metastable memberships (right) taken from the HMM:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
pyemma.plots.plot_contour(
    *data_all.T, bayesian_msm.pi[dtrajs_all],
    ax=axes[0], cbar_label='stationary distribution',
    method='nearest', mask=True)
pyemma.plots.plot_state_map(
    *data_all.T,
    hmm.metastable_assignments[dtrajs_all],
    ax=axes[1])
for ax in axes.flat:
    ax.set_xlabel('$\Phi$')
axes[0].set_ylabel('$\Psi$')
fig.tight_layout()

In this first use case, the HMM helps us to understand which discrete states form metastable sets.
For the second use case, we start from the same featurization and a coarse (but reasonable) discretization:

In [None]:
cluster = pyemma.coordinates.cluster_regspace(data, dmin=1.0)
dtrajs_all = np.concatenate(cluster.dtrajs)

its = pyemma.msm.its(cluster.dtrajs, lags=[1, 2, 5, 10, 20, 50], nits=6, errors='bayes')

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_feature_histograms(data_all, feature_labels=['$\Phi$', '$\Psi$'], ax=axes[0])
pyemma.plots.plot_density(*data_all.T, ax=axes[1], cbar=False, alpha=0.1)
axes[1].scatter(*cluster.clustercenters.T, s=15, c='C1')
axes[1].set_xlabel('$\Phi$')
axes[1].set_ylabel('$\Psi$')
pyemma.plots.plot_implied_timescales(its, ax=axes[2], units='ps')
fig.tight_layout()

Instead of building an MSM, we use the ITS plot to guess possible numbers of metastable states and repeat the ITS convergence analysis using (bayesian) HMMs and small lagtimes:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(cluster.dtrajs, 4, lags=[1, 2, 3, 4, 5], errors='bayes'), ax=axes[0])
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(cluster.dtrajs, 6, lags=[1, 2, 3, 4, 5], errors='bayes'), ax=axes[1])
fig.tight_layout()

The left panel shows that an HMM with four hidden states yields converged implied timescales from lagtime $1$.

The right panel, however, shows that an HMM with six hiddeen states and lagtime $1$ can resolve two additional processes.

Let us follow up on this and perform a CK test for a four state HMM at lagtime $1$...

In [None]:
hmm_4 = pyemma.msm.bayesian_hidden_markov_model(cluster.dtrajs, 4, lag=1, dt_traj='0.001 ns')
pyemma.plots.plot_cktest(hmm_4.cktest(mlags=5));

... and than the six state HMM at laggtime $1$ (we use `mlags=2` because we would loose the two fast processes at lagtimes $\geq3$):

In [None]:
hmm_6 = pyemma.msm.bayesian_hidden_markov_model(cluster.dtrajs, 6, lag=1, dt_traj='0.001 ns')
pyemma.plots.plot_cktest(hmm_6.cktest(mlags=2));

In both cases, the CK test is passed.

If we now compare both metastable membership plots...

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
for hmm, ax in zip([hmm_4, hmm_6], axes.flat):
    pyemma.plots.plot_state_map(
        *data_all.T,
        hmm.metastable_assignments[dtrajs_all],
        ax=ax)
    ax.set_title('HMM with {} hidden states'.format(hmm.nstates))
    ax.set_xlabel('$\Phi$')
axes[0].set_ylabel('$\Psi$')
fig.tight_layout()

... we see that the six state HMM is able to subdivide the two most metastable states of the four state HMM and, tgus, give us a more detailed view on the underlying system.

This is the second use case for HMMs: due to the low sensibility to discretization errors, we can afford to estimate HMMs at smaller lagtimes than MSMs and, thus, resolve  more processes.

Let us repeat this approach again for another featurization: we already know that it is possible to resolve six metastable states (five slow processes) using an HMM estimated on a coarse discretization of the backbone torsions. Can you achieve the same level of resolution using heavy atom distances and a suitable TICA projection?

**Exercise 1**: obtain the heavy atom distances, use TICA for dimension reduction, and discretize using a method of your choice.

In [None]:
feat = #FIXME
feat. #FIXME
data = #FIXME

tica = #FIXME
tica_all = np.concatenate(tica.get_output())

cluster = #FIXME
dtrajs_all = #FIXME

its = pyemma.msm.its(cluster.dtrajs, lags=[1, 2, 5, 10, 20, 50], nits=4, errors='bayes')

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_feature_histograms(tica_all, ax=axes[0])
pyemma.plots.plot_density(*tica_all[:, :2].T, ax=axes[1], cbar=False, alpha=0.1)
axes[1].scatter(*cluster.clustercenters[:, :2].T, s=15, c='C1')
axes[1].set_xlabel('IC 1')
axes[1].set_ylabel('IC 2')
pyemma.plots.plot_implied_timescales(its, ax=axes[2], units='ps')
fig.tight_layout()

In [None]:
feat = pyemma.coordinates.featurizer(pdb)
feat.add_distances(feat.select_Heavy())
data = pyemma.coordinates.load(files, features=feat)

tica = pyemma.coordinates.tica(data, lag=3)
tica_all = np.concatenate(tica.get_output())

cluster = pyemma.coordinates.cluster_kmeans(tica, k=100, max_iter=50, stride=10)
dtrajs_all = np.concatenate(cluster.dtrajs)

its = pyemma.msm.its(cluster.dtrajs, lags=[1, 2, 5, 10, 20, 50], nits=4, errors='bayes')

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_feature_histograms(tica_all, ax=axes[0])
pyemma.plots.plot_density(*tica_all[:, :2].T, ax=axes[1], cbar=False, alpha=0.1)
axes[1].scatter(*cluster.clustercenters[:, :2].T, s=15, c='C1')
axes[1].set_xlabel('IC 1')
axes[1].set_ylabel('IC 2')
pyemma.plots.plot_implied_timescales(its, ax=axes[2], units='ps')
fig.tight_layout()

**Exercise 2**: let's see if your discretized data is suitable to converge five slow implied timescales using a bayesian HMM.

In [None]:
pyemma.plots.plot_implied_timescales #FIXME

In [None]:
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(cluster.dtrajs, 6, lags=[1, 2, 3, 4], errors='bayes'));

**Exercise 3**: estimate a bayesian HMM and perform a CK test.

In [None]:
hmm = #FIXME
pyemma.plots. #FIXME

In [None]:
hmm = pyemma.msm.bayesian_hidden_markov_model(cluster.dtrajs, 6, lag=1, dt_traj='0.001 ns')
pyemma.plots.plot_cktest(hmm.cktest(mlags=2));

**Exercise 4**: now that you have a model, be creative and visualize the metastable regions in your projected space.

In [None]:
#FIXME

In [None]:
def draw_panel(ax, i, j):
    pyemma.plots.plot_state_map(
        *tica_all[:, [i, j]].T,
        hmm.metastable_assignments[dtrajs_all],
        ax=ax)
    ax.set_xlabel('IC {}'.format(i + 1))
    ax.set_ylabel('IC {}'.format(j + 1))

fig, axes = plt.subplots(2, 2, figsize=(10, 8))
draw_panel(axes[0, 0], 0, 2)
draw_panel(axes[0, 1], 1, 2)
draw_panel(axes[1, 0], 0, 1)
axes[1, 1].set_axis_off()
fig.tight_layout()

## Wrapping up
In this notebook, we have learned how to use a hidden Markov state model (HMM) and how they differ from an MSM. In detail, we have used
- `pyemma.msm.timescales_hmsm()` function to obtain an implied timescale object for HMMs,
- `pyemma.msm.estimate_hidden_markov_model()` to estimate a regular HMM,
- `pyemma.msm.bayesian_hidden_markov_model()` to estimate a Bayesian HMM, and
- the `metastable_assignments` attribute of an HMM object to access the metastable membership of discrete states.