# 05 - hidden Markov state models (HMMs)
In this notebook, we will learn about hidden Markov state models and how to use them to deal with bad discretization. We further explain how to obtain a coarse model based on an initial MSM analysis.

Maintainers:

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="left"/></a>

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import mdshare
import pyemma

## Case 1: preprocessed, two-dimensional data (toy model)
We load the two-dimensional as well as the true discrete trajectory from an archive using `numpy`...

In [None]:
file = mdshare.fetch('hmm-doublewell-2d-100k.npz', working_directory='data')
with np.load(file) as fh:
    data = fh['trajectory']
    good_dtraj = fh['discrete_trajectory']

... and discretize the two-dimensional data poorly:

In [None]:
poor_clustercenters = np.asarray([[-0.1, -0.6], [0.1, 1.4]])
poor_dtraj = pyemma.coordinates.assign_to_centers(data, centers=poor_clustercenters)[0]

The result of this poor clustering is depicted in the left panel below, whereas the correct discretization is shown on the right:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
pyemma.plots.plot_state_map(*data.T, poor_dtraj, ax=axes[0])
axes[0].scatter(*poor_clustercenters.T, s=15, c='C1')
pyemma.plots.plot_state_map(*data.T, good_dtraj, ax=axes[1])
axes[1].scatter(*np.asarray([[0, -1], [0, 1]]).T, s=15, c='C1')
for ax in axes.flat:
    ax.set_xlabel('$x$')
    ax.set_xlim(-4, 4)
    ax.set_ylim(-4, 4)
    ax.set_aspect('equal')
axes[0].set_ylabel('$y$')
axes[0].set_title('poor discretization')
axes[1].set_title('good discretization')
fig.tight_layout()

One of the first steps after discretization should always be an implied timescale convergence plot. We try this here for both discretizations...

In [None]:
lags = [i + 1 for i in range(10)]

fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
pyemma.plots.plot_implied_timescales(
    pyemma.msm.its(poor_dtraj, lags=lags, errors='bayes', n_jobs=None), ylog=False, ax=axes[0])
pyemma.plots.plot_implied_timescales(
    pyemma.msm.its(good_dtraj, lags=lags, errors='bayes', n_jobs=None), ylog=False, ax=axes[1])
axes[0].set_title('MSM with poor discretization')
axes[1].set_title('MSM with good discretization')
fig.tight_layout()

... and see that we need a lagtime of at least $4$ or $5$ steps to roughly converge the implied timescales for the poor discretization. The good discretization yields a converged timescale from lagtime $1$. For demonstration purposes, let's choose a lagtime $1$ for models of both discretizations and see what happens.

We continue to build MSM objects and print out the first implied timescale for both:

In [None]:
poor_msm = pyemma.msm.estimate_markov_model(poor_dtraj, lag=1)
good_msm = pyemma.msm.estimate_markov_model(good_dtraj, lag=1)

print('MSM (poor): 1. implied timescale = {:.2f} steps'.format(poor_msm.timescales()[0]))
print('MSM (good): 1. implied timescale = {:.2f} steps'.format(good_msm.timescales()[0]))

Looking at the next in line validation step, a Chapman-Kolmogorow test, we observe that the `poor_msm`...

In [None]:
pyemma.plots.plot_cktest(poor_msm.cktest(2));

... in fact shows deviations that are too large for a simple double well system. In comparison, the `good_msm`...

In [None]:
pyemma.plots.plot_cktest(good_msm.cktest(2));

... shows excellent agreement between higher lagtime estimation and model prediction.

Let us now repeat both estimations using hidden Markov state models instead of regular MSMs. We begin with the implied timescale convergence using the `pyemma.msm.timescales_hmsm()` function and two hidden states:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharey=True)
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(poor_dtraj, 2, lags=lags, errors='bayes', n_jobs=None),
    ylog=False, ax=axes[0])
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(good_dtraj, 2, lags=lags, errors='bayes', n_jobs=None),
    ylog=False, ax=axes[1])
axes[0].set_title('HMM with poor discretization')
axes[1].set_title('HMM with good discretization')
fig.tight_layout()

Now, both discretizations give us converged implied timescales from the very start (lagtime $1$).

We estimate HMMs using both discretizations at lagtime 1 and two hidden states...

In [None]:
poor_hmm = pyemma.msm.estimate_hidden_markov_model(poor_dtraj, 2, lag=1)
good_hmm = pyemma.msm.estimate_hidden_markov_model(good_dtraj, 2, lag=1)

print('HMM (poor): 1. implied timescale = {:.2f} steps'.format(poor_hmm.timescales()[0]))
print('HMM (good): 1. implied timescale = {:.2f} steps'.format(good_hmm.timescales()[0]))

... and obtain nearly identical estimates for the first implied timescale.

We observe that HMMs, unlike MSMs, seem to be somewhat resistant to discretization errors.

Regarding the CK test, we again see that the `poor_hmm`...

In [None]:
pyemma.plots.plot_cktest(poor_hmm.cktest(2));

... and the `good_hmm`...

In [None]:
pyemma.plots.plot_cktest(good_hmm.cktest(2));

... are in perfect agreement.

Let us now worsen the discretization using three badly chosen clustercenters and show the discretization, the MSM-ITS convergence, and the HMM-ITS convergence:

In [None]:
bad_clustercenters = np.asarray([[-2.5, -1.4], [0.3, 1.2], [2.7, -0.6]])
bad_dtraj = pyemma.coordinates.assign_to_centers(data, centers=bad_clustercenters)[0]

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_state_map(*data.T, bad_dtraj, ax=axes[0])
axes[0].scatter(*bad_clustercenters.T, s=15, c='C1')
pyemma.plots.plot_implied_timescales(
    pyemma.msm.its(bad_dtraj, lags=lags, errors='bayes'),
    ylog=False, ax=axes[1])
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(bad_dtraj, 2, lags=lags, errors='bayes'),
    ylog=False, ax=axes[2])
axes[0].set_xlabel('$x$')
axes[0].set_ylabel('$y$')
axes[0].set_xlim(-4, 4)
axes[0].set_ylim(-4, 4)
axes[0].set_aspect('equal')
axes[0].set_title('bad discretization')
axes[1].set_title('MSM with bad discretization')
axes[2].set_title('HMM with bad discretization')

for ax in axes.flat[1:]:
    ax.set_ylim(-0.5, 12.5)
    ts = good_msm.timescales()[0]
    ax.hlines(ts, *ax.get_xlim(), linestyle=':')
    ax.annotate('good_msm', xy=(1.5, 1.05 * ts), ha='left', va='bottom')

fig.tight_layout()

All three discrete states include data points from the two metastable regions (left panel) and, as the middle panel shows, this discretization error cannot be fixed by using a large lagtime and a regular MSM estimation. The dotted line represents the result from our previously estimated MSM with a fine discretization. 

The HMM estimate (right panel) still yields a converged implied timescale, even at lagtime $1$. Also the HMM's CK test passes:

In [None]:
bad_hmm = pyemma.msm.estimate_hidden_markov_model(bad_dtraj, 2, lag=1)
print('HMM (bad): 1. implied timescale = {:.2f} steps'.format(bad_hmm.timescales()[0]))
pyemma.plots.plot_cktest(bad_hmm.cktest(2));

## Case 2: low-dimensional molecular dynamics data (alanine dipeptide)
We are now illustrating a typical use case of hidden markov state models: estimating an MSM that is used as a heuristics for the number of slow processes or hidden states, and estimating an HMM (to resolve faster processes than an MSM).

We fetch the alanine dipeptide data set, load the backbone torsions into memory...

In [None]:
pdb = mdshare.fetch('alanine-dipeptide-nowater.pdb', working_directory='data')
files = mdshare.fetch('alanine-dipeptide-*-250ns-nowater.dcd', working_directory='data')

feat = pyemma.coordinates.featurizer(pdb)
feat.add_backbone_torsions()
data = pyemma.coordinates.load(files, features=feat)
data_all = np.concatenate(data)

cluster = pyemma.coordinates.cluster_kmeans(data, k=75, max_iter=50, stride=10)
dtrajs_all = np.concatenate(cluster.dtrajs)

... discretize the full space using $k$-means clustering, visualize the marginal and joint distributions of both components as well as the cluster centers, and show the ITS convergence to help selecting a suitable lag time:

In [None]:
its = pyemma.msm.its(
    cluster.dtrajs, lags=[1, 2, 5, 10, 20, 50], nits=4, errors='bayes', n_jobs=None)

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_feature_histograms(data_all, feature_labels=['$\Phi$', '$\Psi$'], ax=axes[0])
pyemma.plots.plot_density(*data_all.T, ax=axes[1], cbar=False, alpha=0.3)
axes[1].scatter(*cluster.clustercenters.T, s=15, c='C1')
axes[1].set_xlabel('$\Phi$')
axes[1].set_ylabel('$\Psi$')
pyemma.plots.plot_implied_timescales(its, ax=axes[2], units='ps')
fig.tight_layout()

Based on the implied timescale convergence plot, we choose a lagtime of $10$ steps. We further find $3$ slow processes in the implied timescales plot, meaning that we can expect $4$ metastable sets or hidden states. First, we estimate a Bayesian MSM, and show the results of a CK test:

In [None]:
bayesian_msm = pyemma.msm.bayesian_markov_model(cluster.dtrajs, lag=10, dt_traj='0.001 ns')

nstates = 4
pyemma.plots.plot_cktest(bayesian_msm.cktest(nstates), units='ps');

At this point, we have a (bayesian) MSM with $75$ discrete states and basic validation. To obtain an HMM with only four states (the number for which we have validated our MSM), we compute the implied timescales for HMMs with this number of hidden states. 

We repeat the ITS convergence analysis using (bayesian) HMMs and small lagtimes for a $4$-state HMM. For demonstration purposes, we add the same analysis with a $6$-state HMM to visualize what happens if the number of states is not as clear as in this example:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(
        cluster.dtrajs, 4, lags=[1, 2, 3, 4, 5], errors='bayes', nsamples=50),
    ax=axes[0], units='ps')
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(
        cluster.dtrajs, 6, lags=[1, 2, 3, 4, 5], errors='bayes', nsamples=50),
    ax=axes[1], units='ps')
fig.tight_layout()

The left panel shows that an HMM with four hidden states yields converged implied timescales from lagtime $1$.

The right panel, however, shows that an HMM with six hidden states and lagtime $1$ can resolve two additional processes.

Let us follow up on this and perform a CK test for a four state HMM at lagtime $1$...

In [None]:
hmm_4 = pyemma.msm.bayesian_hidden_markov_model(cluster.dtrajs, 4, lag=1, dt_traj='0.001 ns', nsamples=50)
pyemma.plots.plot_cktest(hmm_4.cktest(mlags=5), units='ps');

... and than the six state HMM at laggtime $1$ (we use `mlags=2` because we would loose the two fast processes at lagtimes $\geq3$):

In [None]:
hmm_6 = pyemma.msm.bayesian_hidden_markov_model(cluster.dtrajs, 6, lag=1, dt_traj='0.001 ns', nsamples=50)
pyemma.plots.plot_cktest(hmm_6.cktest(mlags=2), units='ps');

In both cases, the CK test is passed.

If we now compare both metastable membership plots...

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True, sharey=True)
for hmm, ax in zip([hmm_4, hmm_6], axes.flat):
    pyemma.plots.plot_state_map(
        *data_all.T,
        hmm.metastable_assignments[dtrajs_all], 
        ax=ax)
    ax.set_title('HMM with {} hidden states'.format(hmm.nstates))
    ax.set_xlabel('$\Phi$')
axes[0].set_ylabel('$\Psi$')
fig.tight_layout()

... we see that the six state HMM is able to subdivide the two most metastable states of the four state HMM and, thus, give us a more detailed view on the underlying system. As one would have expected from the implied timescale plot, the metastable dynamics is already well-described with $4$ hidden states.

Due to the low sensibility to discretization errors, we can afford to estimate HMMs at smaller lagtimes than MSMs and, thus, resolve  more processes.

Like with classical MSMs, we can further analyze properties of the HMM. As an example, have a look at the transition paths and committor probabilities below. 

In [None]:
A = [0]
B = [3]
flux = pyemma.msm.tpt(hmm_4, A, B)

highest_membership = hmm_4.metastable_distributions.argmax(1)
coarse_state_centers = cluster.clustercenters[hmm_4.observable_set[highest_membership]]

Please note one important difference when operating on metastable sets: Since HMMs operate directly on the metastable sets, we need not compute the flux between the `msm.metastable_sets` but between the lists of macrostate numbers, e.g. instead of `A = msm.metastable_sets[0]` we need `A = [0]`. 

Let's now visualize the committor as before. Does it look familiar?

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))

pyemma.plots.plot_contour(
    *data_all.T, flux.committor[hmm_4.metastable_assignments[dtrajs_all]], cmap='brg', ax=ax,
    mask=True, method='nearest', cbar_label='committor 0 -> 3', alpha=.8, zorder=-1);

pyemma.plots.plot_flux(flux, coarse_state_centers, flux.stationary_distribution, ax=ax, 
                       show_committor=False, figpadding=0, show_frame=True);

ax.set_xlabel('$\Phi$')
ax.set_ylabel('$\Psi$')
ax.set_xlim(data_all[:, 0].min(), data_all[:, 0].max())
ax.set_ylim(data_all[:, 1].min(), data_all[:, 1].max())
fig.tight_layout()

As we see here, in addition to the properties described here, HMMs provide the same analysis tools as MSMs. E.g. eigenvectors and mean first passage times can be extracted as described in previous notebooks. 

Let us now repeat this approach again for another featurization: we already know that it is possible to resolve six metastable states (five slow processes) using an HMM estimated on a discretization of the backbone torsions. Can you achieve the same level of resolution using heavy atom distances and a suitable TICA projection?

**Exercise 1**: obtain the heavy atom distances, use TICA for dimension reduction, and discretize using a method of your choice.

In [None]:
feat = #FIXME
feat. #FIXME
data = #FIXME

tica = #FIXME
tica_all = np.concatenate(tica.get_output())

cluster = #FIXME
dtrajs_all = #FIXME

its = pyemma.msm.its(
    cluster.dtrajs, lags=[1, 2, 5, 10, 20, 50], nits=4, errors='bayes', n_jobs=None)

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_feature_histograms(tica_all, ax=axes[0])
pyemma.plots.plot_density(*tica_all[:, :2].T, ax=axes[1], cbar=False, alpha=0.1)
axes[1].scatter(*cluster.clustercenters[:, :2].T, s=15, c='C1')
axes[1].set_xlabel('IC 1')
axes[1].set_ylabel('IC 2')
pyemma.plots.plot_implied_timescales(its, ax=axes[2], units='ps')
fig.tight_layout()

In [None]:
feat = pyemma.coordinates.featurizer(pdb)
feat.add_distances(feat.select_Heavy())
data = pyemma.coordinates.load(files, features=feat)

tica = pyemma.coordinates.tica(data, lag=3)
tica_all = np.concatenate(tica.get_output())

cluster = pyemma.coordinates.cluster_kmeans(tica, k=75, max_iter=50, stride=10)
dtrajs_all = np.concatenate(cluster.dtrajs)

its = pyemma.msm.its(
    cluster.dtrajs, lags=[1, 2, 5, 10, 20, 50], nits=4, errors='bayes', n_jobs=None)

fig, axes = plt.subplots(1, 3, figsize=(12, 3))
pyemma.plots.plot_feature_histograms(tica_all, ax=axes[0])
pyemma.plots.plot_density(*tica_all[:, :2].T, ax=axes[1], cbar=False, alpha=0.1)
axes[1].scatter(*cluster.clustercenters[:, :2].T, s=15, c='C1')
axes[1].set_xlabel('IC 1')
axes[1].set_ylabel('IC 2')
pyemma.plots.plot_implied_timescales(its, ax=axes[2], units='ps')
fig.tight_layout()

**Exercise 2**: let's see if your discretized data is suitable to converge five slow implied timescales using a bayesian HMM.

In [None]:
pyemma.plots.plot_implied_timescales #FIXME

In [None]:
pyemma.plots.plot_implied_timescales(
    pyemma.msm.timescales_hmsm(
        cluster.dtrajs, 6, lags=[1, 2, 3, 4], errors='bayes'), units='ps');

**Exercise 3**: estimate a bayesian HMM and perform a CK test.

In [None]:
hmm = #FIXME
pyemma.plots. #FIXME

In [None]:
hmm = pyemma.msm.bayesian_hidden_markov_model(cluster.dtrajs, 6, lag=1, dt_traj='0.001 ns')
pyemma.plots.plot_cktest(hmm.cktest(mlags=2), units='ps');

**Exercise 4**: now that you have a model, be creative and visualize the metastable regions in your projected space.

In [None]:
#FIXME

In [None]:
def draw_panel(ax, i, j):
    pyemma.plots.plot_state_map(
        *tica_all[:, [i, j]].T,
        hmm.metastable_assignments[dtrajs_all],
        ax=ax)
    ax.set_xlabel('IC {}'.format(i + 1))
    ax.set_ylabel('IC {}'.format(j + 1))

fig, axes = plt.subplots(2, 2, figsize=(10, 8))
draw_panel(axes[0, 0], 0, 2)
draw_panel(axes[0, 1], 1, 2)
draw_panel(axes[1, 0], 0, 1)
axes[1, 1].set_axis_off()
fig.tight_layout()

## Wrapping up
In this notebook, we have learned how to use a hidden Markov state model (HMM) and how they differ from an MSM. In detail, we have used
- `pyemma.msm.timescales_hmsm()` function to obtain an implied timescale object for HMMs,
- `pyemma.msm.estimate_hidden_markov_model()` to estimate a regular HMM,
- `pyemma.msm.bayesian_hidden_markov_model()` to estimate a Bayesian HMM, and
- the `metastable_assignments` attribute of an HMM object to access the metastable membership of discrete states.