# 00 - Showcase Pentapeptide: A PyEMMA walkthrough
In this notebook, we introduce the most basic features of PyEMMA. Overall, the notebook serves as an example workflow for analyzing molecular dynamics trajectories. Here, we keep the details to absolutely minimal and refer to the more specialized notebooks fleshing out each topic covered here in more details, including exercises and some theory. 

Maintainers: 

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="left"/></a>

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import mdshare
import pyemma

In [None]:
mdshare.__version__

### Data input and featurization
We start our short walkthrough tutorial by loading a topology file (in this case, a PDB) and the trajectory data.

In [None]:
pdb = mdshare.fetch('pentapeptide-impl-solv.pdb', working_directory='data')
files = mdshare.fetch('pentapeptide-*-500ns-impl-solv.dcd', working_directory='data')

The above cell has downloaded the data from our servers; in general, only strings of file paths need to be provided.

In [None]:
print(pdb)
print(files)

As from the beginning it is unknown which feature describes the system best, we start with a broad systematic analysis. For the sake of simplicity, we are only interested in modeling the backbone kinetics. Thus, we only consider features describing the backbone.

In PyEMMA, the `featurizer` is a central object that incorporates the system's topology. Features are easily computed by adding the target feature e.g. with `featurizer.add_backbone_torsions()`. We will load backbone torsion angles, backbone heavy atom distances and backbone heavy atom distances. 

In [None]:
torsions_feat = pyemma.coordinates.featurizer(pdb)
torsions_feat.add_backbone_torsions(cossin=True)
torsions_data = pyemma.coordinates.load(files, features=torsions_feat)
labels = ['backbone\ntorsions']

positions_feat = pyemma.coordinates.featurizer(pdb)
positions_feat.add_selection(positions_feat.select_Backbone())
positions_data = pyemma.coordinates.load(files, features=positions_feat)
labels += ['backbone atom\npositions']

distances_feat = pyemma.coordinates.featurizer(pdb)
distances_feat.add_distances(
    distances_feat.pairs(distances_feat.select_Backbone(), excluded_neighbors=2))
distances_data = pyemma.coordinates.load(files, features=distances_feat)
labels += ['backbone atom\ndistances']

### Feature selection
#TODO: explain vamp-2 score.

In [None]:
dim = 10

fig, axes = plt.subplots(1, 3, figsize=(12, 3), sharey=True)
for ax, lag in zip(axes.flat, [5, 10, 20]):
    torsions_vamp = pyemma.coordinates.vamp(torsions_data, lag=lag, dim=dim)
    scores = [torsions_vamp.score()]
    positions_vamp = pyemma.coordinates.vamp(positions_data, lag=lag, dim=dim)
    scores += [positions_vamp.score()]
    distances_vamp = pyemma.coordinates.vamp(distances_data, lag=lag, dim=dim)
    scores += [distances_vamp.score()]
    ax.bar(labels, scores)
    ax.set_title(r'lag time $\tau$={:.1f}ns'.format(lag * 0.1))
    if lag == 5:
        vamp_bars_plot = dict(labels=labels, scores=scores, dim=dim, lag=lag) # save for later
axes[0].set_ylabel('VAMP2 score')
fig.tight_layout()

In [None]:
lags = [1, 2, 5, 10, 20, 50]
dims = [i + 1 for i in range(10)]

fig, ax = plt.subplots()
for lag in lags:
    scores = [pyemma.coordinates.vamp(torsions_data, lag=lag, dim=dim).score()
              for dim in dims]
    ax.plot(dims, scores, label='lag={:.1f}ns'.format(lag * 0.1))
ax.legend()
ax.set_xlabel('number of dimensions')
ax.set_ylabel('VAMP2 score')
fig.tight_layout()

#TODO: Explain why we chose lag 5

### Coordinate transform and discretization

We perform TICA at the optimal using the lag time obtained from the VAMP-2 score. Please note the general PyEMMA API that pllies to all estimators. By calling the TICA estimator with the data (`tica = pyemma.coordinates.tica(torisions_data)`), the estimation is done and an estimator instance returned (`tica`). It contains all the information about the specific entity and maps the input data by calling `tica.get_output()`. 

In [None]:
tica = pyemma.coordinates.tica(torsions_data, lag=5)
tica_output = tica.get_output()
tica_concatenated = np.concatenate(tica_output)

We visualize the marginal and joint distributions of our TICA components by simple histograming. 

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
pyemma.plots.plot_feature_histograms(tica_concatenated, ax=axes[0])
pyemma.plots.plot_density(*tica_concatenated.T[:2], ax=axes[1], logscale=True)
fig.tight_layout()

We note that the projection yields defined clusters of high densities, that are most likely to be identified as metastable basins. 

Let’s have a look how one of the trajectories looks like in the space of the first $4$ TICA components. We can see that the TICA components nicely resolve the slow transitions as discrete jumps. Thus, metastability is well-described in this projection.

In [None]:
fig, axes = plt.subplots(4, 1, figsize=(12, 5), sharex=True)
x = 0.1 * np.arange(tica_output[0].shape[0])
for i, (ax, tic) in enumerate(zip(axes.flat, tica_output[0].T)):
    ax.plot(x, tic)
    ax.set_ylabel('IC {}'.format(i + 1))
axes[-1].set_xlabel('time / ns')
fig.tight_layout()

The TICA coordinates are now clustered into a number of discrete states using the $k$-means algorithm. The $k$-means algorithm requires as input the number of clusters. The trajectories are automatically assigned to the cluster centers by calling `cluster.dtrajs`. 

In [None]:
cluster = pyemma.coordinates.cluster_kmeans(tica_output, k=200, max_iter=50, stride=10)
dtrajs_concatenated = np.concatenate(cluster.dtrajs)

We can check the location of our discrete states by plotting them onto the density of our data in the first two TICA dimensions. The cluster centers are contained in the `cluster` object.

In [None]:
fig, ax = plt.subplots()
pyemma.plots.plot_density(*tica_concatenated.T[:2], ax=ax, cbar=False, alpha=0.3)
ax.scatter(*cluster.clustercenters.T[:2], s=5, c='C1')
ax.set_xlabel('IC 1')
ax.set_ylabel('IC 2')
fig.tight_layout()

The states are well distributed in TICA space.

### MSM estimation and validation

Now, we calculate the implied timescales with `pyemma.msm.its()` up to a lagtime of $50$ steps, which is equivalent to $5\,\mathrm{ns}$ for this dataset ($\Delta t=0.1\,\mathrm{ns}$). The uncertainty of the implied timescales is quantified based upon Markov models sampled according to a Bayesian scheme. If this this is too time-consuming maximum likelihood MSMs can be used instead, by setting the errors keyword argument to `None`.

Please note that instead of a single number `lags=50`, an array can be passed to compute the ITS at defined lagtimes. When we pass an integer ($K$) as this value a set of lag times starting from $\Delta t$ to $K\Delta t$ will be generated, using a multiplier of 1.5 between successive lag-times.

In [None]:
its = pyemma.msm.its(cluster.dtrajs, lags=50, nits=10, errors='bayes')
pyemma.plots.plot_implied_timescales(its, units='ns', dt=0.1);

The implied timescales converge quickly. Above $0.5$ ns, the implied timescales of the slowest processes are constant within error. We thus select a lag time of $5$ steps ($0.5$ ns) to build a Markov model. As a quick check we print the fraction of states and counts that are in the active set. 

Please note the similarity of the `msm` object to the `tica` object. Both are estimator instances and contain all the relevant information from the estimation and methods for validation and further analysis. 

In order to keep track of our trajectory time step, a `dt_traj` keyword argument can be passed that contains the trajectory time step unit.

In [None]:
msm = pyemma.msm.bayesian_markov_model(cluster.dtrajs, lag=5, dt_traj='0.1 ns')
print('fraction of states used = {:.2f}'.format(msm.active_state_fraction))
print('fraction of counts used = {:.2f}'.format(msm.active_count_fraction))

The model is validated with a Chapman-Kolmogorov test. We use the implied timescales plot as a heuristics to estimate the number of metastable states to test for. We can resolve $3$ slow processes up to lagtimes of $5$ ns. Since the Chapman-Kolmogorov test involves estimations at higher lagtimes, we will attempt to capture those processes choosing  $4$ metastable states.

In [None]:
nstates = 4
cktest = msm.cktest(nstates)
pyemma.plots.plot_cktest(cktest, dt=0.1, units='ns');

Assuming $4$ metastable states yields a passing Chapman-Kolmogorov test.

### MSM spectral analysis and PCCA

From the MSM object `msm`, various properties can be obtained. We start the spectral analysis by looking at implied timescales. 

In [None]:
nits = 15
fig, axes = plt.subplots(1, 2, figsize=(8, 3))
axes[0].errorbar(range(1, nits+1), msm.timescales(k = nits), yerr = msm.sample_std('timescales', nits), fmt='.')
axes[1].plot(range(1, nits), msm.timescales(k = nits)[:-1]/msm.timescales(k = nits)[1:], '.')
axes[0].set_ylabel('implied timescales / ns')
axes[1].set_ylabel('timescale fraction / unitless')
for i, ax in enumerate(axes):
    ax.set_xlabel('implied timescale index')
    ax.set_xticks(range(1, nits+1))
    ax.grid('on', axis='x')
    if i == 0:
        ax.hlines(msm.lag*0.1, 1, nits, lw=1.5, linestyles="--")
    elif i == 1:
        ax.set_xticks(range(1, nits))
        ax.set_xticklabels(["{:d}/{:d}".format(k, j) for k,j in zip(range(1,nits+2), range(2,nits+1))], rotation=45)
        ax.set_xlabel('implied timescale indices')

fig.tight_layout()

As we see, PyEMMA sorts the implied timescales (and their corresponding eigenfunctions) in descending order. From the time-scale fraction we can see that the largest time-scale gap, within the time-resolution of the model, is between the 3rd and 4th process, suggesting that 4 meta-stable states may be a good choice for coarse-graining. We discuss this further below.

We go on by analyzing the stationary distribution and the free energy computed over the first two TICA coordinates. The stationary distribution, $\pi$, is stored in `msm.pi` or  (as an alias) `msm.stationary_distribution`. We compute the free energy landscape by re-weighting the trajectory frames with stationary probabilities from the MSM (returned by  `msm.trajectory_weights()`).

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4), sharex=True, sharey=True)
pyemma.plots.plot_contour(
    *tica_concatenated.T[:2],
    msm.pi[dtrajs_concatenated],
    ax=axes[0],
    mask=True,
    cbar_label='stationary distribution')
pyemma.plots.plot_free_energy(
    *tica_concatenated.T[:2],
    weights=np.concatenate(msm.trajectory_weights()),
    ax=axes[1],
    legacy=False)
for ax in axes.flat:
    ax.set_xlabel('IC 1')
axes[0].set_ylabel('IC 2')
fig.tight_layout()

The eigenvectors corresponding to the slowest processes (larges implied time-scales) contain information about what configurational changes are happening on what time-scales. We analyze the slowest processes by inspecting the value of the first four eigenfunctions projected on two the first TICA coordinates. As the first right eigenvector  corresponds to the stationary process (equilibrium) it is constant at $1$.

In [None]:
eigvec = msm.eigenvectors_right()
print('The first eigenvector is one: {} (min={}, max={})'.format(
    np.allclose(eigvec[:, 0], 1, atol=1e-15), eigvec[:, 0].min(), eigvec[:, 0].max()))

fig, axes = plt.subplots(1, 4, figsize=(15, 3))
for i, ax in enumerate(axes.flat):
    pyemma.plots.plot_contour(
        *tica_concatenated.T[:2],
        eigvec[dtrajs_concatenated, i + 1],
        ax=ax,
        cmap='PiYG',
        cbar_label='{}. right eigenvector'.format(i + 2),
        mask=True)
    ax.set_xlabel('IC 1')
axes[0].set_ylabel('IC 2')
fig.tight_layout()

The eigenvector of the MSM contain information about what conformational changes are happening on a certain timescale, governed by the correponding implied timescale. Specifically, conformational states in areas of configuration space at negative-values for a given eigenvector, exchange with corresponding positive regions for the same eigenvector. The relaxation timescale of this exchange process is exactly the implied timescale. Since the eigenvectors were internally sorted according to their eigenvalue, they correspond to the four slowest processes of the implied timescale plot. We see that indeed, the slowest processes occur inbetween the dense clusters in the TICA projection.

Next, we do a PCCA coarse graining into a user defined number of macrostates. As already discussed, $4$ is a good choice for this example.

In [None]:
msm.pcca(nstates);

We have now determined the probability for each microstate to belong to a given macrostate. These are encoded by the PCCA memberships to a given macrostate.

In [None]:
fig, axes = plt.subplots(1, msm.n_metastable, figsize=(15, 3), sharex=True, sharey=True)
for i, ax in enumerate(axes.flat):
    pyemma.plots.plot_contour(
        *tica_concatenated.T[:2],
        msm.metastable_distributions[i][dtrajs_concatenated],
        ax=ax,
        cmap='afmhot_r', 
        mask=True,
        method='nearest',
        cbar_label='metastable distribution {}'.format(i))
    ax.set_xlabel('IC 1')
axes[0].set_ylabel('IC 2')
fig.tight_layout()

As we see, microstates are fuzzily assigned to macrostates by membership probabilities. We can access crisp if desired in the attribute `msm.metastable_assignments`. Let's see how PCCA has coarse-grained our state space in the first two TICA projections.

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))

metastable_traj = msm.metastable_assignments[dtrajs_concatenated]

pyemma.plots.plot_state_map(*tica_concatenated.T[:2], metastable_traj, ax=ax, zorder=-1)

ax.set_xlabel('IC 1')
ax.set_ylabel('IC 2')
ax.set_xlim(-2, 8)
ax.set_ylim(-2, 8)
fig.tight_layout()

PCCA has nicely separated our state space. For each macrostate we can generate a number of representative sample structures and store them into a trajectory file for visual inspection. The following cell writes trajectory files to hard disc.

In [None]:
pcca_samples = msm.sample_by_distributions(msm.metastable_distributions, 10)
torsions_source = pyemma.coordinates.source(files, features=torsions_feat)
pyemma.coordinates.save_trajs(
    torsions_source,
    pcca_samples,  
    outfiles=['./data/pcca{}_10samples.xtc'.format(n) for n in range(msm.n_metastable)])

Alternatively, one can visualize the structures in this notebook as follows:

In [None]:
def visualize_metastable(samples, cmap, selection='backbone'):
    """ visualize metastable states
    Parameters
    ----------
    samples: list of mdtraj.Trajectory objects
        each element contains all samples for one metastable state.
    cmap: matplotlib.colors.ListedColormap
        color map used to visualize metastable states before.
    selection: str
        which part of the molecule to selection for visualization. For details have a look here:
        http://mdtraj.org/latest/examples/atom-selection.html#Atom-Selection-Language
    """
    import nglview
    from matplotlib.colors import to_hex

    widget = nglview.NGLWidget()
    widget.clear_representations()
    ref = samples[0]
    for i, s in enumerate(samples):
        s = s.superpose(ref)
        s = s.atom_slice(s.top.select(selection))
        comp = widget.add_trajectory(s)
        comp.add_ball_and_stick()

    # this has to be done in a separate loop for whatever reason...
    x = np.linspace(0, 1, num=len(samples))
    for i, x_ in enumerate(x):
        c = to_hex(cmap(x_))
        widget.update_ball_and_stick(color=c, component=i, repr_index=i)
        widget.remove_cartoon(component=i)
    return widget

In [None]:
my_samples = [pyemma.coordinates.save_traj(files, idist, outfile=None, top=pdb)
              for idist in msm.sample_by_distributions(msm.metastable_distributions, 50)]

cmap = mpl.cm.get_cmap('viridis', nstates)
visualize_metastable(my_samples, cmap, selection='backbone')

This coarse-grained representation of the dynamics is more directly amenable to human interpretation. Nevertheless, as for the conventional MSM, we can still compute several interesting properties. We start with the stationary distribution which encodes the free energy of the states. This can be achieved by summing all the  contributions to a coarse-grained state $\mathcal{S}_i$:
$$
G_i = - k_b T \ln(\sum_{j\in \mathcal{S}_i} \pi_j)
$$

In [None]:
print('state\tπ\t\tG/kT')
for i, s in enumerate(msm.metastable_sets):
    p = msm.pi[s].sum()
    print('{}\t{:f}\t{:f}'.format(i, p, -np.log(p)))

Knowing PCCA metastable states, we can also extract mean first passage times (MFPTs) and transition rates between them. They can be simply printed...

In [None]:
from itertools import product

mfpt = np.zeros((nstates, nstates))
for i, j in product(range(nstates), repeat = 2):
    mfpt[i, j] = msm.mfpt(
        msm.metastable_sets[i],
        msm.metastable_sets[j])
        
rate = np.zeros_like(mfpt)
nz = mfpt.nonzero()
rate[nz] = 1.0 / mfpt[nz]

print('from/to\t|' + '_______'.join([str(n) for n in range(msm.n_metastable)]))
print('\n'.join(['{}\t|'.format(n) + '\t'.join(np.round(m, 1).astype(str)) for n, m in enumerate(mfpt)]))

... or displayed in a network plot. Here, we scale the arrows according to the transition rates and annotate them with MFPTs.

In [None]:
fig, ax = plt.subplots(figsize=(6, 6))
pyemma.plots.plot_network(
   rate,
   pos=np.asarray([[np.cos(x + 0.4), np.sin(x + 0.5)] for x in np.linspace(0, 2 * np.pi, msm.n_metastable, endpoint=False)]),
   figpadding=0.2,
   state_scale=0.3,
   arrow_label_format='%.1f ns',
   arrow_labels=mfpt,
   arrow_scale=0.5,
   size=8,
   ax=ax);

### Transition path theory
Further, the flux between metastable states can be computed and coarse grained as follows. We chose metastable states $4$ and $3$ as an example between which the flux is computed.

In [None]:
start, final = 1, 2
A = msm.metastable_sets[start]
B = msm.metastable_sets[final]
flux = pyemma.msm.tpt(msm, A, B)

cg, cgflux = flux.coarse_grain(msm.metastable_sets)

The committor as projected in the first two TICA dimensions can be displayed with a filled contour plot.

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))

pyemma.plots.plot_contour(
    *tica_concatenated.T[:2], flux.committor[np.concatenate(cluster.dtrajs)], cmap='brg', ax=ax,
    mask=True, method='nearest', cbar_label=r'committor {} $\to$ {}'.format(start, final))

ax.set_xlim(-2, 8)
ax.set_ylim(-2, 8)
fig.tight_layout()

We find that the committor is constant within the metastable sets defined above. Transition regions can be identified by committor values $\tilde{} 0.5$.

### Computing experimental observables

Having thoroughly constructed, validated and analysed our MSM above, we may want to take the next step and compare our model to experimental data. `PyEMMA` enables computation of stationary as well as dynamic experimental observables, below we give give some examples of this. We will make use of some external library functionality provided by `MDTraj`.

In [None]:
from joblib import Parallel, delayed
from mdtraj import shrake_rupley, compute_rg

#We compute a maximum likelihood MSM for comparison
mlmsm = pyemma.msm.estimate_markov_model(cluster.dtrajs, lag=5, dt_traj='0.1 ns')

We will pre-compute the experimental observables in the following cell. It make take a few minutes to complete. It can be sped up on your computers by increasing the `NJOB` variable to the number of cores you have available.

In [None]:
NJOBS = 4
markov_samples = [smpl
                  for smpl in msm.sample_by_state(20)]
_samples = Parallel(n_jobs = NJOBS)(delayed(pyemma.coordinates.save_traj)(files, smpl, outfile=None, top=pdb) for smpl in markov_samples )
markov_sasa_all = Parallel(n_jobs = NJOBS)(delayed(shrake_rupley)(sample, mode='residue')
                           for sample in _samples)
markov_rg_all = Parallel(n_jobs = NJOBS)(delayed(compute_rg)(sample)
                           for sample in _samples)

markov_average_trp_sasa = np.array(markov_sasa_all).mean(axis=1)[:, 0]
markov_average_rg = np.array(markov_rg_all).mean(axis=1)

### Radius of gyration
The radius of gyration $r^2_g(x)$ is a measure of the over-all dimensions of a particles in configuration, $x$. It is a quantity often extracted extracted from light-scattering experiments. In the context of proteins and nucleic acids these experiments often happen on bulk samples and the observables is therefore stationary and averaged by the Boltzmann distribution:
$$ 
 R_{g, \mathrm{obs}}^2 = \mathcal{Z}^{-1}\int_{\Omega} \mathrm{d}x\, r_g(x) \exp{(-\frac{E(x)}{kT})}.
$$
This value is also called the _expectation value of $r^2_g(x)$ with respect to the Boltzmann distribution_. Since we have access to the stationary distribution from the Markov model we can approximate this continuous integral by the sum:

$$
 R_{g, \mathrm{obs}}^2 \approx \sum_i \pi_i \mathbf{r^2_g}_i 
$$
with
$$ \mathbf{r^2_g}_i  = \frac{1}{\pi_i\mathcal{Z}} \int_{x\in S_i} \mathrm{d}x\, r^2_g(x)\exp(-\beta E(x)). $$
The last expression constitutes the average of the experimental observable ($r^2_g(x)$) confined to each of our Markov states individually.

Above we pre-computed the vector $\mathbf{r^2_g}=\{\mathbf{r^2_g}_i\}$ as `markov_average_rg`. Using this vector and our estimated Markov model we can compute the expectation value using the `expectation` method.

In [None]:
print("The average radius of gyration of penta-peptide is {:.3f} nm".format(msm.expectation(markov_average_rg)))

Since we have a Bayesian MSM estimated we can also compute the uncertainty in our prediction of the observable as standard deviations or confidence intervals:

In [None]:
print("The standard deviation of our prediction of the average radius of gyration of penta-peptide is {:.9f} nm".format(msm.sample_std('expectation', markov_average_rg)))
print("The {:d}% CI of our prediction of the average radius of gyration of penta-peptide have the bounds ({:.5f}, {:.5f})".format(int(msm.conf*100), *msm.sample_conf('expectation', markov_average_rg)))

So our model is very confident in the prediction of the radius of gyration. However, this does not guarantee it to be accurate -- that is, agree with experimental meaurements. If we lack quantitative agreement with experiments, we can estimate MSMs which optimally balance experimental data and simulation data using the Augmented Markov model (AMM) procedure. A dedicated AMM tutorial can be [found here](http://www.emma-project.org/latest/generated/augmented_markov_model_walkthrough.html).


### Trp-flourescene auto-correlation

Fluctuations in the tryptophan flourescence can be measured using spectroscopic techniques. These fluctuations depend on (among other things) on the solvent accessible surface area (SASA) of tryptophan residues. We will here use a third party library (MDTraj) to estimate the SASA using the Shrake-Rupley algorithm (completed above). Since we are not interested in inspecting the experimental observable as a function of time and since the computation can be expensive to perform for large data-sets we instead sample a representative set of configurations for each of our Markov states. Hint: A similar strategy can be used if expensive external software has to be used to compute the observables including ab initio calculations.


Let's compute an auto-correlation function of tryptophan flouresence:

In [None]:
eq_time_ml, eq_acf_ml = mlmsm.correlation(markov_average_trp_sasa, maxtime=15)

eq_time_bayes, eq_acf_bayes = msm.sample_mean(
    'correlation',
    np.array(markov_average_trp_sasa),
    maxtime=15)

eq_acf_bayes_ci_l, eq_acf_bayes_ci_u = msm.sample_conf(
    'correlation',
    np.array(markov_average_trp_sasa),
    maxtime=15)

fig, ax = plt.subplots(figsize=(5, 4))
ax.plot(eq_time_ml, eq_acf_ml, color='orange', marker='.', label='ML MSM')
ax.plot(
    eq_time_bayes, eq_acf_bayes, ls='--',marker='x',
    color='teal', label='Bayes sample mean')
ax.fill_between(
    eq_time_bayes, eq_acf_bayes_ci_l[1], eq_acf_bayes_ci_u[1],
    color='teal', alpha=0.2, lw=0)
ax.semilogx()

ax.set_xlim((eq_time_ml[1], eq_time_ml[-1]))
ax.set_xlabel(r'time / $\mathrm{ns}$')
ax.set_ylabel(r'Trp-1 SASA ACF / $\mathrm{nm}^4$')

ax.legend()
fig.tight_layout()

This amplitude is likely too small to be experimentally measurable if we consider experimental uncertainty. Using stopped flow, T-jump, P-jump or other similar experimental setups we can prepare our ensemble in a non-equilibrium initial condition. With `PyEMMA` we can simulate such a scenerio using the `relaxation` method.

In [None]:
eq_time_ml, eq_acf_ml = mlmsm.relaxation(
    msm.metastable_distributions[0],
    markov_average_trp_sasa,
    maxtime=15)

eq_time_bayes, eq_acf_bayes = msm.sample_mean(
    'relaxation',
    msm.metastable_distributions[0],
    np.array(markov_average_trp_sasa),
    maxtime=15)

eq_acf_bayes_CI_l, eq_acf_bayes_CI_u = msm.sample_conf(
    'relaxation', 
    msm.metastable_distributions[0],
    np.array(markov_average_trp_sasa),
    maxtime=15)

fig, ax = plt.subplots(figsize=(5, 4))
ax.semilogx(eq_time_ml, eq_acf_ml, color='orange', marker='^', label='ML MSM')
ax.plot(
    eq_time_bayes, eq_acf_bayes, ls='--',
    color='teal', label='Bayes sample mean')
ax.fill_between(
    eq_time_bayes, eq_acf_bayes_CI_l[1], eq_acf_bayes_CI_u[1],
    color='teal', alpha=0.2, lw=0)
ax.semilogx()

ax.set_xlim((eq_time_ml[1], eq_time_ml[-1]))
ax.set_xlabel(r'time / $\mathrm{ns}$')
ax.set_ylabel(r'Average Trp-1 SASA / $\mathrm{nm}^2$')

ax.legend()
fig.tight_layout()

In [None]:
from matplotlib.gridspec import GridSpec
from matplotlib.ticker import LogLocator
from matplotlib.cm import get_cmap

mpl.rcParams['axes.titlesize'] = 8
mpl.rcParams['axes.labelsize'] = 8
mpl.rcParams['font.size'] = 8
mpl.rcParams['legend.fontsize'] = 5
mpl.rcParams['xtick.labelsize'] = 5
mpl.rcParams['ytick.labelsize'] = 5
mpl.rcParams['xtick.minor.pad'] = 3
mpl.rcParams['xtick.major.pad'] = 3
mpl.rcParams['ytick.minor.pad'] = 3
mpl.rcParams['ytick.major.pad'] = 3
mpl.rcParams['axes.labelpad'] = 1
mpl.rcParams['lines.markersize'] = 4

In [None]:
fig = plt.figure(figsize=(3.47, 4))
gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))
gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))
gs = plt.GridSpec(gh, gw)
gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)

ax_box = fig.add_subplot(gs[:, :])
ax_box.set_axis_off()
ax_box.text(0.00, 0.97, '(a)', size=10)
ax_box.text(0.00, 0.55, '(b)', size=10)
ax_box.text(0.55, 0.97, '(c)', size=10)
ax_box.text(0.00, 0.30, '(d)', size=10)

ax_feat = fig.add_subplot(gs[200:1300, 400:1800])
ax_feat.bar(vamp_bars_plot['labels'], vamp_bars_plot['scores'], 0.5)
ax_feat.tick_params(axis='x', labelrotation=35)
ax_feat.set_ylabel('VAMP2 score')
ax_feat.set_title(r'lag time $\tau$={:.1f}ns'.format(vamp_bars_plot['lag'] * 0.1))

ax_density = fig.add_subplot(gs[400:1550, 2200:3350])
_, _, misc = pyemma.plots.plot_density(
    *tica_concatenated.T[:2],
    ax=ax_density,
    cax=fig.add_subplot(gs[300:350, 2200:3350]),
    cbar_orientation='horizontal',
    logscale=True)
misc['cbar'].set_ticks(LogLocator(base=10.0))
misc['cbar'].ax.xaxis.set_ticks_position('top')
misc['cbar'].ax.xaxis.set_label_position('top')
ax_density.set_xlabel('IC 1')
ax_density.set_ylabel('IC 2')

x = 0.1 * np.arange(tica_output[0].shape[0])
ax_tic1 = fig.add_subplot(gs[1850:2200, 400:3350])
ax_tic2 = fig.add_subplot(gs[2200:2550, 400:3350])

ax_tic1.plot(x, tica_output[0][:, 0])
ax_tic2.plot(x, tica_output[0][:, 1])
ax_tic1.set_ylabel('IC 1')
ax_tic2.set_ylabel('IC 2')
ax_tic2.set_xlabel('time / ns')

ax_its = fig.add_subplot(gs[2870:3670, 400:3350])
pyemma.plots.plot_implied_timescales(its, units='ns', dt=0.1, ax=ax_its, nits=4, ylog=False)
ax_its.set_ylim(1, 15)
ax_its.set_xlabel(r'lag time $\tau$ / ns')

fig.savefig('figure_1.pdf', dpi=300)

In [None]:
fig, axes = pyemma.plots.plot_cktest(cktest, figsize=[3.47, 3.47], dt=0.1, units='ns')
for ax in axes[-1, :]:
    ax.set_xlabel(r'$\tau$ / ns')
for ax in axes[:, 0]:
    ax.set_ylabel('prob.')
fig.savefig('figure_2.pdf', dpi=300)

In [None]:
fig = plt.figure(figsize=(3.47, 4))
gw = int(np.floor(0.5 + 1000 * fig.get_figwidth()))
gh = int(np.floor(0.5 + 1000 * fig.get_figheight()))
gs = plt.GridSpec(gh, gw)
gs.update(hspace=0.0, wspace=0.0, left=0.0, right=1.0, bottom=0.0, top=1.0)

ax_box = fig.add_subplot(gs[:, :])
ax_box.set_axis_off()
ax_box.text(0.00, 0.97, '(a)', size=10)
ax_box.text(0.52, 0.97, '(b)', size=10)
ax_box.text(0.00, 0.50, '(c)', size=10)
ax_box.text(0.52, 0.50, '(d)', size=10)

ax_fe = fig.add_subplot(gs[400:1750, 400:1750])
_, _, misc = pyemma.plots.plot_free_energy(
    *tica_concatenated.T[:2],
    weights=np.concatenate(msm.trajectory_weights()),
    ax=ax_fe,
    cax=fig.add_subplot(gs[300:350, 400:1750]),
    cbar_orientation='horizontal',
    legacy=False)
misc['cbar'].set_ticks(np.linspace(0, 8, 5))
misc['cbar'].ax.xaxis.set_ticks_position('top')
misc['cbar'].ax.xaxis.set_label_position('top')
ax_fe.set_ylabel('IC 2')
ax_fe.set_xticklabels([])

ax_state = fig.add_subplot(gs[400:1750, 2000:3350])
_, _, misc = pyemma.plots.plot_state_map(
    *tica_concatenated.T[:2],
    metastable_traj,
    ax=ax_state,
    cax=fig.add_subplot(gs[300:350, 2000:3350]),
    cbar_label='metastable state',
    cbar_orientation='horizontal')
misc['cbar'].ax.xaxis.set_ticks_position('top')
misc['cbar'].ax.xaxis.set_label_position('top')
ax_state.set_xticklabels([])
ax_state.set_yticklabels([])

evec_idx = 1
ax_eig = fig.add_subplot(gs[2300:3650, 400:1750], zorder=1)
_, _, misc = pyemma.plots.plot_contour(
    *tica_concatenated.T[:2],
    eigvec[dtrajs_concatenated, evec_idx],
    cmap='PiYG',
    ax=ax_eig,
    mask=True,
    cax=fig.add_subplot(gs[2200:2250, 400:1750]),
    cbar_label='{}. right eigenvector'.format(evec_idx + 1),
    cbar_orientation='horizontal')
misc['cbar'].set_ticks(np.linspace(*misc['cbar'].get_clim(), 3))
misc['cbar'].ax.xaxis.set_ticks_position('top')
misc['cbar'].ax.xaxis.set_label_position('top')
ax_eig.set_xlabel('IC 1')
ax_eig.set_ylabel('IC 2')

ax_flux = fig.add_subplot(gs[2300:3650, 2000:3350], zorder=1)
_, _, misc = pyemma.plots.plot_contour(
    *tica_concatenated.T[:2],
    flux.committor[np.concatenate(cluster.dtrajs)],
    cmap='brg',
    ax=ax_flux,
    mask=True,
    cax=fig.add_subplot(gs[2200:2250, 2000:3350]),
    cbar_label=r'committor {} $\to$ {}'.format(start, final),
    cbar_orientation='horizontal')
misc['cbar'].set_ticks(np.linspace(0, 1, 3))
misc['cbar'].set_ticklabels(['start', 'transition state', 'final'])
misc['cbar'].ax.xaxis.set_ticks_position('top')
misc['cbar'].ax.xaxis.set_label_position('top')
ax_flux.set_xlabel('IC 1')
ax_flux.set_yticklabels([])

fig.savefig('figure_3.pdf', dpi=300)