# Analysing alanine dipeptide with TRAM
As an excercise, find the potential of mean force (PMF) with respect to the torsion angles of alanine dipeptide.

Alanine dipeptide is a small peptide which is often used as a model system. It consists of 21 atoms, and we are interested in two backbone torsion angles $\phi$ and $\psi$.

![Alanine dipeptide](img/alanine.png)
(image source: https://www.cp2k.org/)

We want to know how alanine dipeptide is structured, specifically, what combinations of these two torsion angles are energetically favourable, and which are unfavourable.

To do this, simulations have been performed at 21 different temperatures between 300K and 500K. Each simulation corresponds to one thermodynamic state, and 10000 samples were taken during each simulation (energies and torsion angles have been stored).

Use TRAM to combine the data from these different simulations, and estimate the free energy of each state. Then use those free energies to estimate the free energy surface as a function of the two torsion angles.

## Input data
The temperatures of the different simulations (i.e. replica's, i.e. thermodynamic states) are given, as well as some useful imports and constants:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from deeptime.clustering import KMeans
from deeptime.markov.msm import TRAMDataset, TRAM
import mdshare


N_REPLICAS = 10 # total number of temperature replicas (=simulations)
SAMPLES_PER_T = 10000 # number of samples that were taken per simulation

temperatures = np.arange(300, 501, N_REPLICAS) # the temperatures of each simulation

# kBT in kJ/mol
kB_kJ = 0.00831446261815324  

# kBT in kcal/mol
kB_kcal = 0.0019872042586408316

The input data consists of energies and angles. These are loaded into lists, each lists is of length `N_REPLICAS`. The `i`-th element in each list contains the data for the temperature at index `i`. In other words:

* `angles[i][n]` is of shape `(2)` and contains angles $\phi$ and $\psi$ of the `n`-th sample taken in simulation `i` (i.e. at temperature `i`), in degrees.

* `energies[i][n]` is the potential energy belonging to that same sample, in kcal/mol. 

In [None]:
angles_file_name = mdshare.fetch('alanine_dipeptide_parallel_tempering_dihedrals.npz', working_directory='data')
energies_file_name = mdshare.fetch('alanine_dipeptide_parallel_tempering_energies.npz', working_directory='data')

angles = []
energies = []

for T in temperatures:
    angles.append(np.load(angles_file_name)[f't{T}'])
    energies.append(np.load(energies_file_name)[f't{T}'])
    
print(f"angles    -    length: {len(angles)},  shape: {angles[0].shape}")
print(f"energies  -    length: {len(energies)},  shape: {energies[0].shape}")

## Construct the bias matrix
The energies are used to fill the bias matrix. For each sample, the bias needs to be computed in each thermodynamic state. In other words: for each sample, compute the bias energy $b^k(x) = U^k(x) - U^0(x)$ for every thermodynamic state $k$. 

First compute the inverse temperature, $\beta$ for each thermodynamic state. Note: the energies are stored in kcal/mol, but the bias energies will need to be non-dimensional! Choose $\beta$ accordingly. See section 0 for some useful constants.

In [None]:
betas =

In [None]:
betas = kB_kJ * temperatures.astype(float)**(-1)

Now compute the bias matrices and add them to the list. You should obtain a list of bias matrices of length `N_REPLICAS`, with each bias matrix of shape `(SAMPLES_PER_T, N_REPLICAS)`

In [None]:
bias_matrices = []

In [None]:
bias_matrices = []

for k, T in enumerate(temperatures):
    # apply the bias factors to the potential energies to produce bias energies
    bias_matrices.append((betas - betas[0]) * energies[k][:, None])

## Discretize the trajectories
The torsion angles $\phi$ and $\psi$ need to be transformed into discrete trajectories from which the transition counts are computed.

Discretize the angles into Markov states using an appropriate clustering method (for example Kmeans++: https://deeptime-ml.github.io/latest/notebooks/clustering.html#k-means++-initialization).

In [None]:
estimator =

In [None]:
estimator = KMeans(
    n_clusters=20, # we will cluster data to 20 Markov states
    init_strategy='kmeans++',
    max_iter=10,
    fixed_seed=13,
    n_jobs=8
)

Use the estimator to obtain a clustering mode.

In [None]:
clustering =

In [None]:
clustering = estimator.fit(angles).fetch_model()

Now compute the dtrajs by applying the clustering transformation.

In [None]:
dtrajs = 

In [None]:
dtrajs = []

for A in angles:
    dtrajs.append(np.asarray(clustering.transform(A)))

## Analyse the data with TRAM
Now use TRAM to estimate the free energies. First construct a TRAMDataset, and use this to restrict the data to the largest connected set.

In [None]:
dataset = 

In [None]:
dataset = TRAMDataset(dtrajs, bias_matrices, lagtime=10)
dataset.restrict_to_largest_connected_set(connectivity='BAR_variance')

Now create the TRAM estimator and fit the model.

Convergence can take some time (you will need at least a few 100 iterations). Use the `MBAR` initialization strategy to speed up the initial convergence, and pass a tqdm progress bar to the TRAM object to visualize the progress.

It may help to run only a few TRAM iterations first, and plot the `TRAMModel.therm_state_energies` as a sanity check, and once everything behaves as you would expect, run TRAM until convergence. 
(The `therm_state_energies` are the free energies of the thermodynamic states. Convince yourself that they behave as expected, given the increasing temperatures).

In [None]:
tram_estimator =
model = 

In [None]:
tram_estimator = TRAM(lagtime=10, maxiter=5000, progress=tqdm, maxerr=1e-8,  
                      init_strategy="MBAR", init_maxerr=1e-10, init_maxiter=1000)
model = tram_estimator.fit_fetch(dataset)

plt.plot(model.therm_state_energies)

## Recover the PMF
Recover the free energy surface as a function of the torsion angles. For this, you will need to discretize the angles into a one-dimensional set of bins over the space (-180, 180). Choose a number of bins and use numpy's digitize to discretize each angle.

In [None]:
n_bins = 20
bins = np.linspace(-180, 180, n_bins, endpoint=True)
binned_angles =

In [None]:
n_bins = 20
bins = np.linspace(-180, 180, n_bins, endpoint=True)
binned_angles = np.digitize(angles, np.linspace(-180, 180, n_bins, endpoint=False), right=False) - 1

Turn the 2-dimensional angle indices into a 1-dimensional index.

In [None]:
binned_trajectories =

In [None]:
binned_trajectories = n_bins * binned_angles[:, :, 0] + binned_angles[:, :, 1]

Use the `compute_PMF` method of `TRAMModel` to compute the PMF over the bins. Since we are interested in free energy differences, shift the PMF so that the minimum is at 0.

In [None]:
pmf =

In [None]:
pmf = model.compute_PMF(dtrajs, bias_matrices, binned_trajectories) * kB_kcal * 300
pmf -= pmf.min()

The plot of the free energy surface with on the x- and y-axes torsion angles is called a Ramachandran plot. Make such a plot for alanine dipeptide, showing the energy surface in kcal/mol at T=300K (recall that TRAM works with unitless quantities). You can use matplotlibs `contourf` for visualization, and numpy's `meshgrid` to construct 2D coordinates from the bins.

* Have you recovered the meta-stable states?
* Can you identify the transition path between the different states?
* What are the free energy differences?

In [None]:
XS, YS = np.meshgrid(bins, bins)
im = plt.contourf(XS, YS, np.reshape(pmf, [n_bins, n_bins]).T, cmap='jet', levels=50)
plt.colorbar(im);

What else?
* the underlying Markov Models of the states that you clustered the data in, are stored in `model.msm_collection`. Use these to analyse kinetic properties
* What about the lagtime dependence of the model?