# Physicochemical properties

This notebook demonstrates some of the higher-level analysis tools from MDTraj that focus on the physicochemical properties of a macromolecule. These can be useful to understand (de)stabilizing factors. A complete overview of analysis functions in MDTraj can be found here: http://mdtraj.org/latest/analysis.html

In [None]:
import matplotlib.pyplot as plt
import mdtraj
import nglview
import numpy as np
import pandas

In [None]:
traj = mdtraj.load('traj.dcd', top='init.pdb')
traj.restrict_atoms(traj.topology.select("protein"))
df = pandas.read_csv("scalars.csv")

## 1. Hydrogen bonding

(Iternal) hydrogen bonds are one of the important factors that stabilize protein folds and also in other macromolecular systems, they often play an important role. We will use the [Wernet-Nillson criterion](http://mdtraj.org/latest/api/generated/mdtraj.wernet_nilsson.html) implemented in MDTraj to count the number of hydrogen bonds in each frame.

In [None]:
hbonds = mdtraj.wernet_nilsson(traj)
numhbonds = [len(pairs) for pairs in hbonds]
plt.close(1)  # This is needed to rerun the code cell correctly
fig, ax1 = plt.subplots(num=1)
ax1.plot(df["Time (ps)"], numhbonds, label="Num. HBonds")
ax1.set_xlabel("Time [ps]")
ax1.set_ylabel("Number of hydrogen bonds", color="C0")
ax2 = ax1.twinx()
ax2.plot(df["Time (ps)"], df["Potential Energy (kJ/mole)"],
         color="C1", label="Pot. Energy")
ax2.set_ylabel("Potential energy [kJ/mol]", color="C1")

The potential energy finds its equilibrium value much twice as fast compared to the number of hydrogen bonds. This shows (again) that one should not just trust a single property to determine the equilibration phase. Most of the increase of the potential energy is simply the thermal activation of the solvent, which is not necessarily representative for the solute.

## 2. Secondary structure assignment

The secondary structure assignment of every residue (at each time step) can be obtained with the [compute_dssp](http://mdtraj.org/latest/api/generated/mdtraj.compute_dssp.html) function of MDTraj. For small proteins like the Villin headpiece, the results can be plotted conveniently with Matplotlib.

In [None]:
dssp = mdtraj.compute_dssp(traj)
# Convert string characters to numbers
dssp_num = np.vectorize(lambda char: "HEC".index(char))(dssp)
plt.close(2)  # This is needed to rerun the code cell correctly
fig, ax = plt.subplots(num=2)
im = ax.pcolormesh(df["Time (ps)"], np.arange(
    dssp.shape[1]), dssp_num.T, shading="nearest")
cbar = fig.colorbar(im, values=[0, 1, 2], ticks=[0, 1, 2])
cbar.ax.set_yticklabels(['H', 'E', 'C'])
ax.set_xlabel("Time [ps]")
ax.set_ylabel("Residue index")

This example already shows nicely the presence of the three alpha helices and how they change somewhat over time.

**<span style="color:#A03;font-size:14pt">
&#x270B; HANDS-ON! &#x1F528;
</span>**

> Modify the above example to plot the 8-level DSSP assignment, instead of the simplified 3-level assignment.

## 3. Radial distribution function

The coordination of a functional group by another chemical moiety can be characterized with a [radial distribution function](https://en.wikipedia.org/wiki/Radial_distribution_function) (RDF). In general, a radial distribution function is a ratio of two probability densities as function of an inter-atomic distance:

- In the numerator, one puts the empirically observed probability density for finding a given pair of atoms.
- The denominator contains the probability in case the two atoms would be uniformly distributed over space.

In case of dense liquids, the RDF converges to one for large distances because there is no long-range structure in a normal liquid. In case of a single isolated system (like a protein), the RDF converges to zero because all empirical inter-atomic distances are bound by the size of the molecule.

We can use [compute_rdf](http://mdtraj.org/latest/api/generated/mdtraj.compute_rdf.html) in combination with [select_pairs](http://mdtraj.org/latest/api/generated/mdtraj.Topology.html#mdtraj.Topology.select_pairs) to easily generate RDFs.

In the example below, we compute the RDF all amine nitrogens with amide oxygens.

In [None]:
r, gr = mdtraj.compute_rdf(
    traj,
    traj.topology.select_pairs("protein and name N", "protein and name O"),
    r_range=(0.1, 0.5),
    bin_width=0.0005)
plt.close(3)  # This is needed to rerun the code cell correctly
fig, ax = plt.subplots(num=3)
ax.plot(r, gr)
ax.set_xlabel("Radius [nm]")
ax.set_ylabel("Radial distribution function [gr]")

The peak at approx $0.225 \textrm{nm}$ is due to internal hydrogen bonds, mostly in alpha helices. Subsequent peaks are likely caused by amide groups in the alpha helices forming cooperative hydrogen bonds.

## 4. Solvent accessible surface area

The Shrake-Rupley method can be used to measure the solvent-accessible surface area of each residue, showing which are exposed to the solvent water molecules. For this, we use the [shrake_rupley](http://mdtraj.org/latest/api/generated/mdtraj.shrake_rupley.html) function from MDTraj.

In [None]:
sasa = mdtraj.shrake_rupley(traj[::10], mode="residue")
plt.close(4)  # This is needed to rerun the code cell correctly
fig, ax = plt.subplots(num=4)
im = ax.pcolormesh(df["Time (ps)"][::10], np.arange(
    dssp.shape[1]), sasa.T, shading="nearest")
cbar = fig.colorbar(im)
cbar.set_label("Solvent accessible surface areas [nm^2]")
ax.set_xlabel("Time [ps]")
ax.set_ylabel("Residue index")