# TrpCage Analysis

First we need to download the files from colab:


In [14]:
import sys
sys.path.append("..")
import helpers
helpers.set_style()

In [35]:
# copy url from colab
!curl -o archive.zip https://transfer.sh/RhNEo8/archive.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1371k  100  147k    0     0   160k      0  0:00:08 --:--:--  0:00:08  162k 1371k    0     0  1422k      0 --:--:-- --:--:-- --:--:-- 1439k


In [36]:
# unzip
!unzip archive.zip

Archive:  archive.zip
  inflating: trp_cage_gb.dcd         
  inflating: trp_cage_heating.dcd    
  inflating: trp_cage_linear.nc      
  inflating: trp_cage_linear.pdb     
  inflating: trp_cage.prmtop         
  inflating: production.csv          
  inflating: heating.csv             


In [37]:
import numpy as np
import pytraj as pt
import nglview as nv

import matplotlib.pyplot as plt

from matplotlib.ticker import MaxNLocator
from scipy import stats


## Lets load the trajectory files

In [38]:
traj = pt.load('trp_cage_gb.dcd', top='trp_cage.prmtop' )

Let's visualize our trajectory to get a first idea what happend over the course of 20 ns to our linear peptide. 

In [39]:
view = nv.show_pytraj(traj)
view.clear_representations()
view.add_representation('licorice', selection='not hydrogen')
view.add_representation('cartoon')
view.add_representation('contact')
view.center()
view

NGLWidget(max_frame=199)

he conference. The videos will be edited before posting, we will send announcements about that. Just stay tuned and send us an email if you want to avoid publishing your talk.
<div class="exercise admonition" name="6ex3" style="padding: 10px">
<p class="title">Exercise 3</p>
What type of structure is the folded Trp-cage miniprotein? List the main components contributing to this structure, including the residues which are responsible for their formation.
</div>

We will also load an experimental NMR structure from an experimental NMR ensemble.

```{note} Experimental structures Quick overview
:class: dropdown
Experimental structures can be obtained using three main methods:
**X-Ray diffraction**
Freeze the proteins and shoot x-rays at it. The diffraction pattern allows to reconstruct the electron density map in which we can fit a protein model.
The models are usually very accurate, may have crystal artefacts due to packing and low temperature and can be obtained for proteins from large to small. Some proteins are difficult to crystallize. 
**NMR**
In solution structure using nuclear magnetic resonance (usually carbon and hydrogen) using complicated pulse sequences. Allows to resolve dynamical properties of the protein but normally is limited to smaller proteins. 
**Cryo EM**
The new kid on the block of structural biology. Works very well especially for large proteins, models are fit into a reconstructed map of the protein. Resolution tends to be lower than using X-Ray crystallography but sample preparation is much easier.
```

## Align trajectory

Aligning a trajectory is an important step when analysing molecular dynamics simulations. We will choose a reference structure/frame and project each frame to the reference (e.g by minimizing RMSD to reference). 

In this case we align on the backbone nitrogen and carbon atoms. 

As a reference we use one structure from the experimental NMR ensemble (PDB `1l2y`).

In [40]:
ref = pt.load('1L2Y.pdb')

In [42]:
aligned_traj = pt.align(traj, ref=ref, mask='@N,CA,C')

We can also compute the RMSD to the reference structure using a differnt function. 

In [43]:
rmsd_to_nmr = pt.rmsd(aligned_traj, ref=ref, mask='@N,CA,C')

Next, we plot this RMSD.

In [None]:
fig, ax = plt.subplots(1)

times = np.linspace(0,20, num=traj.n_frames)

ax.set_title('RMSD to NMR structure')
ax.plot(times,rmsd_to_nmr)
ax.set_xlabel('time [ns]')
ax.set_ylabel('RMSD [$\AA$]')
plt.show()

an alternative is the RMSF

In [45]:
# we compute RMSF for the last 10 ns of the trajectory
rmsf = pt.rmsf(aligned_traj[100:], mask="@CA")

In [46]:
resids = np.linspace(1, rmsf.shape[0], num=20, dtype=int)

In [None]:
fig, ax = plt.subplots(1)

ax.set_title('RMSF')
ax.plot( resids, rmsf[:,1])
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
ax.set_xlim(0,20)
ax.set_xlabel('residue')
ax.set_ylabel('RMSF [$\AA$]')
plt.show()

<div class="exercise admonition" name="6ex5" style="padding: 10px">
<p class="title">Exercise 4</p>
Explain the RMSD and RMSF plots.  Does the trajectory reach the same conformation as the experimental structure?
Which metric is more useful for the problem at hand? <b>Bonus:</b>  Provide a use case for the other metric. 
</div>

## Visualize trajectory

Now let's compare the trajectory to one of the structures from the NMR structural ensemble to see how our linear Trp cage folded. 

In [59]:
nmr_structure =  nv.PyTrajTrajectory(pt.load('1L2Y.pdb'))

In [60]:
view = nv.show_pytraj(aligned_traj)
view.clear_representations()
view.add_structure(nmr_structure)
view.add_representation('licorice', selection='not hydrogen')
view.add_representation('cartoon')
view.add_representation('contact')

view.center()

view

NGLWidget(max_frame=199)

## Emergence of secondary structure
### Hbonds

We know that hydrogen bonds are very important for the formation of secondary structure elements. Let's count the number of hbonds per frame and bin them in 1 ns bins by taking the mean of 10 frames. 

In [50]:
hbond_data = pt.hbond(aligned_traj)

In [51]:
n_hbonds_per_frame = hbond_data.total_solute_hbonds()


In [52]:
# Bin every nanosecond (i.e. 10 snapshots)
times = np.linspace(0,20, num=traj.n_frames)
statistic, bin_edges, binnumber = stats.binned_statistic(times, n_hbonds_per_frame, bins=20)

In [None]:
fig, ax = plt.subplots(1)

times = np.linspace(0,20, num=traj.n_frames)

ax.set_title('Average number of Hbonds')
ax.plot(statistic)
ax.set_xlabel('time [ns]')
ax.set_ylabel('n hbonds')
plt.show()

<div class="exercise admonition" name="6ex6" style="padding: 10px">
<p class="title">Exercise 5</p>
 Include the hbond graph in your report, and explain the observed trend with reference to the structural components of the Trp-cage miniprotein ?
</div>

<div class="exercise admonition" name="6ex4" style="padding: 10px">
<p class="title">Exercise 6</p>
Monitor an individual hydrogen bond involved in a secondary structure after checking the trajectory, and provide the graph of the hydrogen bond length over time. Can you infer at which interval (in nanoseconds) the secondary structure forms?
</div>

A: A hydrogen bond in the alpha helix is a good choice. In this case after around 5ns the alpha helix is formed

In [None]:
dist = pt.distance(traj, 'CHOOSE ATOMS HERE') #use :resid@atomname format e.g :4@O 

In [None]:
fig, ax = plt.subplots(1)

times = np.linspace(0,20, num=traj.n_frames)

ax.set_title('H-Bond in alpha helix')
ax.plot(times,dist)
ax.set_xlabel('time [ns]')
ax.set_ylabel(r'distance [$\AA$]')
plt.show()

### Dihedral

The central tryptophan in the Trp cage protein is located in the alpha helix. Here we track the dihedral angle along our simulation. When the protein forms ordered secondary structure elements we expect to see the dihedral only fluctuate a little. 

In [None]:
# look at dihedral angle of Trp6 in the alpha helix
df_trp6_dihedrals = pt.multidihedral(aligned_traj, resrange='6', dtype='dataframe')
df_trp6_dihedrals.head(6)

In [None]:
fig, ax = plt.subplots(1)

times = np.linspace(0,20, num=traj.n_frames)

binned_dihedral, bin_edges, binnumber = stats.binned_statistic(times, df_trp6_dihedrals['phi_6'], bins=30)

ax.set_title('Trp6 $\phi$ dihedral')
ax.plot(binned_dihedral)
ax.set_xlabel('time [ns]')
ax.set_ylabel('dihedral $[degree]$')
plt.show()

<div class="exercise admonition" name="6ex7" style="padding: 10px">
<p class="title">Exercise 7</p>
Why is it useful to constrain bond lengths for larger MD simulations (typically with the SHAKE algorithm)? Which bonds would you typically constrain in such a scenario, and why?
</div>

<div class="exercise admonition" name="6bex1" style="padding: 10px">
<p class="title">Bonus exercise 8</p>
Which properties do you need to take into account in order to select an appropriate timestep for your MD simulation? Are there any other reasons you might wish to reduce or increase this timestep?
</div>

<div class="exercise admonition" name="6bex2" style="padding: 10px">
<p class="title">Bonus exercise 9</p>
Which advantage/disadvantage does implicit solvation have? Will it influence the folding kinetics during your simulation? 
</div>

<div class="exercise admonition" name="6bex3" style="padding: 10px">
<p class="title">Bonus exercise 10</p>
Is it better to sample 2 x 10 ns from the same starting structure or 1 x 20 ns in order to explore conformational space efficiently? 
</div>