# Day 2 â€“ Tutorial 03:Â Coupling AF2 and GROMACS for protein structure prediction and MD simulations

### Overview

In this hands-on tutorial, we will first employ **ColabFold**, developed by Milot Mirdita, Sergey Ovchinnikov and Martin Steinegger to predict the structure of a given protein on **Google Colab**. ColabFold is, in short, an implementation of **AlphaFold2** using **MMseqs2** and **HHsearch**, enabling ultra fast search and cluster huge protein and nucleotide sequence sets, which is key for accelerating the generation of the multiple sequence alignment (**MSA**) for protein structure prediction using AlphaFold2 on Google Colab.

We highly recommend that you check out the [ColabFold GitHub](https://github.com/sokrypton/ColabFold) and read their manuscript for more information:

- Mirdita M, SchÃ¼tze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: Making protein folding accessible to all. [*Nature Methods*, 2022, 19, 6, 679-682](https://www.nature.com/articles/s41592-022-01488-1) 

Once the protein structure is predicted, we will install **GROMACS**, an MD simulation package, which we will use to set-up and perform MD simulations on our predicted protein. We will visualize our protein structure using **py3Dmol**, while the simulation trajectories from our MD production runs will be visualized in a web version of **NGLview**.

For more detailed information on other analysis of the motions of our protein, we highly recommend to check the [GROMACS tutorials](http://www.mdtutorials.com/gmx/) developed by Justin Lemkul and our implementation of some of these analyses on Google Colab in our [Molecular Modeling and Simulation GitHub](https://github.com/pb3lab/ibm3202), as well as our article:

- Engelberger F, Galaz-Davison P, Bravo G, Rivera M, Ramirez-Sarmiento CA. Developing and Implementing Cloud-Based Tutorials That Combine Bioinformatics Software, Interactive Coding, and Visualization Exercises for Distance Learning on Structural Bioinformatics. [*J Chem Educ*, 2021, 98, 5, 1801â€“1807](https://pubs.acs.org/doi/full/10.1021/acs.jchemed.1c00022)


## Part I. Protein Structure Prediction on ColabFold

We will run Colab on a readily defined sequence using the default parameters (except for the recycling, which is now set to 1). For continuity with the tutorials, we will model the structure of a potential ancestral sequence in the node where the human FoxP and FoxO proteins split.

ðŸ˜±**EMERGENCY BACKUP**: "But I did not save that sequence!". Do not worry, we got you:


```
>AncOP
RPPFSYASLITQAIEESPEKRLTLNEIYNWIMRNFPYFRDKGDSNNAAGWKNSIRHNLSLHKCFVKVPREHDESTGKGSFWTID
```

Therefore, the instructions for this tutorial are:

1. Paste your protein sequence in the input field.
2. Run ColabFold by clicking on the "Play" button on the left of each cell.
3. The pipeline consists of 5 steps. The currently running step is indicated by a circle with a stop sign next to it.

Once the final protein structure is predicted, the results will be visually presented on ColabFold and a ZIP file will be generated, which contains the following information:

1. PDB formatted structures sorted by avg. pLDDT and complexes are sorted by pTMscore (unrelaxed and relaxed if `use_amber` is enabled).
2. Plots of the model quality.
3. Plots of the MSA coverage.
4. Parameter log file.
5. A3M formatted input MSA.
6. A `predicted_aligned_error_v1.json` using [AlphaFold-DB's format](https://alphafold.ebi.ac.uk/faq#faq-7) and a `scores.json` for each model which contains an array (list of lists) for PAE, a list with the average pLDDT and the pTMscore.
7. BibTeX file with citations for all used tools and databases.

At the end of the job a download modal box will pop up with a `jobname.result.zip` file. Additionally, if the `save_to_google_drive` option was selected, the `jobname.result.zip` will be uploaded to your Google Drive.

For the description of additional options (prediction of protein complexes, custom MSAs, custom templates, etc) please check out the latest version of [ColabFold](http://colabfold.com).


In [None]:
#@title 1. Input protein sequence(s), then hit `Runtime` -> `Run all`
from google.colab import files
import os.path
import re
import hashlib
import random

def add_hash(x,y):
  return x+"_"+hashlib.sha1(y.encode()).hexdigest()[:5]

query_sequence = 'THISISNOTAPROTEINSEQUENCE' #@param {type:"string"}
#@markdown  - Use `:` to specify inter-protein chainbreaks for **modeling complexes** (supports homo- and hetro-oligomers). For example **PI...SK:PI...SK** for a homodimer

# remove whitespaces
query_sequence = "".join(query_sequence.split())

jobname = 'test' #@param {type:"string"}
# remove whitespaces
basejobname = "".join(jobname.split())
basejobname = re.sub(r'\W+', '', basejobname)
jobname = add_hash(basejobname, query_sequence)
while os.path.isfile(f"{jobname}.csv"):
  jobname = add_hash(basejobname, ''.join(random.sample(query_sequence,len(query_sequence))))

with open(f"{jobname}.csv", "w") as text_file:
    text_file.write(f"id,sequence\n{jobname},{query_sequence}")

queries_path=f"{jobname}.csv"

# number of models to use
use_amber = False #@param {type:"boolean"}
template_mode = "none" #@param ["none", "pdb70","custom"]
#@markdown - "none" = no template information is used, "pdb70" = detect templates in pdb70, "custom" - upload and search own templates (PDB or mmCIF format, see [notes below](#custom_templates))

if template_mode == "pdb70":
  use_templates = True
  custom_template_path = None
elif template_mode == "custom":
  custom_template_path = f"{jobname}_template"
  os.mkdir(custom_template_path)
  uploaded = files.upload()
  use_templates = True
  for fn in uploaded.keys():
    os.rename(fn, f"{jobname}_template/{fn}")
else:
  custom_template_path = None
  use_templates = False


In [None]:
#@markdown ### 2. MSA options (custom MSA upload, single sequence, pairing mode)
msa_mode = "MMseqs2 (UniRef+Environmental)" #@param ["MMseqs2 (UniRef+Environmental)", "MMseqs2 (UniRef only)","single_sequence","custom"]
pair_mode = "unpaired+paired" #@param ["unpaired+paired","paired","unpaired"] {type:"string"}
#@markdown - "unpaired+paired" = pair sequences from same species + unpaired MSA, "unpaired" = seperate MSA for each chain, "paired" - only use paired sequences.

# decide which a3m to use
if msa_mode.startswith("MMseqs2"):
  a3m_file = f"{jobname}.a3m"
elif msa_mode == "custom":
  a3m_file = f"{jobname}.custom.a3m"
  if not os.path.isfile(a3m_file):
    custom_msa_dict = files.upload()
    custom_msa = list(custom_msa_dict.keys())[0]
    header = 0
    import fileinput
    for line in fileinput.FileInput(custom_msa,inplace=1):
      if line.startswith(">"):
         header = header + 1
      if not line.rstrip():
        continue
      if line.startswith(">") == False and header == 1:
         query_sequence = line.rstrip()
      print(line, end='')

    os.rename(custom_msa, a3m_file)
    queries_path=a3m_file
    print(f"moving {custom_msa} to {a3m_file}")
else:
  a3m_file = f"{jobname}.single_sequence.a3m"
  with open(a3m_file, "w") as text_file:
    text_file.write(">1\n%s" % query_sequence)

In [None]:
#@markdown ### 3. Advanced settings
model_type = "auto" #@param ["auto", "AlphaFold2-ptm", "AlphaFold2-multimer-v1", "AlphaFold2-multimer-v2"]
#@markdown - "auto" = protein structure prediction using "AlphaFold2-ptm" and complex prediction "AlphaFold-multimer-v2". For complexes "AlphaFold-multimer-v[1,2]" and "AlphaFold-ptm" can be used.
num_recycles = 1 #@param [1,3,6,12,24,48] {type:"raw"}
save_to_google_drive = False #@param {type:"boolean"}

#@markdown -  if the save_to_google_drive option was selected, the result zip will be uploaded to your Google Drive
dpi = 200 #@param {type:"integer"}
#@markdown - set dpi for image resolution

#@markdown Don't forget to hit `Runtime` -> `Run all` after updating the form.


if save_to_google_drive:
  from pydrive.drive import GoogleDrive
  from pydrive.auth import GoogleAuth
  from google.colab import auth
  from oauth2client.client import GoogleCredentials
  auth.authenticate_user()
  gauth = GoogleAuth()
  gauth.credentials = GoogleCredentials.get_application_default()
  drive = GoogleDrive(gauth)
  print("You are logged into Google Drive and are good to go!")

In [None]:
#@title 4. Install AF2 dependencies
%%bash -s $use_amber $use_templates

set -e

USE_AMBER=$1
USE_TEMPLATES=$2

if [ ! -f COLABFOLD_READY ]; then
  # install dependencies
  # We have to use "--no-warn-conflicts" because colab already has a lot preinstalled with requirements different to ours
  pip install -q --no-warn-conflicts "colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold"
  # high risk high gain
  pip install -q "jax[cuda11_cudnn805]>=0.3.8,<0.4" -f https://storage.googleapis.com/jax-releases/jax_releases.html
  touch COLABFOLD_READY
fi

# setup conda
if [ ${USE_AMBER} == "True" ] || [ ${USE_TEMPLATES} == "True" ]; then
  if [ ! -f CONDA_READY ]; then
    wget -qnc https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash Miniconda3-latest-Linux-x86_64.sh -bfp /usr/local 2>&1 1>/dev/null
    rm Miniconda3-latest-Linux-x86_64.sh
    touch CONDA_READY
  fi
fi
# setup template search
if [ ${USE_TEMPLATES} == "True" ] && [ ! -f HH_READY ]; then
  conda install -y -q -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 python=3.7 2>&1 1>/dev/null
  touch HH_READY
fi
# setup openmm for amber refinement
if [ ${USE_AMBER} == "True" ] && [ ! -f AMBER_READY ]; then
  conda install -y -q -c conda-forge openmm=7.5.1 python=3.7 pdbfixer 2>&1 1>/dev/null
  touch AMBER_READY
fi

In [None]:
#@title 5. Run Prediction

import sys

from colabfold.download import download_alphafold_params, default_data_dir
from colabfold.utils import setup_logging
from colabfold.batch import get_queries, run, set_model_type
K80_chk = !nvidia-smi | grep "Tesla K80" | wc -l
if "1" in K80_chk:
  print("WARNING: found GPU Tesla K80: limited to total length < 1000")
  if "TF_FORCE_UNIFIED_MEMORY" in os.environ:
    del os.environ["TF_FORCE_UNIFIED_MEMORY"]
  if "XLA_PYTHON_CLIENT_MEM_FRACTION" in os.environ:
    del os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]

from colabfold.colabfold import plot_protein
from pathlib import Path
import matplotlib.pyplot as plt


# For some reason we need that to get pdbfixer to import
if use_amber and '/usr/local/lib/python3.7/site-packages/' not in sys.path:
    sys.path.insert(0, '/usr/local/lib/python3.7/site-packages/')

def prediction_callback(unrelaxed_protein, length, prediction_result, input_features, type):
  fig = plot_protein(unrelaxed_protein, Ls=length, dpi=150)
  plt.show()
  plt.close()

result_dir="."
setup_logging(Path(".").joinpath("log.txt"))
queries, is_complex = get_queries(queries_path)
model_type = set_model_type(is_complex, model_type)
download_alphafold_params(model_type, Path("."))
run(
    queries=queries,
    result_dir=result_dir,
    use_templates=use_templates,
    custom_template_path=custom_template_path,
    use_amber=use_amber,
    msa_mode=msa_mode,    
    model_type=model_type,
    num_models=5,
    num_recycles=num_recycles,
    model_order=[1, 2, 3, 4, 5],
    is_complex=is_complex,
    data_dir=Path("."),
    keep_existing_results=False,
    recompile_padding=1.0,
    rank_by="auto",
    pair_mode=pair_mode,
    stop_at_score=float(100),
    prediction_callback=prediction_callback,
    dpi=dpi
)

In [None]:
#@title 6. Display 3D structure {run: "auto"}
import py3Dmol
import glob
import matplotlib.pyplot as plt
from colabfold.colabfold import plot_plddt_legend
rank_num = 1 #@param ["1", "2", "3", "4", "5"] {type:"raw"}
color = "lDDT" #@param ["chain", "lDDT", "rainbow"]
show_sidechains = False #@param {type:"boolean"}
show_mainchains = False #@param {type:"boolean"}

jobname_prefix = ".custom" if msa_mode == "custom" else ""
if use_amber:
  pdb_filename = f"{jobname}{jobname_prefix}_relaxed_rank_{rank_num}_model_*.pdb"
else:
  pdb_filename = f"{jobname}{jobname_prefix}_unrelaxed_rank_{rank_num}_model_*.pdb"

pdb_file = glob.glob(pdb_filename)

def show_pdb(rank_num=1, show_sidechains=False, show_mainchains=False, color="lDDT"):
  model_name = f"rank_{rank_num}"
  view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)
  view.addModel(open(pdb_file[0],'r').read(),'pdb')

  if color == "lDDT":
    view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':50,'max':90}}})
  elif color == "rainbow":
    view.setStyle({'cartoon': {'color':'spectrum'}})
  elif color == "chain":
    chains = len(queries[0][1]) + 1 if is_complex else 1
    for n,chain,color in zip(range(chains),list("ABCDEFGH"),
                     ["lime","cyan","magenta","yellow","salmon","white","blue","orange"]):
      view.setStyle({'chain':chain},{'cartoon': {'color':color}})
  if show_sidechains:
    BB = ['C','O','N']
    view.addStyle({'and':[{'resn':["GLY","PRO"],'invert':True},{'atom':BB,'invert':True}]},
                        {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
    view.addStyle({'and':[{'resn':"GLY"},{'atom':'CA'}]},
                        {'sphere':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
    view.addStyle({'and':[{'resn':"PRO"},{'atom':['C','O'],'invert':True}]},
                        {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})  
  if show_mainchains:
    BB = ['C','O','N','CA']
    view.addStyle({'atom':BB},{'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})

  view.zoomTo()
  return view


show_pdb(rank_num,show_sidechains, show_mainchains, color).show()
if color == "lDDT":
  plot_plddt_legend().show() 

In [None]:
#@title 7. Generate plots {run: "auto"}
from IPython.display import display, HTML
import base64
from html import escape

# see: https://stackoverflow.com/a/53688522
def image_to_data_url(filename):
  ext = filename.split('.')[-1]
  prefix = f'data:image/{ext};base64,'
  with open(filename, 'rb') as f:
    img = f.read()
  return prefix + base64.b64encode(img).decode('utf-8')

pae = image_to_data_url(f"{jobname}{jobname_prefix}_PAE.png")
cov = image_to_data_url(f"{jobname}{jobname_prefix}_coverage.png")
plddt = image_to_data_url(f"{jobname}{jobname_prefix}_plddt.png")
display(HTML(f"""
<style>
  img {{
    float:left;
  }}
  .full {{
    max-width:100%;
  }}
  .half {{
    max-width:50%;
  }}
  @media (max-width:640px) {{
    .half {{
      max-width:100%;
    }}
  }}
</style>
<div style="max-width:90%; padding:2em;">
  <h1>Plots for {escape(jobname)}</h1>
  <img src="{pae}" class="full" />
  <img src="{cov}" class="half" />
  <img src="{plddt}" class="half" />
</div>
"""))


### Description of the plots
*   **Number of sequences per position** - We want to see at least 30 sequences per position,  and for best performance, ideally 100 sequences or more.
*   **Predicted lDDT per position** - Model confidence (out of 100) at each position. The higher the better.
*   **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.

In [None]:
#@title 8. Package and download results (optional)
#@markdown If you are having issues downloading the result archive, try disabling your adblocker and run this cell again. If that fails click on the little folder icon to the left, navigate to file: `jobname.result.zip`, right-click and select \"Download\" (see [screenshot](https://pbs.twimg.com/media/E6wRW2lWUAEOuoe?format=jpg&name=small)).

if msa_mode == "custom":
  print("Don't forget to cite your custom MSA generation method.")

!zip -FSr $jobname".result.zip" config.json $jobname*".json" $jobname*".a3m" $jobname*"relaxed_rank_"*".pdb" "cite.bibtex" $jobname*".png"
files.download(f"{jobname}.result.zip")

if save_to_google_drive == True and drive:
  uploaded = drive.CreateFile({'title': f"{jobname}.result.zip"})
  uploaded.SetContentFile(f"{jobname}.result.zip")
  uploaded.Upload()
  print(f"Uploaded {jobname}.result.zip to Google Drive with ID {uploaded.get('id')}")

## Part II. Download and install GROMACS

We will first start by setting up **GROMACS** on Google Colab, based on a  previously compiled and installed GROMACS.


In [None]:
#@title ### Installing GROMACS 2020.6 version
!wget https://raw.githubusercontent.com/pb3lab/ibm3202/master/software/gromacs.tar.gz
!tar xzf gromacs.tar.gz
# It is recommended (and required for GROMACS 2021) to upgrade cmake
!pip install cmake --upgrade

In [None]:
#@title ### Checking that GROMACS 2020.6 runs on Google Colab

%%bash
source /content/gromacs/bin/GMXRC
gmx -h

## Part III. Preparing our MD simulation system


### Part III.A. Parameterizing the atoms building up our system for MD simulations

Now, we will work with GROMACS to parameterize our protein, generating:

- A .gro or .pdb coordinate file that contains all the atom types as defined by a given force field (including hydrogens).
- A .top topology file containing the parameters for bonds, angles, dihedrals and non-bonded interactions defined by a given force field (potential energy function) to employ in our simulations.

We will parameterize our protein using the AMBER99SB-ILDN force field on GROMACS and obtain these files using gmx as shown in the code cell below. This force field is extensively used in MD simulations and has parameters that well-represent the dynamics and flexibility of folded proteins. Notice that the dynamics of highly motile proteins or intrinsically disordered regions is not the main dataset for which this force field was parameterized, and other options may better suit such goals.

In [None]:
#@title #### 1. Making a copy of the ColabFold predictions for safety
!mkdir /content/results
!cp /content/*pdb /content/results/.
!cp /content/*json /content/results/.
!cp /content/*png /content/results/.
!cp /content/results/*rank_1*.pdb /content/structure.pdb

In [None]:
#@title #### 2. Parameterizing our protein using the AMBER99SB-ILDN force field
%%bash
source /content/gromacs/bin/GMXRC

#Using pdb2gmx to parameterize our PDB with the AMBER forcefield and SPC/E water
gmx pdb2gmx -f structure.pdb -o processed.pdb -water spce -ignh -ff amber99sb-ildn -quiet

### Part III.B. Solvating our protein

We will now define a periodic box for our simulation system, in which our protein will be centered ,and then fill this box with water molecules, thus solvating our protein. Typically, a **padding distance** of 1.0-1.5 nm around the protein is recommended for globular proteins.

In [None]:
#@title #### 1. Setting up a cubic box of 1.0 nm padding

%%bash
source /content/gromacs/bin/GMXRC

#Using editconf to create a cubic box with 1.0 nm padding for our solvated system
gmx editconf -f processed.pdb -o newbox.pdb -c -d 1.0 -bt cubic -quiet

In [None]:
#@title #### 2. Filling up our cubic box with water molecules
%%bash
source /content/gromacs/bin/GMXRC

#Using solvate to fill up our box with water molecules
gmx solvate -cp newbox.pdb -o solv.pdb -p topol.top -quiet

Please note that, given the addition of water molecules to our simulation system, we are generating **a new topology file** and **a new coordinate file with added water molecules**

In [None]:
#@title #### 3. Visualization of our solvated system

#First we assign the py3Dmol.view as view
view=py3Dmol.view()
#The following lines are used to add the addModel class
#to read the PDB files
view.addModel(open('solv.pdb', 'r').read(),'pdb')
#Here we set the background color as white
view.setBackgroundColor('white')
#Here we set the visualization style and color
view.setStyle({'cartoon': {'color':'green'}})
#Here we add a style for showing the oxygen from water molecules
view.addStyle({'atom':'OW'},{'sphere':{'radius':'0.2'}})
#Centering the view on all visible atoms
view.zoomTo()
#And we finally visualize the structures using the command below
view.show()

### Part III.C. Adding counterions to neutralize the global charge of the system

Now we have a solvated box, but our system has a non-zero charge. We need to neutralize the charges of our simulation system.


In [None]:
#@title #### 1. The charge of your solvated system is:
!grep "qtot" topol.top | awk 'END{print $(NF)'}


Therefore, we will replace water molecules with the counterions required to get the absolute charge of the system to **zero**.

In [None]:
#@title #### 2. Neutralizing the simulation system
%%bash
wget https://raw.githubusercontent.com/pb3lab/ibm3202/master/files/ions.mdp
source /content/gromacs/bin/GMXRC

#Using grompp and an MD instruction file to add counterions to our system
gmx grompp -f ions.mdp -c solv.pdb -p topol.top -o ions.tpr -quiet

#This is a trick to provide interactive options to gmx
echo "SOL" > options
echo " " >> options

#Using genion and the tpr to add counterions to our solvated system
gmx genion -s ions.tpr -o solv_ions.pdb -p topol.top -pname NA -nname CL -neutral < options

## Part IV. Running an MD simulation

### Part IV.A. Minimization and equilibration of our system

Now, we are ready to perform the minimization of our system to eliminate the high energy and forces due to bad initial coordinates, and its equilibration at constant pressure and temperature (NPT ensemble).

In very simple words, each simulation step in GROMACS requires:
1. Generating an **MD parameter (.mdp) file** that provides the details of the simulation
2. Pre-processing this file using **`grompp`**
3. Running the simulation using **`mdrun`**

As an example, we will start by downloading an MD instruction file that contains all of the parameters required for the minimization of our system and also print its contents for inspection.

In [None]:
#@title ###1. Download and analyze a GROMACS .mdp file
!wget https://raw.githubusercontent.com/pb3lab/ibm3202/master/files/em.mdp
!paste em.mdp

As you can see, the minimization process involves the use of a **steepest descent** (`steep`) integrator. This integrator, alongside conjugate gradient (`cg`) and a quasi-Newtonian method (`l-bfgs`), are **minimization algorithms** that instead of solving the positions for changes in the gradient (Newtonâ€™s equation of motion), look for changes in position that would **minimize the potential energy**.


In [None]:
#@title ### 2. Run the energy minimization
%%bash
source /content/gromacs/bin/GMXRC

#Using grompp to prepare our minimization MD
gmx grompp -f em.mdp -c solv_ions.pdb -p topol.top -o em.tpr -quiet

#Run our minimization
gmx mdrun -deffnm em -nb gpu -quiet

Once our minimization is finished, we can check how the potential energy of the system changes over each minimization step.

In [None]:
#@title ### 3. Extract the change in potential energy during minimization
%%bash
source /content/gromacs/bin/GMXRC

#This is a trick to provide interactive options to gmx
echo "Potential" > options
echo " " >> options

#Using energy to extract the potential energy of the system
gmx energy -f em.edr -o em_potential.xvg -xvg none -quiet < options

In [None]:
#@title ### 4. Plot the change in potential energy of the system
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

#Reading the text file containing this information
data = np.loadtxt('em_potential.xvg')

plt.title('Potential Energy during Minimization')
plt.xlabel('Energy Minimization Step')
plt.ylabel(r'E$_P$ [kJâ€¢mol$^{-1}]$')
plt.plot(data[:,0], data[:,1], linestyle='solid', linewidth='2', color='red') 
plt.show()

Next, we will **equilibrate the energy and density of our system at constant temperature and pressure** before the MD production runs.

First, we will equilibrate our system at a target temperature (in our case, 300K) using a thermal bath. The initial velocities for the atoms of our system at our target temperature are obtained through a Maxwell distribution.

In [None]:
#@title ### 5. Downloading an MD parameter file and running an NVT equilibration
%%time
%%bash
wget https://raw.githubusercontent.com/pb3lab/ibm3202/master/files/nvt.mdp

#Using grompp to prepare our NVT equilibration MD
source /content/gromacs/bin/GMXRC
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr -quiet

#Run our NVT equilibration MD
source /content/gromacs/bin/GMXRC
gmx mdrun -deffnm nvt -nb gpu -quiet

What we just did corresponds to a simulation setup in which the number of atoms, the volume and the temperature are kept constant: **NVT ensemble**. Thus, the temperature of the system should oscillate around the desired temperature (in our case, 300 K). We will confirm this condition below.

In [None]:
#@title ### 6. Extract the change in temperature during the NVT equilibration
%%bash
source /content/gromacs/bin/GMXRC

#This is a trick to provide interactive options to gmx
echo "Temperature" > options
echo " " >> options

#Using energy to extract the temperature of the system during the NVT equil MD
gmx energy -f nvt.edr -o nvt_temp.xvg -xvg none -quiet < options

In [None]:
#@title ### 7. Plotting the change in temperature of the system during the NVT equilibration
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

#Reading the text file containing this information
data = np.loadtxt('nvt_temp.xvg')

plt.title('Temperature during 0.1 ns Equilibration (NVT)')
plt.xlabel('Time (ps)')
plt.ylabel('Temperature [K]')
plt.plot(data[:,0], data[:,1], linestyle='solid', linewidth='2', color='red') 
plt.show()

Lastly, we will equilibrate our system at **constant pressure**, which will maintain the density of our solvent constant so that it matches what we would expect for a protein in solution at atmospheric conditions. Thus, in this case we will be using an ensemble in which the number of atoms, the pressure and the temperature of the system remain constant: **the NPT ensemble**.

In [None]:
#@title ### 8. Downloading an MD parameter file and running an NPT equilibration 
%%time
%%bash
wget https://raw.githubusercontent.com/pb3lab/ibm3202/master/files/npt.mdp

#Using grompp to prepare our NPT equilibration MD
source /content/gromacs/bin/GMXRC
gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr -quiet

#Run our NPT equilibration MD
source /content/gromacs/bin/GMXRC
gmx mdrun -deffnm npt -nb gpu -quiet

7. Given that we are using an NPT ensemble to maintain our simulation at constant pressure and density, we should check if this is achieved.

In [None]:
#@title ### 9. Extract the change in pressure and density during the NPT equilibration 
%%bash
source /content/gromacs/bin/GMXRC

#This is a trick to provide interactive options to gmx
echo "Pressure" > options
echo "Density" >> options
echo " "

#Using energy to extract the pressure and density of the system during the NPT equil MD
gmx energy -f npt.edr -o npt_press_dens.xvg -xvg none -quiet < options

In [None]:
#@title ### 10. Plotting the change in pressure and density of the system during the NPT equilibration
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

#Reading the text files containing this information
data1 = np.loadtxt('npt_press_dens.xvg')
data2 = np.loadtxt('npt_press_dens.xvg')

fig, (ax1, ax2) = plt.subplots(2)
fig.suptitle('Pressure and density during 0.1 ns Equilibration (NPT)')

ax1.set(ylabel='Pressure [bar]')
#ax1.ylim(-250,250)

ax2.set(xlabel='Time (ps)', ylabel='Density [kgâ€¢m$^{-3}$]')
#ax2.ylim(1000,1020)

#Smoothing using Savitzky-Golay
from scipy.signal import savgol_filter
yhat = savgol_filter(data1[:,1], 21, 5)

#Plot raw data and spline interpolation
ax1.plot(data1[:,0], data1[:,1], linestyle='solid', linewidth='2', color='red')
ax1.plot(data1[:,0], yhat, linestyle='solid', linewidth='2', color='blue') 
ax2.plot(data2[:,0], data2[:,2], linestyle='solid', linewidth='2', color='red') 

plt.show()

### Part IV.B. Running an MD simulation

Now we are ready for running our MD simulations. Due to time constraints and the size of the system, we are only generating **0.1 ns of production runs**, whilst you will find that simulations in current articles often correspond to hundreds of ns.

In [None]:
#@title ### 1. Downloading an MD parameter file and runninng a production MD simulation
%%time
%%bash
wget https://raw.githubusercontent.com/pb3lab/ibm3202/master/files/md.mdp

#Using grompp to prepare our production MD
source /content/gromacs/bin/GMXRC
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_1.tpr -quiet

#Run our production MD
source /content/gromacs/bin/GMXRC
gmx mdrun -deffnm md_1 -nb gpu -quiet

In [None]:
#@title ### 2. Analyzing the global (RMSD) and local (RMSF) dynamics of the protein
%%bash
source /content/gromacs/bin/GMXRC

#Commands for RMSD
echo "C-alpha" > options
echo " " >> options
echo "C-alpha" >> options
echo " " >> options
#RMSD calculation
gmx rms -s em.gro -f md_1.xtc -xvg none < options
#RMSF calculation
gmx rmsf -s em.gro -f md_1.xtc -xvg none -res < options

In [None]:
#@title ### 3. Plotting the RMSD of our short simulation
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

#Reading the text file containing this information
data = np.loadtxt('rmsd.xvg')

plt.title('Ca-RMSD during 0.1 ns production MD')
plt.xlabel('Time (ps)')
plt.ylabel('Ca-RMSD (nm)')
plt.plot(data[:,0], data[:,1], linestyle='solid', linewidth='2', color='red') 
plt.show()

In [None]:
#@title ### 4. Plotting the RMSF of our short simulation
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

#Reading the text file containing this information
data = np.loadtxt('rmsf.xvg')

plt.title('Ca-RMSF during 0.1 ns production MD')
plt.xlabel('residue number')
plt.ylabel('Ca-RMSF (nm)')
plt.plot(data[:,0], data[:,1], linestyle='solid', linewidth='2', color='red') 
plt.show()

To finalize, we will visualize our simulation. For this, we will use the `trjconv` module from GROMACS to remove the solvent from our system and convert our trajectory into a PDB file.

Given that our protein may have drifted from the center of the box and reached its edges, we will also take the opportunity to recenter the periodic box on our protein such that its atoms do not stretch through the edges of the periodic boundary conditions.

In [None]:
#@title ### 3. Removing the solvent from our MD simulation
%%bash
source /content/gromacs/bin/GMXRC

#This is a trick to provide interactive options to gmx
echo "Protein" > options
echo " " >> options
echo "Protein" >> options
echo " " >> options

#Using trjconv to extract only the protein atoms from the simulation trajectory
#and also recenter the protein if its atoms crossed the periodic boundaries
gmx trjconv -s md_1.tpr -f md_1.xtc -o md_traj.pdb -pbc mol -center -quiet < options

In [None]:
#@markdown Now, you can download this new PDB file and load it onto [**NGLviewer**](http://nglviewer.org/ngl/) as a **trajectory** PDB file to visualize the protein motions explored within this short simulation time.
files.download(f"md_traj.pdb")

# Part V - Backing up your files

1. If you want to download your produced files, execute the code below. A compressed .tar.gz file will be generated and automatically downloaded into your computer (unless you have an ad-blocker, for which you will have to manually download it).

In [None]:
#Compressing all files into a .tar.gz file
!tar -czf D2-tutorial-03.tar.gz *

In [None]:
from google.colab import files
files.download("/content/D2-tutorial-01.tar.gz")

2. Alternatively, you can transfer the files directly to your Google Drive as shown below:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os
import shutil
from pathlib import Path 
backup = Path("/content/drive/MyDrive/saocarlos2022/")
if os.path.exists(backup):
  print("Sao Carlos Workshop 2022 - Backup folder already exists")
if not os.path.exists(backup):
  os.mkdir(backup)
  print("Sao Carlos Workshop 2022 - Backup folder did not exists and was succesfully created")

#Backing up
shutil.copy(str('/content/D2-tutorial-03.tar.gz'), str(backup/'D2-tutorial-03.tar.gz'))
print("Day 2 - Tutorial 3 files successfully backed up!")