In [1]:
import gromacs
from Bio.PDB import PDBList

import datetime
now = datetime.datetime.now()

from sys import argv

# gromacs.config.setup() # UNCOMMENT if not yet setup on your system!
print("GMX version:",gromacs.release(),"\n")
# help(gromacs.pdb2gmx)

GMX version: 2020.2-MODIFIED 



# 1.) SOLVATION TUTORIAL

This tutorial follows the tutorial developed by Joseph Lemke.

In [2]:
# Download the PDB target
import pypdb

# Define the simulation datapath
def now_dir_ts():
    now_ts = str(now.year)+"_"+str(now.month)+"_"+str(now.day)+"_"+str(now.hour)+"_"+str(now.minute)+"_"+str(now.second)
    return now_ts

sim_dir = "/Users/jacobnorth/Box/extracurriculars/research/SURE_S2020_fileshare/sure_data/"+input("Please enter a sub-directory of sure_data/:")+"/mdsim_"+now_dir_ts()

print(sim_dir)      # Print the simulation datapath

/Users/jacobnorth/Box/extracurriculars/research/SURE_S2020_fileshare/sure_data/1aki/mdsim_2020_6_23_21_36_59


In [3]:
id = input("Please enter a PDB ID to simulate:")

pdbl = PDBList()            # Create a PDBList object 
pdbl.retrieve_pdb_file(id, file_format='pdb', pdir=sim_dir)       # Retrieve the PDB file in PDB format

# pypdb.get_pdb_file('1oca', filetype='PDB')      # Search and download on PDB

Downloading PDB structure '1aki'...


'/Users/jacobnorth/Box/extracurriculars/research/SURE_S2020_fileshare/sure_data/1aki/mdsim_2020_6_23_21_36_59/pdb1aki.ent'

In [None]:
with open(sim_dir+"/pdb"+id+".ent") as f: # The with keyword automatically closes the file when you are done
    pdbfile = f.read()
    print(pdbfile)

HERE: use grep:
```
grep -v HOH pdb1aki.ent > 1AKI_clean.pdb
```
to remove waters from the structure and rename it

Note that such a procedure is **not universally appropriate** (e.g., the case of a tightly bound or otherwise functional active-site water molecule). For our intentions here, we do not need crystal water.

Always:
- Check your .pdb file for entries listed under the comment MISSING, as these entries indicate either atoms or whole residues that are not present in the crystal structure. Terminal regions may be absent, and may not present a problem for dynamics. 
- Incomplete internal sequences or any amino acid residues that have missing atoms will cause pdb2gmx to fail. These missing atoms/residues must be modeled in using other software packages. 
- Also note that pdb2gmx is not magic. It cannot generate topologies for arbitrary molecules, just the residues defined by the force field (in the *.rtp files - generally proteins, nucleic acids, and a very finite amount of cofactors, like NAD(H) and ATP).

Now that the crystal waters are gone and we have verified that all the necessary atoms are present, the PDB file should contain only protein atoms, and is ready to be input into the first GROMACS module, pdb2gmx. The purpose of pdb2gmx is to generate three files:
- The topology for the molecule.
- A position restraint file.
- A post-processed structure file. 

The topology (topol.top by default) contains all the information necessary to define the molecule within a simulation. This information includes nonbonded parameters (atom types and charges) as well as bonded parameters (bonds, angles, and dihedrals). We will take a more detailed look at the topology once it has been generated.

Execute pdb2gmx by issuing the following command:

In [6]:
# gromacs.pdb2gmx(f=sim_dir+"/pdb"+id+".ent", o=sim_dir+"/protein.gro", p=sim_dir+"/topol.top", ff="oplsaa", water="tip4p")
gromacs.pdb2gmx(f=sim_dir+"/"+id+"_clean.pdb", o=sim_dir+"/"+id+"_processed"+".gro", p=sim_dir+"/"+id+"_topol.top", ff="oplsaa", water="tip4p")

(0, None, None)

The force field will contain the information that will be written to the topology. This is a very important choice! You should always read thoroughly about each force field and decide which is most applicable to your situation. For this tutorial, we will use the all-atom OPLS force field, so type 15 at the command prompt, followed by 'Enter'.

There are many other options that can be passed to pdb2gmx. Some commonly used ones are listed here:

- -ignh: Ignore H atoms in the PDB file; especially useful for NMR structures. Otherwise, if H atoms are present, they must be in the named exactly how the force fields in GROMACS expect them to be. Different conventions exist, so dealing with H atoms can occasionally be a headache! If you need to preserve the initial H coordinates, but renaming is required, then the Linux sed command is your friend.
- -iter: Interactively assign charge states for N- and C-termini.
- -inter: Interactively assign charge states for Glu, Asp, Lys, Arg, and His; choose which Cys are involved in disulfide bonds. 

You have now generated three new files: 1AKI_processed.gro, topol.top, and posre.itp. 1AKI_processed.gro is a GROMACS-formatted structure file that contains all the atoms defined within the force field (i.e., H atoms have been added to the amino acids in the protein). The topol.top file is the system topology (more on this in a minute). The posre.itp file contains information used to restrain the positions of heavy atoms (more on this later).

One final note: many users assume that a .gro file is mandatory. This is not true. GROMACS can handle many different file formats, with .gro simply being the default for commands that write coordinate files. It is a very compact format, but it has limited precision. If you prefer to use, for instance, PDB format, all you need to do is to specify an appropriate file name with .pdb extension as your output. The purpose of pdb2gmx is to produce a force field-compliant topology; the output structure is largely a side effect of this purpose and is intended for user convenience. The format can be just about anything you like (see the GROMACS manual for different formats).

Now that you are familiar with the contents of the GROMACS topology, it is time to continue building our system. In this example, we are going to be simulating a simple aqueous system. It is possible to simulate proteins and other molecules in different solvents, provided that good parameters are available for all species involved.

There are two steps to defining the box and filling it with solvent:

- Define the box dimensions using the editconf module.
- Fill the box with water using the solvate module (formerly called genbox). 

You are now presented with a choice as to how to treat the unit cell. For the purpose of this tutorial, we will use a simple cubic box as the unit cell. As you become more comfortable with periodic boundary conditions and box types, I highly recommend the rhombic dodecahedron, as its volume is ~71% of the cubic box of the same periodic distance, thus saving on the number of water molecules that need to be added to solvate the protein.

Let's define the box using editconf:

In [7]:
gromacs.editconf(f=sim_dir+"/"+id+"_processed"+".gro", o=sim_dir+"/"+id+"_boxed.gro", bt="dodecahedron", d=1.5, princ=True, input="Protein")

(0, None, None)

In [9]:
gromacs.tools.Genion()

<gromacs.tools.Genion at 0x1269e6b50>

The above command centers the protein in the box (-c), and places it at least 1.0 nm from the box edge (-d 1.0). The box type is defined as a cube (-bt cubic). The distance to the edge of the box is an important parameter. Since we will be using periodic boundary conditions, we must satisfy the minimum image convention. That is, a protein should never see its periodic image, otherwise the forces calculated will be spurious. Specifying a solute-box distance of 1.0 nm will mean that there are at least 2.0 nm between any two periodic images of a protein. This distance will be sufficient for just about any cutoff scheme commonly used in simulations.

Now that we have defined a box, we can fill it with solvent (water). Solvation is accomplished using solvate:

In [8]:
# gromacs.solvate(cp=sim_dir+"/"+id+"_boxed.gro", cs="tip4p", p=sim_dir+"/"+id+"_topol.top", o=sim_dir+"/"+id+"_solvated.gro")

gromacs.solvate(cp=sim_dir+"/"+id+"_boxed.gro", cs="tip4p", p=sim_dir+"/"+id+"_topol.top", o=sim_dir+"/"+id+"_solvated.gro")

(0, None, None)

The configuration of the protein (-cp) is contained in the output of the previous editconf step, and the configuration of the solvent (-cs) is part of the standard GROMACS installation. We are using spc216.gro, which is a generic equilibrated 3-point solvent model. You can use spc216.gro as the solvent configuration for SPC, SPC/E, or TIP3P water, since they are all three-point water models. The output is called 1AKI_solv.gro, and we tell solvate the name of the topology file (topol.top) so it can be modified. Note the changes to the [ molecules ] directive of topol.top:

Given an MDP input file for energy minimization, generate the TPR file and run the energy minimization locally:

In [None]:
# gromacs.grompp(f=sim_dir+"/minim.mdp", c=sim_dir+"/solvated.gro", p="topol.top", o=sim_dir+"/minim.tpr")

gromacs.grompp(f=sim_dir+"/minim.mdp", c=sim_dir+"/solvated.gro", p="/Users/jacobnorth/Documents/GitHub/mdcode/jupyter_nbs/general/md_recipes/temp.top", o=sim_dir+"/minim.tpr")

In [None]:
gromacs.mdrun(v=True, deffnm="emin")

Assuming it all went well, set up and run a MD simulation, starting from the energy minimized system:

In [None]:
gromacs.grompp(f=sim_dir+"/md.mdp", c=sim_dir+"/emin.gro", p=sim_dir+"/topol.top", o=sim_dir+"/md.tpr")
gromacs.mdrun(v=True, deffnm="md")