In [1]:
%set_env SHELL=/bin/bash


env: SHELL=/bin/bash


In [2]:
import mbuild
compound = mbuild.load("spce216.gro")
compound.visualize()

  warn(


<py3Dmol.view at 0x14c69722f910>

Let us begin setting up our simulation files. Our goal is to simulate a box of water molecules, and we want the box to have at least 4000 water molecules as we have ascertained this is a reasonable system size.

**Question** How do we decide what is an optimal system size for any of our simulations?

We will first need to see what files we have, and what files do we need. 

In gromacs, input files are given in three main types:
(1) File that has information about the atom (atomtypes), and molecules in our system, and how they interact with each other -- Topology file (extension used is \*.top)
(2) A file with initial coordinates of all the atoms in our system. This is called the coordinate file and can be input in different formats. Most common are gro and pdb. We will use \*.gro format.
(3) A file with parameters for our MD simulations. We will see this file in detail later. The extension for this file is \*.mdp.

Note that every MD software has its own format and names for the input files. However, in the end, we need to provide these three pieces of information -- what does our system comprise, initial coordinates, and parameters for our MD simulation.

It is important to choose and understand each of these input carefully. 

**Question** Where do we get information about the atomtypes and interactions to give as input in our files? What does this look like? (Hint: Force fields)

In [3]:
%%bash
pwd
ls 

/jet/home/erjank/watersims/runtutorial
analysis-tutorial.ipynb
em.mdp
runmdshort.mdp
spce216.gro
spce_MASTER.top
water-tutorial.ipynb


We have a 
(1) topology file (spce_MASTER.top)
(2) two mdp files (em.mdp and mdspce.top)
(3) one gro file (spce216.gro)

Let's do the easy bit first. In the gro file, you will see we have 216 water molecules. However, we want a simulation box with 4000 water molecules. Therefore, we will run the following commands to generate a box of 4000 molecules. (Remember, these questions are gromacs specific and you will need different commands for different MD software but the general steps will be the same).

In [4]:
!mpirun gmx_mpi solvate -cs spce216.gro -cp spce216.gro -box 5.0 -o spcesolv.gro

               :-) GROMACS - gmx solvate, 2022.2-conda_forge (-:

Executable:   /opt/conda/bin.AVX2_256/gmx_mpi
Data prefix:  /opt/conda
Working dir:  /jet/home/erjank/watersims/runtutorial
Command line:
  gmx_mpi solvate -cs spce216.gro -cp spce216.gro -box 5.0 -o spcesolv.gro

Reading solute configuration
Reading solvent configuration

Initialising inter-atomic distances...

         based on residue and atom names, since they could not be
         definitively assigned from the information in your input
         files. These guessed numbers might deviate from the mass
         and radius of the atom type. Please check the output
         files if necessary. Note, that this functionality may
         be removed in a future GROMACS version. Please, consider
         using another file format for your input.

NOTE: From version 5.0 gmx solvate uses the Van der Waals radii
from the source below. This means the results may be different
compared to previous GROMACS versions.

++++ PLEASE 

In [5]:
import mbuild
compound = mbuild.load("spcesolv.gro")
compound.visualize()

  warn(


<py3Dmol.view at 0x14c687e33f40>

That was easy! But let's understand what we did.

mpirun gmx_mpi_d solvate -cs spce216.gro -cp spce216.gro -box 5.0 -o spcesolv.gro

(1) gmx_mpi_d == invokes gromacs and the exact form of this gmx_(suffix) depends on how gromacs is installed and what is given as the suffix at the time of installation.
(2) solvate == is the gromacs command to solvate a solute (given as -cs option) with a solvent (given as -cp option)
(3) the command above is saying "solvate my solute (spce216.gro) with solvent (spce216.gro)"
(4) write the output as spcesolv.gro.

In [6]:
!ls -ltrh

total 780K
-rw-r--r-- 1 erjank see220002p  738 Jul 10 18:52 spce_MASTER.top
-rw-r--r-- 1 erjank see220002p  29K Jul 10 18:52 spce216.gro
-rw-r--r-- 1 erjank see220002p  410 Jul 10 18:52 em.mdp
-rw-r--r-- 1 erjank see220002p  59K Jul 10 18:52 analysis-tutorial.ipynb
-rw-r--r-- 1 erjank see220002p 139K Jul 10 21:59 water-tutorial.ipynb
-rw-r--r-- 1 erjank see220002p 1.6K Jul 10 21:59 runmdshort.mdp
-rw-r--r-- 1 erjank see220002p 535K Jul 10 21:59 spcesolv.gro


Ok, now let's understand our topology file. Remember I had asked a question -- how do we decide on our atomtypes and interactions between them? It comes from force fields. But at a very basic level, we have to have a way of saying to the computer, hey this coordinate you see is an oxygen and this one is a hydrogen, one oxygen atom is bonded to two hydrogen atoms. How do we convey this information?

**In gromacs topology file** you first define the atoms in your system, then the molecules, and then the system. So if you open your topology file you will see sections [atoms], [molecules], [system].


So now that you know what is in the topology file, let's think of what modifications, if any, do we need to make with our steps of creating a new box of water molecules.

**Question** What do we need to modify in the topology file to make it compatible with our configuration file?

In [7]:
%%bash
fsolv="spcesolv.gro" 
nsol=`tail -2 $fsolv | head -1 | awk '{print substr($1,1,length($1)-3)}'`
sed  "s/XXXX/$nsol/g" spce_MASTER.top > spce-updated.top
tail -2 spce-updated.top

SOL   4055 



Ok, so now we have a configuration file. We have an updated topology file. We know what goes into it. **What is the next file we should be thinking about?"** 

Yes, it's the mdp file. Here we will enter all the parameters necessary to run the simulation. Let's think about this for a moment. **Do you know what is the general algorithm of an MD simulation?**

**What is the general approach when starting the simulations**
1. You run an energy minimization
2. Run an equilibration
3. Run a production simulation

We will specify minimization algorithm in the mdp file. 

In [None]:
%%bash
more em.mdp

In [8]:
!mpirun gmx_mpi grompp -f em.mdp -c spcesolv.gro -p spce-updated.top -o minim.tpr

                :-) GROMACS - gmx grompp, 2022.2-conda_forge (-:

Executable:   /opt/conda/bin.AVX2_256/gmx_mpi
Data prefix:  /opt/conda
Working dir:  /jet/home/erjank/watersims/runtutorial
Command line:
  gmx_mpi grompp -f em.mdp -c spcesolv.gro -p spce-updated.top -o minim.tpr

Ignoring obsolete mdp entry 'ns_type'

NOTE 1 [file em.mdp]:
  nstcomm < nstcalcenergy defeats the purpose of nstcalcenergy, consider
  setting nstcomm equal to nstcalcenergy for less overhead

Setting the LD random seed to -604276867

Generated 3 of the 3 non-bonded parameter combinations

Excluding 2 bonded neighbours molecule type 'SOL'
Analysing residue names:
There are:  4055      Water residues
Number of degrees of freedom in T-Coupling group rest is 24327.00

The largest distance between excluded atoms is 0.164 nm
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 42x42x42, spacing 0.119 0.119 0.119

Estimate for the relative computational load of the PME mesh part: 0.24

This run wil

In [9]:
!mpirun gmx_mpi mdrun -s minim.tpr -deffnm spceminim -v

                :-) GROMACS - gmx mdrun, 2022.2-conda_forge (-:

Executable:   /opt/conda/bin.AVX2_256/gmx_mpi
Data prefix:  /opt/conda
Working dir:  /jet/home/erjank/watersims/runtutorial
Command line:
  gmx_mpi mdrun -s minim.tpr -deffnm spceminim -v

Reading file minim.tpr, VERSION 2022.2-conda_forge (single precision)
Using 1 MPI process

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread 


Steepest Descents:
   Tolerance (Fmax)   =  1.00000e+03
   Number of steps    =          100
Step=    0, Dmax= 1.0e-02 nm, Epot= -6.87434e+04 Fmax= 1.94193e+05, atom= 6817
Step=    1, Dmax= 1.0e-02 nm, Epot= -1.02153e+05 Fmax= 7.69022e+04, atom= 157
Step=    2, Dmax= 1.2e-02 nm, Epot= -1.24601e+05 Fmax= 2.89806e+04, atom= 9487
Step=    3, Dmax= 1.4e-02 nm, Epot= -1.41872e+05 Fmax= 1.33854e+04, atom= 10630
Step=    4, Dmax= 1.7e-02 nm, Epot= -1.53159e+05 Fmax= 5.75715e+03, atom= 2029
Step=    5, Dmax= 2.1e-02 nm, Epot= -1.62903e+05 Fmax= 2.69055e+03, atom=

In [10]:
import mbuild
compound = mbuild.load("spceminim.gro")
compound.visualize()

  warn(


<py3Dmol.view at 0x14c696430220>

In [None]:
%%bash
ls -ltrh

Now that we understand the input let's spend few minutes going over the mdp file and let me know if you have any questions.

Now let's compile and run the simulation!

In [11]:
!mpirun gmx_mpi grompp -f runmdshort.mdp -c spceminim.gro -p spce-updated.top -o spcemd.tpr -maxwarn 1

                :-) GROMACS - gmx grompp, 2022.2-conda_forge (-:

Executable:   /opt/conda/bin.AVX2_256/gmx_mpi
Data prefix:  /opt/conda
Working dir:  /jet/home/erjank/watersims/runtutorial
Command line:
  gmx_mpi grompp -f runmdshort.mdp -c spceminim.gro -p spce-updated.top -o spcemd.tpr -maxwarn 1


  The Berendsen barostat does not generate any strictly correct ensemble,
  and should not be used for new production simulations (in our opinion).
  For isotropic scaling we would recommend the C-rescale barostat that also
  ensures fast relaxation without oscillations, and for anisotropic scaling
  you likely want to use the Parrinello-Rahman barostat.

Setting the LD random seed to -136515585

Generated 3 of the 3 non-bonded parameter combinations

Excluding 2 bonded neighbours molecule type 'SOL'

turning H bonds into constraints...

Velocities were taken from a Maxwell distribution at 300 K
Analysing residue names:
There are:  4055      Water residues
Number

In [12]:
!mpirun gmx_mpi mdrun -s spcemd.tpr -deffnm spcemdshort -v 

                :-) GROMACS - gmx mdrun, 2022.2-conda_forge (-:

Executable:   /opt/conda/bin.AVX2_256/gmx_mpi
Data prefix:  /opt/conda
Working dir:  /jet/home/erjank/watersims/runtutorial
Command line:
  gmx_mpi mdrun -s spcemd.tpr -deffnm spcemdshort -v

Reading file spcemd.tpr, VERSION 2022.2-conda_forge (single precision)
Changing nstlist from 10 to 25, rlist from 0.898 to 0.994

Using 1 MPI process

Non-default thread affinity set, disabling internal thread affinity

Using 1 OpenMP thread 

starting mdrun 'Pure SPCE water'
10000 steps,     50.0 ps.
step 9900, remaining wall clock time:     0 s          7700, remaining wall clock time:    22 s          
Writing final coordinates.
step 10000, remaining wall clock time:     0 s          
               Core t (s)   Wall t (s)        (%)
       Time:       96.451       96.452      100.0
                 (ns/day)    (hour/ns)
Performance:       44.794        0.536

GROMACS reminds you: "Schrödinger's backup: The condition of any backup