# Very Quick Test Simulation with GROMACS

## Introduction
Adapted from a [GROMACS Tutorial](https://tutorials.gromacs.org/docs/md-intro-tutorial.html), this test seeks a brief molecular dynamics simulation between a small protein called `Factor Xa` (PDB code `1FJS`), and a co-crystallized ligand, called `INHIBITOR ZK-807834 (CI-1031)` (residue name `z34`). 

The objective of this exercise was to test my singularity image of `GROMACS` bundled with associated goodies as detailed in the singularity def file [obtained here](.config/gromacs_2024.5-GPU.def).

The simulation time data is given below. For further background, see the MD-tutorials by Justin Lemkul @ http://www.mdtutorials.com/gmx/.

| Stage                | Purpose                                           | Ensemble | Duration | Time Step | Constraints          | Thermostat (T=300 K) | Barostat (P=1 bar)     | Notes |
|----------------------|---------------------------------------------------|----------|----------|-----------|----------------------|----------------------|------------------------|-------|
| Energy Minimization  | Relax bad contacts; remove steric clashes         | N/A      | ≤ 50,000 steps (until Fmax < 1000 kJ/mol/nm) | N/A       | None (or LINCS if needed) | None                 | None                   | Steepest descent (then CG optional) |
| NVT Equilibration    | Stabilize temperature; densify without pressure coupling | NVT      | 100 ps   | 2 fs      | LINCS on bonds to H  | V-rescale (tau_t=0.1 ps) | None                   | Position restraints on heavy protein & ligand atoms |
| NPT Equilibration    | Adjust density & pressure; stabilize volume       | NPT      | 100 ps   | 2 fs      | LINCS on bonds to H  | V-rescale (tau_t=0.1 ps) | Parrinello-Rahman (tau_p=2.0 ps) | Keep position restraints; isotropic coupling |
| Production MD        | Collect trajectory for analysis (dynamics, binding) | NPT      | 5 ns     | 2 fs      | LINCS on bonds to H  | V-rescale (tau_t=0.1 ps) | Parrinello-Rahman (tau_p=2.0 ps) | Remove restraints; save frames every 10 ps (e.g. nstxout-compressed=5000) |


The starting point for every simulation is a molecular structure file. In this tutorial, we will simulate Factor Xa - a protein playing a critical role in the formation of blood clots. The 3D structure is available from the RCSB website, https://www.rcsb.org/ with PDB code `1FJS`. You can find the PDB file for the crystal structure in the “input” directory as a file called `1fjs.pdb`.


### Notabene

For best results, launch this notebook on a browser with good webgl compatibility, like firefox or epiphany, not Google Chrome (at least not on linux).

In [15]:
import nglview as ng
import pandas as pd
import os
os.chdir("/host_pwd/protein-1fjs-ligand-z34-sim")
view = ng.show_structure_file("1fjs_protein.pdb")
view

NGLWidget()

## Cleaning the input structure
Once you’ve had a look at the molecule, you are going to want to strip out all the atoms that do not belong to the protein (e.i crystal waters, ligands, etc). To delete those atoms (labelled “HETATM” in the PDB file) and eventually their connectivity, either use a plain text editor like vi, emacs (Linux/Mac), or Notepad (Windows). Do not use word processing software! Alternatively, you can use `grep` to delete these lines very easily:

```bash
$ grep -v HETATM 1fjs.pdb > 1fjs_protein_tmp.pdb
$ grep -v CONECT 1fjs_protein_tmp.pdb > 1fjs_protein.pdb
```

Next, load into `pymol` the original `1fjs.pdb` that was downloaded from RCSB and use the sequence menu option to view the sequence. Use the mouse to select the residue labelled `z34` and save it into a separate pdb named `z34.pdb`. 

Now, you have two PDB files, the pure protein in `1fjs_protein.pdb`, and the ligand in `z34.pdb`.

## Molecular Protein-Ligand Docking

Next step is to dock the ligand into the protein optimally. To do so, use the `AMDock` tool after downloading and installing it from [their GitHub page](https://github.com/Valdes-Tresanco-MS/AMDock). Once the docking is finished, use pymol to prepare a protein-ligand complex PDB file with the ligand fully docked.  

In [16]:
view = ng.show_structure_file("1fjs_protein_z34.pdb")
view

NGLWidget()

## Actual Simulation

Finally, use my handy-dandy python script to launch the simulation using the bioexel building blocks modules. For details, check out their website @ https://biobb-wf-protein-complex-md-setup.readthedocs.io/en/latest/index.html.

The default is to supply the input PDB with all atoms and molecules properly labelled and optimally docked. Then the script splits off the ligand (usually a small molecule) from the receptor (usually a protein, but a peptide or DNA/RNA segment or any combination of amino acids and/or nucleic acids should work just fine), protonates it if necessary, uses `acpype` to generate the ligand topology data (via stochastic gradient descent on the Hartree-Fock electronic ground state), then automagically follows the steps sequentially as indicated in the table above. Once the full production MD is complete, it post-processes the trajectory to remove artefacts and center the complex (labelled as 'Protein_Other'), making it ready for viewing and analysis.

In [9]:
import importmonkey
importmonkey.add_path("/host_pwd/scripts")
import biobb_protein_ligand_simulation as plsim

  * **terms** (*list*) - (["Potential"]) Energy terms. Values: Angle, Proper-Dih., Improper-Dih., LJ-14, Coulomb-14, LJ-\(SR\), Coulomb-\(SR\), Coul.-recip., Position-Rest., Potential, Kinetic-En., Total-Energy, Temperature, Pressure,  Constr.-rmsd, Box-X, Box-Y,  Box-Z, Volume, Density, pV, Enthalpy, Vir-XX, Vir-XY, Vir-XZ, Vir-YX, Vir-YY, Vir-YZ, Vir-ZX, Vir-ZY, Vir-ZZ, Pres-XX, Pres-XY, Pres-XZ, Pres-YX, Pres-YY,  Pres-YZ, Pres-ZX, Pres-ZY, Pres-ZZ, #Surf*SurfTen, Box-Vel-XX, Box-Vel-YY, Box-Vel-ZZ, Mu-X, Mu-Y, Mu-Z, T-Protein, T-non-Protein, Lamb-Protein, Lamb-non-Protein.


In [10]:
input_structure="1fjs_protein_z34.pdb"
ligand_code="Z34"
ligand_charge=0
outdir="simulation"
nprocs=12
usegpu=True
gpuid="0"
em_steps=5000

nvt_time_ns = 0.1
npv_time_ns = 0.1
md_time_ns = 5.0


npt_steps = int(npv_time_ns * 500000)
nvt_steps = int(nvt_time_ns * 500000)
md_steps = int(md_time_ns * 500000)

protein_1fjs_ligand_z34 = {
        'input_structure': input_structure,
        'ligand_code': ligand_code,
        'ligand_charge': ligand_charge,
        'outdir': outdir,
        'nprocs': nprocs,
        'usegpu': usegpu,
        'gpuid': gpuid,
        'em_steps': em_steps,
        'npt_steps': npt_steps,
        'nvt_steps': nvt_steps,
        'md_steps': md_steps
    }

plsim.molecular_dynamics(protein_1fjs_ligand_z34, protonated=False)


Start of Protein-Ligand Dynamics Simulation
+---------------------------+-----------------------------------------------+
| Parameter                 | Value                                         |
| Input Structure           | 1fjs_protein_z34.pdb                          |
+---------------------------+-----------------------------------------------+
| Ligand Code               | Z34                                           |
+---------------------------+-----------------------------------------------+
| Ligand Charge             | 0                                             |
+---------------------------+-----------------------------------------------+
| Output Directory          | simulation                                    |
+---------------------------+-----------------------------------------------+
| Number of Processors      | 12                                            |
+---------------------------+-----------------------------------------------+
| Use GPU           

True

End of Protein-Ligand Dynamics Simulation


               :-) GROMACS - gmx trjconv, 2024.5-conda_forge (-:

Executable:   /usr/local/bin.AVX2_256/gmx
Data prefix:  /usr/local
Working dir:  /host_pwd/protein-1fjs-ligand-z34-sim
Command line:
  gmx trjconv -f simulation/prot_1fjs_protein_z34_Z34_cluster_center_traj.xtc -s simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr -o simulation/prot_1fjs_protein_z34_Z34_cluster_center_traj.xtc -pbc mol -center

Will write xtc: Compressed trajectory (portable xdr format): xtc
Reading file simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr, VERSION 2024.5-conda_forge (single precision)
Reading file simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr, VERSION 2024.5-conda_forge (single precision)
Group     0 (         System) has 36638 elements
Group     1 (        Protein) has  4417 elements
Group     2 (      Protein-H) has  2238 elements
Group     3 (        C-alpha) has   286 elements
Group     4 (       Backbone) has   858 elements
Group     5 (      MainChain) has  1146 elements
Group     6 (   Ma

Note that major changes are planned in future for trjconv, to improve usability and utility.
Select group for centering
Selected 0: 'System'
Select group for output
Selected 13: 'Z34'


               :-) GROMACS - gmx trjconv, 2024.5-conda_forge (-:

Executable:   /usr/local/bin.AVX2_256/gmx
Data prefix:  /usr/local
Working dir:  /host_pwd/protein-1fjs-ligand-z34-sim
Command line:
  gmx trjconv -f simulation/prot_1fjs_protein_z34_Z34_cluster_center_traj.xtc -s simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr -o simulation/prot_1fjs_protein_z34_Z34_cluster_center_traj.xtc -pbc mol -center

Will write xtc: Compressed trajectory (portable xdr format): xtc
Reading file simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr, VERSION 2024.5-conda_forge (single precision)
Reading file simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr, VERSION 2024.5-conda_forge (single precision)
Group     0 (         System) has 36638 elements
Group     1 (        Protein) has  4417 elements
Group     2 (      Protein-H) has  2238 elements
Group     3 (        C-alpha) has   286 elements
Group     4 (       Backbone) has   858 elements
Group     5 (      MainChain) has  1146 elements
Group     6 (   Ma

Note that major changes are planned in future for trjconv, to improve usability and utility.
Select group for centering
Selected 0: 'System'
Select group for output
Selected 13: 'Z34'


                   :-) GROMACS - gmx, 2024.5-conda_forge (-:

Executable:   /usr/local/bin.AVX2_256/gmx
Data prefix:  /usr/local
Working dir:  /host_pwd/protein-1fjs-ligand-z34-sim
Command line:
  gmx convert_tpr -f simulation/prot_1fjs_protein_z34_Z34_md.trr -s simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr -n simulation/prot_1fjs_protein_z34_Z34_index.ndx -o simulation/prot_1fjs_protein_z34_Z34_cluster_center_traj.tpr

-------------------------------------------------------
Program:     gmx, version 2024.5-conda_forge
Source file: src/gromacs/commandline/cmdlinemodulemanager.cpp (line 385)
Function:    gmx::ICommandLineModule* gmx::CommandLineModuleManager::Impl::processCommonOptions(gmx::CommandLineCommonOptionsHolder*, int*, char***)

Error in user input:
'convert_tpr' is not a GROMACS command.

For more information and tips for troubleshooting, please check the GROMACS
website at https://manual.gromacs.org/current/user-guide/run-time-errors.html
-----------------------------------

Selected 20: 'Protein_Other'


 ctime or size or n_atoms did not match


### Post-Processing

Use `gmx convert-tpr` to create a tpr file of protein-ligand selections only.

In [17]:
import MDAnalysis as mda
import nglview as ng
import subprocess

In [18]:
ndx_file="simulation/prot_1fjs_protein_z34_Z34_index.ndx"
traj_file = "simulation/prot_1fjs_protein_z34_Z34_cluster_center_traj.xtc"
old_top_file="simulation/prot_1fjs_protein_z34_Z34_gppmd.tpr"
top_file = "simulation/prot_1fjs_protein_z34_Z34_cluster_center_traj.tpr"
ligand_code = "Z34"

In [None]:
subprocess.run(["gmx", "convert-tpr", "-s", old_top_file, "-n", ndx_file, "-o", top_file],
        input="Protein_Other\n", text=True)

### Visualization

Use `MDAnalysis` and `nglview` to visualize the `xtc` trajectory.

In [19]:
u = mda.Universe(top_file, traj_file)

prot_lig = u.select_atoms("protein or resname " + ligand_code) 
view = ng.show_mdanalysis(prot_lig)
view.center(selection='resname'+ligand_code)
view.add_contact(selection=prot_lig,hydrogen_bond=True)
view

NGLWidget(max_frame=2500)

## TODO

Modify the `biobb` script to handle more exotic cases, like DNA/RNA-peptide combos (excluding the need for topology building). Also important, make up a decent workflow for **Coarse-Graining** using the `Martini3` force-field.