Colicin PK protein and Pyocynain Complex
======================


## ``Gromacs_py`` simulation

Here is an example of a short simulation of the Colicin PK (`colpk`) protein in complex with pyocyanin (`py`) ligand using CHARMM force-field model.


Finally, nine successive steps are used:

1. Load the protein in its best -docked state. 
   
2. In-complex creation of Protein Topology using ``GmxSys.add_top()``, followed by boxing and solvation/neutralization.
   
3. Boxing of complex.
   
4. Ideally, Ligand topology creation using `prepare_top()` to build ligand topology. This did not work, so must do this manually using `cgenff` server.
   
5. Solvate complex and add ions.

6. Minimisation of the structure using ``GmxSys.em_2_steps()``.

7. Equilibration of the system using ``GmxSys.em_equi_three_step_iter_error()``.

8. Production run using ``GmxSys.production()``.
   
9.  Post-processing using `GmxSys.convert_trj().`

### Import

In [1]:
import sys
import os
import shutil

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

## To use `gromacs_py` in a project

In [2]:
from gromacs_py import gmx

## Simulation setup

- Define a few variables for you simulation, like:
  
    1. simulation output folders
    2. ionic concentration
    3. number of minimisation steps
    4. equilibration and production time

### Regarding equilibriation time:
The following variables define the sim times (relative units) for each stage of the three-stage equilibriation process. Check notes below for details:

1. `HA_time`
2. `CA_time`
3. `CA_LOW_time` 


In [3]:
DATA_OUT = 'colpk_pyc_charmm'

# System Setup
vsite='none'
sys_top_folder = os.path.join(DATA_OUT, 'sys_top')
#ignore_hydrogen = {'ignh': None}

# Energy Minimisation
em_folder = os.path.join(DATA_OUT, 'em')
em_sys_folder = os.path.join(DATA_OUT, 'sys_em')
em_step_number = 10000
emtol = 10.0  	# Stop minimization when the maximum force < 10 J/mol
emstep  = 0.01      # Energy step size


# Equillibration
equi_folder = os.path.join(DATA_OUT, 'sys_equi')
HA_time = 0.5
CA_time = 1.0
CA_LOW_time = 4.0

dt_HA = 0.001
dt = 0.002

HA_step = 1000 * HA_time / dt_HA
CA_step = 1000 * CA_time / dt
CA_LOW_step = 1000 * CA_LOW_time / dt

# Production
os.makedirs(DATA_OUT, exist_ok = True)
prod_folder = os.path.join(DATA_OUT, 'sys_prod')
prod_time = 25.0

prod_step = 1000 * prod_time / dt

## Create the `GmxSys` object

Load protein information only from docked PDB file on disk

In [4]:
pdb_file = "pdbs/colpk_pyc.pdb"
sys_name = DATA_OUT
complex_sys = gmx.GmxSys(name=sys_name, coor_file=pdb_file)

## Create topology and stuff

Topology creation involves using `pdb2gmx` via the `add_top()` function. Note that the supposedly easier `prepare_top()` function fails with `CHARMM` (intermediate `pdb` files are created have mal-formatted residues), although works with `AMBER` (no problems with the intermediate `pdb` files)

Note that the `CHARMM` forcefield included in `gromacs_py` is hopelessly outdated, and the ligand topologies created by the `CGENFF` server (see below) include more up-to-date parameters that are incompatible. 

Thus, download the latest `CHARMM` ff. As of today (`20250119`), the url is:

https://mackerell.umaryland.edu/download.php?filename=CHARMM_ff_params_files/charmm36-jul2022.ff.tgz

Download it, untar it and set the environment variable `GMXLIB` to the working directory where it was untarred.

If using `gromacs_py`, then download and **OVERWRITE**  `/usr/local/lib/python3.12/site-packages/gromacs_py/gmx/template/charmm36-jul2017.ff` with the latest charm field directory. That way, the default ff will always be the one you just downloaded and `gromacs_py` will continue to think that its using `charmm36-jul2017.ff`.

In [5]:
complex_sys.add_top(DATA_OUT, name=sys_name, pdb2gmx_option_dict={'ignh': None, 'vsite': vsite})

gmx pdb2gmx -f ../pdbs/colpk_pyc.pdb -o colpk_pyc_charmm_pdb2gmx.pdb -p colpk_pyc_charmm_pdb2gmx.top -i colpk_pyc_charmm_posre.itp -water tip3p -ff charmm36-jul2017 -ignh -vsite none


## Create Ligand topology and transclude manually to complex topology

Doing this automatically in `gromacs_py` is buggy and doesn't work, necessitating manual overrides.

1. First, you need to add ALL hydrogen atoms into the PDB file of the best-docked ligand pose using pymol gui:
    Load the ligand pdb into pymol, then click `builder -> add H` from the top right buttons. Save twice, once to a `pdb` and once to a new `mol2` file. The latter is needed to generate `CHARMM` topology.
2. Then, visit the CGenFF server ([cgenff.com](https://cgenff.com/)), log into your account, and click "Upload molecule" at the top of the page. Upload the `mol2` file and the CGenFF server will quickly return a topology in the form of a CHARMM "stream" file (extension `.str`). Save its contents.
3. Download a suitable version of the `cgenff_charmm2gmx.py` script from [this GitHub site](https://github.com/Lemkul-Lab/cgenff_charmm2gmx). Prefer the _py3_nx2 version. You'll need a container (conda environment) with `python-3.5.2` and `networkx-2.3`. These are older versions that are required until the script maintainer updates the script to work with more current versions. Perform the conversion of the `.str` file to `GROMACS` topology with:
    ```bash
        python cgenff_charmm2gmx.py pyc pyc.mol2 pyc.str /path/to/XXXX.ff

**Note:** You may need to point to the full path of the `/path/to/XXXX.ff`. **This shoud be the same force-field that you used to build the protein topology above.**

4. Next, copy-paste the contents of the ligand's CHARMM force field topology file transclusions to the top of the complex topology file, AND the molecules section of the ligand's topology file to the molecules section of the complex topology file. 

**Note:** The ligand information MUST come FIRST in the complex topology, as there are new atomtypes in there and those must be defined first before any `moleculetype`. [See this](https://gromacs.bioexcel.eu/t/invalid-order-for-directive-atomtypes-error/3859)

5. Make sure to adjust the paths of all transcluded files.

6. Finally, update the complex PDB file with ligand co-ordinates from the best-docked pose using `pymol`.


In [None]:
complex_sys.create_box(dist=1.5, box_type="dodecahedron", check_file_out=True)
complex_sys.solvate_add_ions(out_folder=DATA_OUT, name=sys_name,create_box_flag=False, maxwarn=4)
complex_sys.display()

## Add index groups

Now, we have to add the following index groups: Merge the Protein with the ligand (PYC) with the shell command

```bash
gmx make_ndx -f colpk_pyc_complex_water_ion.gro -o colpk_pyc_complex_water_ion.ndx
```
Merge group 1 with group 13 to form `Protein_PYC` group. We can now set temperature control groups to = `Protein_PYC` and `Water_and_ions` later.

Finally, add the index file to the `complex_sys` object.

In [None]:
%%script true
complex_sys.add_ndx("",ndx_name="colpk_pyc_complex_water_ion", folder_out=DATA_OUT)

## Energy minimisation

Set parallelization and GPU options here. Change them later, if needed.

In [None]:
#Parallelization
nthreads = int(os.environ.get('PBS_NCPUS', '12'))

#Set Parallelization
complex_sys.nt = nthreads
#complex_sys.ntmpi = 1
complex_sys.gpu_id = '0'

complex_sys.em_2_steps(out_folder=em_folder,
        no_constr_nsteps=em_step_number,
        constr_nsteps=em_step_number,
        posres="",
        create_box_flag=False, emtol=emtol, emstep=emstep)

In [None]:
import warnings
warnings.filterwarnings('ignore') # make the notebook nicer
from __future__ import print_function

import pytraj as pt
import nglview as nv

print("nglview version = {}".format(nv.__version__))
print("pytraj version = {}".format(pt.__version__))

traj = pt.load('workspace/colpk.pdb')
view = nv.show_pytraj(traj)
view

In [None]:
nv.demo()