## A notebook to for Protein-Ligand Simulations.

This notebook sets up a simulation to calculate $k_{on}$ from a coarse-grained simulation. For all terms and purpose the rates from 'most' coarse-grained molecular simulations should be used for comparisions --- and we should not trust the absolute values. For example, for a particular protein we can get an idea of qualitative comparisions of the rates for a variety of ligands. 

<center><img src="martini_drug_discovery.png" alt="Protein-ligand interactions" width="400"/></center>

### Possible Applications ...

1. Protein-Ligand Binding Studies
Binding Affinity and Free Energy Calculations - Martini 3 can estimate binding free energies of ligands to protein targets, aiding in understanding the thermodynamics of binding; Useful for prioritizing ligands in drug discovery workflows.; Identifies potential binding sites by simulating ligand docking to protein surfaces.

2. Drug Discovery and Design - Efficiently screens large libraries of small molecules to identify candidates for further refinement.; Reduced computational cost allows simulation of many protein-ligand systems.; Lead Optimization: Explores the structural dynamics of protein-ligand complexes to refine ligand design for improved binding.; Allosteric Modulation: Simulates small molecules binding at allosteric sites to understand their regulatory effects on protein activity.

3. Protein-Small Molecule Interactions - Models interactions between enzymes and their substrates, products, or cofactors.; Studies the effects of metabolites on protein conformations.; Simulates the binding and release dynamics of small molecules to/from transport proteins (e.g., ion channels, transporters).; Explores the dynamics of covalent or reversible modifications of proteins by small molecules.

4. Membrane-Associated Systems - Studies the binding of ligands to membrane proteins (e.g., GPCRs, ion channels) in their native lipid bilayer environment.; Assesses the effects of the lipid environment on binding affinity and protein dynamics.; Models the partitioning and diffusion of small molecules within lipid bilayers.
Useful for designing membrane-permeable drugs.

5. Allosteric Regulation and Functional Modulation - Studies how small molecules modulate protein function by binding to allosteric sites.; Explores the dynamics of allosteric pathways within proteins.; Simulates ligand-induced changes in protein conformations linked to downstream signaling events.

6. Aggregation and Phase Behavior - Studies the effects of small molecules on the aggregation behavior of amyloid-forming proteins (e.g., Aβ peptides); Explores therapeutic strategies for diseases like Alzheimer’s and Parkinson’s.; Models the influence of small molecules on liquid-liquid phase separation (LLPS) of intrinsically disordered proteins.

7. Toxicology and Pharmacology - Simulates interactions between small molecules and unintended protein targets to predict off-target effects.; Models how small molecules interact with detoxification proteins (e.g., cytochrome P450 enzymes) to predict pharmacokinetics.




Given, 

- We have an automated way to parametrize a variety of ligands (not so far in future)
- We have a 'WORKING' python wrapper for GROMACS (maybe there are workarounds)
- An automated way to characterize important features of the binding pockets - the important water molecules. This should in principle be available from the PDB structures.

We could maybe come up with a toolkit to **quickly** calculate $k_{on}$ for a bunch of ligands. And pass these to more expensive calculations at a later point.

<center><img src="table_1.png" alt="Table of bead-types for small molecules." width="500"/></center>
<center><img src="table_2.png" alt="Table of bead-types for small molecules." width="500"/></center>

### Basics of Small-Molecule Parametrization

- Consider only non-hydrogen atoms to define the mapping.

- Avoid dividing chemical functional groups (e.g., amide, carboxylate, etc.) between two beads.

- Retain in the CG representation as much as possible (1) the symmetry and (2) the molecular volume and shape of the underlying AA structure.

- Keep in mind that the default bead sizes for mapping linear molecular fragments are one regular (R) bead for 4 non-hydrogen atoms (4-1), one small (S) bead for three (3-1), and one tiny (T) bead for two (2-1).

- Consider that R-beads are the most computationally-performant option; S-beads are useful to mimic the “bulkier” shape of aliphatic rings, while T-beads are particularly suited to represent atom-thick, conjugated rings.

- Optimize the number of beads such that there is at most a mismatch of ±1 non-hydrogen atom every 10 non-hydrogen atoms. Some exceptions may be allowed if well-tested (e.g., mapping of thiophene or furan with three T-beads have a mismatch of −1 given that 5 non-hydrogen atoms are mapped onto 3 T-beads which are parameterized to represent 6 non-hydrogen atoms).

- For fully branched fragments, use beads of one size smaller than the size determined based on the non-hydrogen atom count. The rationale behind this is that the central atom of a branched group is not exposed to the environment (i.e., it is buried), which reduces its influence on the interactions. For example, a neopentane group contains 5 non-hydrogen atoms but it can be effectively modeled with an R-bead

<center><img src="caffine.jpeg" alt="Martini 3 CG model of caffeine. (a) The 14 non-hydrogen atoms are described by a 7 T-bead model; the indices used for the beads in the CG topology file are also shown. (b) Rendering of the CG model: apolar aromatic and intermediately polar beads are displayed in silver (TC5, TN1) and blue (TN5a) while polar (TP1a) beads are in red. As described in Sec. 2.5 and shown in the rendering, beads 1, 3, 6, and 7 are connected via constraints and form a “hinge” construction, while beads 2, 4, and 5 are constructed as virtual sites. (c) Representative bond and dihedral distributions: OPLS is in blue while Martini is in red. Note that while distance 3-6 corresponds to an actual constraint at the CG level, distances 5-6 and 1-2 at the CG-level result from the virtual site constructions. (d) A comparison of the Connolly surfaces of the AA (gray) and Martini 3 (blue) models; the inset shows a side view of the molecule." width="500"/></center>

### Set up your simulation box.

Use Charmm-GUI Martini maker + Tools from gromacs (gmx_mpi insert-molecules) to create the simulation box -- see martini_basics tutorial.

### System

Martini Lysozyme with Benzene
<center><img src="protein_ligand.png" alt="Martini 3 CG model of caffeine. (a) The 14 non-hydrogen atoms are described by a 7 T-bead model; the indices used for the beads in the CG topology file are also shown. (b) Rendering of the CG model: apolar aromatic and intermediately polar beads are displayed in silver (TC5, TN1) and blue (TN5a) while polar (TP1a) beads are in red. As described in Sec. 2.5 and shown in the rendering, beads 1, 3, 6, and 7 are connected via constraints and form a “hinge” construction, while beads 2, 4, and 5 are constructed as virtual sites. (c) Representative bond and dihedral distributions: OPLS is in blue while Martini is in red. Note that while distance 3-6 corresponds to an actual constraint at the CG level, distances 5-6 and 1-2 at the CG-level result from the virtual site constructions. (d) A comparison of the Connolly surfaces of the AA (gray) and Martini 3 (blue) models; the inset shows a side view of the molecule." width="500"/></center>

<center><img src="protein_ligand.png" alt="Martini 3 CG model of caffeine. (a) The 14 non-hydrogen atoms are described by a 7 T-bead model; the indices used for the beads in the CG topology file are also shown. (b) Rendering of the CG model: apolar aromatic and intermediately polar beads are displayed in silver (TC5, TN1) and blue (TN5a) while polar (TP1a) beads are in red. As described in Sec. 2.5 and shown in the rendering, beads 1, 3, 6, and 7 are connected via constraints and form a “hinge” construction, while beads 2, 4, and 5 are constructed as virtual sites. (c) Representative bond and dihedral distributions: OPLS is in blue while Martini is in red. Note that while distance 3-6 corresponds to an actual constraint at the CG level, distances 5-6 and 1-2 at the CG-level result from the virtual site constructions. (d) A comparison of the Connolly surfaces of the AA (gray) and Martini 3 (blue) models; the inset shows a side view of the molecule." width="500"/></center>

In [1]:
#Import

from openmm.unit import *
from openmm import *
from openmm.app import *
import martini_openmm as martini
from mdtraj.reporters import XTCReporter
from openmm import app
from sys import stdout
import openmmtools


****** PyMBAR will use 64-bit JAX! *******
* JAX is currently set to 32-bit bitsize *
* which is its default.                  *
*                                        *
* PyMBAR requires 64-bit mode and WILL   *
* enable JAX's 64-bit mode when called.  *
*                                        *
* This MAY cause problems with other     *
* Uses of JAX in the same code.          *
******************************************



In [2]:
#platform = Platform.getPlatformByName("OpenCL")
properties = {'Precision': 'double'}

conf = GromacsGroFile("./protein_ligand/wion.gro")
box_vectors = conf.getPeriodicBoxVectors()
# get any defines
defines = {}
try:
	with open("defines.txt") as def_file:
		for line in def_file:
			line = line.strip()
			defines[line] = True
except FileNotFoundError:
	pass

top = martini.MartiniTopFile(
		"./protein_ligand/t4l_only.top",
		periodicBoxVectors=box_vectors,
		defines=defines,
		epsilon_r=15.0,
	)
system = top.create_system(nonbonded_cutoff=1.1 * nanometer)
integrator = LangevinIntegrator(310 * kelvin,
									10.0 / picosecond,
									20 * femtosecond)
integrator.setRandomNumberSeed(0)
simulation = Simulation(top.topology, system, integrator)
simulation.context.setPositions(conf.getPositions())

In [3]:
### Minimization ###

simulation.reporters.append(PDBReporter('mini.pdb', 1000))
simulation.reporters.append(StateDataReporter(stdout, 5000,
													step=True,
													potentialEnergy=True,
													temperature=True,
													volume=True)
								)
simulation.reporters.append(StateDataReporter("md_log.txt", 100, step=True,
        potentialEnergy=True, temperature=True, volume=True))
print("Minimizing energy...")
simulation.minimizeEnergy(maxIterations=5000,tolerance=1.0)

energies = simulation.context.getState(getEnergy=True).getPotentialEnergy()
print("System minimized at", energies, "\n")

Minimizing energy...
System minimized at -262556.15869140625 kJ/mol 



In [4]:
################################################################################
### NVT equilibration ###
simulation.context.setVelocitiesToTemperature(310 * kelvin)
print('Running NVT equilibration...')
simulation.step(50000) #1ns
################################################################################

Running NVT equilibration...
#"Step","Potential Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)"
5000,-228414.12646484375,309.4791749549511,1000.0
10000,-228651.671875,305.5381636146542,1000.0
15000,-228931.2216796875,309.2738015712404,1000.0
20000,-229253.2001953125,306.0596903733026,1000.0
25000,-229395.052734375,310.54857155566253,1000.0
30000,-228436.431640625,314.4630259838682,1000.0
35000,-228949.7255859375,308.087920692198,1000.0
40000,-229289.67236328125,312.4811464317332,1000.0
45000,-228803.3046875,312.19211379168223,1000.0
50000,-228193.298828125,310.91912181027686,1000.0


In [5]:
### NPT equilibration ###
	
system.addForce(MonteCarloBarostat(1 * bar, 310 * kelvin))
# to update the simulation object to take in account the new system
simulation.context.reinitialize(True)
print('Running NPT equilibration...')
simulation.step(50000) #1ns

# save the equilibration results to file
simulation.saveState('equi.state')
simulation.saveCheckpoint('equi.chk')

Running NPT equilibration...
55000,-217123.32373046875,310.68878814613385,1068.4465427085138
60000,-217339.70751953125,310.36267419311076,1064.2996561887303
65000,-217299.18359375,311.3086082433225,1064.3708286081667
70000,-217458.07275390625,310.9631484133843,1068.3009102094538
75000,-217984.9541015625,313.09321907293264,1062.470784180637
80000,-217380.478515625,312.6284639415436,1067.4840223385877
85000,-217581.1884765625,313.94621391638213,1066.5825948463612
90000,-217595.02587890625,307.5827555582401,1067.7289661014827
95000,-217716.84814453125,308.3243162494833,1064.9833266220817
100000,-216914.1337890625,315.18248900160086,1067.918864357926


In [6]:
simulation.reporters.append(StateDataReporter("prod.log", 1000,
													step=True,
													potentialEnergy=True,
													totalEnergy=True,
													density=True,
													temperature=True,
													volume=True)
								)
# save the trajectory in XTC format
xtc_reporter = XTCReporter('prod.xtc', 1000)
simulation.reporters.append(xtc_reporter)

# run simulation
print("Running simulation...")
simulation.step(500000) #1ns

Running simulation...
105000,-218333.0771484375,307.07231927393184,1062.436735103614
110000,-217309.77392578125,309.127571048203,1063.442341701423
115000,-217056.02685546875,305.2369080752565,1066.6841871512775
120000,-217685.8857421875,311.7877210287728,1063.137576302273
125000,-218005.0078125,310.02214089579167,1064.8764064785562
130000,-217322.13232421875,314.53333652527607,1066.51109582074
135000,-216447.2822265625,308.60912607789095,1068.6090820526767
140000,-217724.3359375,307.88214485499657,1065.4319122261286
145000,-216964.1005859375,308.61818242911295,1065.7270159212962
150000,-216546.08447265625,309.31448413157136,1070.2453828113194
155000,-216851.43994140625,305.80078565006255,1068.5470195154
160000,-217702.32177734375,312.2431319275248,1063.5228292417794
165000,-217574.66259765625,311.78710856589504,1063.1872944274912
170000,-216833.056640625,309.4557294878307,1065.6253735637806
175000,-217705.2177734375,311.4618820862795,1065.4872036085146
180000,-217064.4423828125,312.091

### Sample Results

<center><img src="protein_mutants.png" alt="Benzene density around mutants L99A (A) and L99A/M102Q (B) T4 lysozyme obtained from averaging 0.9 ms of CG simulations. The blue, cyan, red, and violet isosurfaces correspond to a 10, 100, 1,000, and 10,000 fold higher benzene density than in water. These densities translate to the free energy values shown at the color map. Results obtained with Martini 3 open-beta" width="500"/></center>
 <center>Benzene density around mutants L99A (A) and L99A/M102Q (B) T4 lysozyme obtained from averaging 0.9 ms of CG simulations. The blue, cyan, red, and violet isosurfaces correspond to a 10, 100, 1,000, and 10,000 fold higher benzene density than in water. These densities translate to the free energy values shown at the color map. Results obtained with Martini 3 open-beta</center>


$k_{on}$ = (# of binding-events/simulation-time)/concentration.