# Ligand preperations using OpenMM, RDkit, OpenFF and GaFF

The ligand preparation process is crucial when constructing protein-ligand complexes for molecular dynamics (MD) simulations. However, issues can arise when non-standard residues or ligands are present due to the constraints of classical forcefields. Remember that classical MD essentially refers to a parameterized list of bonds, angles, torsion angles, and non-bonded interaction parameters, aiming to replicate chemically reasonable behavior. This process is relatively straightforward for standard biomolecular structures like proteins, RNA, or DNA, which are composed of repeating units.

Ligands, however, can present challenges. Their novel connectivity or unusual elements make the construction of a forcefield description more difficult. As it's not feasible to tabulate all possible combinations of ligand interactions with biomolecules, we must instead process the ligand using a more generalized forcefield. While this approach might not be as precise as a carefully tuned forcefield, it generally provides sufficient insight into the structure and dynamics for most cases.

## Finding the ligand

PDB files primarily consist of two record types: atoms and hetatoms. In a properly configured PDB file, any ligand will be categorized as a hetatom.

There are two main strategies for identifying the ligand ID:

1) Manual Inspection: Open the PDB file and visually search for the ligand.
2) Automated Extraction: Utilize the script provided below to automatically extract the ligand from the PDB file.


In [37]:
from Bio.PDB import PDBParser

input_pdb_file = "assets/cookbook/pdbs/7oun_modified_correct_999.pdb"  # Replace with the name of your PDB file

# Parse the PDB file and create a new structure
parser = PDBParser()
structure = parser.get_structure("protein", input_pdb_file)

ligand_residues = set()

for model in structure:
    for chain in model:
        for residue in chain:
            residue_id = residue.get_id()
            residue_name = residue.get_resname()
            if residue_id[0].startswith("H_"):  # Check if the residue is a hetero-residue
                if residue_name in ["ZN", "NA", "CL"]:
                    continue  # Skip ions
                ligand_residues.add(residue_name)

# if len(ligand_residues) == 1:  
ligand = list(ligand_residues) #[0]
print("ligand found: ", ligand)
ligand_residue_names = ligand


ligand found:  ['LIG']


Exception ignored.
Some atoms or residues may be missing in the data structure.


### Extracting the ligands

In this block of code, we extract and isolate the ligands from a given protein structure, and save them as separate files. The process includes:

Specifying the subdirectory, "assets/cookbook/Ligands", where the resulting ligand files will be stored.

1) Implementing a function, extract_ligand_residues, which identifies ligand residues present within a protein structure. These residues correspond to small molecules we're interested in studying further.

2) The ligands subfolder is purged if it already exists, and a new, empty subfolder is created.

3) The PDB file is parsed and a new protein structure is created using the BioPython PDBParser.

4) The code iterates over all the chains in the structure, and ligand residues are extracted using our defined function, adding these residues to the ligand_residues list.

5) Each ligand residue is saved as a separate file in both PDB and SDF formats. For each residue, a unique output file path is generated based on the residue's name.

6) We utilize RDKit to convert PDB formatted files to the SDF format, useful for software that requires the latter format. The PDB file content is read into a string and converted to a RDKit Mol object, which is then saved in the SDF format.



In [38]:
import os
import shutil
from Bio.PDB import PDBParser, PDBIO, Select
from rdkit import Chem

# Set the input and output file names
ligands_subfolder = "assets/cookbook/Ligands"

# Function to extract ligand residues
def extract_ligand_residues(chain, residue_names):
    return [residue for residue in chain if residue.get_resname() in residue_names]

# Clean up the ligands subfolder and create it again
if os.path.exists(ligands_subfolder):
    shutil.rmtree(ligands_subfolder)
os.makedirs(ligands_subfolder)

# Parse the PDB file and create a new structure
parser = PDBParser()
structure = parser.get_structure("protein", input_pdb_file)

# Iterate over chains and extract ligand residues using the function
ligand_residues = []
for chain in structure.get_chains():
    ligand_residues.extend(extract_ligand_residues(chain, ligand_residue_names))

#print(ligand_residues)
#for residue in ligand_residues: 
#    print(residue, residue.get_resname())

# Save each ligand residue to a separate PDB/SDF file
ligand_counts = {}
ligand_residue_names_split=[]

io = PDBIO()
for residue in ligand_residues:
    residue_name = residue.get_resname()

    # If the ligand has already been processed before, increase its count, otherwise set its count to 1
    ligand_counts[residue_name] = ligand_counts.get(residue_name, 0) + 1
    unique_identifier = ligand_counts[residue_name]

    # Modify the output filenames to include the unique identifier
    output_ligand_pdb_file = os.path.join(ligands_subfolder, f"{residue_name}_{unique_identifier}_ligand.pdb")
    output_ligand_sdf_file = os.path.join(ligands_subfolder, f"{residue_name}_{unique_identifier}_ligand.sdf")


    io.set_structure(residue)
    io.save(output_ligand_pdb_file)
    
    # Convert the PDB file to SDF
    with open(output_ligand_pdb_file, "r") as f:
        pdb_block = f.read()

    mol = Chem.MolFromPDBBlock(pdb_block)
    Chem.MolToMolFile(mol, output_ligand_sdf_file)
    print("Writing ligand file", output_ligand_pdb_file)
    print("Writing ligand file", output_ligand_sdf_file)
    ligand_residue_names_split.append(f"{residue_name}_{unique_identifier}")

Writing ligand file assets/cookbook/Ligands/LIG_1_ligand.pdb
Writing ligand file assets/cookbook/Ligands/LIG_1_ligand.sdf


Exception ignored.
Some atoms or residues may be missing in the data structure.


## Visualising the ligand

The function visualize_ligand creates a 3D representation and a 2D sketch of the chosen ligand. It adds hydrogens to the ligand if it's composed of more than one atom, but remember this process might not always produce accurate results.

The on_ligand_selected function gets triggered when you pick a ligand from the dropdown menu. It updates the display to show the selected ligand.

The script lists all ligand files in the 'assets/cookbook/Ligands' folder and adds them to the dropdown menu. When a ligand is selected from the menu, it's visualized. If there's only one ligand, it's displayed by default.

Hydrogenation is likley to be wrong using this approach!

In [39]:
import os
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw, AllChem 
from openff.toolkit.topology import Molecule
import nglview as nv
import ipywidgets as widgets
from IPython.display import display

def visualize_ligand(ligand_file):
    rdkitmol = Chem.MolFromMolFile(ligand_file)
    # Add hydrogens only if the number of atoms is greater than 1
    if rdkitmol.GetNumAtoms() > 1:
        rdkitmol = Chem.AddHs(rdkitmol, addCoords=True)
    rdkitmol.UpdatePropertyCache(strict=False)

    # Assign stereochemistry
    Chem.AssignAtomChiralTagsFromStructure(rdkitmol)
    Chem.AssignStereochemistry(rdkitmol, force=True, cleanIt=True)
    Chem.AssignStereochemistryFrom3D(rdkitmol, replaceExistingTags=True)

    ligand = Molecule(rdkitmol, allow_undefined_stereo=True)

    # Draw shallow copy of 2d molecule for inspection 
    for mol in [rdkitmol.__copy__()]:
        AllChem.Compute2DCoords(mol)
        img = Chem.Draw.MolToImage(mol)  

    # Create NGLview visualization
    view = nv.NGLWidget()
    view.add_component(rdkitmol)
    view.representations = [{"type": "ball+stick", "params": {"multipleBond": "offset"}}]

    # Display RDKit image and NGLview side by side
    hbox = widgets.HBox([widgets.Image(value=img._repr_png_()), view])
    return hbox

def on_ligand_selected(change):
    selected_file = os.path.join('assets/cookbook/Ligands', change['new'])
    with visualization_output:
        visualization_output.clear_output()
        display(visualize_ligand(selected_file))


ligands_folder = 'assets/cookbook/Ligands'
ligand_files = [f for f in os.listdir(ligands_folder) if f.endswith('.sdf')]

dropdown = widgets.Dropdown(options=ligand_files,description='Ligand:',value=ligand_files[0])


dropdown.observe(on_ligand_selected, names='value')
visualization_output = widgets.Output()
display(dropdown)
display(visualization_output)

# Show default ligand view if there is only one ligand
if ligand_files[0] is not None:
    dropdown.value = ligand_files[0]
    on_ligand_selected({'new': ligand_files[0]})

Dropdown(description='Ligand:', options=('LIG_1_ligand.sdf',), value='LIG_1_ligand.sdf')

Output()

# OPTIONAL: Using Antechamber to assign better hyrodgenation of a Ligand

You might notice a red box about complaining about the chirality of centres around the ligand. Or that the number of hydrogens added is wrong for the ligand. Lets use antechamber to fix this.

Antechamber is a part of the AmberTools suite, specifically designed to prepare ligands for molecular simulations using Amber force fields. In this tutorial, we will discuss how to use Antechamber to assign a topology to a ligand by generating a mol2 file with AM1-BCC charges and GAFF atom types.

In [40]:
import subprocess
import os


ligands_folder = 'assets/cookbook/Ligands'
ligand_files = [f for f in os.listdir(ligands_folder) if f.endswith('.pdb') and not f.endswith('_h.pdb')]
print(ligand_files)

for file in ligand_files: 
    # Get the input file's base name without the extension
    file_base_name, _ = os.path.splitext(file)

    # Add hydrogens to the input PDB file using the 'reduce' program
    output_pdb_with_h = os.path.join(ligands_folder, f"{file_base_name}_with_h.pdb")
    with open(output_pdb_with_h, "w") as outfile:
        subprocess.run(["reduce", "-BUILD", os.path.join(ligands_folder, file)], check=True, text=True, stdout=outfile)


['LIG_1_ligand.pdb']


reduce: version 3.3 06/02/2016, Copyright 1997-2016, J. Michael Word
Processing file: "assets/cookbook/Ligands/LIG_1_ligand.pdb"
Database of HETATM connections: "/opt/conda//dat/reduce_wwPDB_het_dict.txt"
VDW dot density = 16/A^2
Orientation penalty scale = 1 (100%)
Eliminate contacts within 3 bonds.
Ignore atoms with |occupancy| <= 0.01 during adjustments.
Waters ignored if B-Factor >= 40 or |occupancy| < 0.66
Aromatic rings in amino acids accept hydrogen bonds.
Building His ring NH Hydrogens.
Flipping Asn, Gln and His groups.
For each flip state, bumps where gap is more than 0.4A are indicated with '!'.
Building or keeping OH & SH Hydrogens.
Rotating existing OH & SH Hydrogens
Rotating NH3 Hydrogens.
Not processing Met methyls.
Found 0 hydrogens (0 hets)
Standardized 0 hydrogens (0 hets)
Added 0 hydrogens (0 hets)
Removed 0 hydrogens (0 hets)
If you publish work which uses reduce, please cite:
Word, et. al. (1999) J. Mol. Biol. 285, 1735-1747.
For more information see http://kinemage

In [None]:
# import os
# from simtk.openmm import Vec3
# from simtk.openmm.app import Modeller, ForceField, PDBFile
# from openmmforcefields.generators import GAFFTemplateGenerator
# from openff.toolkit.topology import Molecule, Topology
# from simtk import unit

# ligands_folder = 'assets/cookbook/Ligands'
# ligand_files = [f for f in os.listdir(ligands_folder) if f.endswith('.sdf')]

# for file in ligand_files:
#     file_base_name, _ = os.path.splitext(file)
#     input_file_path = os.path.join(ligands_folder, file)

#     # Create OpenFF Molecule from SDF
#     openff_mol = Molecule.from_file(input_file_path, allow_undefined_stereo=True)

#     # Convert OpenFF Molecule to OpenFF Topology
#     openff_topology = Topology.from_molecules([openff_mol])

#     # Convert OpenFF Topology to OpenMM Topology
#     openmm_topology = openff_topology.to_openmm()

#     # Get the OpenFF Molecule conformer positions and convert to OpenMM Vec3 with proper units
#     openmm_positions = [Vec3(*pos) for pos in openff_mol.conformers[0].value_in_unit(unit.nanometer)]

#     # Multiply by unit.nanometer to convert to OpenMM Quantity
#     openmm_positions = [pos * unit.nanometer for pos in openmm_positions]

#     modeller = Modeller(openmm_topology, openmm_positions)

#     # Create a GAFFTemplateGenerator instance
#     gaff_generator = GAFFTemplateGenerator(molecules=[openff_mol])

#     forcefield = ForceField()
#     forcefield.registerTemplateGenerator(gaff_generator.generator)

#     modeller.addHydrogens(forcefield)

#     output_pdb_with_h = os.path.join(ligands_folder, f"{file_base_name}_with_h.pdb")
#     with open(output_pdb_with_h, 'w') as outfile:
#         PDBFile.writeFile(modeller.topology, modeller.positions, outfile)


## Reprocess the ligand with openbabel into an sdf file

OpenBabel provides a solution for reprocessing the connectivity data generated by RDKit, particularly beneficial when addressing topology discrepancies related to incorrect chiral assignments. In this context, OpenBabel is utilized for converting the data into the Structure-Data File (SDF) format.

In [41]:
from openbabel import pybel
from rdkit import Chem
import nglview


ligands_folder = 'assets/cookbook/Ligands'
ligand_files = [f for f in os.listdir(ligands_folder) if f.endswith('_h.pdb')]

for input_file in ligand_files :
    print(input_file)
    file_base_name, _ = os.path.splitext(input_file)
    mol = next(pybel.readfile("pdb", ligands_folder+"/"+input_file))
    #mol.addh()
    mol.write("sdf", ligands_folder+"/"+file_base_name+".sdf", overwrite=True)
    


LIG_1_ligand_with_h.pdb


### Veiwing the reprotonated ligand

Antechamber has produced a rehydrogenated ligand that we can examine below

In [42]:
import os
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw, AllChem 
from openff.toolkit.topology import Molecule
import nglview as nv
import ipywidgets as widgets
from IPython.display import display

def visualize_ligand(ligand_file):
    # Need to use suplier when there are hydrogens
    mol_supplier = Chem.SDMolSupplier(ligand_file, removeHs=False)
    rdkitmol = mol_supplier[0]
    with open(ligand_file, "r") as f:
         sdf_data = f.read()
    
    # Draw shallow copy of 2d molecule for inspection 
    for mol in [rdkitmol.__copy__()]:
        AllChem.Compute2DCoords(mol)
        img = Chem.Draw.MolToImage(mol)  

    # Create NGLview visualization
    view = nv.NGLWidget()
    view.add_component(rdkitmol)
    view.representations = [{"type": "ball+stick", "params": {"multipleBond": "offset"}, "showH": "true"}]

    # Display RDKit image and NGLview side by side
    hbox = widgets.HBox([widgets.Image(value=img._repr_png_()), view])
    return hbox

def on_ligand_selected(change):
    selected_file = os.path.join('assets/cookbook/Ligands', change['new'])
    with visualization_output:
        visualization_output.clear_output()
        display(visualize_ligand(selected_file))


ligands_folder = 'assets/cookbook/Ligands'
ligand_files = [f for f in os.listdir(ligands_folder) if f.endswith('with_h.sdf')]
print(ligand_files)
dropdown = widgets.Dropdown(options=ligand_files,description='Ligand:',value=ligand_files[0])


dropdown.observe(on_ligand_selected, names='value')
visualization_output = widgets.Output()
display(dropdown)
display(visualization_output)

# Show default ligand view if there is only one ligand
if ligand_files[0] is not None:
    dropdown.value = ligand_files[0]
    on_ligand_selected({'new': ligand_files[0]})

['LIG_1_ligand_with_h.sdf']


Dropdown(description='Ligand:', options=('LIG_1_ligand_with_h.sdf',), value='LIG_1_ligand_with_h.sdf')

Output()

# Creating OpenMM Compatible Files Using RDKit
## Introduction
In this segment, we aim to create a protein-ligand assembly that's ready for an OpenMM simulation. To do this, we'll take a ligand in the .sdf format, prepare it, and convert it into a format compatible with OpenMM. An important consideration is ensuring the ligand's optimal placement relative to the protein to avoid any unnecessary distance or overlap. This process can be expedited by using Nanome to design the ligand and establish a suitable starting position for molecular dynamics (MD). However we will be using the default palcement of the bound complex to begin with.

## Loading the Ligand
he first step is to load the ligand from the .sdf file. Subsequently, we'll use RDKit to attach hydrogen atoms to the ligand, as this improves the accuracy of our simulation

## Converting to OpenMM-Compatible Format
After attaching the hydrogen atoms, we'll use the OpenMM.Molecule function to transition the ligand into a format OpenMM can work with. This will enable us to use OpenMM to construct our protein-ligand complex in a standardized format.

Next, we'll visually inspect the prepared ligand with RDKit's 2D viewer and ngview to ensure accurate bond orders.

 ## Converting an RDkit molecule to an OpenMM molecule
 
To reintegrate the ligand into our system, we'll employ a pre-segmented ligand file and transform it from the SDF format to the OpenFF format, and ultimately into the OpenMM format. The function provided below accomplishes this task. Although a deep understanding of the underlying process isn't necessary for this example, here's a brief overview:

The function identifies the elements in the SDF file and processes this information to construct a topology (or connectivity) for the molecule. Using this topology, we can then apply the General AMBER Force Field (GAFF) to create all the necessary parameters for the bond, angle, and dihedrals of the ligand.


In [43]:
import openmm.app as app
from openff.toolkit.topology import Molecule

def rdkit_to_openmm(rdkit_mol, name="LIG"):
    """
    Convert an RDKit molecule to an OpenMM molecule.

    Parameters
    ----------
    rdkit_mol: rdkit.Chem.rdchem.Mol
        RDKit molecule to convert.
    name: str
        Molecule name.

    Returns
    -------
    omm_molecule: simtk.openmm.app.Modeller
        OpenMM modeller object holding the molecule of interest.
    """
    # convert RDKit to OpenFF
    off_mol = Molecule.from_rdkit(rdkit_mol, allow_undefined_stereo=True)

    # add name for molecule
    off_mol.name = name

    # add names for atoms
    element_counter_dict = {}
    for off_atom, rdkit_atom in zip(off_mol.atoms, rdkit_mol.GetAtoms()):
        element = rdkit_atom.GetSymbol()
        if element in element_counter_dict.keys():
            element_counter_dict[element] += 1
        else:
            element_counter_dict[element] = 1
        off_atom.name = element + str(element_counter_dict[element])

    # convert from OpenFF to OpenMM
    off_mol_topology = off_mol.to_topology()
    mol_topology = off_mol_topology.to_openmm()
    mol_positions = off_mol.conformers[0]

    # convert units from Ångström to nanometers
    # since OpenMM works in nm
    mol_positions = mol_positions.to("nanometers")

    # combine topology and positions in modeller object
    omm_mol = app.Modeller(mol_topology, mol_positions)

    return omm_mol, off_mol

### Create openMM objects from the hydrated ligand sdf file

In [44]:
from rdkit import Chem
import os

#rdkit_ligand = Chem.MolFromMolFile('assets/cookbook/Ligands/LZU_ligand.sdf')

# Load the hydrated ligand files
ligands_folder = 'assets/cookbook/Ligands'
ligand_files = [f for f in os.listdir(ligands_folder) if f.endswith('_h.sdf')]

omm_ligands = [] 
off_ligands = [] 

# loop through each ligand file found
for file in ligand_files:
    rdkit_ligand = Chem.MolFromMolFile(ligands_folder+"/"+file)
    ligand_name= "LIG"
    omm_ligand, off_ligand = rdkit_to_openmm(rdkit_ligand, ligand_name)
    omm_ligands.append(omm_ligand)  # Append the ligand to the list
    off_ligands.append(off_ligand)  # Append the ligand to the list

# Prepare the protein

In [45]:
from openmm.app import * 
from openmm import *
from openmm.unit import *
from openmm.openmm import *
from pdbfixer import PDBFixer
import subprocess

# PDB file that we will use as a starting structure
pdb_start = "assets/cookbook/pdbs/7oun_modified_fixed.pdb"

# PDB file that we will use as the cleaned output structure
pdb_out = 'assets/cookbook/cleaned_output.pdb'

#output_capture = io.StringIO()

# Use amber4pdb to clean up records for use with amber forcefield
pdb4amber_result = subprocess.run(["pdb4amber", "--nohyd", "--dry", pdb_start],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        text=True
    )

with open(pdb_out, 'wb') as f:
    f.write(pdb4amber_result.stdout.encode("utf-8")) 

# Use reduce to add hydrogens according to ambers preferences    
#try:
#    out = subprocess.check_output(["reduce", "-build", "-nuclear", "assets/cookbook/cleaned_output.pdb"], stderr=subprocess.PIPE)
#except subprocess.CalledProcessError as e:
#    print("Error message from reduce:", e.stderr.decode())

# Use OpenMMs pdbfixer to fix some final issues that can crop up
fixed_pdb = PDBFixer(filename=pdb_out)
fixed_pdb.findMissingResidues()
fixed_pdb.findNonstandardResidues()
#fixer.replaceNonstandardResidues(
fixed_pdb.removeHeterogens(True) # comment to run with ligand
fixed_pdb.findMissingAtoms()
fixed_pdb.addMissingAtoms()
fixed_pdb.addMissingHydrogens(7.0)
PDBFile.writeFile(fixed_pdb.topology, fixed_pdb.positions, open(pdb_out, 'w'))



## View the uncomplexed protien

In [46]:
import nglview as nv
view = nv.show_structure_file("assets/cookbook/cleaned_output.pdb")
#view.add_ball_and_stick("protien") 
view

NGLWidget()

## Merge the protien and the ligand topologies

In [47]:
import mdtraj as md
import numpy as np
def merge_protein_and_ligand(protein, ligand, ligand_name):
    """
    Merge two OpenMM objects.

    Parameters
    ----------
    protein: pdbfixer.pdbfixer.PDBFixer
        Protein to merge.
    ligand: simtk.openmm.app.Modeller
        Ligand to merge.

    Returns
    -------
    complex_topology: simtk.openmm.app.topology.Topology
        The merged topology.
    complex_positions: simtk.unit.quantity.Quantity
        The merged positions.
    """

    
    # combine topologies
    md_protein_topology = md.Topology.from_openmm(protein.topology)  # using mdtraj for protein top
    md_ligand_topology = md.Topology.from_openmm(ligand.topology)  # using mdtraj for ligand top
    
    for residue in md_ligand_topology.residues:
        print("Added a ligand called", ligand_name)
        residue.name = ligand_name
        
    md_complex_topology = md_protein_topology.join(md_ligand_topology)  # add them together
    complex_topology = md_complex_topology.to_openmm()

    # combine positions
    total_atoms = len(protein.positions) + len(ligand.positions)

    # create an array for storing all atom positions as tupels containing a value and a unit
    # called OpenMM Quantities
    complex_positions = unit.Quantity(np.zeros([total_atoms, 3]), unit=unit.nanometers)
    complex_positions[: len(protein.positions)] = protein.positions  # add protein positions
    complex_positions[len(protein.positions) :] = ligand.positions  # add ligand positions

    return complex_topology, complex_positions

In [48]:
from simtk.openmm.app import Modeller

current_protein_topology = fixed_pdb.topology
current_protein_positions = fixed_pdb.positions

# Loop over each ligand in the omm_ligands list
for index, omm_ligand in enumerate(omm_ligands):
    # Merge the current protein (starts as fixed_pdb, then becomes the merged complex) with the ligand
    complex_topology, complex_positions = merge_protein_and_ligand(Modeller(current_protein_topology, current_protein_positions), omm_ligand, ligand_residue_names_split[index])
    
    # Update current_protein_topology and current_protein_positions to be the merged complex for the next iteration
    current_protein_topology = complex_topology
    current_protein_positions = complex_positions

print("Final complex topology has", complex_topology.getNumAtoms(), "atoms.")
complex_topology

Added a ligand called LIG_1
Final complex topology has 2121 atoms.


  self._value[key] = value / self.unit


<Topology; 2 chains, 118 residues, 2121 atoms, 2142 bonds>

# Viewing the prepearped protien ligand complex PDB

We can view the new protien ligand complex held in the "complex_topology/Complex_positions" by exporting to the temp.pdb

In [49]:
PDBFile.writeFile(complex_topology, complex_positions, open('assets/cookbook/temp.pdb', 'w'))
view = nv.show_structure_file("assets/cookbook/temp.pdb")
view

NGLWidget()

# Create the parameters for the ligand 

We have generated both the ligand positions and a topology combined with the protein (complex_positions and complex_topology). Our final setep is to register the ligand as part of the Forcefield using the Gaff Generator, which effectively parses and adds the physical description of the ligand to the standard protien Forcefield.

In [54]:
from sys import stdout
from openmm.app import ForceField
from openff.toolkit.topology import Molecule, Topology
from openmmforcefields.generators import GAFFTemplateGenerator
from mdtraj.reporters import XTCReporter

# Compute Gasteiger charges for each molecule in off_ligands
for off_ligand in off_ligands:
    off_ligand.generate_conformers(n_conformers=1)
    off_ligand.assign_partial_charges(partial_charge_method='gasteiger')

# The forcefieild for the protein and solvent (if used)
protein_ff="amber14-all.xml" 
solvent_ff="amber14/tip3pfb.xml"
forcefield = app.ForceField(protein_ff, solvent_ff)

# Generate and add the forcefeild terms to the "forcefeild" function holdiding the protein and solver terms
#gaff = GAFFTemplateGenerator(molecules=Molecule.from_rdkit(rdkit_ligand, allow_undefined_stereo=True))

# Register GAFF parameters only once
# Loop through each off_ligand in off_ligands to register GAFF parameters
gaff = GAFFTemplateGenerator(molecules=off_ligands)  # Initialize GAFF generator
forcefield.registerTemplateGenerator(gaff.generator)  # Register the GAFF parameters

# The modeller collects together the molecular data (positions and toplology) ready for combining with the forcefeild
modeller = app.Modeller(complex_topology, complex_positions)

with open('assets/cookbook/output_of_modeller_topology.pdb', 'w') as outfile:
    app.PDBFile.writeFile(modeller.topology, modeller.positions, outfile)

# Running the simulation

We have now created a unified set of coordinates and a unified topology connecting all those coordinates and a forcefield with all the necessary terms for describing the physics between the atoms and molecules. We will now set the physical conditions for the simulation, such as the temperature and timestep for the simulation to run.

The simulation below will take a little while to set up and then the number of steps will progress quite rapidly, depending on the number of atoms in the system and how often we checkpoint the simulation.

We have selected the following parameters below for the simulation.

* Integrator : Langevin
* Nonbonded cutoff distance : 1 Nanometer
* friction coefficient : 0.1 picosecond
* Temperature : 300 kelvin
* Timestep : 0.004 picoseconds
* Number of timesteps : 2000 steps
* Total simulation time : 8 picoseconds
* Number of stpes between checkpoints : 100 steps
* Real world time between steps : 0.4 picoseconds

In [55]:
# Uncomment the blow line to use GPU accelleration
#platform = Platform.getPlatformByName('CUDA')

# setting of the chemical system
system = forcefield.createSystem(modeller.topology, nonbondedMethod=NoCutoff,
        nonbondedCutoff=1*nanometer, constraints=HBonds)

# settings for how bit the timestep should be
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.004*picoseconds)

# Collect everything together to make a simulation instance
simulation = Simulation(modeller.topology, system, integrator)

# Set starting positions
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy()

# File location to save output and how often to save
simulation.reporters.append(DCDReporter('assets/cookbook/first_output.dcd', 25))
simulation.reporters.append(XTCReporter('assets/cookbook/first_output.xtc', 25))

# Report the physical properties
simulation.reporters.append(StateDataReporter(stdout, 100, step=True,
        potentialEnergy=True, temperature=True))

# Number of steps to run
simulation.step(4000)

#"Step","Potential Energy (kJ/mole)","Temperature (K)"
100,-11503.629224416414,110.23563386854046
200,-10194.424665003642,177.25489892600945
300,-9450.232872855384,212.51709123815561
400,-8955.361830434482,252.83231915434607
500,-8754.508316857839,267.15355221248933
600,-8617.922211241026,270.1629614123691
700,-8538.526401569698,285.81597414732346
800,-8282.699879911603,288.93392018173853
900,-8382.344817032774,291.6027942814281
1000,-8319.239079161061,291.9893894351254
1100,-8238.031438262664,294.0996307505138
1200,-8141.607768232732,289.3538071479753
1300,-8309.483689207857,306.62626110374816
1400,-8137.055276730492,308.17044763386696
1500,-8256.197849561144,321.52995669726033
1600,-8220.79046433892,312.75748387507366
1700,-8067.365986095616,310.80884331422334
1800,-8345.523463502584,310.3918147337992
1900,-8203.700338816263,304.57355911904676
2000,-8373.408142709164,311.5970073581938
2100,-8327.93480569335,312.0667968925153
2200,-8133.491638411269,299.7487574765655
2300,-8197.891955

# Analysing the output

Notice that every 100 steps that we get a report on the state of the system". It takes around 1000 steps to reach our target temperature.

In [56]:
import mdtraj as md
import nglview as nv

import MDAnalysis as mda


traj = nv.SimpletrajTrajectory("assets/cookbook/first_output.dcd", "assets/cookbook/temp.pdb")
print(f"Trajectory has {traj.n_frames} frames")
viewtraj = nv.show_simpletraj(traj)
viewtraj.add_unitcell()
viewtraj 

Trajectory has 160 frames


NGLWidget(max_frame=159)

# Exporting the new trajectory ready for visualisation in Nanome

In [20]:
from MDAnalysis import  Writer
import MDAnalysis as mda
from simtk.openmm.app import Modeller, ForceField, PDBFile

u = mda.Universe(modeller.topology, "assets/cookbook/first_output.dcd")
protein = u.select_atoms("protein")
with Writer("assets/cookbook/first_output.xtc", u.trajectory) as W:
    for ts in u.trajectory:
        W.write(protein)

