# Protein Ligand Complex

You can run this notebook in your browser: 

[![Open On Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openmm/openmm_workshops/blob/main/section_1/protein_ligand_complex.ipynb)


In this notebook, we will demonstrate two ways of setting up a protein-ligand complex simulation in OpenMM.

- **Method 1**

  Using the OpenMM package [openmmforcefields](https://github.com/openmm/openmmforcefields) and an external package called Open Force Field toolkit ([openff-toolkit](https://github.com/openforcefield/openff-toolkit)).

  This covers the following steps:
    - Loading in the ligand with `openff-toolkit`.
    - Parameterising the ligand force-field with `openmmforcefields`.
    - Combining the topologies.
    - Solvating and simulating.

  *Note this notebook is based on the [openff-toolkit's example](https://github.com/openforcefield/openff-toolkit/blob/stable/examples/toolkit_showcase/toolkit_showcase.ipynb) . We would like to give credit to the Open ForceField Authors.*

- **Method 2**

  Using a third party tool ([BioSimSpace](https://biosimspace.openbiosim.org/)) to produce OpenMM-compatible input files.

  This covers the following steps:
    - Using a BSS workflow to take the protein+ligand files and produce Amber-format input files.
    - Using Amber input files with OpenMM.


## Table of Contents
- Method 1: OpenFF-toolkit
  - Extra conda packages
  - System
  - Load the molecules
  - Create the force field
  - Combine and solvate
  - Simulate
- Method 2: BioSimSpace
  - Extra conda packages
  - BSS Workflow
  - Run with OpenMM
- Extra exercises
- Solutions

## Method 1: OpenFF-toolkit 
<a id="method1"></a>

### Extra Packages
<a id="packages1"></a>

We will need to install the additional python packages:

 - `openmmforcefields`
   - github: https://github.com/openmm/openmmforcefields
   - conda-forge: https://anaconda.org/conda-forge/openmmforcefields
 - `openff-toolkit`
   - github: https://github.com/openforcefield/openff-toolkit
   - conda-forge: https://anaconda.org/conda-forge/openff-toolkit

Both of these will be installed if you install `openmmforcefields` from conda-forge.

Note that for apple silicon you may need to create a x86 conda environment see [here](https://github.com/openforcefield/openff-toolkit/blob/main/FAQ.md#im-having-troubles-installing-the-openff-toolkit-on-my-apple-silicon-mac).

If you run into problems, we recommend you create a fresh conda environment and install `openmmforcefields` firsts. It is easier for conda to solve dependency issues in fresh environments.

In [None]:
# Execute this cell to install mamba in the Colab environment and then install openmm

if 'google.colab' in str(get_ipython()):
    print('Running on colab')
    !pip install -q condacolab
    import condacolab
    condacolab.install_mambaforge()
else:
    print('Not running on colab.')
    print('Make sure you create and activate a new conda environment!')

<div class="alert alert-block alert-info">
  ⚠️ <b>Note: During this step in Colab, the kernel will restart, which may trigger the error message: "Your session crashed for an unknown reason." This is expected behavior and can be safely ignored.</b>
</div>


<div class="alert alert-block alert-info">
⚠️ <b>Note that the installation will take several minutes!</b>
</div>

In [None]:
!mamba install -y -c conda-forge openmmforcefields

### Imports

We need to be careful with the imports here because OpenMM and OpenFF have some objects with the same names. For this reason, we no longer use wildcard imports and will henceforth adopt a more typical (and recommended) Python programming approach.

In [None]:
from sys import stdout

# OpenMM imports
import openmm.app as app
import openmm as mm
import openmm.unit as unit
from openmmforcefields.generators import SMIRNOFFTemplateGenerator

# OpenFF-toolkit imports
from openff.toolkit import Molecule
from openff.toolkit import Topology as offTopology
from openff.units.openmm import to_openmm as offquantity_to_openmm

### System
<a id="system"></a>

Our example system consists of a complex of a benzene ligand and a lysozyme protein. Lysozyme is an antimicrobial protein that has been extensively studied using MD simulations. We can download the files from the GitHub repository, and we will also download a second ligand (o-xylene) to use in a later exercise.

In [None]:
# Get the files
!wget https://raw.githubusercontent.com/openmm/openmm_workshops/main/section_1/benzene.sdf
!wget https://raw.githubusercontent.com/openmm/openmm_workshops/main/section_1/o-xylene.sdf
!wget https://raw.githubusercontent.com/openmm/openmm_workshops/main/section_1/lysozyme.pdb

The benzene-lysozyme complex is shown in the figure below.

![benzene-lysozyme](./images/benzene_lysozyme.png)
**Figure 1:** Benzene-lysozyme complex.

Note that the files we are using have already been cleaned up (see [PDBFixer](https://github.com/openmm/pdbfixer) for more info). Additionally, the ligand is aligned with the protein and in an appropriate binding site. This is something you would need to do with a docking program before using OpenMM.


### Load in the Molecules
<a id="load"></a>

The protein structure is given as a PDB file so we can load it as before. The benzene molecule is in SDF file format, for which OpenMM does not have loaders. Therefore, we will we use OpenFF-toolkit to load the ligand.


In [None]:
protein_path = "lysozyme.pdb"
ligand_path = "benzene.sdf"

# Load a molecule from a SDF file
ligand = Molecule.from_file(ligand_path)

# Load the protein from a PDB file
protein_pdb = app.PDBFile(protein_path)

### Creating the ForceField Object

<a id="createff"></a>

We now need to define the force field to use. For the protein, we can use the standard force fields already available in OpenMM. However, for the benzene molecule, we will need to generate a force field template for it.

We can do this by using the residue template generators for small molecules already available from the [openmmforcefields](https://github.com/openmm/openmmforcefields) package. We have the option to choose between the [Amber GAFF small molecule force field](http://ambermd.org/antechamber/gaff.html) or the [Open Force Field Initiative force fields](https://github.com/openforcefield/openff-forcefields).

For this example, we will use [OpenFF SMIRNOFF](https://docs.openforcefield.org/projects/toolkit/en/stable/users/smirnoff.html).


In [None]:
# Create the SMIRNOFF template generator with the default force field
smirnoff = SMIRNOFFTemplateGenerator(molecules=ligand)

# We can check which version of the force field is being used
print(smirnoff.smirnoff_filename)

# Create an OpenMM ForceField object with AMBER ff14SB and TIP3P
ff = app.ForceField('amber/protein.ff14SB.xml', 'amber/tip3p_standard.xml')

# Add in the SMIRNOFF template generator
ff.registerTemplateGenerator(smirnoff.generator)

### Combine Topologies and Solvate the Complex
<a id="combine"></a>

We can convert from the OpenFF format topology to an OpenMM format topology and then use the OpenMM `Modeller` to combine the ligand and protein into a single topology. Once combined we can solvate as before.

In [None]:
# Make an OpenMM Modeller object with the protein
modeller = app.Modeller(protein_pdb.topology, protein_pdb.positions)

# Make an OpenFF Topology of the ligand
ligand_off_topology = offTopology.from_molecules(molecules=[ligand])

# Convert it to an OpenMM Topology
ligand_omm_topology = ligand_off_topology.to_openmm()

# Get the positions of the ligand
ligand_positions = offquantity_to_openmm(ligand.conformers[0])

# Add the ligand to the Modeller
modeller.add(ligand_omm_topology, ligand_positions)

# Solvate
modeller.addSolvent(ff, padding=1.0*unit.nanometer, ionicStrength=0.15*unit.molar)


### Simulate
<a id="simulate1"></a>

We can now simulate in the NVT ensemble as before.

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 1</b>

Set the initial positions of the simulation.
</div>

In [None]:
# Create the system, define the integrator, and create the simulation
system = ff.createSystem(modeller.topology, nonbondedMethod=app.PME, constraints=app.HBonds)
integrator = mm.LangevinMiddleIntegrator(300*unit.kelvin, 1/unit.picosecond, 0.002*unit.picoseconds)
simulation = app.Simulation(modeller.topology, system, integrator)

# set the positions
FIXME

print("Minimizing energy...")
simulation.minimizeEnergy(maxIterations=100)

simulation.context.setVelocitiesToTemperature(300*unit.kelvin)

simulation.reporters.append(app.PDBReporter('traj.pdb', 100))

simulation.reporters.append(app.StateDataReporter(stdout, 100, step=True,
        potentialEnergy=True, temperature=True, speed=True))

print("Running simulation...")
simulation.step(1000)

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 2</b>

Download the `traj.pdb` file and visualize it. You will need to click on the `files` icon on the left side bar of the Colab window:

![screenshot](images/screenshot1.png)
</div>

When you open `traj.pdb` in VMD, it should look similar to this:

![screenshot_traj](images/screenshot3.png)


The protein appears to be sticking out of the side of the water box. This is just a visualization artifact due to the periodic boundary conditions. By default, OpenMM wraps the coordinates to keep molecules whole. For more information, read the OpenMM FAQs on this topic: https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#periodic.

## Method 2
<a id="method2"></a>

In this method, we will use a [BioSimSpace](https://biosimspace.openbiosim.org/) (BSS) workflow to produce [Amber](https://ambermd.org/)-format input files that we can read into OpenMM.

### Extra Packages
<a id="packages2"></a>

We need to install [BioSimSpace from the OpenBioSim channel](https://anaconda.org/openbiosim/biosimspace)

The conda command below instructs conda to search for `biosimspace` in the `openbiosim` and `conda-forge` channels, along with the optional for BSS but necessary dependencies for this specific workflow, `gromacs` and `ambertools`.

In [None]:
if 'google.colab' in str(get_ipython()):
    # https://github.com/googlecolab/colabtools/issues/3409
    import locale
    locale.getpreferredencoding = lambda: "UTF-8"

!mamba install -y -c conda-forge -c openbiosim biosimspace  gromacs ambertools

### Get the BSS Workflow
<a id="bssworkflow"></a>

We will use a BSS workflow (also called a Node) which takes in the ligand SDF file along with protein PDB file and combines them into a protein-ligand complex solvated in a water box. A BSS node is a python script that can be run as a command line program. We will use the script in this workshop repo which is based on an [example script from BioSimSpace](https://github.com/michellab/BioSimSpace/blob/6a36648e1f2e95ee6de35b2e6c9ac32f201c2bc8/nodes/playground/BSSPrepNode.ipynb). For more information please look at the [BioSimSpace documentation](https://biosimspace.openbiosim.org/).

In [None]:
# get the BSS workflow
!wget https://raw.githubusercontent.com/openmm/openmm_workshops/main/section_1/BSSPrepNode.py

### Run the workflow

If you run the script without any command line arguments, it will print out help info.

In [None]:
!python BSSPrepNode.py

We can then run it specifying the ligand and protein files with command line arguments. 

If you get the error *"MissingSoftwareError: 'BioSimSpace.Parameters.gaff2' is not supported. Please 
install AmberTools (http://ambermd.org)."* but you have already installed AmberTools with conda then you will need to set the environmental variable `AMBERHOME` to the install location. The first cell below will do it correctly for running on Colab. On your own device it will be different.

In [None]:
if 'google.colab' in str(get_ipython()):
  import os
  os.environ["AMBERHOME"]="/usr/local/"
else:
  print('You might need to set AMBERHOME env variable')

In [None]:
!python BSSPrepNode.py --ligand benzene.sdf --protein lysozyme.pdb

It will produce the files:
 - `bound.prm7` - the Amber topology file for the protein-ligand complex.
 - `bound.rst7` - the Amber coordinate file for the protein-ligand complex.

it also produces `free.prm7` and `free.rst7` which are the input files for just the ligand solvated in a water box. These would be used in thermodynamic cycle calculations to compute binding energies.

### Run with OpenMM
<a id="run"></a>

OpenMM has the capability to load [Amber format files](http://docs.openmm.org/latest/userguide/application/02_running_sims.html#using-amber-files). `AmberPrmtopFile` can load the topology file and `AmberInpcrdFile` can load in the coordinates. The rest of the setup is very similar to using PDB files. Note that we do not need to define a forcef ield as this is included in the Amber topology file, so we simply call `prmtop.createSystem` instead of `ff.createSystem`.

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 3</b>

Specify the OpenMM integrator.
</div>

In [None]:
import openmm.app as app
import openmm as mm
import openmm.unit as unit
from sys import stdout

prmtop = app.AmberPrmtopFile('bound.prm7')
inpcrd = app.AmberInpcrdFile('bound.rst7')
system = prmtop.createSystem(nonbondedMethod=app.PME, nonbondedCutoff=1*unit.nanometer,
        constraints=app.HBonds)

# Specify the integrator
integrator = FIXME

simulation = app.Simulation(prmtop.topology, system, integrator)
simulation.context.setPositions(inpcrd.positions)
simulation.context.setVelocitiesToTemperature(300*unit.kelvin)

print("Minimizing energy...")
simulation.minimizeEnergy(maxIterations=100)

simulation.reporters.append(app.PDBReporter('bss_traj.pdb', 100))
simulation.reporters.append(app.StateDataReporter(stdout, 100, step=True,
        potentialEnergy=True, temperature=True))

print("Running simulation...")
simulation.step(1000)

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 4</b>

Download `bss_traj.pdb` and visulize it in VMD or similar software. How does it look different to the `traj.pdd` generated in the previous method?
</div>

## Extra Exercises
<a id="extraex"></a>

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 5</b>

Modify both methods to run in the NPT ensemble.
</div>

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 6</b>

Run both setups again using using the o-xylene ligand.
</div>

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 7</b>

Use the command line arguments of `BSSPrepNode.py` to specify a different force field.
</div>

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 8</b>

Take a look at the `BSSPrepNode.py` file in a text editor. Can you modify it to only output the bound state?
</div>

<div class="alert alert-block alert-info">
ℹ️ <b>Exercise 9</b>

Create a modified version of Method 1 to run a simulation of just the ligand in a water box.
</div>

## Solutions
<a id="solutions"></a>

*Exercise 1*. Set the initial positions:
```python
simulation.context.setPositions(modeller.positions)
```

*Exercise 3*. Specify the integrator:
```python
integrator = mm.LangevinMiddleIntegrator(300*unit.kelvin, 1/unit.picosecond, 0.002*unit.picoseconds)
```

*Exercise 4*. In `bss_traj.pdb`, the protein is centered within the water box, whereas in `traj.pdb`, it appears off to the side. This difference is a result of the visualization effects of periodic boundaries. Additionally, the water boxes have different dimensions due to variations in the setup steps.