How to set up host-guest simulations with the SMIRNOFF99Frosst force field in AMBER
This repository contains helper functions to read AMBER-compatible simulation parameter (e.g.,
prmtop) and coordinate (e.g.,
inpcrd) files, re-parameterize those files with the SMIRNOFF99Frosst force field, and then write out parameter and coordinate files that can be used to run simulations in AMBER. Examples are given in the notebooks.
Changing the parameters for an already prepared host-guest complex requires a few extra steps compared to swapping the parameters for an isolated small molecule. Rather than using a SMILES string or other chemical identifier to start, the method here attempts to preserve all position and topology information (e.g., presence or absence of dummy atoms, solvation and ionization status, box vectors, and so on). There have been a few sticky spots (detailed below) that have led to some creative -- and some clunky -- workarounds; the strategy outlined below is surely not the most direct method, but it has been the most robust in my testing.
Compared with the
smirnoff_host_guest.ipynb example notebook, here we start with AMBER files, don't charge the molecules, don't generate 3D conformers, don't perform docking, don't solvate the structure, and don't perform any minimization.
The general method is as follows, split into three sections.
Prepare the host and guest
- Read in existing files* and create a standards-compliant
- Split the
pdbinto separate topologies, extracting the host and guest molecules separately.
- Use the host and guest topologies to generate
mol2with SYBYL atom types, polymerizing a cyclic host from a monomer, if necessary.
- Read the host and guest
mol2files, ensure unique atom names, and convert to OpenEye
- Use the list of
OEMols to create an OpenMM system with the SMIRNOFF99Frosst force field. Since we are not running any simulations here, I don't believe the value of simulation-related options (e.g.,
nonbondedCutoff=1.1 * unit.nanometer) matter, but they are required.
- Convert the OpenMM system into a ParmEd structure.
Prepare the water, ions, and dummy atoms (if present)
- Extract the water, ions, and dummy atoms (if present) from the standards-compliant
pdbfile created after reading in the existing files.
- Parameterize the water, ions, and dummy atoms (if present) with existing force fields (e.g., TIP3P water, Joung-Cheatham monovalent ions, and so on).
- Convert the water, ions, and dummy atoms (if present) to a ParmEd structure.
Combine the host and guest with the water, ions, and dummy atoms (if present)
- Add the two ParmEd structures.
- Copy box vector information from the existing coordinates to the new coordinates.
- If necessary (because the atom indices may change between the input and output coordinates)...
- Determine the mapping between atoms in the existing and new coordinate set.
- Determine the mapipng between residues in the existing and new coordinate set.
- Rewrite any restraints that have been specified using atom index or residue number masks.
* The starting point can be multiple files or one file for the whole system. In the first example notebook, the input is one pair of files for the whole system. However, in the second example notebook, the host and guest molecules each have parameters and coordinates. To set the system coordinates, a fully solvated
pdb file is used, so this example uses five files as input.
Testing and caveats
The included notebooks run through this workflow with two examples. First, a host-guest pair from David Mobley's
benchmarksets repository (CB7-memantine). Second (more challenging), input files for an existing attach-pull-release workflow, with a multi-residue host, dummy atoms, and restraints that have been to be re-encoded with new atom ordering (αCD-1-butylamine).
A few notes on things that didn't work in my testing. Many of these things might be able to work if applied in a different context or even in a different order -- and I don't want to claim they are broken -- only that these paths led to errors one way or another, in my hands. Some of the issues may be due to my unfamiliarity with the tools, but by listing them here, someone else might avoid a few pitfalls.
- Read a
mol2file with GAFF atom types into an OpenEye
OEIFlavor_MOL2_Forcefield. This can be a big deal. Ignoring it can lead to oxygen being interpreted as osmium silently, leading to incorrect parameter assignment. When wildcard assignments are eliminated, this will probably be more obvious.
- Go straight from a
prmtopto an OpenEye
OEMolvia an OpenMM topology. This requires inferring bond orders (mentioned here), which has not always worked reliably.
- Start from a
mol2file containing host (or guest) together with water and ions. There can be unexpected results when the "Forcefield" flavor is required to parse the host (or guest). But more importantly, there is an explosion of file size when writing out a subsequent
prmtopfile, mentioned here.
- Try to run
createSystemwhen all atoms are not unique. It's best to always run
OETriposAtomNamesto generate unique atom names. See this issue for more information; a multi-model
mol2seems to be troublesome, since by definition, there are non-unique atom names.
- Write a
mol2using ParmEd after using
oemmutilsto convert an
OEMolinto an OpenMM topology. The
mol2file won't have positions. It seems the positions are lost after running through
oemmutils. (I have not tried to reproduce this recently.)
- Build a cyclic molecule (e.g., cyclodextrin) from SMILES and expect reasonable coordinates from OpenEye tools. I can't recall whether this is a problem building an initial structure from SMILES or even after running minimization. Other tools likely also fail to build cyclodextrin reasonably from SMILES.
- Use OpenEye to generate charges (e.g., AM1-BCC) for an entire host molecule (e.g., cyclodextrin). It will take a long time, the conformations generated are nonphysical and this may impact the quality of the charges;
antechamberis faster and more reliable (although I don't know if it is better).
- Read a
CONECTrecords into an
OEMol. This is going to cause problems for determining the topology, although it is possible this could be circumvented using some combination of
OEPerceiveConnectivityand related tools.
- Read a
H2in water. This occurs when using
cpptrajand TIP3P water. The improper bond will either (a) prevent creation of the OpenMM system, or (b) prevent ParmEd from saving a
prmtop. I'm not sure what determines where the error will pop up. Relatedly, this will also prevent tools like
mdtrajfrom recognizing the water molecules.
- If trying to run
createSystemwill a single molecule, make sure to pass a list of
OEMols (i.e., a list of length 1).
- Read a
pdbwith water residue named
WAT. This should be handled gracefully by
PDBFixer, but I've encountered problems with solvent not being recognized as such.
- Use the option
createSystem(). This is related to this issue and can be avoided by doing the "mixed force field" approach of parameterizing the water through existing methods in AMBER, and then combining with the host and guest structure. Also, see this and this discussion and future developments with the new
- Use OpenEye
oenb.draw_mol()to draw a cyclic host. The molecule may look collapsed and distorted, but be sure to check with an external tool, because the coordinates may actually be okay!
- Rarely, saving a structure with ParmEd at the final step results in
AttributeError: 'NoneType' object has no attribute 'used', this can be avoided by looping through all the bonds, but this is just a stop-gap measure, and requires further investigation. I think reading
PDBFixeris more likely to lead to this outcome.
- With large molecules over 100 atoms, numeric atom types written by ParmEd will be truncated by AmberTools. One way to work around this is to rewrite atom types with random characters of length (e.g.,
a9). There can still be trouble with hardcoded recognition of inferring elements from atom types (see Niel's
aimtoolsrepository for more information).
It is also worth noting that upcoming changes to
Topology may address some of these issues.
build/: Contains a
condaenvironment file used for the notebooks
tests/: Directory for test case input and output
01-convert-benchmarkset.ipynb: Example notebook #1 (see above)
02-convert-APR-files.ipynb: Example notebook #2 (see above)
utils.py: Helper functions used in the notebooks, with a few extras
Running the notebook
01-convert-benchmarkset.ipynb will download the input files from GitHub, switch parameters, and write the files
test/cb7-1/. Running the notebook
02-convert-APR-files.ipynb will read files from
test/a-bam-p/original/, switch the parameters, and write the files
For new systems, there should be just a few places where configuration might be required:
- When using
antechamberto write a
mol2with SYBYL atom types,
acdoctormay have to be disabled for carboxylates or other resonance structures. This is an option to
- When extracting the water and ions, the option
dummy_atoms=Truemay be misleading. I've added explanatory text in the notebooks.
- Determining whether atom or residue mapping is necessary. (This process is slow, because it runs on the fully solvated system. We can't run atom mapping earlier because the atom mapping changes after combining the two ParmEd structures.) This is shown in the second example notebook.
I used a custom
conda environment to test the workflow and fix the version of
openforcefield. The environment can be installed by running
conda env create -f build/environment.yaml.
jupyter notebooks and choose the
smirnovert environment for the kernel.
These tools rely on AmberTools (
cpptraj) to do some intermediate conversions and it is assumed the environmental variable
$AMBERHOME is defined. This can be changed in
utils.py. (Edit: I think this can be avoided by installing the tools in a miniconda environment and using those, without having to source any file in
$AMBERHOME). I also have some boilerplate for the intermediate file conversions in
utils.py, to minimize configuration fussing, that can be changed (e.g., water model).
The scripts also heavily leverage OpenEye tools. If you see
ImportError: No module named '_oechem', we've had luck fixing this
conda upgrade libgcc.