# Convert a host-guest pair from `https://github.com/MobleyLab/benchmarksets/`

**CB7-memantine**

The initial bonded and Lennard-Jones parameters are GAFF v1.7 with partial charges generated using RESP, and conformation given by docking the host and guest with MOE. For more information, see [here](https://github.com/MobleyLab/benchmarksets/tree/d9bd05719fe42a390442d3984eccec591ec32950/input_files).

The conversion strategy is outlined in the [README](README.md). Briefly, we will download the files from the `benchmarksets` repository, create a PDB, extract the host and guest as topologies, re-parameterize those molecules with SMIRNOFF99Frosst bond, angle, torsion, and Lennard-Jones parameters, then parameterize the water and ions with TIP3P and Joung-Cheatham, merge the structures, and write out a new set of combined parameters and coordinates.

In [8]:
%load_ext autoreload
%autoreload 2

import os as os
import urllib.request

import parmed as pmd

from openforcefield.typing.engines.smirnoff import ForceField, unit
from openforcefield.utils import mergeStructure

from smirnovert.utils import (create_pdb_with_conect, prune_conect, split_topology, create_host_guest_topology,
                    create_host_mol2, convert_mol2_to_sybyl_antechamber,
                    load_mol2, check_unique_atom_names,
                    check_bond_lengths,
                    extract_water_and_ions, create_water_and_ions_parameters)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Before we begin, let's specify the starting file names and prefix for the intermediary files. We are also going to write a bunch of temporary files that can be cleaned up later, but for debugging, I leave them. Also, the `utils.py` functions using the `logging` module, so we can specify how much information we want. Here I'll set the logging level to `INFO`.

In [9]:
import logging
from importlib import reload
# `logging` needs to be reloaded, because `jupyter notebook` itself 
# uses the logging module to print messages to standard output...
reload(logging)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.basicConfig(
    format='%(asctime)s %(message)s', datefmt='%Y-%m-%d %I:%M:%S %p')

In [10]:
test_case = 'cb7-1/'
destination = './tests/' + test_case
reference_prmtop = 'cb7-1.prmtop'
reference_inpcrd = 'cb7-1.rst7'


prefix = 'cb7-1'
host_resname = 'CB7'
guest_resname = 'MOL'

First, let's download a fresh host-guest example from David Mobley's `benchmarksets` GitHub repository.

In [11]:
if not os.path.exists(destination):
    os.makedirs(destination)
    
request = urllib.request.urlretrieve("https://rawgit.com/MobleyLab/benchmarksets/master/input_files/cb7-set1/prmtop-rst7/cb7-1.prmtop", destination + reference_prmtop)
request = urllib.request.urlretrieve("https://rawgit.com/MobleyLab/benchmarksets/master/input_files/cb7-set1/prmtop-rst7/cb7-1.rst7", destination + reference_inpcrd)

At this point, it is useful to visualize the structure with e.g., `nglview`.

Before we start the conversion, let's grab the box vectors from the files -- this will come in handy later, because we lose this information when we merge the structures down below.

In [12]:
reference = pmd.load_file(destination + reference_prmtop, xyz=destination + reference_inpcrd)
box = reference.box

Next, we'll create a proper PDB from these AMBER files, and delete the `CONECT` records that are solvent-solvent.

In [13]:
create_pdb_with_conect(solvated_pdb=destination + reference_inpcrd,
                      amber_prmtop=destination + reference_prmtop,
                      output_pdb=destination + prefix + '.pdb')

2018-03-06 02:37:17 PM Creating ./tests/cb7-1/cb7-1.pdb with CONECT records...


We need to prune the `CONECT` records to deal with https://github.com/openforcefield/openforcefield/issues/68.

In [14]:
prune_conect(input_pdb=prefix + '.pdb',
            output_pdb=prefix + '.pruned.pdb',
            path=destination)

2018-03-06 02:37:17 PM Pruning water-water CONECT records...


We'll split the PDB into separate topology objects, and extract the host-guest topology for now.

In [15]:
components = split_topology(file_name=destination + prefix + '.pruned.pdb')
hg_topology = create_host_guest_topology(components, 
                                         host_resname=host_resname, 
                                         guest_resname=guest_resname)

2018-03-06 02:37:17 PM Splitting topology into components...
2018-03-06 02:37:18 PM Creating a combined topology for the host and guest molecules...


We'll also need to create `mol2` files **with SYBYL atom types** for the host and guest, that we can use later.

In [16]:
create_host_mol2(
    solvated_pdb=destination + prefix + '.pruned.pdb',
    amber_prmtop=destination + reference_prmtop,
    mask=host_resname,
    output_mol2=destination + host_resname + '.mol2')

create_host_mol2(
    solvated_pdb=destination + prefix + '.pdb',
    amber_prmtop=destination + reference_prmtop,
    mask=guest_resname,
    output_mol2=destination + guest_resname + '.mol2')

convert_mol2_to_sybyl_antechamber(
    input_mol2=destination + host_resname + '.mol2',
    output_mol2=destination + host_resname + '-sybyl.mol2',
    ac_doctor=False)

convert_mol2_to_sybyl_antechamber(
    input_mol2=destination + guest_resname + '.mol2',
    output_mol2=destination + guest_resname + '-sybyl.mol2',
    ac_doctor=True)

2018-03-06 02:37:19 PM Writing a `mol2` for the host molecule...
2018-03-06 02:37:19 PM Writing a `mol2` for the host molecule...
2018-03-06 02:37:19 PM Converting ./tests/cb7-1/CB7.mol2 to SYBYL atom types via Antechamber...
2018-03-06 02:37:19 PM Converting ./tests/cb7-1/MOL.mol2 to SYBYL atom types via Antechamber...


We also need to separate out waters and ions from the starting AMBER files.

It is a little confusing that the following function runs with `dummy_atoms=True`. That setting means don't specifically strip dummy atoms from the reference coordinates, and is correct when dummy atoms are not present. Yet the second function call needs `dummy_atoms=False` because dummy atoms do *not* need parameters, if they are absent. This could be made more clear.

In [17]:
extract_water_and_ions(
    amber_prmtop=reference_prmtop,
    amber_inpcrd=reference_inpcrd,
    host_residue=':' + host_resname,
    guest_residue=':' + guest_resname,
    dummy_atoms=True,
    output_pdb='water_ions.pdb',
    path=destination)

create_water_and_ions_parameters(
    input_pdb='water_ions.pdb',
    output_prmtop='water_ions.prmtop',
    output_inpcrd='water_ions.inpcrd',
    dummy_atoms=False,
    path=destination)

2018-03-06 02:37:19 PM Extracting water and ions from cb7-1.prmtop...
2018-03-06 02:37:19 PM Creating parameters for the waters and ions...


Now, we'll create `mol2` files with **SYBYL atom types** for the water and ions.

In [18]:
host = load_mol2(
    filename=destination + host_resname + '-sybyl.mol2',
    name=host_resname,
    add_tripos=True)

guest = load_mol2(
    filename=destination + guest_resname + '-sybyl.mol2',
    name=guest_resname,
    add_tripos=False)

check_unique_atom_names(host)
check_unique_atom_names(guest)
molecules = [host, guest]

2018-03-06 02:37:20 PM Loading ./tests/cb7-1/CB7-sybyl.mol2...
2018-03-06 02:37:20 PM Loading ./tests/cb7-1/MOL-sybyl.mol2...
2018-03-06 02:37:20 PM Checking all atoms have unique names...
2018-03-06 02:37:20 PM Checking all atoms have unique names...


Finally, we can create the OpenMM system for the host and guest with SMIRNOFF99Frosst parameters.

In [19]:
ff = ForceField('forcefield/smirnoff99Frosst.ffxml')
system = ff.createSystem(
    hg_topology.topology,
    molecules,
    nonbondedCutoff=1.1 * unit.nanometer,
    ewaldErrorTolerance=1e-4)

We'll convert the OpenMM system to a ParmEd structure and check for bad bonds.

In [20]:
hg_structure = pmd.openmm.topsystem.load_topology(
    hg_topology.topology, system, hg_topology.positions)

check_bond_lengths(hg_structure, threshold=4)

try:
    hg_structure.save(destination + 'hg.prmtop')
except OSError:
    print(
        'Check if the host-guest parameter file already exists...')

try:
    hg_structure.save(destination + 'hg.inpcrd')
except OSError:
    print(
        'Check if the host-guest coordinate file already exists...')

2018-03-06 02:37:21 PM Checking structure for bonds >4 A...


Check if the host-guest parameter file already exists...
Check if the host-guest coordinate file already exists...


Next, we'll load the water and ions into a ParmEd structure.

In [21]:
water_and_ions = pmd.amber.AmberParm(
    destination + 'water_ions.prmtop',
    xyz=destination + 'water_ions.inpcrd')

Now, let's merge the host-guest structure with the dummy atoms, the waters, and the ions, and set the box coordinates of the merged structure.

In [22]:
merged = mergeStructure(hg_structure, water_and_ions)
merged.box = reference.box
try:
    merged.save(destination + 'smirnoff.prmtop')
except:
    print('Check if solvated parameter file already exists...')
try:
    merged.save(destination + 'smirnoff.inpcrd')
except:
    print('Check if solvated coordinate file already exists...')


Check if solvated parameter file already exists...
Check if solvated coordinate file already exists...


Now that we have the parmaeters in a ParmEd structure, we can do quick comparisons between the "reference" (i.e., GAFF v1.7 here) parameters and SMIRNOFF99Frosst parameters.

In [23]:
for smirnoff_bond, reference_bond in zip(merged.bonds, reference.bonds):
    smirnoff_atom1, smirnoff_atom2 = smirnoff_bond.atom1, smirnoff_bond.atom2
    reference_atom1, reference_atom2 = reference_bond.atom1, reference_bond.atom2
    
    if smirnoff_bond.type is not None :
        if smirnoff_atom1.name == 'N2' and smirnoff_atom2.name == 'C2':
            atom1 = smirnoff_atom1
            atom2 = smirnoff_atom2
            bond = smirnoff_bond
            print(
                f'{atom1.idx + 1:7d} {atom1.name:4} ({atom1.type:4}) {atom2.idx + 1:7d} '
                f'{atom2.name:4} ({atom2.type:4}) {bond.type.req:10.4f} {bond.type.k:10.4f}'
            )
        if reference_atom1.name == 'N2' and reference_atom2.name == 'C2':
            atom1 = reference_atom1
            atom2 = reference_atom2
            bond = reference_bond
            print(
                f'{atom1.idx + 1:7d} {atom1.name:4} ({atom1.type:4}) {atom2.idx + 1:7d} '
                f'{atom2.name:4} ({atom2.type:4}) {bond.type.req:10.4f} {bond.type.k:10.4f}'
            )

      2 N2   (2   )      30 C2   (30  )     1.3350   490.0000
      2 N2   (n   )      30 C2   (c   )     1.3789   427.6000


Nice, these parameters look identical, but note the difference in atom type!