# Convert a host-guest pair from an attach-pull-release workflow, with dummy atoms, and atom re-indexing

**$\alpha$CD-1-butylamine (primary orientation)**

The initial bonded and Lennard-Jones parameters are GAFF v1.8, with partial charges determined using AM1-BCC on from a host residue monomer capped with methyls (see [Niel's paper](https://pubs.acs.org/doi/abs/10.1021/acs.jctc.7b00359) for details).

This is largely going to follow the path in the first example notebook, but here we give special treatment to the dummy atoms, we polymerize the host from a single monomer, and we also generate a dictionary that maps between atoms and residues in the initial and final structures, so we rewrite restraints downstream.

In [1]:
%load_ext autoreload
%autoreload 2

import os as os
import urllib.request

import parmed as pmd

from openforcefield.typing.engines.smirnoff import ForceField, unit
from openforcefield.utils import mergeStructure

from smirnovert.utils import (create_pdb_with_conect, prune_conect, split_topology, create_host_guest_topology,
                    create_host_mol2, convert_mol2_to_sybyl_antechamber,
                    load_mol2, check_unique_atom_names,
                    check_bond_lengths,
                    extract_water_and_ions, create_water_and_ions_parameters,
                    map_atoms, map_residues, load_pdb)

Before we begin, let's specify the starting file names and prefix for the intermediary files. We are also going to write a bunch of temporary files that can be cleaned up later, but for debugging, I leave them. Also, the `utils.py` functions using the `logging` module, so we can specify how much information we want. Here I'll set the logging level to `INFO`.

In [2]:
import logging
from importlib import reload
# `logging` needs to be reloaded, because `jupyter notebook` itself 
# uses the logging module to print messages to standard output...
reload(logging)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.basicConfig(
    format='%(asctime)s %(message)s', datefmt='%Y-%m-%d %I:%M:%S %p')

In [3]:
test_case = 'a-bam-p/'
reference_destination = './tests/' + test_case + 'original/'
reference_prmtop = 'full.topo'
reference_inpcrd = 'full.crds'

generated_destination = './tests/' + test_case + 'generated/'
prefix = 'full'
host_resname = 'MGO'
guest_resname = 'BAM'


try:
    os.stat(generated_destination)
except:
    os.mkdir(generated_destination)

In [4]:
reference = pmd.load_file(reference_destination + reference_prmtop, 
                          xyz=reference_destination + reference_inpcrd)
box = reference.box

Next, we'll create a proper PDB from these AMBER files, and delete the `CONECT` records that are solvent-solvent.

In [5]:
create_pdb_with_conect(solvated_pdb=reference_destination + reference_inpcrd,
                      amber_prmtop=reference_destination + reference_prmtop,
                      output_pdb=generated_destination + prefix + '.pdb')

2018-03-06 02:38:13 PM Creating ./tests/a-bam-p/generated/full.pdb with CONECT records...


We need to prune the `CONECT` records to deal with https://github.com/openforcefield/openforcefield/issues/68.

In [6]:
prune_conect(input_pdb=prefix + '.pdb',
            output_pdb=prefix + '.pruned.pdb',
            path=generated_destination)

2018-03-06 02:38:13 PM Pruning water-water CONECT records...


We'll split the PDB into separate topology objects, and extract the host-guest topology for now.

In [7]:
components = split_topology(file_name=generated_destination + prefix + '.pruned.pdb')
hg_topology = create_host_guest_topology(components, 
                                         host_resname=host_resname, 
                                         guest_resname=guest_resname)

2018-03-06 02:38:13 PM Splitting topology into components...
2018-03-06 02:38:15 PM Creating a combined topology for the host and guest molecules...


Now, let's polymerize the host because the input `mol2` file is only a single sugar residue and not the cyclic molecule. This *should* occur without intervention because `cpptraj` will detect that the single residue in the input `mol2` matches *multiple* residues in the `pdb`, and be able to write out the multiresidue `mol2` file with proper topology.

In [8]:
create_host_mol2(
    solvated_pdb=generated_destination + prefix + '.pruned.pdb',
    amber_prmtop=reference_destination + reference_prmtop,
    mask=host_resname,
    output_mol2=generated_destination + host_resname + '.mol2')

create_host_mol2(
    solvated_pdb=generated_destination + prefix + '.pdb',
    amber_prmtop=reference_destination + reference_prmtop,
    mask=guest_resname,
    output_mol2=generated_destination + guest_resname + '.mol2')

convert_mol2_to_sybyl_antechamber(
    input_mol2=generated_destination + host_resname + '.mol2',
    output_mol2=generated_destination + host_resname + '-sybyl.mol2',
    ac_doctor=False)

convert_mol2_to_sybyl_antechamber(
    input_mol2=generated_destination + guest_resname + '.mol2',
    output_mol2=generated_destination + guest_resname + '-sybyl.mol2',
    ac_doctor=True)

2018-03-06 02:38:15 PM Writing a `mol2` for the host molecule...
2018-03-06 02:38:15 PM Writing a `mol2` for the host molecule...
2018-03-06 02:38:15 PM Converting ./tests/a-bam-p/generated/MGO.mol2 to SYBYL atom types via Antechamber...
2018-03-06 02:38:15 PM Converting ./tests/a-bam-p/generated/BAM.mol2 to SYBYL atom types via Antechamber...


We also need to separate out waters and ions from the starting AMBER files.

Below, we are going to first run `extract_water_and_ions` with `dummy_atoms=True` to also extract the dummy atoms, then run `create_water_and_ions_parameters` with `dummy_atoms=True` to create dummy atom parameters, in contrast to the first example notebook.

In [9]:
extract_water_and_ions(
    amber_prmtop=reference_destination + reference_prmtop,
    amber_inpcrd=reference_destination + reference_inpcrd,
    host_residue=':' + host_resname,
    guest_residue=':' + guest_resname,
    dummy_atoms=True,
    output_pdb=generated_destination + 'water_ions.pdb'
    )

create_water_and_ions_parameters(
    input_pdb='water_ions.pdb',
    output_prmtop='water_ions.prmtop',
    output_inpcrd='water_ions.inpcrd',
    dummy_atoms=True,
    path=generated_destination)

2018-03-06 02:38:15 PM Extracting water and ions from ./tests/a-bam-p/original/full.topo...
2018-03-06 02:38:15 PM Creating parameters for the waters and ions...
2018-03-06 02:38:15 PM Writing a `frcmod` file for dummy atoms...
2018-03-06 02:38:15 PM Writing a `mol2` file for dummy atoms...


Now, we'll create `mol2` files with **SYBYL atom types** for the water and ions.

In [10]:
host = load_mol2(
    filename=generated_destination + host_resname + '-sybyl.mol2',
    name=host_resname,
    add_tripos=True)

guest = load_mol2(
    filename=generated_destination + guest_resname + '-sybyl.mol2',
    name=guest_resname,
    add_tripos=False)

check_unique_atom_names(host)
check_unique_atom_names(guest)
molecules = [host, guest]

2018-03-06 02:38:17 PM Loading ./tests/a-bam-p/generated/MGO-sybyl.mol2...
2018-03-06 02:38:17 PM Loading ./tests/a-bam-p/generated/BAM-sybyl.mol2...
2018-03-06 02:38:17 PM Checking all atoms have unique names...
2018-03-06 02:38:17 PM Checking all atoms have unique names...


At this point, let's create the OpenMM system with SMIRNOFF99Frosst parameters and finish dealing with the host and guest.

In [11]:
ff = ForceField('forcefield/smirnoff99Frosst.ffxml')
system = ff.createSystem(
    hg_topology.topology,
    molecules,
    nonbondedCutoff=1.1 * unit.nanometer,
    ewaldErrorTolerance=1e-4)

We'll convert the OpenMM system to a ParmEd structure and check for bad bonds.

In [12]:
hg_structure = pmd.openmm.topsystem.load_topology(
    hg_topology.topology, system, hg_topology.positions)

check_bond_lengths(hg_structure, threshold=4)

try:
    hg_structure.save(generated_destination + 'hg.prmtop')
except OSError:
    print(
        'Check if the host-guest parameter file already exists...')

try:
    hg_structure.save(generated_destination + 'hg.inpcrd')
except OSError:
    print(
        'Check if the host-guest coordinate file already exists...')

2018-03-06 02:38:17 PM Checking structure for bonds >4 A...


Check if the host-guest parameter file already exists...
Check if the host-guest coordinate file already exists...


Next, we'll load the water and ions into a ParmEd structure.

In [13]:
water_and_ions = pmd.amber.AmberParm(
    generated_destination + 'water_ions.prmtop',
    xyz=generated_destination + 'water_ions.inpcrd')

Now, let's merge the host-guest structure with the dummy atoms, the waters, and the ions, and set the box coordinates of the merged structure.

In [14]:
merged = mergeStructure(hg_structure, water_and_ions)
merged.box = reference.box

try:
    merged.save(generated_destination + 'smirnoff.prmtop')
except:
    print('Check if solvated parameter file already exists...')
try:
    merged.save(generated_destination + 'smirnoff.inpcrd')
except:
    print('Check if solvated coordinate file already exists...')


Check if solvated parameter file already exists...
Check if solvated coordinate file already exists...


At this point, we can compare parameters between the sets, like in the first example notebook. But for attach-pull-release calculations, we usually run simulations with restraints specified relative to the dummy atoms (either by residue number or atom index), and because here we *included* the dummy atoms with the water and ions, they are written *after* the host and guest instead of at the beginning of the coordinates. Thus, the atom indexing and residue indexing between the reference and generated structures are different. N.B. If I include the dummy atoms with the host and guest, I was not able to correctly parameterize the host and guest with SMIRNOFF99Frosst.

To help with this, I've written functions that will look through connected bonds and generate a mapping between atoms and residues. To map the atoms, I'm going to save the reference and target structures as `mol2`. To map the residues, I'm going to save the reference and target structures as `pdb`. This is because the interaction between OpenEye and ParmEd (or maybe just one of those) has difficult saving and re-reading `mol2` files with dummy atoms. On the other hand, when reading a `pdb`, we can instruct OpenEye to *not* ignore dummy atoms (if we ignore dummy atoms -- even in both cases -- the graph is not isomorphic). I have not fully investigated what is going on here, but this workaround seems to work well enough.

In [15]:
try:
    reference.save(generated_destination + 'reference.pdb')
    reference.save(generated_destination + 'reference.mol2')
except OSError:
    print('Check if reference pdb and mol2 files exist...')

try:
    merged.save(generated_destination + 'target.pdb')
    merged.save(generated_destination + 'target.mol2')
except OSError:
    print('Check if target pdb and mol2 files exist...')

Check if reference pdb and mol2 files exist...
Check if target pdb and mol2 files exist...


Now, let's load those into OpenEye `OEMol`s, so we can do the mapping.

In [16]:
reference_mol = load_mol2(generated_destination + 'reference.mol2')
target_mol = load_mol2(generated_destination + 'target.mol2')

atom_mapping = map_atoms(reference_mol, target_mol)

2018-03-06 02:38:18 PM Loading ./tests/a-bam-p/generated/reference.mol2...
2018-03-06 02:38:18 PM Loading ./tests/a-bam-p/generated/target.mol2...
2018-03-06 02:38:18 PM Generating map between atoms...


We can use the atom mapping to do the residue mapping.

In [17]:
reference_mol = load_pdb(generated_destination + 'reference.pdb')
target_mol = load_pdb(generated_destination + 'target.pdb')

residue_mapping = map_residues(atom_mapping, reference_mol, target_mol)

2018-03-06 02:38:55 PM Loading ./tests/a-bam-p/generated/reference.pdb...
2018-03-06 02:38:55 PM Loading ./tests/a-bam-p/generated/target.pdb...
2018-03-06 02:38:56 PM Generating map between residues...


Now, we're basically finished. If we had AMBER input files in the same directory, we could rewrite the input file by searching for atom or residue masks as positional restraints and using the dictionaries to replace them (`rewrite_amber_input_file`). Likewise, we could rewrite files with NMR restraints using the atom_mapping dictionary (`rewrite_restraints_file`).