# ForceField Parameter Deduplication
## Authors: Connor Davel, Jeffrey Wagner
## Date Created: April 21, 2021
One inefficiency in the current forcefields is redudant parameters. These add an estimated 10 to 20% loading time during the very slow `ForceField.create_openmm_system()` function, not to mention any other function that need to load and parse the file. Since searching and operating on large forcefield files tends to be more than linearly complex, it is important that the user is given the shortest forcefield parameter file possible while also maintaining the same results no matter the chemical environment. The goal of this notebook is to reduce the size of `test.offxml` and produce the same energy calculations for the ALA_ALA protein and t4 protein files. 

## Forcefield reduction methods
Any two identical smirks should have the same parameters, so one of these smirks can be safely removed without changing how the system is parameterized. For example, the first two bonds in `test.offxml` are equivalent:

`<Bond smirks="[H][C@@]([C]=O)([C:1]([H:2])([H])[S])[N][H]" length="1.09 * angstrom" k="680.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-MainChain_CYX-2C_H1"></Bond>`

`<Bond smirks="[H][C@@]([C]=O)([C:1]([H])([H:2])[S])[N][H]" length="1.09 * angstrom" k="680.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-MainChain_CYX-2C_H1"></Bond>`

The only difference between the two smirks is which identical hydrogen is mapped. Changing the mapping or which specific hydrogen is bonded to carbon does not change the structure or the bond values (length, k, etc.). Identical smirks are recognized with two methods:

1) MCS substructure searching with as custom isotope search function

2) isomorphism test adapted from `Molecule.are_isomorphic()` (Same method but used RDKit instead of Molecule)

Throughout the notebook, I avoided using the `Molecule` or `FrozenMolecule` since these classes do not play nicely with wild-type bonds. I used RDKit for the MCS seraching and isomorphism, so free users can use this notebook. 
		

## Importing dependences and loading forcefield

In [None]:
from openff.toolkit.topology import Molecule, Topology
from openff.toolkit.typing.engines.smirnoff import ForceField
import parmed as ParmEd
from simtk import openmm
from simtk.openmm import app, unit, XmlSerializer, LangevinIntegrator
from simtk.openmm.app import NoCutoff, HBonds
from utils import fix_carboxylate_bond_orders
import os
import itertools
import time
from pathlib import Path