# Forcefield Modification
<br />
<details>
    <summary><small>‚ñº Click here for dependency installation instructions</small></summary>

    # The simplest way to install dependencies is to install the examples package:
    
    conda install -c conda-forge openff-toolkit-examples
    
    # You can also install all the depencies using the provided environment.yaml:
    
    conda env update --file ../environment.yaml 
    
    # You may also need to restart this notebook's kernel after you make these changes (Kernel -> Restart)
</details>

In this example, we'll parameterize a ligand automatically, and then play with its parameters to demonstrate the Toolkit's ability to facilitate force field optimization. For each modification, we'll calculate the energy for the original conformation, and then minimize the energy and visualize the result.

In [1]:
from copy import deepcopy

from openff.toolkit.topology import Molecule, Topology
from openff.toolkit.typing.engines.smirnoff import ForceField
import openff.toolkit.typing.engines.smirnoff.parameters as offtk_parameters
from openff.toolkit.utils.utils import get_data_file_path

import nglview
from simtk import unit
from simtk.openmm import app, LangevinIntegrator
from simtk import openmm
import numpy as np






We're going to do a lot of changing a parameter and then visualising what happened, so let's define a convenience function to do just that

In [2]:
def minimize_and_visualize(molecule, forcefield):
    # Sort out our input data
    mol_topology = molecule.to_topology()
    mol_system = forcefield.create_openmm_system(mol_topology, charge_from_molecules=[molecule])
    
    # Set up the minimization and point calculation
    integrator = LangevinIntegrator(300*unit.kelvin, 1/unit.picosecond, 0.002*unit.picoseconds)
    simulation = app.Simulation(mol_topology.to_openmm(), mol_system, integrator)
    simulation.context.setPositions(molecule.conformers[0])
    
    # Get the initial energy
    initial_potential = simulation.context.getState(getEnergy=True).getPotentialEnergy()
    
    # Energy minimize
    simulation.minimizeEnergy()
    minimized_state = simulation.context.getState(getPositions=True, getEnergy=True)
    minimized_potential = minimized_state.getPotentialEnergy()
    minimized_coords = minimized_state.getPositions(asNumpy=True)
    
    # Visualize
    vis_mol = deepcopy(molecule)
    vis_mol.conformers[0] = minimized_coords
    view = vis_mol.visualize(backend="nglview")
    print(
        f"Initial energy is {initial_potential.format('%0.1F')};",
        f"Minimized energy is {minimized_potential.format('%0.1F')}"
    )
    return view


## Getting to know you ‚Äî the molecule

This "ligand" is a modified version of the molecule we introduced in the Toolkit Showcase. It's just been altered for a slightly more exciting example here. This also lets us demonstrate constructing a `Molecule` from a SMILES string!

In [3]:
ligand_smiles = 'CC(C)(C)c1c(O)c(O)c2c(c1O)[C@H]1OCCC[C@H]1[C@H](c1cc(O)c(O)c(F)c1)N2'
ligand = Molecule.from_smiles(ligand_smiles)
ligand.generate_conformers(n_conformers=1)
force_field = ForceField('openff-1.3.0.offxml')

Computing charges is expensive, and we're going to be changing the force field a lot, so we can save time by computing them just once and caching them.


<div class="alert alert-info" style="max-width: 700px; margin-left: auto; margin-right: auto;">
    ‚ÑπÔ∏è There will be a convenience method to do this soon. Note that when we 
    call <code>create_openmm_system</code> above we pass in the charges with 
    the <code>charge_from_molecules</code> argument!
</div>

In [4]:
_, ret_top = force_field.create_openmm_system(ligand.to_topology(), return_topology=True)
ligand.partial_charges = [*ret_top.reference_molecules][0].partial_charges

### The ligand, visualised

Let's take a close look at the ligand and decide what we want to modify. We'll label the atoms with their indexes so we can identify them later

In [5]:
view = minimize_and_visualize(ligand, force_field)
view.add_label(label_type='atomindex', color="black", attachment="middle-center")
view

Initial energy is 351.8 kJ/mol; Minimized energy is 27.3 kJ/mol


NGLWidget()

## Getting to know all about you ‚Äî Investigating assigned parameters

Let's start with something simple ‚Äî lengthening the bond to the fluorine, atom index 27. We can use `ForceField.label_molecules` to identify the appropriate parameters:

In [6]:
ff_applied_parameters = force_field.label_molecules(ligand.to_topology())
ff_applied_parameters

[{'Constraints': <openff.toolkit.topology.topology.ValenceDict at 0x7f0d18e322d0>,
  'Bonds': <openff.toolkit.topology.topology.ValenceDict at 0x7f0d185ea650>,
  'Angles': <openff.toolkit.topology.topology.ValenceDict at 0x7f0d1b670950>,
  'ProperTorsions': <openff.toolkit.topology.topology.ValenceDict at 0x7f0d1998d7d0>,
  'ImproperTorsions': <openff.toolkit.topology.topology.ImproperDict at 0x7f0d18fa5090>,
  'vdW': <openff.toolkit.topology.topology.ValenceDict at 0x7f0d18e9ea50>,
  'Electrostatics': <openff.toolkit.topology.topology.ValenceDict at 0x7f0d18eb9110>,
  'LibraryCharges': {},
  'ToolkitAM1BCC': <openff.toolkit.topology.topology.ValenceDict at 0x7f0d18e9ec50>}]

`label_molecules()` returns a list of the molecules that can be parameterized from the arguments. It returns them as dictionaries that give us access to the `ForceField` parameters that are used for the molecule. Since we're only passing in a single molecule, `ff_applied_parameters` is a list of one element. We can see the bonds that are used for the ligand by converting the appropriate `ValenceDict` to a regular dictionary:

In [7]:
dict(ff_applied_parameters[0]['Bonds'])

{(0,
  1): <BondType with smirks: [#6X4:1]-[#6X4:2]  id: b1  length: 1.520980132854 A  k: 517.2187207483 kcal/(A**2 mol)  >,
 (0,
  30): <BondType with smirks: [#6X4:1]-[#1:2]  id: b83  length: 1.093910524997 A  k: 754.0714751826 kcal/(A**2 mol)  >,
 (0,
  31): <BondType with smirks: [#6X4:1]-[#1:2]  id: b83  length: 1.093910524997 A  k: 754.0714751826 kcal/(A**2 mol)  >,
 (0,
  32): <BondType with smirks: [#6X4:1]-[#1:2]  id: b83  length: 1.093910524997 A  k: 754.0714751826 kcal/(A**2 mol)  >,
 (1,
  2): <BondType with smirks: [#6X4:1]-[#6X4:2]  id: b1  length: 1.520980132854 A  k: 517.2187207483 kcal/(A**2 mol)  >,
 (1,
  3): <BondType with smirks: [#6X4:1]-[#6X4:2]  id: b1  length: 1.520980132854 A  k: 517.2187207483 kcal/(A**2 mol)  >,
 (1,
  4): <BondType with smirks: [#6X4:1]-[#6X3:2]  id: b2  length: 1.501037244555 A  k: 601.2284108955 kcal/(A**2 mol)  >,
 (2,
  33): <BondType with smirks: [#6X4:1]-[#1:2]  id: b83  length: 1.093910524997 A  k: 754.0714751826 kcal/(A**2 mol)  >,


Take a look at the first entry as an example. The dictionary is keyed by the atomic indices of the particles in the molecule that the parameter applies to, and the values are special types that specify the parameters from the force field. We'll look in more detail at one of these parameters in a second.

## A little goes a long way ‚Äî Changing parameters

We already decided we wanted to adjust the torsion parameter to the hydroxyl group, so let's do that. We know parameters are indexed by the atoms they've been applied to, so we can look at our labelled widget above and pull out exactly the parameter we care about:

In [8]:
ff_applied_parameters[0]['Bonds'][(26, 27)]

<BondType with smirks: [#6:1]-[#9:2]  id: b67  length: 1.35676207799 A  k: 787.4127028387 kcal/(A**2 mol)  >

Let's dig into this type a bit more. It has a few attributes in its textual representation. The first of these is maybe the most important: the `smirks` attribute, which tells the Toolkit which atoms this parameter applies to. SMIRKS is a chemical pattern matching format; think of it as the result of a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) having a baby with a [SMILES string](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system). This one is very simple: it is just a Carbon atom (`[#6:1]`, atomic number 6) singly bonded (`-`) to a Fluorine atom (`[#9:3]`, atomic number 9). The numbers after the colons just label the atoms. This is helpful for when we want to match against atoms that aren't a part of the parameter; we just don't label the additional atoms. We'll use this trick later.

The other important attributes provide the actual parameterization, and so are different for different kinds of parameters. For proper torsions, this is the periodicity of the sinewave that describes the torsion, as well as its phase and amplitude (or force constant, $k$). These parameter values are similar to the equivalent description in most other force field formats.


<div class="alert alert-info" style="max-width: 700px; margin-left: auto; margin-right: auto;">
    <p> ‚ÑπÔ∏è The SMIRKS specification is available online:</p>
    <p style="margin-left:3em;">
        <a href=https://www.daylight.com/dayhtml/doc/theory/theory.smirks.html>
            https://www.daylight.com/dayhtml/doc/theory/theory.smirks.html
        </a>
    </p>
    <p style="margin-left:1.5em;">
        It is closely related to the SMARTS molecular pattern matching language, 
        whose specification is probably more useful for working with the Toolkit
        and is also available online:
    </p>
    <p style="margin-left:3em;">
        <a href = https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html>
            https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
        </a>
    </p>
</div>


### Modifying a parameter

Unfortunately we can't just modify this parameter and see the results reflected in the parameterization. We need to get the appropriate parameter from the force field and modify it there.

In [9]:
fluorine_bond = force_field.get_parameter_handler('Bonds').parameters['[#6:1]-[#9:2]']
fluorine_bond.length = 10 * unit.angstrom

Here, we've selected the proper torsion with the SMIRKS code we found earlier, and changed its force constant by an order of magnitude in the opposite direction! Let's see what we've wrought:

In [10]:
minimize_and_visualize(ligand, force_field)

Initial energy is 123444.0 kJ/mol; Minimized energy is 42.4 kJ/mol


NGLWidget()

Turns out Pinocchio was a real molecule all along!

### Parameters affecting multiple atom groups

Ok, that was fun, but it's only one parameter; we could easily have made this change in the OpenMM `System` or a GROMACS ITP file or whatever. What's the toolkit really giving us here?

Let's mess with all the H-X-H angles. And let's not get into SMIRKS this time, lets let the toolkit do the thinking:

In [11]:
ff_applied_parameters = force_field.label_molecules(ligand.to_topology())
for atoms, parameter in ff_applied_parameters[0]['Angles'].items():
    ele_1 = ligand.atoms[atoms[0]].element.symbol
    ele_2 = ligand.atoms[atoms[1]].element.symbol
    ele_3 = ligand.atoms[atoms[2]].element.symbol
    if (ele_1 == 'H' and ele_3 == 'H'):
        print(atoms, parameter)


(30, 0, 31) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 114.294084683 deg  k: 66.55229431401 kcal/(mol rad**2)  id: a2  >
(30, 0, 32) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 114.294084683 deg  k: 66.55229431401 kcal/(mol rad**2)  id: a2  >
(31, 0, 32) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 114.294084683 deg  k: 66.55229431401 kcal/(mol rad**2)  id: a2  >
(33, 2, 34) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 114.294084683 deg  k: 66.55229431401 kcal/(mol rad**2)  id: a2  >
(33, 2, 35) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 114.294084683 deg  k: 66.55229431401 kcal/(mol rad**2)  id: a2  >
(34, 2, 35) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 114.294084683 deg  k: 66.55229431401 kcal/(mol rad**2)  id: a2  >
(36, 3, 37) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 114.294084683 deg  k: 66.55229431401 kcal/(mol rad**2)  id: a2  >
(36, 3, 38) <AngleType with smirks: [#1:1]-[#6X4:2]-[#1:3]  angle: 11

Wow, there's just one `smirks` parameter describing all of the H-X-H angles!

<div class="alert alert-info" style="max-width: 700px; margin-left: auto; margin-right: auto;">
    ‚ÑπÔ∏è It's worth mentioning here that a force field might include multiple parameters with SMIRKS that match a particular group of atoms. When this happens, only the last parameter specified is applied. This allows force field authors to define general parameters first, and then override them with more specific parameters.
</div>

<div class="alert alert-success" style="max-width: 700px; margin-left: auto; margin-right: auto;">
    üìó Convince yourself that this SMIRKS code matches all of the H-X-H angles in our molecule. What H-X-H angles would it not match?
</div>


In [12]:
hxh_angle = force_field.get_parameter_handler('Angles').parameters['[#1:1]-[#6X4:2]-[#1:3]']
hxh_angle.angle = 50 * unit.degree

minimize_and_visualize(ligand, force_field)

Initial energy is 125160.5 kJ/mol; Minimized energy is 99.2 kJ/mol


NGLWidget()

That is not how a methyl group is supposed to look. Guess we did it right!

## Introducing a new parameter

Now let's get really crazy. I've always thought heteroatoms in rings looked a bit too comfortable, haven't you?

Since we have two cyclic heteroatoms in our molecule with very different chemistries, there'll probably be two seperate parameters we need to modify. We can just pass the SMIRKS that come up for C-X-C angles to code that modifies the parameters directly, no need to manually copy it over. 

In [13]:
ff_applied_parameters = force_field.label_molecules(ligand.to_topology())
for atoms, parameter in ff_applied_parameters[0]['Angles'].items():
    ele_1 = ligand.atoms[atoms[0]].element.symbol
    ele_2 = ligand.atoms[atoms[1]].element.symbol
    ele_3 = ligand.atoms[atoms[2]].element.symbol
    if (ele_2 not in ['C', 'H']) and ele_1 == 'C' and ele_3 == 'C':
        print(atoms, parameter)
        
        smirks = parameter.smirks
        heteroatom_angle = force_field.get_parameter_handler('Angles').parameters[smirks]
        heteroatom_angle.angle = 179 * unit.degree # dihedral angles are undefined when linear, so let's not set to 180

        # We'll need this later
        if ele_2 == 'N':
            cyclic_nitrogen_smirks = smirks
        
minimize_and_visualize(ligand, force_field)

(9, 29, 19) <AngleType with smirks: [*:1]~[#7X3$(*~[#6X3,#6X2,#7X2+0]):2]~[*:3]  angle: 118.2043928474 deg  k: 137.4216581399 kcal/(mol rad**2)  id: a19  >
(13, 14, 15) <AngleType with smirks: [*:1]-[#8:2]-[*:3]  angle: 110.2898389197 deg  k: 134.5019777341 kcal/(mol rad**2)  id: a27  >
Initial energy is 127077.7 kJ/mol; Minimized energy is 383.3 kJ/mol


NGLWidget()

Oops! We didn't mean to make all the hydroxyl groups linear! One of our parameters must be applied to Oxygen both when its in a ring and in a hydroxyl group. Sure enough, the second SMIRKS code printed above (`[*:1]-[#8:2]-[*:3]`) matches any X-O-X angle! We'll have to define a new, more specific SMIRKS entry so we only capture the appropriate oxygen.

### The SMIRKS for our new parameter

One way to do this is to make our SMIRKS code by modifying the SMILES code for the molecule. This way we know our new parameter will only apply to the atom we want it to, and not to other atoms in other molecules. The old parameters will still match, but since the new one is added to the end of the parameter list it will override them.

In [14]:
# Label the three atoms that should be a part of our angle
ligand.properties['atom_map'] = {13:1, 14:2, 15:3}
# Generate the SMIRKS code (any SMILES is a SMIRKS for a specific molecule)
smirks = ligand.to_smiles(mapped=True)
smirks

'[H][O][c]1[c]([H])[c]([C@]2([H])[N]([H])[c]3[c]([O][H])[c]([O][H])[c]([C]([C]([H])([H])[H])([C]([H])([H])[H])[C]([H])([H])[H])[c]([O][H])[c]3[C@:1]3([H])[C@@]2([H])[C]([H])([H])[C]([H])([H])[C:3]([H])([H])[O:2]3)[c]([H])[c]([F])[c]1[O][H]'

That is a much heftier SMIRKS code than we've seen before! The SMILES code uniquely identifies a molecule, so by labelling the appropriate atoms for an angle we convert the SMILES for the ligand into a SMIRKS for a specific angle. This is done by modifying the `properties['atom_map']` attribute of the ligand `Molecule`, which takes a dict whose keys are the atom indices and whose values are the label. Check the [original visualization](#The-ligand,-visualised) for the indices of the atoms in our ligand.

You can see that in the generated SMIRKS, three atoms are labelled with `:1`, `:2` or `:3`. All the atoms must be present for the SMIRKS to match, but only these three atoms define the angle for the parameter! This means the parameter can be specific to this angle in this molecule, as well as perhaps a few stereoisomers or derivatives.

### Defining and registering the new parameter

In [15]:
# Start with the original Parsley 1.3 force field and apply the Nitrogen angle from above
force_field = ForceField('openff-1.3.0.offxml')
cyclic_nitrogen_angle = force_field.get_parameter_handler('Angles').parameters[cyclic_nitrogen_smirks]
cyclic_nitrogen_angle.angle = 179 * unit.degree

# Define the new angle parameter
angle_parameter = offtk_parameters.AngleHandler.AngleType(
    smirks=smirks,
    angle=179 * unit.degree,
    k=134.5019777341 * unit.kilocalorie / (unit.mole * unit.radians**2)
)

# Add the parameter to the force field
angles_handler = force_field.get_parameter_handler('Angles')
angles_handler.add_parameter(parameter=angle_parameter)

# Visualize the newly parameterized molecule
minimize_and_visualize(ligand, force_field)

Initial energy is 987.2 kJ/mol; Minimized energy is 239.6 kJ/mol


NGLWidget()

Perfect! The two cyclic heteroatoms are nearly linear while the hydroxyl group retains its characteristic angle.

<div class="alert alert-success" style="max-width: 700px; margin-left: auto; margin-right: auto;">
    üìó Try replacing the generated SMIRKS code with one that you wrote yourself. Make sure it is specific to the target atom!
</div>