# SMIRNOFF Force Fields and You

OpenFF force fields are distributed in SMIRNOFF format. SMIRNOFF was developed to avoid the pitfalls of existing force field specification formats.

SMIRNOFF:

1. Includes all information needed to apply parameters to a chemical system
2. Completely specifies the potential energy function
3. Works across different MD engines

<img src="https://imgs.xkcd.com/comics/standards_2x.png" style="margin:30px;width:500px"/>

How does it do this?

1. SMIRNOFF defines a map from chemistry to a potential energy function
2. Chemistry is defined through SMIRKS
3. Only the last, most specific SMIRKS sets the parameter
4. OpenFF tools export fully prepared systems, not generic force fields


What does it look like?

```xml
<?xml version="1.0" encoding="utf-8"?>
<SMIRNOFF version="0.3" aromaticity_model="OEAroModel_MDL">
    <Author>The Open Force Field Initiative</Author>
    <Date>2023-05-02</Date>

    <Constraints version="0.3">
        <Constraint smirks="[#1:1]-[*:2]" id="c1"></Constraint>
        <Constraint smirks="[#1:1]-[#8X2H2+0:2]-[#1]" id="c-tip3p-H-O" distance="0.9572 * angstrom ** 1"></Constraint>
        <Constraint smirks="[#1:1]-[#8X2H2+0]-[#1:2]" id="c-tip3p-H-O-H" distance="1.5139006545247014 * angstrom ** 1"></Constraint>
    </Constraints>

    <Bonds version="0.4" potential="harmonic" fractional_bondorder_method="AM1-Wiberg" fractional_bondorder_interpolation="linear">
        <Bond smirks="[#6X4:1]-[#6X4:2]" id="b1" length="1.527940216866 * angstrom ** 1" k="419.9869268191 * angstrom ** -2 * mole ** -1 * kilocalorie ** 1"></Bond>
        ...
    </Bonds>

    <Angles version="0.3" potential="harmonic">
        <Angle smirks="[*:1]~[#6X4:2]-[*:3]" angle="110.0631999136 * degree ** 1" k="121.1883270155 * mole ** -1 * radian ** -2 * kilocalorie ** 1" id="a1"></Angle>
        ...
    </Angles>

    <ProperTorsions version="0.4" potential="k*(1+cos(periodicity*theta-phase))" default_idivf="auto" fractional_bondorder_method="AM1-Wiberg" fractional_bondorder_interpolation="linear">
        <Proper smirks="[*:1]-[#6X4:2]-[#6X4:3]-[*:4]" periodicity1="3" phase1="0.0 * degree ** 1" id="t1" k1="0.1526959283148 * mole ** -1 * kilocalorie ** 1" idivf1="1.0"></Proper>
        ...
    </ProperTorsions>

    <ImproperTorsions version="0.3" potential="k*(1+cos(periodicity*theta-phase))" default_idivf="auto">
        <Improper smirks="[*:1]~[#6X3:2](~[*:3])~[*:4]" periodicity1="2" phase1="180.0 * degree ** 1" k1="5.230790565314 * mole ** -1 * kilocalorie ** 1" id="i1"></Improper>
        ...
    </ImproperTorsions>

    <vdW version="0.4" potential="Lennard-Jones-12-6" combining_rules="Lorentz-Berthelot" scale12="0.0" scale13="0.0" scale14="0.5" scale15="1.0" cutoff="9.0 * angstrom ** 1" switch_width="1.0 * angstrom ** 1" periodic_method="cutoff" nonperiodic_method="no-cutoff">
        <Atom smirks="[#1:1]" epsilon="0.0157 * mole ** -1 * kilocalorie ** 1" id="n1" rmin_half="0.6 * angstrom ** 1"></Atom>
        ...
    </vdW>

    <Electrostatics 
        version="0.4" 
        scale12="0.0" 
        scale13="0.0" 
        scale14="0.8333333333" 
        scale15="1.0" 
        cutoff="9.0 * angstrom ** 1" 
        switch_width="0.0 * angstrom ** 1"
        periodic_potential="Ewald3D-ConductingBoundary" 
        nonperiodic_potential="Coulomb" 
        exception_potential="Coulomb"
    ></Electrostatics>

    <LibraryCharges version="0.3">
        <LibraryCharge smirks="[#3+1:1]" charge1="1.0 * elementary_charge ** 1" id="Li+"></LibraryCharge>
        ...
    </LibraryCharges>

    <ToolkitAM1BCC version="0.3"></ToolkitAM1BCC>
</SMIRNOFF>
```

See the full specification at OpenFF Standards: https://openforcefield.github.io/standards/standards/smirnoff/

In [None]:
!pip install -U https://github.com/conda-incubator/condacolab/archive/cuda-version-12.tar.gz
import condacolab
condacolab.install_mambaforge()

In [None]:
!wget -q https://raw.githubusercontent.com/openforcefield/openff-docs/2024-smirnoff-workshop/source/workshops/2024/smirnoff/utils.py
!wget -q https://raw.githubusercontent.com/openforcefield/openff-docs/2024-smirnoff-workshop/source/workshops/2024/smirnoff/conda-env.yml
!wget -q https://raw.githubusercontent.com/openforcefield/openff-docs/2024-smirnoff-workshop/source/workshops/2024/smirnoff/7FCX_prepped.pdb
!wget -q https://raw.githubusercontent.com/openforcefield/openff-docs/2024-smirnoff-workshop/source/workshops/2024/smirnoff/smirks.png
!mamba env update -q --name=base --file=conda-env.yml
from google.colab import output
output.enable_custom_widget_manager()

In [None]:
from utils import draw_molecule, nglview_show_openmm

### SMIRNOFF Maps from Chemistry to Potential Energy Function

SMIRNOFF force fields can be loaded from disk and inspected using the [`ForceField`] class:

[`ForceField`]: https://docs.openforcefield.org/toolkit/en/stable/api/generated/openff.toolkit.typing.engines.smirnoff.ForceField.html

In [None]:
from openff.toolkit import ForceField 

sage = ForceField('openff-2.1.1.offxml')

# Uncommenting this line will produce a LOT of text
# sage.to_string()

Let's demonstrate this with a simple molecule, hexanoic acid:

In [None]:
from openff.toolkit import Molecule

hexanoic_acid = Molecule.from_smiles("CCCCCC(=O)O")

draw_molecule(hexanoic_acid, explicit_hydrogens=False, atom_notes = {i:str(i) for i in range(hexanoic_acid.n_atoms)})

We can apply the Sage force field to the molecule by creating an `Interchange`. An Interchange represents a chemical system ready to simulate in a variety of MD engines. The potential parameters are stored 

In [None]:
# This takes a second because it is computing partial charges according to the force field

hexanoic_acid_in_sage = sage.create_interchange(hexanoic_acid.to_topology())
hexanoic_acid_in_sage.collections

In [None]:
hexanoic_acid_in_sage.collections['Bonds'].get_force_field_parameters()

In [None]:
bond_collection = hexanoic_acid_in_sage.collections['Bonds']
bond_collection

In [None]:
from openff.interchange.models import BondKey

# These are split up to allow each parameter to be traced back to the original force field
potential_key = bond_collection.key_map[BondKey(atom_indices = [0, 1])]
bond_collection.potentials[potential_key]

In [None]:
draw_molecule(
    hexanoic_acid, 
    bond_notes={
        bond.atom_indices: f"{bond_collection.potentials[key].parameters['length']:.3f~P}" 
        for bond, key in bond_collection.key_map.items()
    },
    explicit_hydrogens=False
)

### Chemistry is defined through SMIRKS

SMIRKS is a derivative of SMILES and SMARTS used to search for and identify atoms in chemical substructures.

<img src="smirks.png" width="400"/>

In [None]:
# Check out the first bond in Sage
sage['Bonds'][0]

In [None]:
# Can also identify bonds by SMIRKS
sage["Bonds"].get_parameter({"smirks":"[#6X4:1]-[#6X4:2]"})

In [None]:
draw_molecule(
    hexanoic_acid, 
    atom_notes={
        i: f"{i}" 
        for i in range(hexanoic_acid.n_atoms)
    },
    explicit_hydrogens=False
)

In [None]:
hexanoic_acid.chemical_environment_matches("[#6X4:1]-[#6X4:2]", unique=True)

### Later parameters override

In [None]:
# Look at the first 3 bond parameters in Sage
print(*(f"{bond.smirks}: {bond.k:.0f}" for bond in sage['Bonds'][:3]), sep='\n')

In [None]:
print("k is given in kilocalorie / angstrom ** 2 / mole")
draw_molecule(
    hexanoic_acid, 
    bond_notes={
        bond.atom_indices: f"k={bond_collection.potentials[key].parameters['k'].m:.0f}" 
        for bond, key in bond_collection.key_map.items()
    },
    explicit_hydrogens=False
)

In [None]:
hexanoic_acid.chemical_environment_matches("[#6X4:1]-[#6X3:2]", unique=True)

In [None]:
hexanoic_acid.chemical_environment_matches("[#6X4:1]-[#6X3:2]=[#8X1+0]", unique=True)

In [None]:
draw_molecule(
    hexanoic_acid, 
    atom_notes={
        i: f"{i}" 
        for i in range(hexanoic_acid.n_atoms)
    },
    explicit_hydrogens=False
)

### Exports fully prepared systems for multiple engines

In [None]:
from openmm import LangevinMiddleIntegrator
import openmm.unit
import numpy as np

# # Earlier:
# from openff.toolkit import Molecule, ForceField
# hexanoic_acid = Molecule.from_smiles("CCCCCC(=O)O")
# hexanoic_acid_in_sage = ForceField("openff-2.1.1.offxml").create_interchange(
#     hexanoic_acid.to_topology()
# )

simulation = hexanoic_acid_in_sage.to_openmm_simulation(
    integrator = LangevinMiddleIntegrator(
        300 * openmm.unit.kelvin,
        0.1 / openmm.unit.picosecond, 
        2 * openmm.unit.femtosecond,
    )
)
simulation.context.setPositions(np.random.rand(hexanoic_acid_in_sage.topology.n_atoms, 3))
simulation.minimizeEnergy()

minimized_positions = simulation.context.getState(getPositions=True).getPositions()
nglview_show_openmm(simulation.topology, minimized_positions)

In [None]:
from openff.interchange.drivers import get_summary_data
from openff.units import unit

hexanoic_acid_in_sage.box = [[4,0,0],[0,4,0],[0,0,4]] * unit.nanometer
hexanoic_acid_in_sage.positions = minimized_positions

# VdW difference comes from Amber putting switching settings in a file we don't export (sander.in)
# GROMACS has similar issues with PME, cutoffs
# OpenMM will be exact; everything else is best effort
get_summary_data(hexanoic_acid_in_sage)

## Interchange's matrix representation

In [None]:
hexanoic_acid_in_sage.collections['Bonds'].get_force_field_parameters()

In [None]:
hexanoic_acid_in_sage.collections['Bonds'].get_param_matrix()

In [None]:
import numpy

dotted = numpy.dot(
    hexanoic_acid_in_sage["Bonds"].get_param_matrix(),
    hexanoic_acid_in_sage["Bonds"].get_force_field_parameters().flatten(),
).reshape((-1, 2))

dotted


In [None]:
numpy.allclose(dotted, hexanoic_acid_in_sage["Bonds"].get_system_parameters())

In [None]:
edited_params = hexanoic_acid_in_sage["Bonds"].get_force_field_parameters()
edited_params[0, 1] = 4.0

hexanoic_acid_in_sage["Bonds"].set_force_field_parameters(edited_params)

hexanoic_acid_in_sage.minimize()

hexanoic_acid_in_sage.visualize()

## A protein system

Introduce NADP, parametrize with Sage, investigate assigned parameters, demonstrate same potential energy across different engines 

Heart-type fatty acid binding protein (hFABP) 

https://www.rcsb.org/structure/7FCX

In [None]:
from openff.toolkit import Topology


topology = Topology.from_pdb(
    "7FCX_prepped.pdb", 
    unique_molecules=[hexanoic_acid]
)

In [None]:
topology.visualize()

### Combining the general Sage force field with a specific protein force field (Amber)

In [None]:
combined_force_field = ForceField(
    "openff-2.1.1.offxml", 
    "bespoke_hexanoic_acid_alltorsions.offxml",
    "ff14sb_off_impropers_0.0.3.offxml",
)

In [None]:
interchange = combined_force_field.create_interchange(topology)

In [None]:
temperature = 300 * openmm.unit.kelvin
friction_coefficient = 1 / openmm.unit.picosecond
time_step = 2 * openmm.unit.femtosecond

simulation = interchange.to_openmm_simulation(
    integrator = openmm.LangevinMiddleIntegrator(
        temperature, 
        friction_coefficient, 
        time_step,
    )
)

# Add a reporter to record the structure every few steps
dcd_reporter = openmm.app.DCDReporter(file="trajectory.dcd", reportInterval=1000)
simulation.reporters.append(dcd_reporter)

In [None]:
def describe_state_of(simulation: openmm.app.Simulation, name: str = "State"):
    state = simulation.context.getState(getEnergy=True, getForces=True)
    forces = [np.sqrt(v.x**2 + v.y**2 + v.z**2) for v in state.getForces()]
    max_force = max(forces)
    max_force_index = np.argmax(forces)
    print(
        f"{name} has energy {round(state.getPotentialEnergy()._value, 2)} kJ/mol "
        f"with maximum force {round(max_force, 2)} kJ/(mol nm) on atom {max_force_index}."
    )

describe_state_of(simulation, "Original state")
simulation.minimizeEnergy()
describe_state_of(simulation, "Minimized state")

simulation.context.setVelocitiesToTemperature(temperature)

In [None]:
simulation.runForClockTime(1 * openmm.unit.minute)

In [None]:
w = nglview_show_openmm(
    simulation.topology, 
    "trajectory.dcd",
)
w.add_line(sele="protein")
w.add_unitcell()
w

## OpenFF BespokeFit

BespokeFit automatically optimizes torsions against QC torsion drives for SMIRNOFF force fields:

In [None]:
# !pkill redis
%env OMP_NUM_THREADS=4
!openff-bespoke executor run                               \
    --smiles "CCCCCC(=O)O"                                 \
    --force-field "openff-2.1.1.offxml"                    \
    --output-force-field "bespokefit-hexanoic-acid.offxml" \
    --target-torsion "[#6X4:1]-[#6X3:2]=[#8X1+0]"          \
    --workflow "default"                                   \
    --qc-compute-n-cores 4                                 \
    --default-qc-spec xtb gfn2xtb none

BespokeFit outputs the entire starting point force field, minus constraints, plus the new parameters, so that the output is exactly the force field that was optimized.

In [None]:
bespoke = ForceField("./bespokefit-hexanoic-acid.offxml")
bespoke_starting_point = ForceField("openff-2.1.1.offxml")

from difflib import Differ

[
    line 
    for line 
    in Differ().compare(bespoke_starting_point.to_string().splitlines(), bespoke.to_string().splitlines())
    if not line.startswith(" ")
]

## Your Own Force Fields

Tools for authoring SMIRNOFF force fields are still at an early stage:

- https://smarts.plus/ for visualizing the chemistry your SMARTS/SMIRKS can match

- https://github.com/MobleyLab/chemper for generating SMIRKS for chemical fragments

You might also take inspiration from how we produce OpenFF force fields - though unfortunately we cannot support this software as it is intended for internal use:

- ⚠️ https://github.com/openforcefield/amber-ff-porting is our tooling for porting Amber ff14SB to SMIRNOFF
- ⚠️ https://github.com/openforcefield/sage-2.2.0 is our WIP next release of Sage, including scripts used for re-fitting

### Shipping your force field

Expose a function that provides a list of directories your package places force fields in. Say you want to publish force fields in the `offxml` directory at the top level of your package `mypackage`:

`mypackage/_forcefields.py`:

```python
from importlib.resources import files

def get_forcefield_dirs_paths() -> list[str]:
    return [(files("mypackage") / "offxml").as_posix()]
```

Then, tell PIP about your data files and entry points:

If you're using `setup.py`:

```python

setup(
    ...
    # Package the contents of the offxml directory, even if they're not Python files
    package_data={"mypackage": ["offxml/*"]},
    # Add entry point so that the forcefield directory can be discovered by the openforcefield toolkit.
    entry_points={
        "openforcefield.smirnoff_forcefield_directory": [
            "get_forcefield_dirs_paths = mypackage._forcefields:get_forcefield_dirs_paths",
        ],
    },
)
```

Or `pyproject.toml`:

```toml
# Package the contents of the offxml directory, even if they're not Python files
[tool.setuptools.package-data]
mypackage = ["offxml/*"]

# Add entry point so that the forcefield directory can be discovered by the openforcefield toolkit.
[project.entry-points."openforcefield.smirnoff_forcefield_directory"]
get_forcefield_dirs_paths = "mypackage._forcefields:get_forcefield_dirs_paths"
```


In [None]:
from importlib_metadata import entry_points
from pathlib import Path

for entry_point in entry_points().select(
    group="openforcefield.smirnoff_forcefield_directory"
):
    for directory in entry_point.load()():
        filename = Path(directory) / "openff-2.1.1.offxml"
        if filename.is_file():
            print(filename.read_text())
            break

In [None]:
for entry_point in entry_points().select(
    group="openforcefield.smirnoff_forcefield_directory"
):
    for directory in entry_point.load()():
        for file in Path(directory).iterdir():
            print(file)

## Wrapping up

1. SMIRNOFF defines a map from chemistry to a potential energy function
2. Chemistry is defined through SMIRKS
3. Only the last, most specific SMIRKS sets the parameter
4. OpenFF tools export fully prepared systems, not generic force fields
5. You can publish SMIRNOFF force fields on Conda Forge now!

See our examples page for more: https://docs.openforcefield.org/examples

<img src="openff-examples-screenshot.png" width=600  />

<!-- TODO: Pictures! xkcd? -->