# Physical property fitting

This notebook will guide you through the process of creating the physical property fitting folder structure required for ForceBalance and demonstrate the fitting of a small toy dataset using a local OpenFF-Evaluator server.

The fitting of the full dataset is computationally demanding and required 60 GPUs and 6 days of wallclock time on an HPC cluster to complete. We recommend following the HPC guide in the OpenFF-Evaluator [documentation](https://docs.openforcefield.org/projects/evaluator/en/stable/backends/daskbackends.html#dask-hpc-cluster) to adapt the local Dask cluster to your needs.

## Setting up the ForceBalance inputs

First, we begin by creating the directory to store our initial force field parameters and another to store our dataset of targets physical properties which we will be fitting to:


In [None]:
import os
from openff.evaluator.datasets import PhysicalPropertyDataSet
from openff.evaluator.datasets.curation.components.filtering import (
    FilterBySmilesSchema,
    FilterByMoleFractionSchema,
    FilterByPropertyTypesSchema
)
from openff.evaluator.datasets.curation.workflow import CurationWorkflow, CurationWorkflowSchema

os.makedirs("forcefield", exist_ok=True)
os.makedirs(os.path.join("targets", "phys-prop"), exist_ok=True)

we can now move our fitting target dataset into the targets folder. In this example will be creating a simple toy dataset with one density and one enthalpy of mixing extracted from the full DE-FF fitting dataset.

In [None]:
de_ff_dataset = PhysicalPropertyDataSet.from_json("../../../data-set-curation/physical-property/physical-data-sets/sage-train-v1.json")
toy_dataset = CurationWorkflow.apply(
    de_ff_dataset, 
    CurationWorkflowSchema(
        component_schemas=[
            FilterBySmilesSchema(smiles_to_include=["O", "OC1=NCCC1"]), 
            FilterByMoleFractionSchema(mole_fraction_ranges={2: [[(0.4, 0.6)]]}),
            FilterByPropertyTypesSchema(property_types=["Density", "EnthalpyOfMixing"], n_components={"Density": [2], "EnthalpyOfMixing": [2]})
            ]
        )
    )
toy_dataset.to_pandas()

we now write our toy dataset to the target folder, to run the full fiting use the de_ff_dataset object instead.

In [None]:
with open(os.path.join("targets", "phys-prop", "toy-dataset.json"), "w") as output:
    output.write(toy_dataset.json(os.path.join("targets", "phys-prop", "toy-dataset.json")))


## Estimation options

The next step is to create the Evaluator estimation options file, which controls simulation settings and construction of the objective function used in ForceBalance. We do this by specifying the weight of each of the target properties and a scale factor or denominator used to scale the error in our calculated property to a unitless value that can be combined in a multiobjective optimisation. 

In [None]:
from forcebalance.evaluator_io import Evaluator_SMIRNOFF
from openff.units import unit
from openff.evaluator.client import RequestOptions

options = Evaluator_SMIRNOFF.OptionsFile()
# set the local path to our target dataset
options.data_set_path = "toy-dataset.json"
# set the weights of the target properties to be equal
options.weights = {"Density": 1.0, "EnthalpyOfMixing": 1.0}
# set the property scale factors
options.denominators = {"Density": 0.05 * unit.grams / unit.millilitre,"EnthalpyOfMixing": 1.6 * unit.kilojoule / unit.mole,}
# pick which calculation back ends to use, use with simulation only
evaluator_options = RequestOptions()
evaluator_options.calculation_layers = ["SimulationLayer"]
options.estimation_options = evaluator_options
# write the file to the targets folder
with open(os.path.join("targets", "phys-prop", "options.json"), "w") as output:
    output.write(options.to_json())

## Starting Force Field Parameters

We need to construct a DE-FF from which we can run the parameter optimisation, we will use the included scripts to build a DE-FF with an optimised tip4p water model. From this we can optimise the non-bonded parameters of the non-water types to the target data.

In [None]:
# run the script to build our initial ff
! python ../../../scripts/build_initial_ff.py

Load up the force field and type the molecules on which we will train to get a list of parameters to optimise

In [None]:
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.topology import Molecule

de_ff = ForceField("double-exp-ff.offxml", load_plugins=True, allow_cosmetic_attributes=True)
# create a molecule object for our target molecule, skip water as we don't want to optimise the parameters again
target_mol = Molecule.from_smiles("OC1=NCCC1")
# label the atoms with the non-bonded smirks types
de_labels = de_ff.label_molecules(topology=target_mol.to_topology())[0]["DoubleExponential"]

# make a list of smirks to target
target_smirks = set()
for parameter in de_labels.values():
    target_smirks.add(parameter.smirks)

# if using openmm <7.7 keep `[#1:1]-[#8]` fixed
target_smirks.remove("[#1:1]-[#8]")
target_smirks


Now that we have a list of smirks non-bonded types that are exercised by this dataset we can tag them to let ForceBalance know which parameters need gradients for the optimisation.

In [None]:
de_handler = de_ff.get_parameter_handler("DoubleExponential")

for smirks in target_smirks:
    de_parameter = de_handler[smirks]
    de_parameter.add_cosmetic_attribute("parameterize", "epsilon, r_min")

# and can write out the final forcefield to the directory
de_ff.to_file(os.path.join( "forcefield", "force-field.offxml"))

## Evaluator Server

With the ForceBalance inputs created, we can now start our OpenFF-Evaluator server to which ForceBalance will send its estimation requests, for more information on how this works see the second evaluator [tutorial](https://docs.openforcefield.org/projects/evaluator/en/stable/tutorials/tutorial02.html).

In [None]:
# Launch the calculation backend which will distribute any calculations.
from openff.evaluator.backends import ComputeResources
from openff.evaluator.backends.dask import DaskLocalCluster

# os.environ["CUDA_VISIBLE_DEVICES"] = "0"

calculation_backend = DaskLocalCluster(
    number_of_workers=1,
    resources_per_worker=ComputeResources(
        number_of_threads=1,
        number_of_gpus=1,
        preferred_gpu_toolkit=ComputeResources.GPUToolkit.CUDA,
    ),
)
calculation_backend.start()

# Launch the server object which will listen for estimation requests and schedule any
# required calculations.
from openff.evaluator.server import EvaluatorServer

evaluator_server = EvaluatorServer(calculation_backend=calculation_backend)
evaluator_server.start(asynchronous=True)

## Run ForceBalance 

With everything created and our Evaluator server ready and waiting for requests, we can start the optimisation with a single command. Note the ForceBalance control file is provided with the example, for more information on the options available see the ForceBalance [documentation](http://leeping.github.io/forcebalance/doc/ForceBalance-Manual.pdf)

In [None]:
! ForceBalance optimize.in 