Decouple typing from system generation #748

mattwthompson · 2020-10-21T17:31:59Z

Is your feature request related to a problem? Please describe.

Each parameter handler follows a similar pattern that can be reduced to the following steps:

Find matches via self.find_matches, which returns key-val pairs of "slots" and ParameterType objects. For example, one such pair could be a bond between atoms (4, 5) and a BondType with some values for k and length.
Gobble up the data in the above pairs and produce OpenMM forces that encode the physics of each ParameterType object and associated them in each "slot" and shoves them into the OpenMM system.

Right now, there's not a high-level API call that will do (1) for each handler without also doing (2). Most of the time you want to do both, but there are cases in which the OpenMM step is a waste.

Creating a non-OpenMM system, in which the output of (1) is desired but (2) not used.
Iterating through a large set of molecules and checking for typing coverage, but not intending to run simulations or calculations
Finding out what value of a parameter (i.e. interpolated torsion force) would be applied to a particular part of a topology, but not doing anything with it past that

Describe the solution you'd like
Right now, find_matches happens inside create_force, which somewhat couples the typing step to the population of the OpenMM system. I'm proposing separating this step out into a separate function that either returns the result of find_matches or caches it inside the handler (and, thereforce, the ForceField object). This would ideally also involve a high-level function like ForceField.find_matches that does this for each handler. In order to not break the API for this, create_force would either call that function or look to see if it was already called and its result cached. (if self._matches is not None: ...)

Describe alternatives you've considered
This can be done via monkey-patching and/or some reaching into the find_matches call inside each handler. These are not internal methods and it's not improper to access them.

ForceField.label_molecules is very similar to what I'm after here, especially for the case of single-molecule topology, but it doesn't return data in the right shape - it returns a hierarchy that has molecules at the top, whereas I'm proposing something that has handlers at the top. Also, this is not used by create_openmm_system, whereas my idea is to add a function that could be.

Additional context

This seems straightforward enough in simple cases, but most of these handlers have some tidying up and sanity checks between step (1) and (2). This may not be tractable for the tricky cases, I have not dug too deeply into each one to check for blockers. For example, torsions may return different data (ParameterType objects or the relevant data) based on whether or not interpolation is used.

I believe #619 is related and effectively a narrower case of what I'm getting at here.

Happy to be corrected if I'm misunderstanding label_molecules and/or this is already captured somewhere!

The text was updated successfully, but these errors were encountered:

j-wags · 2020-10-21T19:13:06Z

Each parameter handler follows a similar pattern that can be reduced to the following steps:

Agreed, though as you note below, #619 indicates that there's a hidden "step 0" that does Topology modification [1].

ForceField.label_molecules is very similar to what I'm after here

Let's distinguish between three ways OpenFF System population could work:

ForceField.create_system(Topology): An OFFTK ForceField receives an empty OFFSystem (or PotentialHandler) object and a Topology and calls methods of each to populate the desired parameters
label_molecules-ish: An OFFTK ForceField or ParameterHandler is passed a Topology and produces a hierarchical dictionary, which is then ingested by an OFFSystem to populate itself.
System.parametrize(ForceField, Topology): An OFFSystem has a OFFTK Topology and ForceField and calls OFFTK methods such as find_matches and label_molecules to populate itself.

All of these have the same logic running, it's just a matter of where the code goes and which intermediate structures are used. I'm in favor of 1) in the long run, though I could envision that rapid iteration on the System object might favor pursing 3) for a while. The second seems like the worst of all worlds [2] and I don't think we want it.

If we go with option 1, we'll want to start a branch of OFFTK called something like create_off_system, where each ParameterHandler will get a new create_potential_handler method. These new methods will be populated with code that's made to interface with the OFFSystem API.

[1] The case for a separate ParameterHandler.modify_topology step:
Edge cases to consider, revolving around Topology modification

Assignment of partial charges/WBOs to reference molecules
Creation of vsites during parameter application

[2] I think the "label_moleculesish" solution is very bad, since we'll encounter the "SMIRNOFF data" design problems again, where we have a hierarchical dictionary that pretends it doesn't need any formal structure, but is actually just a standardless data model that causes complexity at all its interfaces.

mattwthompson added the api extension label Oct 21, 2020

j-wags added the under discussion label Oct 26, 2020

mattwthompson mentioned this issue Oct 26, 2020

Add coverage canary tests openforcefield/openff-forcefields#27

Merged

mattwthompson mentioned this issue Mar 25, 2021

Design: Using pre-defined data on input molecules openforcefield/openff-interchange#141

Open

mattwthompson mentioned this issue Jan 18, 2022

Fix topology serialization / reinitialization #1174

Merged

5 tasks

mattwthompson added the Close when Interchange backend is used label Jun 29, 2022

mattwthompson closed this as completed Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple typing from system generation #748

Decouple typing from system generation #748

mattwthompson commented Oct 21, 2020

j-wags commented Oct 21, 2020

Decouple typing from system generation #748

Decouple typing from system generation #748

Comments

mattwthompson commented Oct 21, 2020

j-wags commented Oct 21, 2020