Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple typing from system generation #748

Closed
mattwthompson opened this issue Oct 21, 2020 · 1 comment
Closed

Decouple typing from system generation #748

mattwthompson opened this issue Oct 21, 2020 · 1 comment

Comments

@mattwthompson
Copy link
Member

Is your feature request related to a problem? Please describe.

Each parameter handler follows a similar pattern that can be reduced to the following steps:

  1. Find matches via self.find_matches, which returns key-val pairs of "slots" and ParameterType objects. For example, one such pair could be a bond between atoms (4, 5) and a BondType with some values for k and length.
  2. Gobble up the data in the above pairs and produce OpenMM forces that encode the physics of each ParameterType object and associated them in each "slot" and shoves them into the OpenMM system.

Right now, there's not a high-level API call that will do (1) for each handler without also doing (2). Most of the time you want to do both, but there are cases in which the OpenMM step is a waste.

  • Creating a non-OpenMM system, in which the output of (1) is desired but (2) not used.
  • Iterating through a large set of molecules and checking for typing coverage, but not intending to run simulations or calculations
  • Finding out what value of a parameter (i.e. interpolated torsion force) would be applied to a particular part of a topology, but not doing anything with it past that

Describe the solution you'd like
Right now, find_matches happens inside create_force, which somewhat couples the typing step to the population of the OpenMM system. I'm proposing separating this step out into a separate function that either returns the result of find_matches or caches it inside the handler (and, thereforce, the ForceField object). This would ideally also involve a high-level function like ForceField.find_matches that does this for each handler. In order to not break the API for this, create_force would either call that function or look to see if it was already called and its result cached. (if self._matches is not None: ...)

Describe alternatives you've considered
This can be done via monkey-patching and/or some reaching into the find_matches call inside each handler. These are not internal methods and it's not improper to access them.

ForceField.label_molecules is very similar to what I'm after here, especially for the case of single-molecule topology, but it doesn't return data in the right shape - it returns a hierarchy that has molecules at the top, whereas I'm proposing something that has handlers at the top. Also, this is not used by create_openmm_system, whereas my idea is to add a function that could be.

Additional context

This seems straightforward enough in simple cases, but most of these handlers have some tidying up and sanity checks between step (1) and (2). This may not be tractable for the tricky cases, I have not dug too deeply into each one to check for blockers. For example, torsions may return different data (ParameterType objects or the relevant data) based on whether or not interpolation is used.

I believe #619 is related and effectively a narrower case of what I'm getting at here.

Happy to be corrected if I'm misunderstanding label_molecules and/or this is already captured somewhere!

@j-wags
Copy link
Member

j-wags commented Oct 21, 2020

Each parameter handler follows a similar pattern that can be reduced to the following steps:

Agreed, though as you note below, #619 indicates that there's a hidden "step 0" that does Topology modification [1].

ForceField.label_molecules is very similar to what I'm after here

Let's distinguish between three ways OpenFF System population could work:

  1. ForceField.create_system(Topology): An OFFTK ForceField receives an empty OFFSystem (or PotentialHandler) object and a Topology and calls methods of each to populate the desired parameters
  2. label_molecules-ish: An OFFTK ForceField or ParameterHandler is passed a Topology and produces a hierarchical dictionary, which is then ingested by an OFFSystem to populate itself.
  3. System.parametrize(ForceField, Topology): An OFFSystem has a OFFTK Topology and ForceField and calls OFFTK methods such as find_matches and label_molecules to populate itself.

All of these have the same logic running, it's just a matter of where the code goes and which intermediate structures are used. I'm in favor of 1) in the long run, though I could envision that rapid iteration on the System object might favor pursing 3) for a while. The second seems like the worst of all worlds [2] and I don't think we want it.

If we go with option 1, we'll want to start a branch of OFFTK called something like create_off_system, where each ParameterHandler will get a new create_potential_handler method. These new methods will be populated with code that's made to interface with the OFFSystem API.


[1] The case for a separate ParameterHandler.modify_topology step:
Edge cases to consider, revolving around Topology modification

  • Assignment of partial charges/WBOs to reference molecules
  • Creation of vsites during parameter application

[2] I think the "label_moleculesish" solution is very bad, since we'll encounter the "SMIRNOFF data" design problems again, where we have a hierarchical dictionary that pretends it doesn't need any formal structure, but is actually just a standardless data model that causes complexity at all its interfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants