Draft of proposed spec #29

mattwthompson · 2020-10-14T21:34:36Z

Description

This PR aims to materialize a specification document for this project.

codecov · 2020-10-14T21:37:50Z

Codecov Report

Merging #29 (b93d5cd) into master (c0171c4) will not change coverage.
The diff coverage is n/a.

j-wags · 2020-10-14T21:43:59Z

spec/The-OpenFF-System.md

+| Bonds | Bond constraints
+| | Fractional bond orders
+| Iterators for other valences | Residue/chain/segments
+| Periodicity boolean | Box vectors


Let's communicate that we're flexible on whether the box vectors could be part of the coordinates.

I see them as separate silos of data (them being in the same row here is just a coincidence)

j-wags · 2020-10-14T21:45:57Z

spec/The-OpenFF-System.md

+independent_variables={"x"}) potential_handler.expression
+```
+
+| MUST | MAY |  MUST NOT |


I really like this style of table.

j-wags · 2020-10-14T22:14:43Z

spec/The-OpenFF-System.md

+
+### User control over system combination
+
+A plug-in architecture will expose settings for the level of rigour executed in the internal consistency checks. The default will be strict, resulting in errors if there is any ambiguity in the physical description of each system (i.e. different non-bonded cutoff treatments). A small number (1-2) of other settings with more granularity will be exposed, i.e. one that is problematively permissive and another that implements some amount of "reasonable" discrepancies to be fudged together. This will all be constructed in a way that enables users to define their own sets of "knobs" for when system combination can and cannot be allowed, i.e. if aromaticity models need not match but cut-off treatments must.


Maybe this is scope creep so early in the process, but I think the + operator is going to be a critically important part of the implementation, since we'll be able to reuse it during system combination, modification, and splitting/subtraction (if we support it). So I wonder if we want more depth about its behavior. Like

Use of the + operator...
MUST result in a physically valid system (able to consume coordinates and calculate energies)
MUST preserve the order of particles in both systems
MUST attempt to use all SystemCombiners registered to the System, in the order they were registered, unless one raises a specific subclass of Exception that calls for an early halt.
MAY use multiple SystemCombiners in order to fully cover the different components of the input Systems
MAY provide non-commutative behavior (nonbonded cutoff might be taken from the system on the left)
MAY accept objects other than OpenFF systems (like ParmEd Structures)
MAY raise an error on atom typing collisions (maybe have a separate registry of TopologyCombiners?)
MUST NOT, if both input Systems are OpenFF Systems, produce a new Topology that contains different chemical species than were in the original inputs
MUST NOT perform in-place modifications on either input System
MUST NOT (?) "add" information that wasn't present in either System (for example, adding a cutoff where neither input specified one)

spec/The-OpenFF-System.md

mrshirts · 2020-10-15T02:08:54Z

spec/The-OpenFF-System.md

+
+System parameters (force field parameters applied to a chemical topology) are represented as the sum of individual components (`PotentialHandler`s). Teach term in a potential energy function is expected to be captured by a `PotentialHandler` or combination thereof. These closely mirror the `ParameterHandler`s in OpenFF Toolkit, and may merge in the future.
+
+Each `PotentialHandler` subclass must specify an string-like `expression` that encodes the algebra of its energy evaluation and a collection of `independent_variables` that specify which variables in the expression do not need to be specified by system parameters. The remaining variables are then expected to be specified in a sequence of `Potential` objects stored in the handler.


Are we sure that everything can be expressed as string-like, rather than a function that gets evaluated a number of times? "PME electrostatics short range" is technically a very complicated function, and expressing it as a string could be problematic. Machine learning potentials are going to be super hard if not impossible to express as strings.

Can we have string constants, that define the algebra elsewhere? Is that a good or bad idea?

I do think we'll need to have a few different flavors of this class - also for tabulated potentials that do not care about any algebraic form

mrshirts · 2020-10-15T02:13:15Z

spec/The-OpenFF-System.md

+
+System parameters (force field parameters applied to a chemical topology) are represented as the sum of individual components (`PotentialHandler`s). Teach term in a potential energy function is expected to be captured by a `PotentialHandler` or combination thereof. These closely mirror the `ParameterHandler`s in OpenFF Toolkit, and may merge in the future.
+
+Each `PotentialHandler` subclass must specify an string-like `expression` that encodes the algebra of its energy evaluation and a collection of `independent_variables` that specify which variables in the expression do not need to be specified by system parameters. The remaining variables are then expected to be specified in a sequence of `Potential` objects stored in the handler.


Can we have string constants, that define the algebra elsewhere? Is that a good or bad idea?

mrshirts · 2020-10-15T02:14:14Z

spec/The-OpenFF-System.md

+
+## Features
+
+### Tracking parameter sources


Can you say a little more about what the data structures are that store/represent provenance?

mrshirts · 2020-10-15T02:17:48Z

spec/The-OpenFF-System.md

+
+### User control over system combination
+
+A plug-in architecture will expose settings for the level of rigour executed in the internal consistency checks. The default will be strict, resulting in errors if there is any ambiguity in the physical description of each system (i.e. different non-bonded cutoff treatments). A small number (1-2) of other settings with more granularity will be exposed, i.e. one that is problematively permissive and another that implements some amount of "reasonable" discrepancies to be fudged together. This will all be constructed in a way that enables users to define their own sets of "knobs" for when system combination can and cannot be allowed, i.e. if aromaticity models need not match but cut-off treatments must.


Maybe there can be compositions of functions? It would be a pain to write out different van der Waals, PME electrostatic short range, etc. that each had to reimplement a tapered cutoff term to multiply by each term - seems like should only be written once.

Also, is it possible to recognize when two functions are equivalent (At least 2 different ways of doing LJ, harmonic force constants can be defined with or without 1/2, RB and periodic sums of torsion can be equivalent).

mrshirts · 2020-10-15T02:18:38Z

spec/The-OpenFF-System.md

+## Relevant edge cases
+
+* Conception of "pre-" and "post-typing" topologies
+* Do virtual sites go in the topology, a special "post-typing" topology, or should they be computed on-the-fly as needed?


My instinct is part of the topology, it defines the geometry, is affected by parameterization.

mrshirts · 2020-10-15T02:19:13Z

spec/The-OpenFF-System.md

+* Allowing/forbidding/tracking/fixing "dirty" states
+* Dealing with mal-formed files or those that play fast-and-loose with specifications (MOL2, PDB, etc.)
+* Safely supporting alchemical mutations
+* Tracking alchemical mutations (safely storing a diff?)


Could literally be arrays of parameters at each site?

mrshirts · 2020-10-15T02:19:36Z

spec/The-OpenFF-System.md

+* Dealing with mal-formed files or those that play fast-and-loose with specifications (MOL2, PDB, etc.)
+* Safely supporting alchemical mutations
+* Tracking alchemical mutations (safely storing a diff?)
+* How to handle polarizability


Well, it is just a functional form.

DOC: Add initial, incomplete, early draft of spec

72aed2a

j-wags reviewed Oct 14, 2020

View reviewed changes

mrshirts reviewed Oct 15, 2020

View reviewed changes

spec/The-OpenFF-System.md Outdated Show resolved Hide resolved

mrshirts reviewed Oct 15, 2020

View reviewed changes

mattwthompson mentioned this pull request Oct 23, 2020

Where should positions go? #33

Closed

mattwthompson added 4 commits December 3, 2020 17:17

Merge remote-tracking branch 'upstream/master' into spec

e3da6cc

DOC: Updates to spec draft

92f53d3

Merge remote-tracking branch 'upstream/master' into spec

7a25c5b

DOC: Small updates to spec

babc52f

mattwthompson closed this Nov 8, 2021

mattwthompson deleted the spec branch May 23, 2022 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft of proposed spec #29

Draft of proposed spec #29

mattwthompson commented Oct 14, 2020

codecov bot commented Oct 14, 2020 •

edited

j-wags Oct 14, 2020

mattwthompson Dec 16, 2020

j-wags Oct 14, 2020

j-wags Oct 14, 2020

mrshirts Oct 15, 2020

mrshirts Oct 15, 2020

mattwthompson Dec 16, 2020

mrshirts Oct 15, 2020

mrshirts Oct 15, 2020

mrshirts Oct 15, 2020

mrshirts Oct 15, 2020

mrshirts Oct 15, 2020

mrshirts Oct 15, 2020

mrshirts Oct 15, 2020


		### User control over system combination

		A plug-in architecture will expose settings for the level of rigour executed in the internal consistency checks. The default will be strict, resulting in errors if there is any ambiguity in the physical description of each system (i.e. different non-bonded cutoff treatments). A small number (1-2) of other settings with more granularity will be exposed, i.e. one that is problematively permissive and another that implements some amount of "reasonable" discrepancies to be fudged together. This will all be constructed in a way that enables users to define their own sets of "knobs" for when system combination can and cannot be allowed, i.e. if aromaticity models need not match but cut-off treatments must.


		System parameters (force field parameters applied to a chemical topology) are represented as the sum of individual components (`PotentialHandler`s). Teach term in a potential energy function is expected to be captured by a `PotentialHandler` or combination thereof. These closely mirror the `ParameterHandler`s in OpenFF Toolkit, and may merge in the future.

		Each `PotentialHandler` subclass must specify an string-like `expression` that encodes the algebra of its energy evaluation and a collection of `independent_variables` that specify which variables in the expression do not need to be specified by system parameters. The remaining variables are then expected to be specified in a sequence of `Potential` objects stored in the handler.

Draft of proposed spec #29

Draft of proposed spec #29

Conversation

mattwthompson commented Oct 14, 2020

Description

codecov bot commented Oct 14, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 14, 2020 •

edited