Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of proposed spec #29

Closed
wants to merge 5 commits into from
Closed

Draft of proposed spec #29

wants to merge 5 commits into from

Conversation

mattwthompson
Copy link
Member

Description

This PR aims to materialize a specification document for this project.

@codecov
Copy link

codecov bot commented Oct 14, 2020

Codecov Report

Merging #29 (b93d5cd) into master (c0171c4) will not change coverage.
The diff coverage is n/a.

| Bonds | Bond constraints
| | Fractional bond orders
| Iterators for other valences | Residue/chain/segments
| Periodicity boolean | Box vectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's communicate that we're flexible on whether the box vectors could be part of the coordinates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see them as separate silos of data (them being in the same row here is just a coincidence)

independent_variables={"x"}) potential_handler.expression
```

| MUST | MAY | MUST NOT |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this style of table.


### User control over system combination

A plug-in architecture will expose settings for the level of rigour executed in the internal consistency checks. The default will be strict, resulting in errors if there is any ambiguity in the physical description of each system (i.e. different non-bonded cutoff treatments). A small number (1-2) of other settings with more granularity will be exposed, i.e. one that is problematively permissive and another that implements some amount of "reasonable" discrepancies to be fudged together. This will all be constructed in a way that enables users to define their own sets of "knobs" for when system combination can and cannot be allowed, i.e. if aromaticity models need not match but cut-off treatments must.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is scope creep so early in the process, but I think the + operator is going to be a critically important part of the implementation, since we'll be able to reuse it during system combination, modification, and splitting/subtraction (if we support it). So I wonder if we want more depth about its behavior. Like

Use of the + operator...
MUST result in a physically valid system (able to consume coordinates and calculate energies)
MUST preserve the order of particles in both systems
MUST attempt to use all SystemCombiners registered to the System, in the order they were registered, unless one raises a specific subclass of Exception that calls for an early halt.
MAY use multiple SystemCombiners in order to fully cover the different components of the input Systems
MAY provide non-commutative behavior (nonbonded cutoff might be taken from the system on the left)
MAY accept objects other than OpenFF systems (like ParmEd Structures)
MAY raise an error on atom typing collisions (maybe have a separate registry of TopologyCombiners?)
MUST NOT, if both input Systems are OpenFF Systems, produce a new Topology that contains different chemical species than were in the original inputs
MUST NOT perform in-place modifications on either input System
MUST NOT (?) "add" information that wasn't present in either System (for example, adding a cutoff where neither input specified one)

spec/The-OpenFF-System.md Outdated Show resolved Hide resolved

System parameters (force field parameters applied to a chemical topology) are represented as the sum of individual components (`PotentialHandler`s). Teach term in a potential energy function is expected to be captured by a `PotentialHandler` or combination thereof. These closely mirror the `ParameterHandler`s in OpenFF Toolkit, and may merge in the future.

Each `PotentialHandler` subclass must specify an string-like `expression` that encodes the algebra of its energy evaluation and a collection of `independent_variables` that specify which variables in the expression do not need to be specified by system parameters. The remaining variables are then expected to be specified in a sequence of `Potential` objects stored in the handler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that everything can be expressed as string-like, rather than a function that gets evaluated a number of times? "PME electrostatics short range" is technically a very complicated function, and expressing it as a string could be problematic. Machine learning potentials are going to be super hard if not impossible to express as strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have string constants, that define the algebra elsewhere? Is that a good or bad idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we'll need to have a few different flavors of this class - also for tabulated potentials that do not care about any algebraic form


System parameters (force field parameters applied to a chemical topology) are represented as the sum of individual components (`PotentialHandler`s). Teach term in a potential energy function is expected to be captured by a `PotentialHandler` or combination thereof. These closely mirror the `ParameterHandler`s in OpenFF Toolkit, and may merge in the future.

Each `PotentialHandler` subclass must specify an string-like `expression` that encodes the algebra of its energy evaluation and a collection of `independent_variables` that specify which variables in the expression do not need to be specified by system parameters. The remaining variables are then expected to be specified in a sequence of `Potential` objects stored in the handler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have string constants, that define the algebra elsewhere? Is that a good or bad idea?


## Features

### Tracking parameter sources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say a little more about what the data structures are that store/represent provenance?


### User control over system combination

A plug-in architecture will expose settings for the level of rigour executed in the internal consistency checks. The default will be strict, resulting in errors if there is any ambiguity in the physical description of each system (i.e. different non-bonded cutoff treatments). A small number (1-2) of other settings with more granularity will be exposed, i.e. one that is problematively permissive and another that implements some amount of "reasonable" discrepancies to be fudged together. This will all be constructed in a way that enables users to define their own sets of "knobs" for when system combination can and cannot be allowed, i.e. if aromaticity models need not match but cut-off treatments must.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there can be compositions of functions? It would be a pain to write out different van der Waals, PME electrostatic short range, etc. that each had to reimplement a tapered cutoff term to multiply by each term - seems like should only be written once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it possible to recognize when two functions are equivalent (At least 2 different ways of doing LJ, harmonic force constants can be defined with or without 1/2, RB and periodic sums of torsion can be equivalent).

## Relevant edge cases

* Conception of "pre-" and "post-typing" topologies
* Do virtual sites go in the topology, a special "post-typing" topology, or should they be computed on-the-fly as needed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct is part of the topology, it defines the geometry, is affected by parameterization.

* Allowing/forbidding/tracking/fixing "dirty" states
* Dealing with mal-formed files or those that play fast-and-loose with specifications (MOL2, PDB, etc.)
* Safely supporting alchemical mutations
* Tracking alchemical mutations (safely storing a diff?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could literally be arrays of parameters at each site?

* Dealing with mal-formed files or those that play fast-and-loose with specifications (MOL2, PDB, etc.)
* Safely supporting alchemical mutations
* Tracking alchemical mutations (safely storing a diff?)
* How to handle polarizability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is just a functional form.

@mattwthompson mattwthompson deleted the spec branch May 23, 2022 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants