Refactor to openff.units.elements #1182

mattwthompson · 2022-01-27T01:04:45Z

Resolves #1178 if accepted.

This PR proposes to not use an external elements package and further remove the concept of atoms containing elements. It seems that every use case can be covered by knowing only the atomic number.

Atom.element is removed
Atom.element.symbol becomes Atom.symbol
Atom.element.mass becomes Atom.mass
Atom.element.atomic_number probably should never have been used, but was also probably always in sync with Atom.atomic_number

Upsides:

No new dependencies
Quick import (~6 ms on my aging hardware), which could probably be optimized
Fast (< 100 ns) atomic mass and symbol lookups

Downsides:

Breaks the API (though we're allowed to for 0.11.0!), which was already going to happen for Atom.mass
Potentially feature-light for feature development (ambiguous)

Quick profiling script

import time
from openff.units import unit
start_time = time.time()
from openff.units.elements import MASSES, SYMBOLS
print(time.time() - start_time)

Check examples - some are not passing on the refactor branch, not clear if they should be fixed here or later while prepping an RC build
Tag issue being addressed
Update tests
Update docstrings/documentation, if applicable
Lint codebase
Update changelog

devtools/conda-envs/test_env.yaml

codecov · 2022-01-27T18:47:04Z

Codecov Report

❗ No coverage uploaded for pull request base (topology-biopolymer-refactor@0beef75). Click here to learn what that means.
The diff coverage is n/a.

mattwthompson · 2022-01-27T19:02:01Z

@j-wags it might take multiple iterations of reviews but it's ready for a first review - I hope it's material enough to make a decision on if this is a direction we want to proceed with.

One new test (openff/toolkit/tests/test_molecule.py::TestMolecule::test_chemical_environment_matches_OE) is failing for reasons that aren't obvious to me.

j-wags

Thanks, @mattwthompson. This looks really solid. I left a few comments, mostly non-blocking. The one blocking thing is probably a misunderstanding on my part, let's discuss at our check in tomorrow.

The only additional thing I'd request is a few sentences of description/migration guide in the releasenotes. This is the first mention of openff-units in this repo, so it'd be good to give some info (differences in getting and setting unit bearing quantities, how to adapt scripts if users are already using openmm units, what is openff units (with a link) and pint (with a link) )

j-wags · 2022-02-08T19:28:55Z

devtools/conda-envs/openeye.yaml

@@ -17,8 +17,7 @@ dependencies:
  - coverage
  - numpy
  - networkx
-  - parmed  # ???
-  - mendeleev
+  - parmed  # ??? what ???


(not blocking) Maybe this was for example tests (from before they were separated into their own CI)? Feel free to try removing if you have time to let CI run.

devtools/conda-envs/test_env.yaml

openff/toolkit/topology/molecule.py

j-wags · 2022-02-08T19:49:51Z

openff/toolkit/typing/engines/smirnoff/forcefield.py

@@ -1306,7 +1306,8 @@ def create_openmm_system(self, topology, **kwargs):
        # This means that even though virtual sites may have been created via
        # the molecule API, an empty VirtualSites tag must exist in the FF
        for atom in topology.topology_atoms:
-            system.addParticle(atom.mass)
+            # addParticle(mass.m_as(unit.dalton)) would be safer but slower


(blocking) Would the explicit units be substantially slower? Like, more than 1 second for 100,000 atoms? And won't atom.mass.m involve a conversion to openmm units anyway? I'd feel way safer if the conversion were explicit. I'm probably misunderstanding something - Let's touch base on this at our check in this week.

Surprisingly, there is no conversion to an OpenMM quantity; internally the quickest way through this method is to pass it an int. Here is a snippet from the SWIG-generated file I found in .../site-packages/openmm/openmm.py:

def addParticle(self, mass): r""" addParticle(self, mass) -> int Add a particle to the System. If the mass is 0, Integrators will ignore the particle and not modify its position or velocity. This is most often used for virtual sites, but can also be used as a way to prevent a particle from moving. Parameters ---------- mass : double the mass of the particle (in atomic mass units) Returns ------- int the index of the particle that was added """ if unit.is_quantity(mass): mass = mass.value_in_unit(unit.amu) return _openmm.System_addParticle(self, mass)

So, giving it something that looks like a number is fastest. If it thinks it sees something wrapped with OpenMM units, it will do the necessary conversion.

In [1]: from openff.units import unit In [2]: from openmm import System, unit as openmm_unit In [3]: dummy_mass_openff = 1.0 * unit.dalton In [4]: dummy_mass_openmm = 1.0 * openmm_unit.amu In [5]: system = System() In [6]: %timeit system.addParticle(mass=1.0) 592 ns ± 81 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) In [7]: %timeit system.addParticle(mass=dummy_mass_openff.m) 756 ns ± 136 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) In [8]: %timeit system.addParticle(mass=dummy_mass_openmm) 4.76 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [9]: %timeit system.addParticle(mass=dummy_mass_openff.m_as(unit.dalton)) 17.2 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Quantity.m is fast because it does no unit checking, it just reports the magnitude in whatever it's currently set as. Ensuring that the values are in Dalton is slower, but only so much slower than allowing OpenMM do the analogous check internally.

Of course, the problem here is dealing with other masses - not something I've ever run into, but it could be considered. The current implementation will always be in Daltons as long as we don't change this upstream. There is no setter on the Atom class, so I don't even know it could get to that state without other changes that themselves might significant behavior changes.

If we can't assume Daltons, I'm having a hard time figuring out any way around the time it takes to do this conversion. The only middle ground I can think of now is using the setter to ensure its units; this would safeguard against similar cases in which we wrote code without considering the non-Dalton case. It has the downside of pushing the cost elsewhere, of course.

In conclusion, I think the current approach is the best one (if accidentally - I did not think this all through while working on this). It assumes the units, but those are hardcoded in openff-units. It's also virtually as performant as possible, clocking in around ~750 ms for adding 1,000,000 atoms compared to ~600 ms if everything was already as floats and ~4700 ms with the current approach.

Thanks for the detailed response. 10 us * 100,000 atoms = 1 second, so using explicit units will become a performance bottleneck at some point.

As a compromise, how about we leave this as-is, but make it so that we guarantee that Atom.mass always returns values in daltons/amus? I've added a comment for where we could test that here: https://github.com/openforcefield/openff-toolkit/pull/1182/files#r803188577

Adding the test described above would resolve the blocker here, and you could feel free to merge. Optionally, it'd be great if you update the atom.mass docstring to echo the unit guarantee.

I take

but make it so that we guarantee

to mean we'll add a unit test and document it as such, which sounds like a good idea to me.

j-wags · 2022-02-08T20:04:45Z

openff/toolkit/utils/_cif_to_substructure_dict.py

+        # Is this de-duplicated? i.e. can it be ["C", "C", "H"] or only ["C", "H"] ?
+        atom_symbol_list: List[str] = cif_entry["_chem_comp_atom.type_symbol"]
+
+        # Is this dict a de-duplicated mapping between element symbols and atomic numbers
+        # for the elements in this cif file? If so, can maybe do it in one step
+        """


(not blocking) In the interest of time I'm not going to give feedback on this. I forgot exactly what this function expects and we don't have tests for this file yet. This is not expected to be run by users and so runtime isn't super critical as long as it's under an hour. If you're interested in optimizing/refactoring this, feel free to do so at your discretion. The best test I can suggest would be ensuring that the openff/toolkit/utils/make_*_substructures.py scripts yield the same outputs.

I'll not tinker with this more, then. Except maybe to suggest it shouldn't be in the API, and instead tucked away in another folder?

I was under the impression that the underscore prepended to the file name communicated the privateness/here-be-dragons-ness of the contents. Though I'm not sure if that's a python convention in reality.

As far as I remember when we first planned this, we didn't want to support a public infrastructure to do this, only just what's "enough" for the toolkit. I think I'm still in favor of this stance, since we know how this problem can be pretty complex. Also, I guess it's worth noticing that this .py file is expected to be used only once in a while. My 2 cents :). BTW, I'm loving how this is evolving, thanks for the good work!

Though I'm not sure if that's a python convention in reality.

It is :) https://www.python.org/dev/peps/pep-0008/?#public-and-internal-interfaces

I'd like to clarify a bit the distinction between two different things. I'm not sure the right jargon (happy to be told what the terminology is, I can't think of anything but "private API" and "scripts") so I'll resort to being descriptive:

Code that's for core toolkit functionality and might get called at runtime while interacting with the public API, but is either too dangerous to be explicitly exposed to users or meant to hide away implementation details. Stuff like Molecule._add_trivalent_lone_pair_virtual_site or OpenEyeToolkitWrapper._find_smarts_matches. These code paths have intrinsic value (in the sense that they are the product), should be covered by unit tests, will be included in conda tarballs, and fall under the umbrella of code we're responsible for maintaining/documenting/etc.

Code that's for one-off data cleaning or developer-facing operations, but not a part of a core feature and will not be called by a user (directly or indirectly) at runtime. These files are valuable in large part for what they produce (i.e. big CIF file I don't understand the details of), but aren't covered by unit tests, may or may not be included in conda tarballs, and don't have the same maintenance burden.

What I'm getting at here is that code that cannot be accessed at runtime should be moved /utilities/ or somewhere that would not be packaged as part of the Python module. That folder is from before my time, but it looks like those files didn't contribute as much to maintenance burden.

There's no danger to the user as is since it's definitely not part of the public API. But I think functions that should be used once in a while should not be in the core module (though no less important, of course).

Hmm, moving it to /utilities/ isn't a bad idea... I'm just thinking about stuff like @connordavel's possible future project with custom polymers, and that having runtime access to the substructure generator from a stable package may enable some important functionality for advanced users. Also that there's a possible future where we do make this public.

So, I understand the arguments about maintenance burden, but overall there's a lot of uncertainty about the future of this functionality, so for now I'd pick "no change" as a course of action while we have so many other things going on.

Taking no action here is certainly reasonable given its temporary nature at the moment. 👍

Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>

…-elements

…into minimal-elements

mattwthompson · 2022-02-09T14:51:19Z

The failing test is also failing upstream on #951

openff/toolkit/tests/test_molecule.py::TestMolecule::test_chemical_environment_matches_OE

mattwthompson · 2022-02-09T16:14:30Z

I also can't reproduce the collection errors in the "CI / Test on ubuntu-latest, Python 3.7, RDKit=false, OpenEye=true".

openff/toolkit/tests/test_molecule.py

mattwthompson · 2022-02-10T20:39:13Z

There are two failures in CI, each of which I believe are unrelated to these changes here

openff/toolkit/tests/test_molecule.py::TestMolecule::test_chemical_environment_matches_OE - reported upstream, unclear
Tests with RDKit not installed and OpenEye installed are failing to collect the molecule fixtures. I can't reproduce this locally

I'm going to merge this and we can fix those later

see openforcefield/openff-toolkit#1182

Draft a refactor to openff.units.elements

c529ee0

mattwthompson commented Jan 27, 2022

View reviewed changes

devtools/conda-envs/test_env.yaml Outdated Show resolved Hide resolved

mattwthompson added 4 commits January 27, 2022 09:33

Fix typos, tests

6555475

Fix more typos

37e86db

Fix setting masses while adding OpenMM particles

a383a0a

Fix virtual site creation

a2e7bf9

openforcefield deleted a comment from lgtm-com bot Jan 27, 2022

Remove Atom.element

0321d99

openforcefield deleted a comment from lgtm-com bot Jan 27, 2022

mattwthompson requested a review from j-wags January 27, 2022 18:58

This was referenced Feb 1, 2022

Add use_interchange argument to ForceField.create_openmm_system #1165

Merged

Update units handling openforcefield/openff-interchange#389

Closed

j-wags approved these changes Feb 8, 2022

View reviewed changes

mattwthompson and others added 3 commits February 8, 2022 14:39

Apply suggestions from code review

5d307a5

Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>

Pin tests to openff-units >=0.1.5

42197ea

Merge remote-tracking branch 'upstream/minimal-elements' into minimal…

d98cd93

…-elements

mattwthompson marked this pull request as ready for review February 8, 2022 22:25

mattwthompson added 2 commits February 8, 2022 16:29

Merge remote-tracking branch 'upstream/topology-biopolymer-refactor' …

7b24398

…into minimal-elements

Update release history, add draft of section of migration guide

b9596c1

j-wags reviewed Feb 9, 2022

View reviewed changes

openff/toolkit/tests/test_molecule.py Show resolved Hide resolved

Test, document that Atom.mass is in units of Dalton

7f6d26a

mattwthompson mentioned this pull request Feb 10, 2022

Add "desire paths" to API #1192

Merged

2 tasks

mattwthompson merged commit 1115d25 into topology-biopolymer-refactor Feb 10, 2022

mattwthompson deleted the minimal-elements branch February 10, 2022 21:38

This was referenced Feb 10, 2022

Element handling #1178

Closed

Update for toolkit elements refactor openforcefield/openff-interchange#391

Merged

IAlibay mentioned this pull request Jul 7, 2022

openfff tk 0.11 breaks openmmforcefields OpenFreeEnergy/openfe#153

Closed

richardjgowers added a commit to OpenFreeEnergy/openmmforcefields that referenced this pull request Jul 7, 2022

fixup for openff 0.11 API break

96b2154

see openforcefield/openff-toolkit#1182

mikemhenry mentioned this pull request Jul 12, 2022

Update for upcoming OpenFF Toolkit API breaks openmm/openmmforcefields#191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to openff.units.elements #1182

Refactor to openff.units.elements #1182

mattwthompson commented Jan 27, 2022 •

edited

Loading

codecov bot commented Jan 27, 2022 •

edited

Loading

mattwthompson commented Jan 27, 2022

j-wags left a comment

j-wags Feb 8, 2022

j-wags Feb 8, 2022

mattwthompson Feb 8, 2022

j-wags Feb 9, 2022

mattwthompson Feb 10, 2022

j-wags Feb 8, 2022

mattwthompson Feb 8, 2022 •

edited

Loading

j-wags Feb 10, 2022

ijpulidos Feb 10, 2022

lilyminium Feb 10, 2022

mattwthompson Feb 10, 2022

j-wags Feb 10, 2022

mattwthompson Feb 10, 2022

mattwthompson commented Feb 9, 2022

mattwthompson commented Feb 9, 2022

mattwthompson commented Feb 10, 2022

Refactor to openff.units.elements #1182

Refactor to openff.units.elements #1182

Conversation

mattwthompson commented Jan 27, 2022 • edited Loading

codecov bot commented Jan 27, 2022 • edited Loading

Codecov Report

mattwthompson commented Jan 27, 2022

j-wags left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattwthompson Feb 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattwthompson commented Feb 9, 2022

mattwthompson commented Feb 9, 2022

mattwthompson commented Feb 10, 2022

mattwthompson commented Jan 27, 2022 •

edited

Loading

codecov bot commented Jan 27, 2022 •

edited

Loading

mattwthompson Feb 8, 2022 •

edited

Loading